r/StableDiffusion

▲ 155 r/StableDiffusion+2 crossposts

Nvidia RTX 2 pass Upscaler (4GB VRAM + 8GB RAM)

Official Link : Nvidia docs

NVIDIA RTX 2-Pass Upscaler (4GB VRAM + 8GB RAM)

Post:

Hi everyone!

Recently, while working on AI videos with the LTX2.3 model, I started thinking a lot about upscaling efficiency, so I made my own RTX Upscale node for ComfyUI.

In the existing ComfyUI setup, most workflows mainly used Video Super Resolution (VSR), but NVIDIA RTX upscaling actually has four different options. I implemented all four of them in this node.

After testing it myself, I honestly no longer feel a need to subscribe to Topaz AI.

- DeBlur: The most effective option for sharpening blurry videos, especially AI-generated videos.

- DeNoise: Helps clean up noisy footage. For AI videos, I recommend using it selectively.

- High Bitrate: Good for improving the quality of cleaner source videos.

- Video Super Resolution (VSR): The standard method that was commonly used before.

The main idea I applied is a 2-step upscaling method.

First, DeBlur is used to sharpen the video, and then High Bitrate or VSR is applied as the second pass. In my tests, this produced much better results.

Performance and requirements:

- On an RTX 5090, upscaling a 512x512 video to 1024x1024 takes about 5 seconds.

- For Low RAM / Low VRAM environments, I made a Batch image workflow. With this method, most low-spec systems can usually finish the upscaling within about 1-2 minutes.

- When using the Batch image method, the requirement is around 10GB RAM and 4GB VRAM.

Existing NVIDIA RTX Super Resolution nodes were very difficult to install because the backend setup often caused errors. So I prepared an install_rtx_vfx helper to make the backend installation as close to one-click as possible.

Installation:

  1. Open ComfyUI Manager → Custom Node Manager, then search for deno-custom-nodes and install it.
  2. Important: Completely close ComfyUI before running the installer. If ComfyUI is still running, the installation may not proceed.
  3. Go to ComfyUI/custom_nodes/deno-custom-nodes/tools.
  4. Run install_rtx_vfx.bat → wait for the installation complete message, then close the window. It usually takes about 30 seconds to 1 minute.
  5. Restart ComfyUI and run the Deno RTX Video Super Resolution (2 Pass) node.

For detailed usage, please check the tutorial and workflow links below.

Link : WorkFlow

Link : Tutorial

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
The DENO RTX Video Super Resolution update is currently being rolled out to ComfyUI Manager / Registry, so it may take a few hours before it appears for everyone. If you want to test it early, please follow the manual installation steps below.

First, completely close ComfyUI. This means closing not only the browser tab, but also the ComfyUI command window, cmd, PowerShell, or any terminal window that is running ComfyUI.

Download the installer from the official DENO GitHub repository:

https://github.com/Deno2026/comfyui-deno-custom-nodes/raw/refs/heads/main/tools/install_rtx_vfx_bat.zip

After downloading the zip file, extract it first. Do not run the .bat file directly from inside the zip file.

After extraction, you will see this file:

install_rtx_vfx.bat

Copy or move this file into the tools folder of your installed DENO custom nodes:

ComfyUI\custom_nodes\deno-custom-nodes\tools\

For example, the final location should look similar to this:

D:\ComfyUI\custom_nodes\deno-custom-nodes\tools\install_rtx_vfx.bat

Important:

Do not run install_rtx_vfx.bat from your Downloads folder. It must be placed inside:

ComfyUI\custom_nodes\deno-custom-nodes\tools\

Once the file is in the correct tools folder, double-click install_rtx_vfx.bat to run it.

If Windows shows a security warning, click “More info” and then “Run anyway.”

When the installer shows the ComfyUI Python path, check that it points to the python_embeded\python.exe used by the ComfyUI you just closed. If the path looks correct, type:

Y

and press Enter.

This installer installs NVIDIA’s official nvidia-vfx Python package from NVIDIA’s official package server, pypi.nvidia.com. It does not download random DLL files.

When you see a green “INSTALL COMPLETE” message or “[OK] NVIDIA RTX VFX is installed,” the installation is complete.

After that, restart ComfyUI and search for:

(Deno) RTX Video Super Resolution

Notes:

- You need an NVIDIA RTX GPU.

- Please use the latest NVIDIA driver.

- macOS is not supported.

- If you do not have the folder ComfyUI\custom_nodes\deno-custom-nodes\tools, please update DENO custom nodes first through ComfyUI Manager or GitHub, then try again.

u/Extension-Yard1918 — 4 hours ago

Kijai just uploaded LTX2.3 OmniNFT RL-LoRA for better video and audio!

Reposting this from Twitter (wildminder):

"LTX2.3 OmniNFT RL-LoRA generates high-quality video/audio + visuals and sound are perfectly synchronized, no laggy or mismatched audio.

- realistic Lip-Sync

- action-matched sound

- reduces synchronization errors by 52%

really nice output"

https://reddit.com/link/1thxd1p/video/qvk7394gh52h1/player

This^ sample is apparently using LTX2 as a baseline. But obviously Kijai wouldn't have released this lora if it wasn't compatible with LTX2.3.

Reddit keeps blocking my posts (removed by filters), so I'm editing the links to see if this post will work (just remove the spaces, sorry):

Project page: zghhui . github . io/OmniNFT/

Kijai HF repo: huggingface . co/Kijai/LTX2.3_comfy/tree/main

reddit.com
u/Scriabinical — 11 hours ago

Local I2V finally feels less like image wiggle and more like shot direction with LTX Director

I’ve been experimenting with LTX Director for LTX 2.3, and I think this workflow has a lot of potential.

Local I2V often feels like “make this one image wiggle”: same angle, small motion, maybe blinking or hair movement. But with LTX Director, using multiple images of the same character as key poses/camera angles inside one timeline feels much closer to shot direction or a tiny MV editor.

For this test, I used three source images of the same character with the same outfit/background, but different poses and camera angles. I included the original three images as well, so you can see what LTX Director was working from.

I also added a custom K-pop-style audio track with Custom Audio ON.

After a lot of tuning, it was able to handle:

- multi-image I2V

- smooth pose changes

- camera and face movement between poses

- cute performance gestures

- custom audio timing

- usable lip-sync

It’s still experimental. Hands can break, identity can drift, and transitions need careful prompting. But when the input images are consistent — same character, outfit, background, and style — it becomes much more dynamic than normal single-image I2V.

The most useful prompt idea for me was to treat the images as key poses of the same character, not separate people:

“Treat all images as the same character in different poses and camera angles. Preserve the same face, hairstyle, outfit, and background throughout. Move smoothly between the poses as one continuous close-up performance. Natural lip-sync to the custom audio vocals, clear visible mouth movement, soft blinking, small head tilts, cute gestures, subtle shoulder sway, light hair motion.”

This still needs more testing, but I think LTX Director could be really useful for AI idol clips, character PVs, surreal mascot videos, short music videos, and anything where local video generation needs more than one static angle

u/Father_hands — 12 hours ago
▲ 22 r/StableDiffusion+1 crossposts

Anima + turbo lora + 2x 5060ti = 4s

I was looking for performance benchmarks for the 5060Ti in a dual-GPU setup with Anima, but didn't find much. Hope this helps anyone looking for similar benchmarks for this specific hardware configuration.

Hardware & Software:

  • GPUs: 2x RTX 5060Ti (OC +250/+2000) connected with pcie 4.0 x8
  • Base Model: Anima v1.0 (HF)
  • LoRA: Turbo LoRA (Civitai)
  • Plugin: Raylight (GitHub)

Performance:

Resolution Lora Compile ulysses ring Time (s)
1024x1024 ON ON 1 2 3.8
1024x1024 ON OFF 1 2 4.0
1024x1024 OFF ON 1 2 21.4
1024x1024 OFF OFF 1 2 23.5
---------- ---- ------- ------- ---- --------
1584x1584 ON ON 1 2 7.9
1584x1584 ON OFF 1 2 8.6
1584x1584 OFF ON 1 2 ERR
1584x1584 OFF OFF 1 2 ERR
1584x1584 OFF ON 2 1 60.4
1584x1584 OFF OFF 2 1 67.7
---------- ---- ------- ------- ---- --------
2048x2048 ON ON 1 2 13.0
2048x2048 ON OFF 1 2 14.5
2048x2048 OFF ON 1 2 85.5
2048x2048 OFF OFF 1 2 98.0
2048x2048 OFF ON 2 1 105.1

*typo in workflow. Compile backend must be inductor, not cudagraphs (trigger err)
*workflow embedded into images

u/MagentL — 8 hours ago
▲ 112 r/StableDiffusion+1 crossposts

Update Characters generator - v1.3 Now with Anima! | Generation of detailed сharacter for full body

Good afternoon!

This is an update to my character generation workflow.

I was very pleased with the release of Anima-Base. It is quite flexible, has a lot of knowledge about characters, and generates different styles perfectly, and its turbo-lora gives quite high-quality results. However, I had to adjust a little to its behavior in img2img.

It used to be called "Sprite generator" referring to the images of characters from visual novels, but I decided that "Characters generator" would cause less confusion.

What's changed?

- Added the ability to specify indentations at the edges of the frame so that the character does not go beyond it.
- Improved tile upscaler using "anima-lllite-inpainting-v2"

Link

u/Ancient-Future6335 — 13 hours ago
▲ 7 r/StableDiffusion+1 crossposts

Character lora tool : GridLoraTester

https://preview.redd.it/7tdi4fa3k52h1.png?width=1828&format=png&auto=webp&s=9b35d7acf7b376c4171e33e0eafdb91b5ed5e1fe

I've been working on this for a few months and it's finally in a state where I think it might be useful to someone other than me. Sharing it here in case you're trying to train character LoRAs on FLUX-2 and you're tired of guessing.

The premise: every time I train a character LoRA, I end up stuck on two questions.

  1. Is my dataset actually balanced and identity-consistent, or am I just hoping?
  2. Once trained, which step actually holds likeness across the whole prompt sweep — not just the one flattering close-up?

GridLoraTester answers both with numbers from face-recognition scores. It's split in two surfaces; you can use either independently.

Dataset curation

  • Face recognition (ArcFace via InsightFace buffalo_l) gives every photo a similarity score against a per-dataset centroid (mean of all detected faces). Off-identity photos surface immediately.
  • Pose × framing classifier (front / ¾ / profile × close-up / medium / wide / extreme). A dataset-health checklist tells you what's balanced and what's under-represented vs published portrait-dataset targets.
  • Prune candidates when you're over a max size — most-redundant photos within over-represented buckets, ranked by k=3 nearest in-bucket cosine. Soft delete, fully reversible.
  • External-photo suggestions — link Immich / Google Photos / a local folder, and the engine mines that library for photos that fit the dataset's identity AND fill an under-rep bucket. Pose-tempered scoring so profile shots aren't penalised. Dedup runs both vs the existing dataset AND across the suggestions themselves, so the same photo on Immich + Google Photos collapses to one suggestion.
  • BlockHash 256-bit near-duplicate detection (10-bit Hamming threshold) underneath all of the above.

Grid testing

  • One row per checkpoint × one column per prompt, same seed across the grid for fair comparison.
  • Every cell scored against the dataset centroid: green ≥ 0.50 / amber ≥ 0.35 / red < 0.35.
  • Per-prompt aspect ratio via [3:4] / [16:9] prefixes; resolution comes from a single MP budget. [trigger] placeholder substituted automatically.
  • Run history per test — flip between runs to compare quant changes, training continuation, or rescore a past run against an updated centroid without regenerating anything.
  • Score-vs-step graph (median / p20 / max). Useful for picking the checkpoint where p20 (consistency) catches up with median (peak) instead of just chasing the spikes.

Tech bits, in case you care

  • FLUX-2 Klein via diffusers; FP8 / FP8 dynamic / bf16 / INT8 ConvRot quant paths. INT8 ConvRot uses Hadamard rotation + torch._int_mm cuBLASLt → ~2× faster denoise than FP8 weight-only on Ampere (3090/3080), same VRAM (~9 GB transformer for Klein 9B). LoRA bake-in via Tensor.data.copy_() preserves Parameter identity so torch.compile survives swaps.
  • Prompt-embedding cache in SQLite. After encoding, Qwen3 text encoder is fully unloaded (del + gc + empty_cache()) so it doesn't squat VRAM during the denoise + VAE.
  • Per-shape batching in the grid loop — mixed AR rows don't crash batched inference; prompts grouped by (w, h) before each pipe() call.
  • Dashboard is SvelteKit + better-sqlite3 in WAL mode. Python writes back to the same DB the dashboard reads — no IPC marshalling, just shared SQLite.
  • Idle-TTL on the face worker frees the ORT BFC arena (~5–6 GB) when not in use; lazy-respawn on next request.

What it isn't

  • Not a trainer. It eats the LoRA folder your trainer (ai-toolkit, etc.) already produces.
  • FLUX-2 only right now. The pipeline-load code is reasonably isolated; FLUX-1 / SD3 / Wan2.2 aren't out of the question if there's demand.
  • NVIDIA + ≥ 24 GB VRAM. Linux is the tested path; the dashboard runs on macOS/Windows but the inference side wants Linux + CUDA.

License

Source-available under PolyForm Noncommercial 1.0.0 — free for personal / hobby / research / education. Commercial use is a separate paid license (details in LICENSE). MIT was too permissive for the niche; PolyForm cleanly splits "free for everyone learning" from "paid if you're shipping a product on top".

Repo

https://github.com/Mandrakia/GridLoraTester

Bug reports and PRs welcome. Particularly interested in feedback on the suggestion engine's bucket-targeting heuristic and the grid-test sort UX — those are the two surfaces where my own preferences leak into the defaults most.

Screenshots

Dataset list Dataset details Dataset stats Dataset edit : Prune Dataset edit : Suggestions Test setup Test grid result Test graphi result

reddit.com
u/Mandrakia — 10 hours ago

Anyone using LTX Desktop?

Hey Guys I have tried the LTX Desktop and it is really fast. It generated 10 sec video 720p 9:16 in just 2-3 minutes maximum.

I want to know if anyone else is using it, as I want to do some more stuffs with it.

reddit.com
u/Critical-Team736 — 21 hours ago
▲ 1.5k r/StableDiffusion+1 crossposts

Someone posted a real Monet to twitter but said it was AI generated. The replies are amazing, pretentious and confidently wrong

u/Jenna_AI — 1 day ago
▲ 159 r/StableDiffusion+1 crossposts

HY World + Sharp, 360 Panorama Gaussian Splat

I was trying to get the HY World 2.0 / WorldMirror v2 and Sharp to work together in order to create something where a room could be explored. This is as about as far as I got. It's still missing something. *Scale button doesn't work with HY World nodes*. But yea, scaling the splat could help. Also, moving the camera really sucks, but I think that's the scale of the actual full splat just not being loaded properly, and I need to figure that out--either through the nodes available or creating my own (which would be hard af for me, not being a coder). If anyone has ideas, maybe I could throw a sheet together to see if Gemini can craft something. But regardless of all that, it's nice to finally get a panorama working in 360 viewable now.

u/DJBFilmz — 1 day ago
▲ 101 r/StableDiffusion+1 crossposts

Full Head swap model that make sure Facial features are so strong as well as head size matching of the target

Hey guys, I hope everyone is having great day.
I'm currently working on a project where I need to swap entire head between two images.
I have tried all sort of models, both open source and commercial and always got stuck between two priorities when one gets fulfilled the other doesn't.
First priority is that facial features should look so strong so that the person is so well recognizable as the source.

Second ( which is what most commercial models fail with), is that head should be resized to match target.

Third (not really strong priority semi priority) : adaption of body color or style, for example changing body color slightly to match head color of the source.

There other things like, Copying Facial emotions from target and head position, but these are not priorities. For commercial models I think i have tried every possible model out there.
And for open source models, I have tried bfs with Qwen basically have tried everything in this repo https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap and it worked well for head size matching target, but facial expressions got so weak.
I was wondering can I find a workflow that fulfills my priorities very well, even if it requires large models size.

▲ 71 r/StableDiffusion+1 crossposts

LumiPic: Oumoumad's (LTX lora fame) SDR-&gt;HDR conversion LoRAs for Qwen, soon Kline Base 4 &amp; 9

>LumiPic — Single-Image SDR to HDR LoRA

>Converts standard dynamic range (SDR) images to high dynamic range (HDR) EXR files — float-valued, with range well beyond what an 8-bit SDR output can carry.

Released weeks ago, surprised no one posted about it.

Even if your target usecase is not HDR, if you want to post edit your images, the extra image range can help with exposure / colorization editing.

ComfyUI workflows in the files tab.

Edit: video https://www.youtube.com/watch?v=z0ue28hbMTk

huggingface.co
u/tomByrer — 1 day ago
▲ 184 r/StableDiffusion+1 crossposts

How to use LTX Director - A Free Tool for Creating Advanced LTX 2.3 Videos in ComfyUI

Just finished the first tutorial for LTX Director.

It covers how to setup the node, and has multiple examples on how to use all of the nodes main features. Hopefully it helps!

youtu.be
u/WhatDreamsCost — 1 day ago

are these models outdated?

so I havent used SD since 2024, and im doing some files cleaning/updating.
are any of these models safe to delete and update?
in that case, which new/updated models should i replace these with?
thanks!

u/baejohnd — 1 day ago
▲ 78 r/StableDiffusion+3 crossposts

I built a Colab notebook that does facial expression copying using LivePortrait. You load a source image (contains a single face with any expression) and a target image (contains a single face whose expression is to be changed), adjust blend sliders, and it transfers the expression while preserving identity.

The notebook replaces LivePortrait's use of InsightFace for face detection with MediaPipe, so the entire pipeline is commercially permissive (MIT + Apache 2.0). It runs on a free Colab T4 GPU.

What it does: expression blend and head rotation blend with adjustable sliders, 512×512 upsampled output.

This is a demo for Face2FaceAI, an Android app I'm building that adds face reinsertion, asymmetry correction, template expressions, and other features — all running on-device. More at face2faceai.com.

The example shows before/after expression swap with face reinsertion (app feature)

Open in Colab | GitHub repo

Feedback welcome — this is my first public release.

u/coolt00nz — 1 day ago
▲ 10 r/StableDiffusion+1 crossposts

Built a workflow to clone any fashion video with your products. DM for early access

Have tested over 100+ videos across categories like:

  1. Get Ready With Me (GRWM),
  2. OOTD,
  3. Lookbook videos

Still fails in 10-20% cases. Looking for early users to share feedback!

u/kinraw — 1 day ago

Does anyone have any information on when Amina-Turbo will be released?

Hi friends.

While checking the latest version of Anima, I saw a message saying that Anima-Turbo will be released soon.

But does anyone know how long it might take? Does a base model, trained for a faster Turbo version, usually take a long time?

I'm asking out of ignorance. Because I'd like to know if models normally take many months to be released, or not.

u/Hi7u7 — 1 day ago