r/comfyui

▲ 177 r/comfyui+2 crossposts

Nvidia RTX 2 pass Upscaler (4GB VRAM + 8GB RAM)

Official Link : Nvidia docs

NVIDIA RTX 2-Pass Upscaler (4GB VRAM + 8GB RAM)

Post:

Hi everyone!

Recently, while working on AI videos with the LTX2.3 model, I started thinking a lot about upscaling efficiency, so I made my own RTX Upscale node for ComfyUI.

In the existing ComfyUI setup, most workflows mainly used Video Super Resolution (VSR), but NVIDIA RTX upscaling actually has four different options. I implemented all four of them in this node.

After testing it myself, I honestly no longer feel a need to subscribe to Topaz AI.

- DeBlur: The most effective option for sharpening blurry videos, especially AI-generated videos.

- DeNoise: Helps clean up noisy footage. For AI videos, I recommend using it selectively.

- High Bitrate: Good for improving the quality of cleaner source videos.

- Video Super Resolution (VSR): The standard method that was commonly used before.

The main idea I applied is a 2-step upscaling method.

First, DeBlur is used to sharpen the video, and then High Bitrate or VSR is applied as the second pass. In my tests, this produced much better results.

Performance and requirements:

- On an RTX 5090, upscaling a 512x512 video to 1024x1024 takes about 5 seconds.

- For Low RAM / Low VRAM environments, I made a Batch image workflow. With this method, most low-spec systems can usually finish the upscaling within about 1-2 minutes.

- When using the Batch image method, the requirement is around 10GB RAM and 4GB VRAM.

Existing NVIDIA RTX Super Resolution nodes were very difficult to install because the backend setup often caused errors. So I prepared an install_rtx_vfx helper to make the backend installation as close to one-click as possible.

Installation:

  1. Open ComfyUI Manager → Custom Node Manager, then search for deno-custom-nodes and install it.
  2. Important: Completely close ComfyUI before running the installer. If ComfyUI is still running, the installation may not proceed.
  3. Go to ComfyUI/custom_nodes/deno-custom-nodes/tools.
  4. Run install_rtx_vfx.bat → wait for the installation complete message, then close the window. It usually takes about 30 seconds to 1 minute.
  5. Restart ComfyUI and run the Deno RTX Video Super Resolution (2 Pass) node.

For detailed usage, please check the tutorial and workflow links below.

Link : WorkFlow

Link : Tutorial

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ
The DENO RTX Video Super Resolution update is currently being rolled out to ComfyUI Manager / Registry, so it may take a few hours before it appears for everyone. If you want to test it early, please follow the manual installation steps below.

First, completely close ComfyUI. This means closing not only the browser tab, but also the ComfyUI command window, cmd, PowerShell, or any terminal window that is running ComfyUI.

Download the installer from the official DENO GitHub repository:

https://github.com/Deno2026/comfyui-deno-custom-nodes/raw/refs/heads/main/tools/install_rtx_vfx_bat.zip

After downloading the zip file, extract it first. Do not run the .bat file directly from inside the zip file.

After extraction, you will see this file:

install_rtx_vfx.bat

Copy or move this file into the tools folder of your installed DENO custom nodes:

ComfyUI\custom_nodes\deno-custom-nodes\tools\

For example, the final location should look similar to this:

D:\ComfyUI\custom_nodes\deno-custom-nodes\tools\install_rtx_vfx.bat

Important:

Do not run install_rtx_vfx.bat from your Downloads folder. It must be placed inside:

ComfyUI\custom_nodes\deno-custom-nodes\tools\

Once the file is in the correct tools folder, double-click install_rtx_vfx.bat to run it.

If Windows shows a security warning, click “More info” and then “Run anyway.”

When the installer shows the ComfyUI Python path, check that it points to the python_embeded\python.exe used by the ComfyUI you just closed. If the path looks correct, type:

Y

and press Enter.

This installer installs NVIDIA’s official nvidia-vfx Python package from NVIDIA’s official package server, pypi.nvidia.com. It does not download random DLL files.

When you see a green “INSTALL COMPLETE” message or “[OK] NVIDIA RTX VFX is installed,” the installation is complete.

After that, restart ComfyUI and search for:

(Deno) RTX Video Super Resolution

Notes:

- You need an NVIDIA RTX GPU.

- Please use the latest NVIDIA driver.

- macOS is not supported.

- If you do not have the folder ComfyUI\custom_nodes\deno-custom-nodes\tools, please update DENO custom nodes first through ComfyUI Manager or GitHub, then try again.

u/Extension-Yard1918 — 6 hours ago
▲ 4 r/comfyui+1 crossposts

How do i create a 85% to 95% LoRA of a complex character?

Character (synthetic IG persona, fully-locked identity):

~20yo athletic white European woman, platinum-blonde hair with mint-green tips

2 facial piercings (vertical L-brow barbell + horizontal bridge barbell)

Blackwork tattoos: tree-branch on neck/chest + cracked-pattern full sleeves both arms

5 silver rings (consistent count), matte-black nails

Edgy / punk / skate vibe

Setup that i'm using at the moment:

Qwen-Image (20B) via ai-toolkit (Ostris), uint3 quantized + accuracy-recovery adapter, on a 24GB 3090

87 training images, all generated via ChatGPT Images 2 for cross-image consistency (no real photos exist):

74 bare-arm (tattoos + rings visible)

13 covered-outfit (jackets / sleeves / gloves) with num_repeats: 2 → ~26% effective, to teach conditional coverage so prompting "wearing a leather jacket" actually hides the tattoos

Captions: JoyCaption Beta One → manual cleaning → 2 multi-agent verification rounds (38 corrections total)

Caption strategy: omit invariant identity features (hair color, piercings, eye color) so they bind to the trigger word; caption everything that varies (pose, framing, hair state, coverage status, rings-visible vs no-rings, gloves vs no-gloves)

Hyperparams: rank 32 / alpha 16, LR 1e-4, 3000 steps, adamw8bit, flowmatch, multi-res [512, 768, 1024], grad checkpointing, no TE training, caption dropout 0.05

Mid-training (step 1750 / 3000) results:

✅ Tattoos lock fast and consistently across all prompts

✅ Trigger binding clean: prompts without the trigger generate a random woman, not her

⚠️ Face identity inconsistent — best when the prompt has contextual anchors (jacket + backwards cap); drifts on plain "tank top + grey studio"

❌ Piercings often missing or distorted (the main worry)

⚠️ Mild hair-color leak to non-trigger prompts (cosmetic only — face does NOT leak)

Questions:

Is "leave invariant fine details uncaptioned" actually the wrong call for piercings? Should I caption them explicitly even if it costs the auto-trigger-binding?

Is uint3 quantization the bottleneck on fine details like piercings? Worth retraining at fp8 with CPU offload despite the speed hit?

Is 87 images the floor for a character this feature-loaded — do you really need 150+?

Higher rank (64+) for fine-detail capture, or does that just overfit at this dataset size?

Hard-coupled features (tattoos + rings + piercings always present together) — is one LoRA correct, or would stacked / decomposed LoRAs work better here?

Better captioner than JoyCaption Beta One for this kind of fine detail?

Anything obvious I'm doing wrong?

Thanks in advance guys :)

(all images that im uploading are consistent and come from gpt images 2)

https://preview.redd.it/697181dvg82h1.png?width=1122&format=png&auto=webp&s=d73c3932b0eebf5f23d0bf8dfcc680479d68de45

https://preview.redd.it/bsdie1dvg82h1.png?width=1122&format=png&auto=webp&s=d626161840fd609b21230d1ada8f08d805c282e6

https://preview.redd.it/lfpxv1dvg82h1.png?width=1122&format=png&auto=webp&s=e3f0419c44b62dd75bc4007dda29b80ad6b5191d

https://preview.redd.it/jc4n22dvg82h1.png?width=1122&format=png&auto=webp&s=6853cceab81ac87e1c551d9206fd6deca09a3867

https://preview.redd.it/udseq1dvg82h1.png?width=1122&format=png&auto=webp&s=fdc5b1d1275b4d96067319d2f2e307efd7d13ad9

https://preview.redd.it/mlfsy1dvg82h1.png?width=1122&format=png&auto=webp&s=a08fea65f336f942390a5b2246828b6f4a6193dc

https://preview.redd.it/moe5q2dvg82h1.png?width=1122&format=png&auto=webp&s=878b364380ce19f68978c2c33055ec9863d87aa1

reddit.com
u/ren_cross — 2 hours ago
▲ 24 r/comfyui+1 crossposts

Anima + turbo lora + 2x 5060ti = 4s

I was looking for performance benchmarks for the 5060Ti in a dual-GPU setup with Anima, but didn't find much. Hope this helps anyone looking for similar benchmarks for this specific hardware configuration.

Hardware & Software:

  • GPUs: 2x RTX 5060Ti (OC +250/+2000) connected with pcie 4.0 x8
  • Base Model: Anima v1.0 (HF)
  • LoRA: Turbo LoRA (Civitai)
  • Plugin: Raylight (GitHub)

Performance:

Resolution Lora Compile ulysses ring Time (s)
1024x1024 ON ON 1 2 3.8
1024x1024 ON OFF 1 2 4.0
1024x1024 OFF ON 1 2 21.4
1024x1024 OFF OFF 1 2 23.5
---------- ---- ------- ------- ---- --------
1584x1584 ON ON 1 2 7.9
1584x1584 ON OFF 1 2 8.6
1584x1584 OFF ON 1 2 ERR
1584x1584 OFF OFF 1 2 ERR
1584x1584 OFF ON 2 1 60.4
1584x1584 OFF OFF 2 1 67.7
---------- ---- ------- ------- ---- --------
2048x2048 ON ON 1 2 13.0
2048x2048 ON OFF 1 2 14.5
2048x2048 OFF ON 1 2 85.5
2048x2048 OFF OFF 1 2 98.0
2048x2048 OFF ON 2 1 105.1

*typo in workflow. Compile backend must be inductor, not cudagraphs (trigger err)
*workflow embedded into images

u/MagentL — 10 hours ago
▲ 125 r/comfyui+1 crossposts

Update Characters generator - v1.3 Now with Anima! | Generation of detailed сharacter for full body

Good afternoon!

This is an update to my character generation workflow.

I was very pleased with the release of Anima-Base. It is quite flexible, has a lot of knowledge about characters, and generates different styles perfectly, and its turbo-lora gives quite high-quality results. However, I had to adjust a little to its behavior in img2img.

It used to be called "Sprite generator" referring to the images of characters from visual novels, but I decided that "Characters generator" would cause less confusion.

What's changed?

- Added the ability to specify indentations at the edges of the frame so that the character does not go beyond it.
- Improved tile upscaler using "anima-lllite-inpainting-v2"

Link

u/Ancient-Future6335 — 15 hours ago

I've worked to optimize this workflow and add Ollama to help with Prompts!

I've worked (I was going to say hard, but it was mostly time) on making the stock Flux.2 workflow better optimized for my RTX 3080 12GB GPU. This setup uses 2x Ollama runs to optimize the prompt generation, and a different Flux.2 Klein model in a GGUF format.

Running 1 pass like this takes about 1 1/2 minutes for the prompt execution plus the image generation. It's about 1 minute for just the image gen, if you get a prompt you like and just re-use that.

Here's the Google drive link: https://drive.google.com/file/d/17HxoWFYnvkXoOmFziuacttjjd5LeKHk3/view?usp=drive_link

The custom nodes I'm using are:

RGThree-Comfy

comfyui-Ollama

ComfyUI-KJNodes

Comfyui-Memory_Cleanup

And then in Ollama (I'm on Windows, so it's a separate app) I'm using the gemma4:e4b model since it's very good at creative writing and image detection.

Let me know what you guys think!

u/MakionGarvinus — 7 hours ago
▲ 241 r/comfyui+5 crossposts

I cracked the time-freeze cinematic trick — one selfie + Seedance 2.0 reference-to-video = a 15s "snap → frozen world → snap" hero clip with native sound design ❄️ 🎬✨

I am using https://muapi.ai along with the claude skill from here. It has the most powerful seedance 2 with realistic faces support https://github.com/SamurAIGPT/Generative-Media-Skills/blob/main/library/motion/freeze-effect-video/SKILL.md

After about 40 failed runs, I finally cracked the "Quicksilver / Zack Snyder

time-stop" effect in pure AI — the one where the character snaps their

fingers, the world freezes mid-explosion (beer droplets hanging in midair,

popcorn floating, people locked mid-cheer), they stroll through the frozen

scene, snap again, and reality slams back to life.

Standard image-to-video completely fumbles this. Either (a) the whole shot

freezes including the protagonist so nothing happens, (b) you get this jittery

half-motion glitch where the "frozen" extras are doing weird micro-twitches

that scream AI, or (c) the model just ignores you and renders a normal bar

scene with vibes. 15 seconds of "one person moves, 47 other people don't, but

the scene still feels alive" is too many physics-violating instructions for a

single vague i2v prompt to hold together.

The fix turned out to be three layered tricks that the freeze-effect-video

skill bakes in by default.

The Winning Workflow:

Step 1 — bytedance-seedance-2-0-reference-to-video-fast takes ONE reference

photo of the subject (the only person who'll actually move) as @Image1. That

identity anchor is what survives the full 15s without face drift, and

crucially it tells the model "everyone else in frame is not @Image1, therefore

freeze them." The selfie does double duty as casting and as a hard masking

signal.

Step 2 — Time-segmented director brief with FIVE explicit beats, hard

timecoded:

- [0:00–0:03] Sports bar packed, blurred TVs showing a championship

celebration, subject walks confidently through the chaos and snaps their

fingers

- [0:03–0:06] A spherical shockwave bursts from the fingertips, air distortion

+ light refraction rippling outward, EVERYTHING freezes — golden arcs of beer

suspended midair, popcorn floating, neon catching dust and liquid, absolute

silence

- [0:06–0:09] Only @Image1 moves. Soft echoing footsteps. Camera tracks

backward as they duck under a suspended arc of beer and pluck a single

floating popcorn kernel from the air

- [0:09–0:11] They stop in front of a frozen fan locked mid-scream,

mid-high-five, tilt their head, adjust the brim of their cap, whisper

"perfect"

- [0:11–0:15] Snap again, reverse shockwave ripples outward, motion explodes

back — beer splashes, cheers return, people land mid-jump, camera pushes

through the celebrating crowd, fade to black

Step 3 — The load-bearing trick most people skip: an explicit Sound Design

line at the bottom of the prompt — "deafening bar celebration → snap → deep

shockwave bass drop → absolute silence → footsteps → sharp popcorn crunch →

'perfect' → snap → reverse shockwave → deafening celebration returns."

Seedance 2.0 generates audio natively, and if you omit this, the model fills

the silent freeze section with random ambient noise that completely murders

the effect.

The crazy part: I expected to have to comp the bass-drop and the dead-air

myself in DaVinci with a separate foley pass. Nope. Seedance writes the

silence into the timeline at the exact frame the shockwave hits. The cheer

cuts off mid-syllable. The popcorn crunch is on a clean track. The

reverse-snap re-explodes the crowd noise. It just shows up correct.

Side by side it's not even close — generic "snap fingers time stops" i2v gives

you something that looks like a video buffering bug by second 4. The

freeze-effect skill version genuinely looks like a 15s hero shot pulled from a

superhero teaser.

And it's not just bars. Swap the scene in the skill — frozen wedding reception

with rice and confetti hanging in midair, freeze-walking through a nightclub

at peak drop, freeze a stadium during the championship goal with foam

suspended above the crowd, freeze a busy NYC crosswalk with cabs caught

mid-honk, freeze a paintball arena with pellets hanging in midair. The

five-beat snap → freeze → walk → snap → resume structure holds for any

high-energy crowd scene where the contrast between chaos and absolute

stillness carries the shot. I think this is currently one of the strongest

pipelines for hero-character cinematic moments where you need a

physics-violating effect to read as intentional instead of as an AI artifact.

Highly recommend the open-source Freeze Effect Video skill — it ships with the

5-beat director brief, the shockwave/reverse-shockwave symmetry, the "only

@Image1 moves" identity lock, and the native sound-design arc baked in. Drop

in any selfie, change the venue, ship it.

Who else is making time-stop or bullet-time style hero clips with this stack?

Drop your best freeze moments, snap-and-stop scenes, or wildest "everyone but

me is paused" experiments below 👇

Let's see who can freeze the wildest scene! ❄️ 🎬⏸️

u/Individual_Hand213 — 17 hours ago
▲ 28 r/comfyui+1 crossposts

"Mossy path" - revisited by monocular stereoscopy

The first displayed image is one from a stereoscopic pair published by stubeans a few hours before this post..

The second image is a side-by-side stereo pair derived from the single image by using the 3D_SBS python tool in a ComfyUI workflow (all the software open-source and freely available for use offline).

The point of this exercise is to demonstrate that an ordinary 2D photograph of a 3D scene contains the necessary information for the brain to construct a 3D view. The ocular cortex of the brain is given the necessary prompting by the results of an image depth/perspective analysis being separated into the necessary two images.

There will be subtle differences between viewing the true stereo pair and the ersatz pair. In this instance they seem to be absent or minor. Sometimes, calculations leading to a constructed stereo pair go a little astray and anomalies will be visible when the combined image is perused in the brain.

The construction algorithm has several parameters enabling tweaking the result to alter the impressions of depth and focus.

Whilst dual lens recording equipment can give optimum results, the enhancement of images taken using lesser apparatus should not be gainsaid. Moreover, 2D pictures of scenes made by artists take on new interest when rendered into 3D; arguably they more closely represent what the artist had in mind, but could not fully realise because of the nature of the medium. Enhancements of this nature don't replace the original constructions, yet they might attune the minds of the artists and the viewers of their works more closely.

I intend to present examples of paintings and drawings revisited in stereoscopy.

u/Statute_of_Anne — 12 hours ago

I made 3 ComfyUI nodes for WAN 2.2 multi-segment video prompting

Been building long-form WAN 2.2 videos with chained segments and got tired of writing prompts manually every time. So I made these:

WAN Prompt Builder (Groq) : single segment, clean WAN-optimized prompt from subject/action/camera/lighting inputs

WAN Prompt Builder Trio : generates 3 coherent prompts for 3 chained segments (~7s each), automatic camera progression wide > medium > close, same scene, no jarring cuts

WAN Prompt Builder Vision : same as Trio but takes an image as input and builds the prompts around it

WAN 2.2 prompt rules are baked in (one action per segment, no chaining, correct sentence structure). Needs a free Groq API key.

Not trying to change the world, just scratching my own itch and figured someone else might find it useful.

reddit.com
u/East_Brilliant569 — 10 hours ago
▲ 13 r/comfyui

Does LoRA order matter?

Just as the post title says, does LoRA order make a difference when using lots of them in succession? I'm assuming it does, but am just wondering if anyone has any practical advice or suggestions about how to approach this

reddit.com
u/Imaginary_Belt4976 — 16 hours ago
▲ 16 r/comfyui

[Free Grab] Juggernaut Z — Cinematic Still Plate Workflow for AI Filmmaking

I've been building a still plate workflow for filmmaking-focused pipeline around Juggernaut Z (ZIB) and wanted to share it with the community. Completely free. If it saves you time and you feel like buying me a coffee, my CashApp is: $miguivaotero.......but genuinely no pressure, no strings attached, because I believe in free collaboration for open source models.

{{{{{DOWNLOAD LINK}}}}}

https://drive.google.com/file/d/1Z2m6PVaWObNHl44SlcKTlkrdnnrzUXGr/view?usp=drive_link

{{{{{DOWNLOAD LINK}}}}}

** PLUG AND PLAY ** (ready for generating)

Description:

What's in the workflow:

  • Juggernaut Z as primary model with full LoRA support
  • Two-pass sampler pipeline for texture refinement
  • SeedLogger (Inspire Pack) for seed tracking and repeatability across scenes, essential for multi-shot narrative consistency
  • Use Everywhere nodes for clean global routing meaning NO SPAGGHETTI
  • Full cinematic aspect ratio library baked in as selectable groups: 4:3, 3:2, 16:9, 5:4, Academy Flat, Flat 1.85, Scope 2.39, Cinemascope, Panavision 70, IMAX
  • Global VAE routing
  • Clean output naming

Why Juggernaut Z over SDXL or Turbo: Prompt precision and character repeatability across scenes matters more for filmmaking than raw texture scores. Z-Image's natural language S3-DiT architecture gives you semantic control that tag-based SDXL prompting simply doesn't. Juggernaut Z adds the texture and lighting quality on top of that foundation.

Required custom nodes:

  • ComfyUI Inspire Pack
  • cg-use-everywhere
  • ComfyUI core 0.15.1+

----------------------------------------------------------------------------

Hardware note: Built and tested on M4 MPS 24GB unified memory. CUDA users should run fine but flag anything weird in the comments.

Model path: Remap ZIB/juggernaut-Z v10.safetensors to wherever you've stored your Juggernaut Z locally.

link for model download:

https://civitai.red/models/2600510/juggernaut-z

i answer any questions regarding this workflow here or on my private chats.

ENJOY!!!!!!!

u/Sir_Latent — 8 hours ago
▲ 123 r/comfyui

ComfyUI Tutorial: Realistic AI Lip Sync Dubbing with LTX 2.3 LORA Low Vram workflow (6 Gb Vram,16 Gb of Ram)

Hello everyone, in this tutorial we explore the new ic lora released by lightricks named ic lora LipDub, this model enable lip dubbing . Which will allows you to dub any video at any languages. For that I tested the workflow for French, italien, german, Japan, Arabic, Spanish languages. The custom workflow allows you to do automatical translation all you have to do is load your video and speech then click run. The workflow is optimized to run on 6gb of vram without craching.

Workflow Link

https://drive.google.com/file/d/1mk37QNbxVIOYo0-1OvOjfWdDyXzav_5M/view?usp=sharing

Video Tutorial Link

https://youtu.be/5hmismj1LQc

u/cgpixel23 — 19 hours ago
▲ 41 r/comfyui

Wan2.2 vs. LTX2.3: Which video generation model do you recommend?

Hi everyone! I recently got hooked on generative AI. I’ve been having a blast running things locally using ComfyUI and experimenting with different tools. (By the way, my specs are an RTX 3060 12GB VRAM and 64GB RAM.)

When it comes to video generation, which one would you recommend: Wan2.2 or LTX2.3? Of course, I know it's not a direct apples-to-apples comparison since LTX2.3 also generates audio tracks, but I'd love to hear your thoughts and experiences!

EDIT: Thank you all so much for the amazing advice! I'm going to take these insights and just enjoy creating videos based on my specific needs.

If there are any other video generation "babies" out there struggling to choose between the two like I was, I really hope this thread helps you out. Bye! 🚀👋

reddit.com
u/Internal_Jury1523 — 23 hours ago
▲ 26 r/comfyui

The TikTok "color analysis" trend, but as a one-node ComfyUI workflow — drop in a single portrait, get back a 4K Dior-style editorial board with your best colors, undertone, makeup guide, hair, jewelry, and capsule wardrobe in one shot🎨👗💄✨

Workflow link: https://github.com/SamurAIGPT/muapi-comfyui/blob/main/workflows/MuAPI\_Skill\_ColorAnalysisBoard.json

If you've been on TikTok in the last year you've seen the Korean / Japanese **color analysis** trend — women flying to Seoul or paying NYC stylists $300–$500/hr to sit in a chair with draped fabric swatches while a consultant pronounces them a "Soft Autumn" or a "Deep Winter," then hands them a printed board of best colors, undertone, makeup palette, and capsule wardrobe.

I tried to fake the output with regular ComfyUI workflows for two days and got nowhere. Standard pipelines fumble it three ways: (a) `flux-dev` "color analysis board for this person" gives you a Pinterest moodboard of unrelated stock photos, (b) `nano-banana-edit` keeps the face but renders the "palette swatches" as blurred rectangles with hallucinated nonsense hex codes, (c) anything 1K or below makes the small magazine-style typography unreadable — the whole point of the board is the *legible labels* under each panel.

The fix is one specific edit model, one very specific aesthetic anchor, and 4K resolution.

**The Winning Workflow:**

**Step 1** — Single node: `MuAPIImageToImage` with model `gpt-image-2-image-to-image`. This is the only edit model I tested that holds the reference identity *and* renders dozens of small legible labels ("Your Best Colors," "Undertone: Cool," "Capsule Wardrobe," "Hair," "Jewelry") in the same image without text drift. Flux Kontext gets the face but garbles text. Nano-Banana gets text but loses the face. GPT-Image 2 does both.

**Step 2** — The load-bearing aesthetic anchor: prompt it as *"high-end editorial Color Analysis Board in a luxury fashion magazine style (Dior / Ralph Lauren aesthetic), clean beige/ivory background, minimal elegant typography, grid-based layout."* Without "Dior / Ralph Lauren" the model defaults to scrapbook-y Pinterest energy with mismatched fonts. Without "grid-based layout" you get a single hero panel instead of the 8-panel magazine spread. Those two phrases are the entire vibe.

**Step 3** — Output at `image_size: 3840x2160` (already wired in the workflow's `extra_params_json`). The board has 8+ small labeled panels — swatches, undertone strip, makeup grid, capsule wardrobe — and at 1024 res the labels under each swatch turn to mush. At 4K every fabric name and undertone label is readable, *and* the board doubles as a desktop wallpaper / Pinterest landscape pin without re-cropping.

**The trick most people skip:** the input portrait matters more than the prompt. Bad lighting = bad palette read. The model literally reads your skin, hair, and eye color off the source image to pick swatches, so:

- front-facing, eyes open, natural light (not blue-hour, not sodium-lamp, not a TikTok filter)

- no sunglasses, no heavy makeup, no color-cast (the orange glow from a sunset will push you "warm autumn" even if you're a cool winter)

- hair visible, not in a cap

Give it a clean portrait and the board reads correctly — your actual undertone gets marked, the "best colors" panel skews to your real palette, and the makeup grid recommends shades that would actually look good on you. Give it a blue-tinted phone selfie and the model thinks you're an Icy Winter regardless of reality.

The crazy part: the board includes panels the model wasn't even explicitly asked for in the prompt — it adds "Colors to Avoid," "Prints that Flatter," "Style notes," sometimes a small Pantone-style color number under each swatch — because it's been trained on enough actual fashion magazine spreads to know what belongs there. The Dior/Ralph Lauren reference primes it for *all* the editorial conventions, not just the literal layout.

Side by side, the "consultant board" the AI ships in ~30 seconds reads more polished than the printed PDFs most $300 in-person consultants hand you. The fabric swatches are fabric, not flat rectangles. The makeup palette looks like actual makeup product photography. The capsule wardrobe outfits are styled, not stock.

Drop in one portrait, hit Queue Prompt, get a 4K board. Use it as: a personal style reference, a Pinterest landscape board, a desktop wallpaper, a gift to the friend who keeps asking "do I look better in warm or cool tones?"

Highly recommend the open-source ComfyUI workflow — it ships pre-wired with the gpt-image-2 model, the editorial prompt, and the 3840x2160 resolution baked into the node. Three nodes (LoadImage → MuAPIImageToImage → SaveImage), one queue, one board.

Who else is doing personal-styling outputs in ComfyUI? Drop your best color analysis boards, capsule wardrobes, or "you in your colors" outfit grids below 👇

Let's see whose AI consultant out-styles the $300/hr human one the hardest 🎨👗💄✨

u/Individual_Hand213 — 18 hours ago

General dual GPU questions

I recently got a free eGPU cage that connects via oculink cable. connected, fresh installed drivers and both GPU are detected and working. 16GB and 12GB cards. It doesn’t seem to help in compfy?

Image gen was never an issue. Video is where I wanted improvements. there is no noticeable improvement.

  1. you can move text encoder to GPU 1
  2. Comfyui still caches about 40% of the model into shared memory
  3. Even using an 8GB quant, fully in memory, the generation doesnt go any faster.

for reference it’s about 32 sec/it on my 4080 super. i9-14700KF, 64GB DDR5, eGPU is a 3080ti. So basically it saved the CPU from doing text encoding and that’s entirely it. yes you can move vae to it too but Wan2.1 vae which is what I’m testing is a mere 200-300mb.

Also Crystools broke and I have to stop using a specific SVI flow. feels like going back to square one.

reddit.com
u/redpandafire — 21 hours ago
▲ 4 r/comfyui+2 crossposts

I built a Windows app that pins your model weights in RAM so you stop waiting for disk loads on every model swap - looking for feedback

If you run multiple models in the same session, be it a coding LLM, a reasoning LLM, different ComfyUI checkpoints depending on what you're generating, you already know the problem. Every swap loads gigabytes off disk. Fast NVMe makes it bearable. SATA or spinning rust makes it genuinely painful. And Windows will evict those file cache pages whenever something else needs memory, so you can't count on the OS keeping them warm for you.

I wrote a Windows app called EWE (Extended Weights Exchanger) that addresses this directly. You add your models to a "warm map," set a RAM budget, and EWE pins the weights using Windows memory APIs so they can't be evicted. The next time any application loads that model, it reads from RAM instead of going back to disk. On my setup, swaps that were taking 60-90 seconds now take under 5 seconds.

https://preview.redd.it/q6t7o1nqr42h1.png?width=900&format=png&auto=webp&s=bf4eae93cbb1254fb759a28410db9004d2b4d691

It's not magic - you need enough system RAM to hold what you want to keep warm. But if you have spare RAM sitting idle while you work, this is a pretty direct use for it.

The app is at https://accord-gpu.com/ewe/ if you want to look at what it does. Currently collecting free early access accounts and enrollments for beta access to the products I'm building. EWE is going to be a one-time purchase (no subscription), and I want to get real users on it before setting the price.

A few things I'm genuinely curious about from this community:

  • I wrote this for Ollama and ComfyUI specifically on my box. It reads the Ollama blob manifests and loads .gguf, .safetensors, .ckpt and .pth files so far. What other model formats should it support, and what other applications should I be checking against for compatibility?
  • Is this a workflow pain you actually have, or do most people just absorb the downtime between model uses?
  • Is there an obvious feature I'm missing?
  • What would a fair one-time price look like for something like this for a perpetual license?

Honest feedback is more useful than encouragement here. If this solves a problem you don't actually have I'd rather know now.

reddit.com
u/MrAddams_LibraLogic — 13 hours ago

What can I do with this laptop?

This laptop:

MSI Vector 17 HX AI Gaming-Laptop, 17 Zoll QHD Plus 240 Hz Display, Intel Core Ultra 9 275HX, NVIDIA GeForce RTX 5090, 32 GB RAM, 2 TB SSD, Windows 11 Home

Would love to make some AI Generated Pictures and Videos. Any recommendations on what I can do with this engine?

reddit.com
u/Pupsi42069 — 16 hours ago
▲ 160 r/comfyui+1 crossposts

HY World + Sharp, 360 Panorama Gaussian Splat

I was trying to get the HY World 2.0 / WorldMirror v2 and Sharp to work together in order to create something where a room could be explored. This is as about as far as I got. It's still missing something. *Scale button doesn't work with HY World nodes*. But yea, scaling the splat could help. Also, moving the camera really sucks, but I think that's the scale of the actual full splat just not being loaded properly, and I need to figure that out--either through the nodes available or creating my own (which would be hard af for me, not being a coder). If anyone has ideas, maybe I could throw a sheet together to see if Gemini can craft something. But regardless of all that, it's nice to finally get a panorama working in 360 viewable now.

u/DJBFilmz — 1 day ago

i am experimenting with wordless music and acestep1.5.

I asked some llm and it seems it is possible. glossolalia or speaking in tongues..
I'm working on a song about a woman's emotions and using images to try to put a video to it. Has anyone had success with this challenge?

here is what a verse for acestep 1.5 looks like
[Verse 1 - Wave One](breath-driven rhythm, close mic, rising softness)Li-a-ma, se-re-na, vo-lu-meAi-ro-sen, ka-li-dra, ne-vaTae-von, si-le-ni, o-ra-shaGa-re-lo, me-li-se, no-vae
reddit.com
u/tostane — 18 hours ago