r/LTXvideo

I built a site to create free AI videos using LTX 2.3 running on my own GPUs
▲ 193 r/LTXvideo+1 crossposts

I built a site to create free AI videos using LTX 2.3 running on my own GPUs

Lately I’ve been working on my project loremotion.com **.**The goal was simply to let anyone create AI videos without credits, subscriptions, or limits. To actually make that possible, I had to skip the APIs and build my own infrastructure.

I’m mostly using open-source models like LTX 2.3 and Wan 2.1. I’ve personally found LTX 2.3 (specifically the 1.1 distilled version) to give the best results for the speed I’m aiming for. Right now, I’ve capped it at 720p/10-second clips for both Text-to-Video and Image-to-Video.

The Hardware Setup: I’m running this on my own cluster. I’ve got four of my own GPUs (30 and 40 series) and I rent the rest on-the-spot (A100s and RTX Pros). It actually keeps my costs incredibly low—around $8 a day—which is why I might be able to keep the generations free. all wired to Wan2GP

Performance: Depending on which GPU grabs your task, a 720p 10-second render usually takes between 50 and 110 seconds(if there's any way i can get much lower generation time, please do let me know)

Features:

  • Dashboard: Your clips stay there for 48 hours before they’re cleared.
  • Discover: You can choose to push your best renders to a public gallery.
  • Email Alerts: If the queue gets backed up, you can drop your email and I’ll ping you when it's done.

The Catch: To keep the lights on and break even, I had to put ads on the site. I know they’re annoying, but it’s the only way I can offer unlimited generations without a paywall.

Next on the list is getting Video-to-Video working, so if you have ideas on how to improve the generation speed, better models to check out, or features you actually want, please let me know.

Check it out here:loremotion.com

u/Fine-Veterinarian537 — 4 days ago
▲ 20 r/LTXvideo+2 crossposts

Is it possible to FEEL real acting with Open Source AI Tools? ( A little experiment)

I spent two weeks working on this at my company for learning and reach purposes. Tried to see if you can create compelling shots. In my opinion, you can, and better than Seedance. (Emotion, not action). But you be the judge. I'll wait and see and if anyone wants I'll share my workflow.

Spaghetti Shortfilm by Arturo Pola

reddit.com
u/a-ijoe — 12 hours ago
▲ 116 r/LTXvideo+1 crossposts

Create automated AI music videos with my full LTX 2.3 workflow for ComfyUI. FREE and LOCAL

Create automated AI music videos with my full LTX 2.3 workflow for ComfyUI FREE and LOCAL!
Sample videos that was created using my Workflow.

https://reddit.com/link/1t7ohql/video/u2ngig8bz40h1/player

https://reddit.com/link/1t7ohql/video/dleorv0u100h1/player

In this walkthrough, I show how the workflow takes a song, analyzes the timing, creates scene prompts from lyrics, and generates a finished music video using LTX 2.3.

The walkthrough video is too long to share in here so please go watch on YouTube: HERE

The workflow is split into two parts.

🎵 Workflow 1 handles audio upload, beat detection, scene timing, lyrics, style and theme, story idea, subjects and locations, and prompt generation.

🎬 Workflow 2 handles the actual video generation, including an image-to-video with zImage Turbo and LTX 2.3 workflow, and a text-to-video with LTX, LoRa workflow that both have advanced prompt controls, scene generation, Remake Mode, and final video stitching.

✨ This workflow is designed to reduce manual setup time while still giving you control over style, characters, camera motion, timing, seeds, LoRas, and final edits.

💡 For the best results, I recommend starting with the default settings first, then experimenting with LoRas, seeds, advanced settings, and Remake Mode as you get more comfortable.

⚙️ Requirements:
ComfyUI
LTX 2.3 models
Z-Image Turbo model
FFmpeg installed for audio stitching
My vrgamedevgirl custom nodes
Impact Pack custom node for auto-queue
llama-cpp-python

At least 16gb of VRAM - 12 "might" work but I have not tested it.

💬 Join my Discord for support, updates, beta features, and to share your work:
Discord Server

⬇️ Download my custom nodes and workflows:

Custom nodes: GitHub custom nodes, or use manager

Workflows are in here: Workflows

Hugging face : HERE

#ComfyUI #LTX #AIvideo #AIMusicVideo #TextToVideo #ImageToVideo #AIWorkflow #GenerativeAI

reddit.com
u/Cheap_Credit_3957 — 5 days ago
▲ 59 r/LTXvideo+1 crossposts

Inpainting with LTXV 2.3. Results after two weeks of R&D.

Hello!

I am a designer at DOGMA, we do AI work for tv ads, shows and movies, a Netflix show we worked on recently came out on Netflix Ita, the company had the first meeting in Hollywood last month.

50% of our work is inpainting on videos, 100% of our work for Netflix was inpaintings, so I've spent the last few weeks doing R&D with LTXV 2.3 to see if and how the tool can help in the practical needs of the movie business. We strongly believe in the sociocultural importance of open-source.

First of all huge thanks to u/ltx_model for becoming the main paladin of the democratization of open-source video generation tools and for the constant improvements on their model, the incredible HDR lora is something we were not expecting so soon, please keep up the amazing work; from our tests LTXV 2.3 T2V and I2V can be pushed locally up to 5K resolution, with results that have very little to envy from the closed-source Seedance 2. Congratulations also to u/Round_Awareness5490 for his outstanding experimental work and effort in creating loras that extend the capabilities of the main model.

Here is the recap of the R&D (translated from italian to eng).

---

Method 1 / No inpainting LoRA:

You use Add Guide Multi with 2 reference frames, first and last, while the original video goes into VAE Encode. Then you apply an LTXV latent mask to the area that needs to be modified.

Problems: as always when using multiple guide inputs for inpainting, some parts flicker and do not match the original video, especially in the frames close to the first and last reference frames. There is no other way to provide reference frames with this method except by adding more entries in Add Guide Multi. In practice, it is a kind of denoise. It works very well if you do not need precision and can avoid reference frames, relying only on the prompt/lora.

---

Method 2 / Inpainting with the model ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors:

The 3000-step version seems to be the only one that works most of the time.

This model is trained to take as input a video where the original video is on the right, with the part to be inpainted marked in magenta, and a small reference frame on the left. As output, it provides the final inpainted video using that reference. It does sometimes work also if you send as input the whole video with no reference and a white overlay on the masked area (similar to VACE).

Problems: it is excellent if you put Trump’s face in the small reference frame, but terrible if you need something precise, because the mini-frame is not even 200px wide, so it has no way to capture precise information. Adding Add Guide Multi partly solves this, but then you are back to the Add Guide Multi problem, meaning flickering and, above all, a mismatch with the original video close to the reference frames. Sending as input only the video with the purple masked area, with the first and last frames already set the way you want them, often, but not always, results in videos where the purple or white artifacts come back in form of smoke or solid color.

--

Method 3 / Inpainting with the model

ltx23_inpaint_rank128_v1_02500steps.safetensors

or the model

ltx23_inpaint_rank128_v1_10000steps.safetensors

This model does in fact take the area to be inpainted in the same way VACE did. Here, it seems that the masked area should be white instead of purple. This LoRA does not support any kind of reference, so it is useful for inpainting based only on the prompt. Here too, Add Guide Multi can be used to force it to use start and end reference frames, with all the problems and inconsistencies of usage of the previous method.

I tried many variations for each method. For example, I tried passing only the video with the mask applied to all frames except the first and last. I tried using a KSampler Advanced to apply denoise only during the final steps. I tried raising the CFG up to 2.5. All these methods sometimes produce decent results, but never consistent ones. The video that came out well yesterday was a complete fluke. If you change the mask by 1px, it may suddenly, randomly, come out well. Change the seed or change the mask by 1px, and the white or purple little clouds may come back.

--

Besides, the author of the inpainting LoRA himself added a huge number of clarifications on the project page, which basically means: it does not work always perfectly without fiddling with parameters, which means we can use it but we can hardly pass a general workflow to a junior at the company to speed up production.

None of the official or unofficial workflows I found does the exact kind of work we need: replacing only one part of a video with something for which we provide an exact visual reference, eventually mixed with depth/canny masks, while keeping and matching the original input video exactly, both in terms of resolution and spatiotemporal coherence.

In all these cases, the only way to get back the original video with only the inpainted part changed is still to recomposite the model output over the original video using the mask. This happens because even if you run inference only on a masked part of the latent, your video will still pass through the VAE and therefore it will be modified. We knew this already, but we always keep hoping they will make an ad hoc model or nodes for this.
There are ways to solve it, and as you saw yesterday, somehow, sooner or later, you can get a result that works. But it requires too much time and too many attempts, at least based on what I have tested so far. What we need is an easy, fast, stable, consistent, and precisely customizable solution.

---------------
I will start re-testing today VACE 2.1 and the experimental 2.2 merge to see how it compares, VACE 2.1 felt almost magical, you could feed it very complex videos with depth maps, reference frames, pose maps, masks, all nested in a single guiding video and with zero prompt you would get exactly what you were expecting, but its generation capabilities are too old for May 2026.

reddit.com
u/axior — 6 days ago
▲ 3 r/LTXvideo+2 crossposts

LTX-2.3 Distilled 1.1 wan2gp

Using LTX-2.3 Distilled 1.1 via WAN2GP, I managed to create a small scene using a reference photo and a reference video. It was produced with a 3060Ti 8GB.

u/StifmaissTR — 6 days ago
▲ 7 r/LTXvideo+1 crossposts

Hi all,
I’m using LTX 2.3 in ComfyUI with the workflow from RuneXX:

https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

Setup:

  • RTX 5090 32GB
  • 64GB RAM

I’m running image-to-video with:

  • first frame conditioning (FF)
  • first + last frame conditioning (FLF-style workflow)

Issue

I’m consistently getting strong identity drift during generation.

This behavior occurs in both:

  • first frame only (FF) workflows
  • first + last frame (FLF) workflows

Even when using a strong reference image:

  • The original image is correctly visible only in the first frame (when used as conditioning)
  • Immediately after the first frame, the face starts to deform and change shape
  • As the sequence progresses, the model increasingly reconstructs a different identity
  • First frame and last frame influence is present but not stable or persistent

What I tested

  • different samplers
  • CFG tuning
  • frame count variations (low/high)
  • FF vs FLF conditioning
  • different guidance strengths

Result is always the same:
→ identity is not preserved across time

Main question

What is the correct way to enforce consistent identity across a full video sequence in LTX 2.3 I2V?

More specifically:

  • Is there a proper method to maintain identity continuity beyond the first frame?
  • Should identity be enforced via a different conditioning strategy (beyond FF / FLF)?
  • Is there a missing identity/face encoder or adapter step in this workflow?
  • Or is LTX 2.3 inherently not designed for persistent identity locking across frames?

Summary of questions

  1. Why does identity only survive the first frame and then degrade immediately (both in FF and FLF)?
  2. What is the correct method to enforce identity consistency across frames in LTX 2.3?
  3. How do you maintain identity continuity across multiple clips / generations?
  4. Are FF / FLF conditioning approaches sufficient for identity locking, or is another mechanism required?
  5. Is there a known best-practice workflow for stable face consistency in ComfyUI LTX?

Media

  • Reference image (input)
  • Generated frame comparison (output)
  • Video (MP4)

Images (example)

FirstFrame

Inconsistent

LastFrame

Video (example)

Video

u/White_Dragon_0 — 8 days ago
▲ 4 r/LTXvideo+1 crossposts

Hi Everyone,

I have laptop Dell Alianware RTX3080 8Vram 32 GB Ram and i9 10980HK.

i have tried LTX2.3 wan2GP i tried to generate video with 1080 at 5-10 Mins.

but i switch to comfyui GGUF ltx2.3 its taking long time sometime reach 1 hour.

why what the problem between them ? or can someone assist me to build a gguf workflow might this workflow is heavy ?

https://preview.redd.it/vazakwljhryg1.png?width=1260&format=png&auto=webp&s=4eb78752212c89787bbe8d023a24aeaf06dbbc8e

reddit.com
u/Proud-Dare-8193 — 12 days ago
▲ 2 r/LTXvideo+1 crossposts

Hi all,
I’m using LTX 2.3 in ComfyUI with the workflow from RuneXX:

https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

Setup:

  • RTX 5090 32GB
  • 64GB RAM

I’m running image-to-video with:

  • first frame conditioning
  • last frame conditioning
  • sometimes both (FLF-style workflow)

Issue

I’m consistently getting strong identity drift during generation.

Even when using a strong reference image:

  • The original image is correctly visible only in the first frame (when used as conditioning)
  • Immediately after the first frame, the face starts to deform and change shape
  • As the sequence progresses, the model increasingly reconstructs a different identity
  • First frame and last frame influence is present but not stable or persistent

What I tested

  • different samplers
  • CFG tuning
  • frame count variations (low/high)
  • FLF vs single-frame conditioning
  • different guidance strengths

Result is always the same:
→ identity is not preserved across time

Main question

What is the correct way to enforce consistent identity across a full video sequence in LTX 2.3 I2V?

More specifically:

  • Is there a proper method to maintain identity continuity beyond the first frame?
  • Should identity be enforced via a different conditioning strategy (beyond FLF)?
  • Is there a missing identity/face encoder or adapter step in this workflow?
  • Or is LTX 2.3 inherently not designed for persistent identity locking across frames?

Summary of questions

  1. Why does identity only survive the first frame and then degrade immediately?
  2. What is the correct method to enforce identity consistency across frames in LTX 2.3?
  3. How do you maintain identity continuity across multiple clips / generations?
  4. Is FLF conditioning sufficient for identity locking, or is another mechanism required?
  5. Is there a known best-practice workflow for stable face consistency in ComfyUI LTX?

Media

  • Reference image (input)
  • Generated frame comparison (output)
  • Video (MP4)

Images (example)

Original

Inconsistent

Video (example)

Video

u/White_Dragon_0 — 8 days ago