u/Famous-Sport7862

▲ 45 r/comfyui+1 crossposts

User @wildmindai from X posted about this new model. Has anyone here tried it yet?

LTX 2.3 audio as standalone speech model.

Emotional TTS with Scenema Audio.

- Zero-shot expressive voice cloning, speech gen

- 8-step distilled with Gemma 3 12B text encoding

- stage directions via <action> tags

- runs at 1.5x real-time on RTX 4090

- fits in 16GB VRAM

- 13 languages, 48kHz stereo output

it also gens matching environment sounds