
Hey r/tts,
I've been working on a local-first audio production tool built on top of IndexTTS2.
It includes a Script Canvas for structuring scenes + emotion detection, a Voice Studio for reusable characters, and a timeline for mixing takes with SFX/ambience.
Still early beta and Docker-based (NVIDIA GPU). Curious if anyone here is interested in this kind of workflow tool.
Repo: https://github.com/JaySpiffy/IndexTTS-Workflow-Studio
(Old prototype code is on the legacy-v2 branch)