u/Eastern_Rock7947

Hey everyone, I wanted to share a project I've been working on — a fully self-hosted, browser-based audio production tool built on top of the k2-fsa/OmniVoice diffusion model.

https://preview.redd.it/qcjrpgxvkxvg1.png?width=713&format=png&auto=webp&s=46fd5a44efed966e764d748a015dfa3f61c3db87

What it does:

It lets you turn a script into a finished, multi-speaker audio production — think podcast episodes, audiobook chapters, narrated videos — entirely on your own machine. No cloud, no subscriptions, no data leaving your computer.

Key features:

Voice cloning from a 3–10 second reference clip. Up to 4 independent speakers per project
Voice Designer — no reference audio? Describe a voice using attributes (gender, age, accent, pitch, style) and it generates one consistently across all your paragraphs
Timeline editor with waveform display, drag-to-reposition, trim handles, cut tool, ripple editing, and undo/redo
Media track for dropping in music, SFX or ambience alongside your voice content
Smart text parser — paste your script, it splits into paragraphs automatically (can split further into additional paragraphs if required). Use [Speaker 2]: to switch voices, [pause 2s] to insert timed silences. Drag and drop between paragraphs to auto re-order, Single or multi paragraph regenerations. Set or adaptable seed options for each paragraph
Episode save/load — saves everything: text, audio, timeline layout, voice settings, generation params
Pronunciation dictionary — fix proper nouns and technical terms once, applies to all generations
600+ language support out of the box, zero-shot
Statistics - Generation demographics

Hardware: Runs on NVIDIA GPU, Apple Silicon (MPS), or CPU. Output is 24kHz WAV.

Tech stack: Python/Flask backend, pure HTML/JS frontend (single file, no framework), OmniVoice diffusion model.

The whole thing runs locally — you just open the HTML file in a browser pointed at the Flask server. No install beyond pip install and pulling the model weights.

Happy to answer questions about this implementation which will be releasing soon.

OmniVoice Audio Studio