I’m currently working on a project where I’m trying to generate highly expressive, human-like voice output — something that feels emotional, wise, and almost “divine” in tone (think storytelling or spiritual narration rather than standard assistant voice).
Right now, I’m using the Google Gemini TTS API, but I’m running into a few issues:
The voice sounds too robotic and flat
Lack of natural pauses and punctuation awareness
No real sense of emotion, depth, or storytelling flow
❓ What I’m Looking For:
I’d love recommendations for:
TTS models/APIs that produce very natural, human-like speech
Support for emotional tone, pacing, and expression
Ability to generate “god-like” / narrator-style voices
Fine control over pauses, emphasis, and delivery
🤔 Questions:
Which TTS APIs/models would you recommend for this kind of use case?
Has anyone achieved cinematic or spiritual narration quality with current tools?
Are there techniques (prompting, SSML, fine-tuning, etc.) that can improve output quality significantly?
🙌 Context:
This is for a project focused on delivering wisdom through voice (stories, guidance, reflections) — so the quality of voice is extremely important.
Would really appreciate any suggestions, tools, or even examples you’ve worked with!
Thanks in advance 🙏