u/BaseballAlive5575

▲ 1 r/TextToSpeech+1 crossposts

Can't get any models to consistently respect the language code. 1/20 generations just say the foreign language word I input, but with an American accent. I've tried v2.5 with the previous text parameter which helps but doesn't solve 100% of cases. v3 doesn't support previous text but does a decent job. I'm using a voice that supports the target language. I need audios for individual words for foreign languages. I've also tried larger generations with splitting at the time boundaries on the individual words through a variety of methods but it is also inconsistent because the boundaries returned by the API are imperfect.

Is there any way to get the generation to always respect the language code? 1/20 generations being junk makes ElevenLabs' service unusable for my use-case. I need <1% of generations erroring in this manner. Any advice or recommendations for other APIs which work more consistently for other languages would be much appreciated.

reddit.com
u/BaseballAlive5575 — 9 days ago