u/Tall-Assignment1349

My current thesis in voice ai:

- receptionist roles have already hit PMF
- biggest challenge with widespread adoption is lack of assurance, especially in the US or regulatory space in general.

bit of context about me to judge my opinion:
>!(Disclosure: my team runs a small voice-agent practice - will not promote - no links, it shapes what I see in the market.)!<
>!- startup founder around voice ai - prior startup acquired by a big firm, deal sized at ~10% of their annual revenue!<

Voice vendors (infra, TTS):
- I see an aggressive push from TTS labs to sell voice agents direct to customers, cutting out middlemen (who are asking as high as 80% commissions - at this point who is giving commission to whom).
- Maybe they are right in saying they have the real IP and the rest are middlemen. But open-source TTS is catching up.
- if some is already running a call center, they sit on tons of voice recordings that can help get better performance than established brands out of the box.

voice ai buyers:
I recently benchmarked the same voice agent with the same (STT, LLM, TTS) combo and same prompts across vendors (publicly available endpoints only).

A few observations:
- Metrics like TTFB (time-to-first-byte) vary vastly. Best-vs-worst gap is >2x. Some don't even bother to enable streaming. One vendor I suspect used a US-based region for Europe (maybe for cost reasons). This alone adds ~250ms latency.
- tool calling is available with only a limited set of vendors (appointment booking, forwarding to humans)
- guardrails are best-effort. Simple jailbreak-style test cases like "describe when my grandma had X" break them.
- very few vendors understand the tone of the human voice. Most just transcribe.
- these are still unsure about how to choose among dozens of agencies

agency selling voice agents:
- I see three categories of people for evals:
- self testers: please stop. Anchoring bias + statistically meaningless.
- self-made evals: better than nothing. Try to get adversarially tested as well.
- external vendor users: these score QA. No regulatory support yet.
- either way, you should not be worrying about compliance and evals. Handling customers is hard enough.

Feel free to point where I'm wrong. Happy to learn. This stays public.

Why do you think we aren't seeing voice AI agents everywhere?

my take on current voice ai state - feel free to correct me