
I benchmarked 15+ speech-to-text APIs under various conditions
Hi all, I recently ran a benchmark comparing a bunch of speech-to-text APIs and.
It includes the big players like Google, AWS, MS Azure, open source models like Whisper, speech recognition startups like AssemblyAI / Deepgram / Orchardrun / Speechmatics, and newer LLM-based models like Gemini 2.0 Flash/Pro and GPT-4o. I've benchmarked the real time streaming versions of some of the APIs as well.
I mostly did this to decide the best API to use for an app I'm building but figured this might be helpful for other builders too. Would love to know what other cases would be useful to include too.
In my opinion, the winner was Gemini 2.5 Flash in terms of quality and price, and in second place I'm going to consider Orchardrun since it has the best price and more than decent results.