u/Slight_Republic_4242

Image 1 — Quit a chill job, failed at 4 products, then built an open-source alternative to Vapi.
Image 2 — Quit a chill job, failed at 4 products, then built an open-source alternative to Vapi.
Image 3 — Quit a chill job, failed at 4 products, then built an open-source alternative to Vapi.
Image 4 — Quit a chill job, failed at 4 products, then built an open-source alternative to Vapi.
Image 5 — Quit a chill job, failed at 4 products, then built an open-source alternative to Vapi.
▲ 0 r/webdev

Quit a chill job, failed at 4 products, then built an open-source alternative to Vapi.

My previous company got acquired and I stayed on with zero workload. Pure management, basically coasting. Quit anyway because it was killing me slowly. Teamed up with my childhood friend and spent 9 months trying to figure out what to build. We shipped 4 different products during that time. All of them went nowhere.

Then we started looking at voice AI agent platforms like Vapi and Retell. The pricing model really bothered us. $0.05/min platform fee on top of the LLM and TTS costs you're already paying. You're basically renting infrastructure you could own.

So we built Dograh (https://github.com/dograh-hq/dograh) . Open source, self-hostable, bring your own API keys. It's a visual workflow builder for voice agents, similar to what n8n does for general AI agents but specifically for phone calls.

You can set up inbound and outbound call flows with drag and drop. Call transfers, variable extraction from conversations, voicemail detection, knowledge base, tool calls to external APIs. We also built a pre-recorded voice mixing system where you use actual human audio clips for predictable parts of the call (greetings, hold messages, confirmations) and TTS only fires when the agent needs to say something dynamic. Saves a ton on TTS costs and honestly sounds way more natural.

Just shipped Speech-to-Speech support via Gemini 3.1 Flash Live too, which collapses the whole STT+LLM+TTS pipeline into a single connection.

Post-call you get QA scoring with sentiment analysis and full call traces through Langfuse so you can debug what went wrong on a bad call.

We're about 6 months in now. 360+ signups last month, 1M organic impressions in the last 40 days, zero ad spend. Still very early but it's starting to compound.

GitHub: https://github.com/dograh-hq/dograh

BSD-2 licensed

Special thanks to this community that supports me with every post ❤️
Any star to the repo is a blessing ⭐️

Would love honest feedback from this community. What's missing? What would make you actually want to self-host something like this?

u/Slight_Republic_4242 — 5 hours ago
▲ 4 r/GeminiAI+1 crossposts

Tested Gemini 3.1 Flash Live for production voice calls, the feel is noticeably better but latency claims need context

Been building voice agents for a while now, and integrated Gemini 3.1 Flash Live into our open source stack as soon as the API went live. Wanted to share some honest observations

The good stuff first. The voice cadence and overall feel of calls is genuinely better than what you get from the classic STT + LLM + TTS pipeline. Turn-taking feels more natural. Interruptions are handled way more gracefully. The model just "gets" conversational rhythm in a way that stitching together STT + LLM + TTS never really achieved. Cost also looks very competitive, which matters a lot in S2S.

Now the stuff nobody seems to be talking about. We averaged around 922ms latency end-to-end in our testing. That's not bad, but it's not the sub-300ms numbers I've seen some people throw around. We were testing from Asia, so region probably plays a role here. Would love to know what others are seeing from US/EU

The other thing that caught us off guard is transcripts. You can't access them live during the call, only after it's done. If you're doing any kind of context stitching or real-time context engineering during conversations, this makes things harder. 

Honestly though I don't think we're going back to the old pipeline. The quality gap in how the conversation actually feels is too big.

We integrated this into Dograh, our open-source voice agent platform (very much like Vapi) , if anyone wants to try it themselves: https://github.com/dograh-hq/dograh

What latency numbers are others getting? And has anyone found a clean workaround for the live transcript limitation?

u/Slight_Republic_4242 — 5 hours ago

Gemini 3.1 Flash Live in production voice agents, honest results after two weeks of testing

I've been testing Gemini 3.1 Flash Live in phone call workflows and figured this community would appreciate some real numbers instead of just benchmark screenshots.

Quick context on what we're doing. We build an open-source voice AI platform (Dograh,  https://github.com/dograh-hq/dograh ) that lets you create phone call agents with a visual workflow builder. Think inbound/outbound calls, telephony integration, tool calls, knowledge base, the whole thing. We previously ran the standard stack: Deepgram/gladia etc for STT, an LLM for reasoning, ElevenLabs/cartesia etc for TTS. Three API hops stitched together.

Switching to Gemini 3.1 Flash Live collapsed that into a single connection. Here's what we actually observed.

The voice quality and conversational feel improved significantly. This isn't just "slightly better TTS." The way the model handles pauses, interruptions, and pacing makes the calls feel closer to talking to a real person. That's a meaningful jump.

Latency averaged 922ms in our tests. Honestly I expected lower based on some of the claims of sub 300ms floating around. We're testing from Asia (and US servers) which probably might explain part of the gap. If you're in the US I'd genuinely love to know your numbers.

One thing that surprised us: you can't access transcripts in real-time during the call. They're available after the call ends. This is fine for post-call analysis but it makes real-time context engineering significantly more complex. So for example-  If your agent needs to summarise context mid-conversation, you need to rethink how you're handling that flow.

The cost structure looks really competitive compared to running three separate APIs. And the model's tool-calling during live audio sessions is solid.

I think we're at a point where the old STT+LLM+TTS pipeline is starting to feel like the wrong architecture. Gemini 3.1 Flash Live isn't perfect, but it feels like the future direction.

Anyone else building production voice stuff on this? Curious about your experiences, especially around session stability for longer calls.

https://preview.redd.it/nnc4r2aq3kug1.jpg?width=781&format=pjpg&auto=webp&s=57823a77917fab520e17e63d9c8a44717792aaf5

reddit.com
u/Slight_Republic_4242 — 6 hours ago
🔥 Hot ▲ 113 r/buildinpublic

Quit a chill job after my previous startup got acquired. 9 months of figuring things out. 4 failed products. Then this.

Six months ago me and my childhood friend started building Dograh, an open-source voice AI agent platform competing with companies (vapi, retell etc)  sitting on millions in funding.

Some backstory. My previous company got acquired. I stayed on with the acquirer, zero workload, pure management, basically coasting. Quit anyway. Teamed up with my childhood friend and we spent 9 months trying to figure out what to build. Built 4 different products during that time and failed miserably. All of them went nowhere.

Then we landed on Dograh. Self-hostable, open-source, visual workflow builder for voice AI calling. Think n8n but for voice agents.

Where we are now, 6 months in. 1M organic impressions in the last 40 days. 360+ signups last month on our cloud platform. 20+ meetings booked from inbound. Turned down a VC term sheet to stay bootstrapped and open-source. Zero dollars on ads.

The thing nobody tells you about competing with deep-pocketed players is that it's crowded and slow for a long time before anything compounds. First three months of writing content felt like talking to nobody.

Still figuring a lot of this out. If you're building something open-source against funded competitors I'd love to swap notes.

GitHub: https://github.com/dograh-hq/dograh

▲ 24 r/webdev+2 crossposts

Self-host your own voice AI agent platform

I built an open-source voice AI platform you can run on your own infrastructure.

You pick your LLM, STT, and TTS providers, wire them together in a drag-and-drop builder, and deploy with one Docker command.

Built this because tools like Vapi and Retell are closed, expensive, and you have zero control over your data.

https://github.com/dograh-hq/dograh

▲ 2 r/saasbuild+1 crossposts

Self-host your own voice AI agent platform

I built an open-source voice AI platform you can run on your own infrastructure.

You pick your LLM, STT, and TTS providers, wire them together in a drag-and-drop builder, and deploy with one Docker command.

Built this because tools like Vapi and Retell are closed, expensive, and you have zero control over your data.

github.com