▲ 3 r/AIVoice_Agents
Need help for a calling based agentic ai project
I'm trying to build an agentic ai system which handles booking services and suggestions for a car dealership and service centers.
techstack:
- stt - whisper model
- tts - gtts
- llm - llama 70b versatile
- backend - python
- db - postgres
I have already made backend but facing some latency issues
I also have to implement this like a calling system
Current call flow:
User speech → STT → text → LLM → response text → TTS → audio output
Latency :
- STT: 300–700 ms
- LLM: 1.5–3s (depending on response length)
- TTS: Adds another 500 ms – 1s, especially for longer replies
Architecture:
- Capture audio input
- Send to STT
- Pass transcript to LLM (API-based)
- Generate response
- Convert response to speech via TTS
- Stream/play audio back
Right now, the system is not streaming end-to-end — it’s more of a sequential pipeline.
[This is just a college project so free tools are much appreciated :)]
I also dont have much experience with these kinds of projects so I'm just vibe coding this right now :|
u/Useful-Thing-1400 — 3 days ago