u/Useful-Thing-1400

I'm trying to build an agentic ai system which handles booking services and suggestions for a car dealership and service centers.
techstack:

stt - whisper model
tts - gtts
llm - llama 70b versatile
backend - python
db - postgres

I have already made backend but facing some latency issues
I also have to implement this like a calling system

Current call flow:
User speech → STT → text → LLM → response text → TTS → audio output

Latency :

STT: 300–700 ms
LLM: 1.5–3s (depending on response length)
TTS: Adds another 500 ms – 1s, especially for longer replies

Architecture:

Capture audio input
Send to STT
Pass transcript to LLM (API-based)
Generate response
Convert response to speech via TTS
Stream/play audio back

Right now, the system is not streaming end-to-end — it’s more of a sequential pipeline.

[This is just a college project so free tools are much appreciated :)]
I also dont have much experience with these kinds of projects so I'm just vibe coding this right now :|

Need help for a calling based agentic ai project