MVP stack advice for voice-based AI
I’m building an MVP for a voice-based conversational AI system related to psychology. The flow is: user speaks → STT → LLM → response → evaluation based on structured criteria. Since it involves psychological interactions, I need an LLM that can understand emotions and nuanced conversations well. I’m planning to “train” the behavior mainly through prompt engineering rather than fine-tuning.
Current stack idea:
- STT/TTS: Sarvam AI (for Indian language support)
- LLM: Claude Haiku (for fast, low-cost responses)
I’m a non-technical founder using AI/no-code tools for MVP, so I want something reliable and scalable without overcomplicating.
Questions:
- Is this architecture (separating STT/TTS and LLM) a good approach?
- Is Haiku enough for both conversation and evaluation?
- Any better alternatives for this use case?
Would really appreciate your suggestions