u/MaD1254

I’m building an MVP for a voice-based conversational AI system related to psychology. The flow is: user speaks → STT → LLM → response → evaluation based on structured criteria. Since it involves psychological interactions, I need an LLM that can understand emotions and nuanced conversations well. I’m planning to “train” the behavior mainly through prompt engineering rather than fine-tuning.

Current stack idea:

- STT/TTS: Sarvam AI (for Indian language support)

- LLM: Claude Haiku (for fast, low-cost responses)

I’m a non-technical founder using AI/no-code tools for MVP, so I want something reliable and scalable without overcomplicating.

Questions:

- Is this architecture (separating STT/TTS and LLM) a good approach?

- Is Haiku enough for both conversation and evaluation?

- Any better alternatives for this use case?

Would really appreciate your suggestions

MVP stack advice for voice-based AI