u/Comprehensive_Pea_66

I run a voice AI in my Discord gaming server from a Mac mini on my desk. It listens in the voice channel, talks back, and remembers things about each member across sessions. Open-sourced it this week.

What's actually self-hosted:
The bot process, the audio pipeline (VAD, STT, speaker tracking), and the per-user memory (suki_memory.json) all run on your machine. Nothing leaves your hardware except the LLM API calls.

What's not local:
The LLM calls go to Gemini (primary) and Groq (fast fallback). STT is macOS native speech recognition (Swift) — so the transcription itself is local, but it uses Apple's on-device model, not something you control. I know that's a limitation for the fully-local crowd.

Requirements:

macOS Ventura or later
Python 3.12
Gemini API key + Groq API key + Edge TTS (all have free tiers — light daily use costs me ~$0-2/mo)
Discord bot with voice permissions

macOS only for now. The STT layer uses Swift. Whisper fallback exists for Linux and passes CI on ubuntu-latest, but I haven't run a full voice session on Linux yet. If you're on Linux and want to test the Whisper path, I'd love to know if it works end-to-end.

No Docker yet. macOS native audio doesn't containerize. Docker becomes realistic once the Linux path is confirmed working.

GitHub: https://github.com/butthead0819-beep/marvin-voice-core

If you get it running, there's a "Show your setup" thread in GitHub Discussions — that's how I'm tracking whether the quickstart actually works on machines that aren't mine.

Self-hosted voice AI for Discord — runs on a Mac mini, knows your community members, MIT license