
Self-hosted voice AI for Discord — runs on a Mac mini, knows your community members, MIT license
I run a voice AI in my Discord gaming server from a Mac mini on my desk. It listens in the voice channel, talks back, and remembers things about each member across sessions. Open-sourced it this week.
What's actually self-hosted:
The bot process, the audio pipeline (VAD, STT, speaker tracking), and the per-user memory (suki_memory.json) all run on your machine. Nothing leaves your hardware except the LLM API calls.
What's not local:
The LLM calls go to Gemini (primary) and Groq (fast fallback). STT is macOS native speech recognition (Swift) — so the transcription itself is local, but it uses Apple's on-device model, not something you control. I know that's a limitation for the fully-local crowd.
Requirements:
- macOS Ventura or later
- Python 3.12
- Gemini API key + Groq API key + Edge TTS (all have free tiers — light daily use costs me ~$0-2/mo)
- Discord bot with voice permissions
macOS only for now. The STT layer uses Swift. Whisper fallback exists for Linux and passes CI on ubuntu-latest, but I haven't run a full voice session on Linux yet. If you're on Linux and want to test the Whisper path, I'd love to know if it works end-to-end.
No Docker yet. macOS native audio doesn't containerize. Docker becomes realistic once the Linux path is confirmed working.
GitHub: https://github.com/butthead0819-beep/marvin-voice-core
If you get it running, there's a "Show your setup" thread in GitHub Discussions — that's how I'm tracking whether the quickstart actually works on machines that aren't mine.