u/EfficientLetter3654

I refused to pay for Wispr Flow (voice-to-text) so I spent two weeks rebuilding it. Free, runs locally, macOS only.
▲ 129 r/AIAssisted+1 crossposts

I refused to pay for Wispr Flow (voice-to-text) so I spent two weeks rebuilding it. Free, runs locally, macOS only.

Two weeks ago I read a study that said people speak about 3x faster than they type. One of those things you've sort of always known but never actually sat with.

So I started looking at voice-to-text apps. Wispr Flow is the obvious pick and it's genuinely good. But $15/month forever for something I'd mostly use to dictate prompts to an LLM felt like a personal insult. I already pay for too many subscriptions.

So instead of doing the rational thing (paying $15), I spent two weeks of evenings rebuilding it. The math obviously doesn't work. But yeah....

What it is

A menu bar app for macOS. You hold a hotkey, talk, release, and the transcribed + polished text gets pasted wherever your cursor is. Works in any app – Slack, browser, IDE, ChatGPT, whatever.

Two open-source models doing the work:

- Parakeet (NVIDIA) / Whisper for transcription

- Gemma 4 (Google) / Apple Intelligence for polishing the raw transcript into something readable

Everything runs locally. No cloud calls, no API keys, no telemetry, no account. Once it's downloaded it works fully offline.

Caveats, in order of importance

  1. macOS only. Apple Silicon required (M Series chip). Sorry to Intel Mac and Windows folks – Windows build is next on the list.
  2. It's two weeks old. I'd love to say there are no bugs, but I'm a realist. There are bugs I didn't find yet. There will be more bugs...
  3. I'd estimate it's at ~90% of Wispr Flow's quality. Not 100%. For me personally, it's enough to use it every day.

What it's saving me

40–60 minutes a day, mostly because I write a lot of prompts. Talking to an LLM feels more natural than typing to one. If you write a lot of emails/docs, the savings are probably bigger.

Download: vox.rizenhq.com (free for personal use, no signup)

The ask

I'm genuinely trying to figure out who this is for besides me. If you try it:

- Tell me where it breaks. I want bug reports more than compliments.

- Tell me what app/workflow you tried it in. I'm trying to understand the actual use cases.

- If there's a feature that would make you switch from Wispr Flow (or start using voice-to-text at all), let me know.

EDIT:

If you see any bugs or want to suggest features - create an issue here.

EDIT 2 (some technical specs, resource consumption, etc.):

  1. No need to download AI models separately. App will ask to click "Download" during the onboarding flow and will do everything for you.
  2. Gemma 4 models available - E2B, E4B, and 26B. E2B is very small, it'll run even on mobile phones. 26B is honestly too big and usable only by really high-end devices. I personally always use E4B - It has an amazing quality for the purpose of this app and works really fast.

Regarding resource consumption:

  1. RAM - approximately 200mb when app is not in use. When you are speaking - approximately 300mb in total. Transcription and Polishing phase - brief spike to 4-6GB for a couple of seconds and then after it's done back to 200mb
  2. CPU - when app is not in use, basically 0. When it's in use the biggest spike I saw in Activity Monitor - 20%

EDIT 3

Is it open source? Not right now. I'm considering making it open source though.

BTW, I develop it during my live streams from 8:30 am to 10:30 am ET everyday here. I show the code and decisions I make live on the stream. If you want to ask questions / push for some features / push to make it open source / etc. - join the stream, push it in the chat and I'll consider it!

EDIT 4

Seeing the number of feedback, and feature requests in the comments I've decided to create a discord server to make sure that nothing will be lost and everything will be addressed. You can join here.

u/EfficientLetter3654 — 1 day ago