I built Ditto, an Electron app for local voice to text on Windows (Whisper.cpp + CUDA)
Disclosure: I made this. Free and open source.
I got tired of paying monthly for voice typing apps like WhisprFlow when my RTX could run Whisper locally faster than the cloud round trip. So I built Ditto, a Windows tray app that does voice to text fully offline.
What it does (three things, kept simple on purpose):
- Floating pill. A small AirTag-style pill floats on top of any window. Press the global shortcut, talk, release. The pill animates with a live waveform driven by your voice while recording.
- Local transcription. Audio goes through
whisper.cppwith CUDA acceleration. Nothing leaves your machine. First transcription takes about 1.3 seconds (CUDA init), subsequent ones 300 to 500 ms. You pick the model on first launch: base (140 MB) to large (3.1 GB). - Auto paste. Transcribed text goes to your clipboard and pastes itself into whatever window is focused. Works in any Win32 app: browser, IDE, Discord, Slack, anywhere.
Notes on the build:
- Electron 39 + React 19 + TypeScript strict, three separate renderer processes (pill, settings, welcome) with typed IPC
- Whisper.cpp 1.8.4 shipped as a standalone exe with bundled CUDA DLLs, no toolchain needed on the user's machine
- Audio resampling done entirely in the renderer via
OfflineAudioContextbefore sending to main - Pill has a transparent invisible margin around it so OS shadow clipping doesn't cut the visual, with click-through driven from main to avoid breaking the drag region
Repo and release: github.com/asantinos/ditto
Caveat: Windows only for now. CUDA build means it currently targets NVIDIA GPUs (works fine on RTX laptops). No code signing yet so SmartScreen warns on first launch. Installer is ~399 MB because of bundled CUDA DLLs.
Feedback welcome, especially from anyone who has shipped Electron apps with native binary dependencies. The Whisper integration was the trickiest part.