Hi everyone 👋
I’m building a WebRTC SFU in Rust using str0m and I’m looking for ways to anonymize voices in real time (think: video calls involving minors or journalist interviews).
The goal is not just pitch shifting (that’s reversible) but something solid enough to make a voice genuinely unrecognizable. I’ve been looking at the WORLD vocoder (F0 + spectral envelope + aperiodicity decomposition) via FFI, combined with rubato for resampling, but I’m not sure it’s the right call for a low-latency streaming context since WORLD prefers longer segments (~200-500ms).
Constraints:
- Real-time, low latency (targeting <100ms)
- Zero-copy as much as possible (hot path, per-participant processing)
- Rust-first, FFI is fine if the lib is solid
Has anyone tackled something similar ? Is WORLD the right tool or is there a better alternative I’m missing ?