Note: I do not own the copyright for Rocky voice and I do not encourage cloning artists' voices for commercial purposes. This is just for a fun personal DIY project intended for the fan community!
Last week I shared my Rocky build here and the support was incredible! Fist my bump to all of you! :)
I received a couple comments on YouTube asking if I could make Rocky actually talk instead of just displaying text. One comment suggested using Qwen voice cloner paired with a Piper TTS workflow. I spent the last few days diving into that, and here is how I got it working:
I took a short, clean sample of the Rocky voice and used Qwen3 TTS to clone the profile. Then I used that clone to generate 500 random phrases. I used those 500 audio clips as input to train a custom Piper model.
The demo video is running the model directly on the Raspberry Pi with Piper TTS. It seems to run pretty smoothly on the Raspberry Pi Zero 2W with decent response times.
To clone the voice and train the model I used Google Colab GPU A100 High-RAM (2025.10). I used `en_US-lessac-low.onnx` as the base model and trained it for up to 2999 epochs.
You can find the full build video on my YouTube.