Kokoro-82M high quality TTS on a Raspberry Pi
March 10th, 2025
Kokoro-82M is an efficient text-to-speech model capable of running on resource constrained devices like the Raspberry Pi. The quantized model weighs in at only ~80MB. Previously, I’ve used eSpeak and Piper for text-to-speech projects. Comparatively I prefer Kokoro’s more natural sounding human voice. However, on my Raspberry Pi 4 (2GB) the inference speed is certainly less than real time so these older tts options still have their place. I imagine the inference speed would be faster on a Raspberry Pi 5.
As a brief side-note: if you’re interested in trying out Kokoro without having to install anything, I made a web version here that runs in your browser (Chrome only at this stage).
For Python on the Pi, the kokoro-onnx package simplifies the process of using Kokoro.
First, install the necessary Python packages:
pip install kokoro-onnx soundfile
soundfile
is used for saving the generated audio to a file.
Next, download the pre-trained Kokoro model and voice files:
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.int8.onnxx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
These files contain the required model and voice data. Place them in the same directory as your Python script.
Now, create a Python file (e.g. tts.py) with the following:
import soundfile as sf
from kokoro_onnx import Kokoro
kokoro = Kokoro("kokoro-v1.0.int8.onnx", "voices-v1.0.bin")
samples, sample_rate = kokoro.create(
"It was the best of times, it was the worst of times", voice="af_heart", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")
To play the generated audio using aplay (often pre-installed on Raspberry Pi OS):
aplay audio.wav
For direct audio playback without saving to a file, install sounddevice
:
pip install sounddevice
The modify your Python script to this:
import sounddevice as sd
from kokoro_onnx import Kokoro
kokoro = Kokoro("kokoro-v1.0.int8.onnx", "voices-v1.0.bin")
samples, sample_rate = kokoro.create(
"It was the best of times, it was the worst of times", voice="af_heart", speed=1.0, lang="en-us"
)
sd.play(samples, sample_rate)
sd.wait()