Hero fork of DavidValin/kokoro-micro v1.0.0 with audio-dropout fix (style indexed by token count). Apache-2.0.

C 59.8%
D 31.1%
Shell 3.3%
Rust 1.5%
Python 1.3%
Other 3%

Find a file

despiegk 21c3b56cd7 chore: update cargo dependencies		2026-06-06 08:17:07 +02:00
assets	v1.0.0 baseline (from crates.io)	2026-04-27 12:09:50 -04:00
examples	fix(synth): index voice style by token count (root cause of audio dropouts)	2026-04-27 12:12:02 -04:00
src	fix(synth): index voice style by token count (root cause of audio dropouts)	2026-04-27 12:12:02 -04:00
target	fix(synth): index voice style by token count (root cause of audio dropouts)	2026-04-27 12:12:02 -04:00
Cargo.lock	chore: update cargo dependencies	2026-06-06 08:17:07 +02:00
Cargo.toml	fix(ort): default-features = false, load-dynamic to avoid download-binaries	2026-05-05 12:24:16 -04:00
Cargo.toml.orig	fix(ort): default-features = false, load-dynamic to avoid download-binaries	2026-05-05 12:24:16 -04:00
README.md	v1.0.0 baseline (from crates.io)	2026-04-27 12:09:50 -04:00

README.md

kokoro-micro

A minimal, embeddable Text-to-Speech (TTS) library for Rust using the Kokoro 82M parameter model.

This is a reduced version of kokoro-tiny created by by 8b-is.

Features

Minimal dependencies - Only essential crates for TTS synthesis
Auto-downloading - Model files (310MB + 27MB) download automatically to ~/.cache/k/
Multiple voices - Support for various voice styles with mixing capability
Speed & gain control - Adjust speech speed and volume
WAV export - Save synthesized audio to WAV files
Long text support - Automatic chunking and crossfading for longer texts
Silent by default - No output unless KOKORO_DEBUG=1 is set

Installation

Add to your Cargo.toml:

[dependencies]
kokoro-micro = "0.2.0"
tokio = { version = "1", features = ["rt", "macros"] }

Quick Start

use kokoro_micro::TtsEngine;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize TTS engine (downloads model on first run)
    let mut tts = TtsEngine::new().await?;

    // Synthesize speech
    // Parameters: text, voice (None for default), speed, gain, language
    let audio = tts.synthesize_with_options(
        "Hello world!",
        None,      // voice: None = default "af_sky"
        1.0,       // speed: 1.0 = normal
        1.0,       // gain: 1.0 = normal volume
        Some("en") // language
    )?;

    // Save to WAV file
    tts.save_wav("output.wav", &audio)?;

    Ok(())
}

API Reference

TtsEngine

Main struct for text-to-speech synthesis.

Methods

new() -> Result<Self, String>
Create a new TTS engine. Downloads model files to ~/.cache/k/ on first run.
with_paths(model_path: &str, voices_path: &str) -> Result<Self, String>
Create engine with custom model file paths.
voices() -> Vec<String>
List all available voice names.
synthesize_with_options(text: &str, voice: Option<&str>, speed: f32, gain: f32, lang: Option<&str>) -> Result<Vec<f32>, String>
Synthesize text to audio samples.
- text - Text to synthesize
- voice - Voice name (e.g., "af_sky", "af_nicole", "am_adam") or None for default
- speed - Speech speed (0.5 = slower, 1.0 = normal, 2.0 = faster)
- gain - Volume multiplier (0.5 = quieter, 1.0 = normal, 2.0 = louder)
- lang - Language code (e.g., "en", "es", "fr") or None for default "en"
save_wav(path: &str, audio: &[f32]) -> Result<(), String>
Save audio samples to a WAV file.

Voice Mixing

You can mix multiple voices by using weighted combinations:

// Mix 40% af_sky + 50% af_nicole
let audio = tts.synthesize_with_options(
    "Hello!",
    Some("af_sky.4+af_nicole.5"),
    1.0,
    1.0,
    Some("en")
)?;

Available Voices

Common voices include:

af_sky (default) - Female, gentle
af_nicole - Female
af_bella - Female
am_adam - Male
am_michael - Male

Use tts.voices() to list all available voices.

Debug Logging

By default, kokoro-micro runs silently with no console output. To enable debug logging (model download progress, synthesis details, etc.), set the KOKORO_DEBUG environment variable:

# Enable debug logging
KOKORO_DEBUG=1 cargo run --example simple

# Or in your code
std::env::set_var("KOKORO_DEBUG", "1");

Debug logging shows:

Model download progress
Long-form synthesis chunking information
Phoneme conversion details
Audio generation statistics

Example

See examples/simple.rs:

# Run without debug output
cargo run --example simple

# Run with debug output
KOKORO_DEBUG=1 cargo run --example simple

Features

Optional Features

cuda - Enable CUDA acceleration for ONNX Runtime

[dependencies]
kokoro-micro = { version = "0.2.0", features = ["cuda"] }

Model Files

Model files are automatically downloaded on first use to $HOME/.cache/k/:

$HOME/.cache/k/0.onnx (310MB) - Kokoro ONNX model
$HOME/.cache/k/0.bin (27MB) - Voice embeddings

The same cache directory is used on all platforms (Linux, macOS, Windows):

Linux/macOS: $HOME/.cache/k/ (e.g., /home/user/.cache/k/)
Windows: %USERPROFILE%/.cache/k/ (e.g., C:\Users\Username\.cache\k\)

Files are cached and shared across all applications using kokoro-micro.

License

Apache-2.0

Credits

Built with the Kokoro 82M parameter TTS model. Reduced version from kokoro-tiny by 8b-is.