When to use
- •When you need to convert text to speech locally (no API keys)
- •When you need to generate audio from long documents (books, articles)
- •When you need seamless audiobook rendering without pop artifacts
- •When you need fast offline TTS rendering (20-50x real-time)
kokoro-tts-tool Skill
Purpose
This skill provides access to the kokoro-tts-tool CLI for local text-to-speech synthesis using the Kokoro-82M model. Runs entirely on-device with ONNX runtime, optimized for Apple Silicon.
When to Use This Skill
Use this skill when:
- •Converting text to speech without cloud APIs
- •Generating audio from markdown/text documents
- •Creating audiobooks from long-form content
- •Needing 60+ voices across 8 languages
Do NOT use this skill for:
- •Cloud-based TTS services
- •Real-time voice conversion
- •Speech-to-text (transcription)
CLI Tool: kokoro-tts-tool
Local text-to-speech CLI using Kokoro-82M (82 million parameters).
Installation
# Clone and install git clone https://github.com/dnvriend/kokoro-tts-tool.git cd kokoro-tts-tool uv tool install .
Prerequisites
- •Python 3.14+
- •uv package manager
- •Apple Silicon Mac (recommended)
Quick Start
# Initialize (downloads ~350MB models) kokoro-tts-tool init # Synthesize text to speakers kokoro-tts-tool synthesize "Hello world" # Save to file kokoro-tts-tool synthesize "Hello" --output speech.wav # Stream a document kokoro-tts-tool infinite --input book.md
Progressive Disclosure
<details> <summary><strong>📖 Core Commands (Click to expand)</strong></summary>init - Download TTS Models
Downloads the Kokoro ONNX model (~300MB) and voice embeddings (~50MB).
Usage:
kokoro-tts-tool init [OPTIONS]
Options:
- •
--force,-f: Re-download models even if they exist
Examples:
# Download models (skips if already present) kokoro-tts-tool init # Force re-download kokoro-tts-tool init --force
synthesize - Convert Text to Speech
Synthesizes text using the Kokoro TTS model. Audio can be played through speakers or saved to file.
Usage:
kokoro-tts-tool synthesize [TEXT] [OPTIONS]
Arguments:
- •
TEXT: Text to synthesize (optional if using --stdin)
Options:
- •
--stdin,-s: Read text from stdin - •
--voice,-v VALUE: Voice ID (default: af_heart) - •
--output,-o PATH: Save to WAV file - •
--speed FLOAT: Speech speed 0.5-2.0 (default: 1.0) - •
--silence INT: Trailing silence in ms (default: 200)
Examples:
# Play text with default voice
kokoro-tts-tool synthesize "Hello world"
# Use different voice
kokoro-tts-tool synthesize "Hello" --voice am_adam
# Save to file
kokoro-tts-tool synthesize "Hello" --output speech.wav
# Read from stdin
echo "Hello world" | kokoro-tts-tool synthesize --stdin
# Adjust speed
kokoro-tts-tool synthesize "Hello" --speed 1.5
# Multiple options
cat article.txt | kokoro-tts-tool synthesize --stdin \
--voice bf_emma \
--output article.wav \
--speed 0.9
Output: Audio played through speakers (default) or saved as WAV file (24kHz, mono, 16-bit).
infinite - Stream Long Documents
Reads markdown or plain text, splits intelligently into chunks, and streams to speakers or renders to file.
Usage:
kokoro-tts-tool infinite [OPTIONS]
Options:
- •
--input,-i PATH: Input text/markdown file - •
--stdin,-s: Read text from stdin - •
--output,-o PATH: Save to WAV file (fast offline mode) - •
--voice VALUE: Voice ID (default: af_heart) - •
--speed FLOAT: Speech speed 0.5-2.0 (default: 1.0) - •
--chunk-size INT: Target words per chunk 50-1000 (default: 200) - •
--pause INT: Pause between chunks in ms 0-2000 (default: 150) - •
--no-markdown: Treat input as plain text
Examples:
# Stream to speakers
kokoro-tts-tool infinite --input book.md
# Render to WAV (fast, ~2-3min for 1hr audio)
kokoro-tts-tool infinite --input book.md --output audiobook.wav
# Pipe from stdin
cat chapter.md | kokoro-tts-tool infinite --stdin
# With custom voice and speed
kokoro-tts-tool infinite --input notes.md \
--voice am_adam \
--speed 1.2
# Render audiobook with narrator voice
kokoro-tts-tool infinite --input book.md \
--output book.wav \
--voice bm_george \
--speed 0.95
# Shorter chunks for studying
kokoro-tts-tool infinite --input study.md \
--chunk-size 200 \
--pause 600
Output:
- •Speaker mode: Real-time playback, seamless audio
- •File mode: Fast offline rendering (20-50x real-time on M4)
list-voices - List Available Voices
Lists voice information including ID, name, gender, accent, quality grade, and description.
Usage:
kokoro-tts-tool list-voices [OPTIONS]
Options:
- •
--language,-l VALUE: Filter by language (English, Japanese, etc.) - •
--gender,-g VALUE: Filter by gender (Male, Female) - •
--json: Output as JSON for scripting
Examples:
# List all voices kokoro-tts-tool list-voices # Filter by language kokoro-tts-tool list-voices --language English # Filter by gender kokoro-tts-tool list-voices --gender Female # Combined filters kokoro-tts-tool list-voices --language English --gender Male # JSON output for scripting kokoro-tts-tool list-voices --json
Voice ID Format:
- •Pattern:
[language][gender]_[name] - •First letter: language (a=American, b=British, j=Japanese, etc.)
- •Second letter: gender (f=Female, m=Male)
Quality Grades:
- •A/A-: Highest quality (af_heart, af_bella, am_adam)
- •B+/B: Good quality
- •B-: Acceptable quality
info - Display Configuration
Shows information about the Kokoro TTS installation.
Usage:
kokoro-tts-tool info
Examples:
kokoro-tts-tool info
Output:
- •Model status (Ready/Not downloaded)
- •Model file locations
- •Default settings
- •Supported languages
completion - Shell Completion
Generate shell completion scripts for bash, zsh, or fish.
Usage:
kokoro-tts-tool completion SHELL
Arguments:
- •
SHELL: Shell type (bash, zsh, fish)
Examples:
# Bash (add to ~/.bashrc) eval "$(kokoro-tts-tool completion bash)" # Zsh (add to ~/.zshrc) eval "$(kokoro-tts-tool completion zsh)" # Fish kokoro-tts-tool completion fish > ~/.config/fish/completions/kokoro-tts-tool.fish
Multi-Level Verbosity Logging
Control logging detail with progressive verbosity levels. All logs output to stderr.
Logging Levels:
| Flag | Level | Output | Use Case |
|---|---|---|---|
| (none) | WARNING | Errors and warnings only | Production, quiet mode |
-v | INFO | + High-level operations | Normal debugging |
-vv | DEBUG | + Detailed info, full tracebacks | Development |
-vvv | TRACE | + Library internals | Deep debugging |
Examples:
# INFO level kokoro-tts-tool -v synthesize "Hello" # DEBUG level kokoro-tts-tool -vv infinite --input book.md # TRACE level kokoro-tts-tool -vvv synthesize "Hello"
Pipeline Composition
Compose commands with Unix pipes for workflows.
Examples:
# Get voice IDs as JSON and filter
kokoro-tts-tool list-voices --json | jq '.[].id'
# Read from another command
cat document.md | kokoro-tts-tool infinite --stdin
# Chain with file processing
find . -name "*.md" -exec cat {} \; | kokoro-tts-tool infinite --stdin
Common Issues
Issue: Command not found
# Verify installation kokoro-tts-tool --version # Reinstall if needed cd kokoro-tts-tool uv tool install . --reinstall
Issue: Models not downloaded
# Initialize models kokoro-tts-tool init # Force re-download kokoro-tts-tool init --force
Issue: Audio not playing
- •Check system volume
- •Try saving to file:
--output test.wav - •Check with verbose:
-vv
Issue: Voice not found
# List available voices kokoro-tts-tool list-voices # Check voice ID format kokoro-tts-tool list-voices --json | jq '.[].id'
Getting Help
# General help kokoro-tts-tool --help # Command-specific help kokoro-tts-tool synthesize --help kokoro-tts-tool infinite --help
Exit Codes
- •
0: Success - •
1: Error (validation, runtime, or unexpected)
Output Formats
Default Output:
- •Human-readable formatted output
- •Audio played through speakers
File Output (--output):
- •WAV format (24kHz, mono, 16-bit)
JSON Output (--json on list-voices):
- •Machine-readable voice data
- •Perfect for pipelines and processing
Best Practices
- •Initialize first: Run
kokoro-tts-tool initbefore synthesis - •Use appropriate voices: Match voice to content (am_adam for audiobooks, bf_emma for education)
- •Leverage infinite for documents: Better than synthesize for long content
- •Use file output for production:
--outputfor consistent results - •Check voice quality grades: A/A- voices recommended for production
Resources
- •GitHub: https://github.com/dnvriend/kokoro-tts-tool
- •Kokoro-82M Model: https://huggingface.co/hexgrad/Kokoro-82M
- •kokoro-onnx: https://github.com/thewh1teagle/kokoro-onnx