When to use
- •Converting text to speech with ElevenLabs API
- •Exploring available voices and models
- •Managing TTS subscriptions and usage
- •Integrating TTS into workflows and pipelines
ElevenLabs TTS Tool Skill
Purpose
Comprehensive guide for the elevenlabs-tts-tool CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.
When to Use This Skill
Use this skill when:
- •Converting text to speech for notifications, audiobooks, or content creation
- •Exploring and comparing different voice characteristics
- •Managing ElevenLabs subscription quotas and usage
- •Building voice-enabled workflows and automation
- •Integrating TTS into Claude Code hooks or other tools
Do NOT use this skill for:
- •Direct ElevenLabs API programming (use SDK docs instead)
- •Custom voice cloning (requires ElevenLabs web interface)
- •Real-time streaming TTS (tool focuses on file/playback generation)
CLI Tool: elevenlabs-tts-tool
Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.
Installation
# Clone repository git clone https://github.com/dnvriend/elevenlabs-tts-tool.git cd elevenlabs-tts-tool # Install globally with uv uv tool install . # Verify installation elevenlabs-tts-tool --version
Prerequisites
- •Python: 3.13 or higher
- •API Key: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
- •Environment Variable:
export ELEVENLABS_API_KEY='your-api-key'
Quick Start
# Set API key export ELEVENLABS_API_KEY='your-api-key' # Basic text-to-speech elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3
Progressive Disclosure
<details> <summary><strong>📖 Core Commands (Click to expand)</strong></summary>synthesize - Convert Text to Speech
Convert text to speech using ElevenLabs API. Supports direct playback or file output.
Usage:
elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]
Arguments:
- •
TEXT: Text to synthesize (optional if --stdin used) - •
--stdin, -s: Read text from stdin instead of argument - •
--voice, -v NAME: Voice name or ID (default: rachel) - •
--model, -m ID: Model ID (default: eleven_turbo_v2_5) - •
--output, -o PATH: Save to audio file instead of playing - •
--format, -f FORMAT: Output format (default: mp3_44100_128)
Examples:
# Basic usage - play through speakers elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Use specific model elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2 # Emotional expression (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Add pauses with SSML elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three." # Read from stdin echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3 # Pipeline integration cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3
Output: Plays audio through default speakers or saves to specified file format.
Available Formats:
- •
mp3_44100_128(default): MP3, 44.1kHz, 128kbps - •
mp3_44100_64: MP3, 44.1kHz, 64kbps - •
mp3_22050_32: MP3, 22.05kHz, 32kbps - •
pcm_44100: PCM WAV, 44.1kHz (requires Pro tier)
list-voices - Show Available Voices
List all available ElevenLabs voices with characteristics.
Usage:
elevenlabs-tts-tool list-voices
Examples:
# List all voices elevenlabs-tts-tool list-voices # Filter by gender elevenlabs-tts-tool list-voices | grep female elevenlabs-tts-tool list-voices | grep male # Filter by accent elevenlabs-tts-tool list-voices | grep British elevenlabs-tts-tool list-voices | grep American # Filter by age elevenlabs-tts-tool list-voices | grep young elevenlabs-tts-tool list-voices | grep middle_aged # Combine filters elevenlabs-tts-tool list-voices | grep "female.*young.*British"
Output:
Voice Gender Age Accent Description ==================================================================================================== rachel female young American Calm and friendly American voice... adam male middle_aged American Deep, authoritative American male... charlotte female middle_aged British Smooth, professional British voice... ... ==================================================================================================== Total: 42 voices available
Popular Voices:
- •rachel: Calm, friendly American female (default)
- •adam: Deep, authoritative American male
- •charlotte: Professional British female
- •josh: Young, casual American male
- •bella: Expressive Italian female
list-models - Show TTS Models
List all available ElevenLabs TTS models with characteristics and use cases.
Usage:
elevenlabs-tts-tool list-models
Examples:
# List all models elevenlabs-tts-tool list-models # Filter by status elevenlabs-tts-tool list-models | grep stable elevenlabs-tts-tool list-models | grep deprecated # Find low-latency models elevenlabs-tts-tool list-models | grep -i "ultra-low" # Find multilingual models elevenlabs-tts-tool list-models | grep -i "multilingual"
Output: Comprehensive model information including:
- •Model ID and version
- •Quality and latency characteristics
- •Language support (mono vs multilingual)
- •Character limits
- •Best use cases
- •Special features (emotions, etc.)
Key Models:
- •eleven_turbo_v2_5: Fast, high-quality (default, best value)
- •eleven_flash_v2_5: Ultra-low latency (real-time applications)
- •eleven_multilingual_v2: 29 languages, production quality
- •eleven_v3: Most expressive with emotion tags (alpha, 2x cost)
Cost Multipliers:
- •Turbo/Flash models: 1x cost
- •Multilingual v2: 1x cost
- •v3 models: 2x cost (half the minutes/tokens)
info - Show Subscription Info
Display subscription tier, character usage, quota limits, and historical usage.
Usage:
elevenlabs-tts-tool info [--days N]
Arguments:
- •
--days, -d N: Number of days of historical usage to display (default: 7)
Examples:
# View subscription with last 7 days of usage elevenlabs-tts-tool info # View last 30 days of usage elevenlabs-tts-tool info --days 30 # Quick quota check (1 day) elevenlabs-tts-tool info --days 1 # Check usage before long generation elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."
Output Information:
- •Subscription tier and status
- •Character usage (used/limit/remaining)
- •Quota reset date
- •Historical usage breakdown by day
- •Average daily usage
- •Projected monthly usage
- •Warnings when approaching quota limits
Use Cases:
- •Monitor character quota consumption
- •Track usage patterns over time
- •Plan when to upgrade subscription tier
- •Avoid hitting quota limits unexpectedly
- •Identify high-usage periods
update-voices - Update Voice Table
Fetch latest voices from ElevenLabs API and update local lookup table.
Usage:
elevenlabs-tts-tool update-voices [--output PATH]
Arguments:
- •
--output, -o PATH: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)
Examples:
# Update default voice lookup (user config directory) elevenlabs-tts-tool update-voices # Save to custom location elevenlabs-tts-tool update-voices --output custom_voices.json # Update before listing voices elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices
Behavior:
- •Fetches all premade voices from ElevenLabs API
- •Saves to user config directory by default (
~/.config/elevenlabs-tts-tool/) - •Creates config directory if it doesn't exist
- •Updates take precedence over package default
- •Persists across package reinstalls
pricing - Show Pricing Information
Display ElevenLabs pricing tiers and feature comparison.
Usage:
elevenlabs-tts-tool pricing
Examples:
# View full pricing table elevenlabs-tts-tool pricing # Find specific tier information elevenlabs-tts-tool pricing | grep Creator elevenlabs-tts-tool pricing | grep "44.1kHz PCM"
Output Information:
- •Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
- •Minutes included per tier
- •Additional minute costs
- •Audio quality options
- •Concurrency limits
- •Priority levels
- •API formats by tier
- •Model cost multipliers
Key Insights:
- •Free tier: 10,000-20,000 characters/month
- •v3 models cost 2x (half the minutes/tokens)
- •Use Flash v2.5 for high-volume integrations
- •Reserve v3 for content requiring emotional expression
- •PCM 44.1kHz requires Pro tier
completion - Shell Completion
Generate shell completion scripts for bash, zsh, or fish.
Usage:
elevenlabs-tts-tool completion [bash|zsh|fish]
Installation:
# Bash (add to ~/.bashrc) eval "$(elevenlabs-tts-tool completion bash)" # Zsh (add to ~/.zshrc) eval "$(elevenlabs-tts-tool completion zsh)" # Fish (save to completion file) elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish
Features:
- •Tab-complete commands and subcommands
- •Tab-complete options and flags
- •Context-aware completion for file paths
Emotion Control (v3 Models)
ElevenLabs v3 model (eleven_v3) supports Audio Tags for emotional expression.
Available Emotion Tags:
- •Basic emotions:
[happy],[excited],[sad],[angry],[nervous],[curious] - •Delivery styles:
[cheerfully],[playfully],[mischievously],[resigned tone],[flatly],[deadpan] - •Speech characteristics:
[whispers],[laughs],[gasps],[sighs],[pauses],[hesitates],[stammers],[gulps]
Usage Examples:
# Basic emotion (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions in sequence elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Combine emotions with pauses elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3 # Whispered speech elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3 # Playful delivery elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3
Best Practices:
- •Place tags at the beginning of phrases
- •Align text content with emotional intent
- •Test with different voices for best results
- •Use sparingly - let AI infer emotion from context when possible
- •Remember: v3 models cost 2x as much (half the minutes/tokens)
Pause Control (SSML)
Add natural pauses using SSML <break> tags.
Syntax:
<break time="X.Xs" />
Examples:
# 1-second pause elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service." # Multiple pauses elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three." # Short pause for emphasis elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully." # Combine with emotions (requires eleven_v3) elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3
Limitations:
- •Maximum pause duration: 3 seconds
- •Recommended: 2-4 breaks per generation
- •Too many breaks can cause:
- •AI speedup
- •Audio artifacts
- •Background noise
- •Generation instability
Alternative Methods:
- •Dashes (
-or—) for shorter pauses (less consistent) - •Ellipses (
...) for hesitation (may add nervous tone) - •SSML
<break>is most reliable
Verbosity Control
Multi-level verbosity for progressive detail control.
Verbosity Levels:
- •No flag (default): WARNING level - only critical issues
- •
-v: INFO level - high-level operations, important events - •
-vv: DEBUG level - detailed operations, API calls, validation steps - •
-vvv: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals
Usage:
# Quiet mode (warnings only) elevenlabs-tts-tool synthesize "Hello world" # INFO level elevenlabs-tts-tool -v synthesize "Hello world" # DEBUG level (detailed operations) elevenlabs-tts-tool -vv synthesize "Hello world" # TRACE level (shows HTTP requests/responses) elevenlabs-tts-tool -vvv synthesize "Hello world"
Dependent Library Logging:
At trace level (-vvv), the following libraries enable DEBUG logging:
- •
elevenlabs- ElevenLabs SDK internals - •
httpx/httpcore- HTTP request/response details - •
urllib3- Low-level HTTP operations
Pipeline Integration
The tool is designed for composition with other CLI tools.
Design Principles:
- •JSON output to stdout, logs/errors to stderr
- •Stdin support for text input
- •Exit codes for success/failure detection
- •Shell completion for productivity
Examples:
# Read from file
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3
# Combine with other tools
gemini-google-search-tool query "AI news" | \
elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
# Conditional execution
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
# Process multiple texts
for text in "First" "Second" "Third"; do
elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3"
done
Claude Code Integration
Use elevenlabs-tts-tool as notification system for Claude Code hooks.
Use Cases:
- •Task Completion Alerts
# After long-running task elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
- •Error Notifications
# On build failure elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
- •Custom Workflows
# Shell script integration
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
- •Multi-Tool Integration
# Combine with other CLI tools
gemini-google-search-tool query "AI news" | \
elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
Hook Configuration:
Create hooks in ~/.config/claude-code/hooks.json:
{
"hooks": {
"after_command": {
"type": "bash",
"command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel"
},
"on_error": {
"type": "bash",
"command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam"
}
}
}
Benefits:
- •Audio alerts for completed tasks without monitoring terminal
- •Error notifications while away from screen
- •Multi-step automation with voice feedback
- •Voice-enabled AI agent pipelines
Common Issues
Issue: "API key not found" error
# Symptom Error: ELEVENLABS_API_KEY environment variable not set
Solution:
- •Get API key from https://elevenlabs.io/app/settings/api-keys
- •Export as environment variable:
bash
export ELEVENLABS_API_KEY='your-api-key'
- •Add to shell profile for persistence:
bash
echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc source ~/.bashrc
Issue: "Voice not found" error
# Symptom ValueError: Voice 'unknown' not found in lookup table
Solution:
- •List available voices:
bash
elevenlabs-tts-tool list-voices
- •Update voice table if needed:
bash
elevenlabs-tts-tool update-voices
- •Use correct voice name (case-insensitive):
bash
elevenlabs-tts-tool synthesize "Hello" --voice rachel
Issue: Character quota exceeded
# Symptom Error: Character quota exceeded for this month
Solution:
- •Check current usage:
bash
elevenlabs-tts-tool info
- •Wait until quota reset date
- •Consider upgrading subscription tier:
bash
elevenlabs-tts-tool pricing
- •Use more efficient models (Flash/Turbo vs v3)
Issue: Audio quality issues
Symptom: Poor audio quality or artifacts
Solution:
- •Try different output format:
bash
elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128
- •Use higher-quality model:
bash
elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2
- •For professional content, use PCM format (requires Pro tier):
bash
elevenlabs-tts-tool synthesize "Text" --format pcm_44100
Issue: Emotional tags not working
Symptom: Emotion tags like [happy] are spoken literally
Solution:
- •Ensure using v3 model:
bash
elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3
- •Place tags at beginning of phrases
- •Test with different voices (some work better than others)
Issue: Too many SSML breaks causing issues
Symptom: Audio artifacts, speedup, or noise with multiple <break> tags
Solution:
- •Limit to 2-4 breaks per generation
- •Use maximum 3 seconds per break
- •Consider splitting into multiple generations:
bash
elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3 elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
Getting Help
# Main help elevenlabs-tts-tool --help # Command-specific help elevenlabs-tts-tool synthesize --help elevenlabs-tts-tool list-voices --help elevenlabs-tts-tool info --help # Version information elevenlabs-tts-tool --version
Additional Resources:
- •GitHub Issues: https://github.com/dnvriend/elevenlabs-tts-tool/issues
- •ElevenLabs Docs: https://elevenlabs.io/docs
- •API Reference: https://elevenlabs.io/docs/api-reference
Free Tier Limitations
ElevenLabs Free Tier (2024-2025):
- •✅ 10,000-20,000 characters per month
- •✅ All 42 premade voices
- •✅ Create up to 3 custom voices
- •✅ MP3 formats (all bitrates)
- •✅ Basic SSML support (
<break>, phonemes) - •✅ Emotional tags (v3 models)
- •✅ Full API access
- •❌ No commercial license (personal/experimentation only)
- •❌ PCM 44.1kHz format (requires Pro tier)
- •⚠️ Max 2,500 characters per single generation
Upgrade Tiers:
- •Starter ($5/month): 30,000 characters, commercial license
- •Creator ($22/month): 100,000 characters, PCM formats
- •Pro ($99/month): 500,000 characters, PCM 44.1kHz, highest priority
- •Scale ($330/month): 2,000,000 characters
- •Business (custom): Custom limits and features
Rate Limits: Not publicly documented - expect reasonable use restrictions on free tier
Exit Codes
- •
0: Success - •
1: General error (validation, API error, etc.)
Output Formats
Audio Formats:
- •
mp3_44100_128: MP3, 44.1kHz, 128kbps (default, best quality) - •
mp3_44100_64: MP3, 44.1kHz, 64kbps (good quality, smaller) - •
mp3_22050_32: MP3, 22.05kHz, 32kbps (acceptable quality, smallest) - •
pcm_44100: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)
Text Formats:
- •Human-readable tables for list commands
- •Structured output with clear sections
- •Errors to stderr, audio/data to stdout
Best Practices
- •Use Turbo v2.5 for High Volume: Default model offers best value (1x cost, fast, high quality)
- •Reserve v3 for Emotional Content: Use v3 only when emotion tags needed (costs 2x)
- •Monitor Quota Regularly: Check
infocommand before large generations - •Update Voices Periodically: Run
update-voicesmonthly to get latest voices - •Test Voices for Your Use Case: Different voices work better for different content types
- •Use SSML Breaks Sparingly: Limit to 2-4 breaks per generation for stability
- •Pipeline for Efficiency: Combine with other tools for automated workflows
- •Set Verbosity Appropriately: Use
-vvor-vvvfor debugging, default for production
Resources
- •GitHub Repository: https://github.com/dnvriend/elevenlabs-tts-tool
- •ElevenLabs Documentation: https://elevenlabs.io/docs
- •API Reference: https://elevenlabs.io/docs/api-reference
- •Voice Library: https://elevenlabs.io/voice-library
- •Python SDK: https://github.com/elevenlabs/elevenlabs-python
- •Claude Code: https://docs.anthropic.com/claude-code