When to use
- •Converting text to lifelike speech using AWS Polly
- •Working with multiple voice engines and output formats
- •Tracking TTS costs and AWS billing
- •Implementing TTS in automation pipelines
AWS Polly TTS Tool Skill
Purpose
Professional AWS Polly text-to-speech CLI and library with agent-friendly design, enabling conversion of text to lifelike speech using Amazon Polly's deep learning technology. Supports 60+ voices in 30+ languages across four quality tiers with comprehensive cost tracking.
When to Use This Skill
Use this skill when:
- •You need to convert text to speech using AWS Polly
- •You want to explore available voices and engines
- •You need to track TTS costs or query billing data
- •You're building automation with TTS capabilities
- •You need SSML support for advanced speech control
- •You want to work with different audio formats
Do NOT use this skill for:
- •Non-AWS TTS services (Google, Azure, etc.)
- •Real-time streaming TTS (use AWS SDK directly)
- •Voice cloning or training (Polly doesn't support this)
CLI Tool: aws-polly-tts-tool
Professional AWS Polly TTS CLI and Python library designed with CLI-first philosophy for both command-line and programmatic use.
Installation
# Clone repository git clone https://github.com/dnvriend/aws-polly-tts-tool.git cd aws-polly-tts-tool # Install with uv (Python 3.12) uv tool install . --python 3.12 # Verify installation aws-polly-tts-tool --version
Prerequisites
- •Python 3.12+ (Python 3.13+ has pydub compatibility issues)
- •AWS credentials configured
- •ffmpeg for audio playback (not required for file output)
- •IAM permissions:
polly:DescribeVoices,polly:SynthesizeSpeech,ce:GetCostAndUsage
Quick Start
# Play text with default voice aws-polly-tts-tool synthesize "Hello world" # Save to file aws-polly-tts-tool synthesize "Hello world" --output speech.mp3 # List available voices aws-polly-tts-tool list-voices # Show pricing aws-polly-tts-tool pricing
Progressive Disclosure
<details> <summary><strong>📖 Core Commands (Click to expand)</strong></summary>synthesize - Convert Text to Speech
Main TTS command with full feature support including multiple engines, voices, and output formats.
Usage:
aws-polly-tts-tool synthesize "TEXT" [OPTIONS]
Arguments:
- •
TEXT: Text to synthesize (required, or use--stdin) - •
--stdin/-s: Read text from stdin (enables piping) - •
--voice TEXT: Voice ID (default: Joanna) - •
--output PATH/-o PATH: Save audio to file instead of playing - •
--format TEXT/-f TEXT: Output format (mp3, ogg_vorbis, pcm) - default: mp3 - •
--engine TEXT/-e TEXT: Voice engine (standard, neural, generative, long-form) - default: neural - •
--ssml: Treat input as SSML markup - •
--show-cost: Display character count and cost estimate - •
--region TEXT/-r TEXT: AWS region override - •
-V/-VV/-VVV: Verbosity (INFO/DEBUG/TRACE with AWS SDK details)
Examples:
# Basic synthesis with default voice (Joanna, neural)
aws-polly-tts-tool synthesize "Hello world"
# Use different voice and engine
aws-polly-tts-tool synthesize "Hello" --voice Matthew --engine generative
# Save to file with specific format
aws-polly-tts-tool synthesize "Hello world" --output speech.mp3 --format mp3
# Read from stdin
echo "Hello world" | aws-polly-tts-tool synthesize --stdin
# Read from file
cat article.txt | aws-polly-tts-tool synthesize --stdin --output article.mp3
# Use SSML for advanced control
aws-polly-tts-tool synthesize '<speak>Hello <break time="500ms"/> world</speak>' --ssml
# Show cost estimate
aws-polly-tts-tool synthesize "Hello world" --show-cost
# Multiple options combined with debugging
cat article.txt | aws-polly-tts-tool synthesize --stdin \
--voice Joanna \
--engine neural \
--output article.mp3 \
--show-cost \
-VV
Output:
- •Audio played through speakers (default) or saved to file
- •Character count and cost estimate (with
--show-cost) - •Logs to stderr, keeping stdout clean for piping
list-voices - Discover Available Voices
List and filter AWS Polly voices by engine, language, and gender.
Usage:
aws-polly-tts-tool list-voices [OPTIONS]
Options:
- •
--engine TEXT/-e TEXT: Filter by engine (standard, neural, generative, long-form) - •
--language TEXT/-l TEXT: Filter by language code (e.g., en-US, es-ES, fr-FR) - •
--gender TEXT/-g TEXT: Filter by gender (Female, Male) - •
--region TEXT/-r TEXT: AWS region override - •
-V/-VV/-VVV: Verbosity levels
Examples:
# List all voices aws-polly-tts-tool list-voices # Filter by engine aws-polly-tts-tool list-voices --engine neural # Filter by language aws-polly-tts-tool list-voices --language en-US # Combine filters aws-polly-tts-tool list-voices --engine neural --language en --gender Female # Use with grep for searching aws-polly-tts-tool list-voices | grep British aws-polly-tts-tool list-voices --engine generative | grep Spanish
Output: Table with Voice, Gender, Language, Engines (supported), and Description columns. Dynamically fetched from Polly API (always up-to-date).
list-engines - Display Voice Engines
Show all available voice engines with technology, pricing, and best use cases.
Usage:
aws-polly-tts-tool list-engines
Examples:
# Show all engines with details aws-polly-tts-tool list-engines
Output: Table showing:
- •Standard ($4/1M chars) - Traditional concatenative TTS, 3000 char limit
- •Neural ($16/1M chars) - Natural human-like voices, 3000 char limit
- •Generative ($30/1M chars) - Most advanced emotionally engaged, 3000 char limit
- •Long-form ($100/1M chars) - Optimized for audiobooks, 100,000 char limit
billing - Query AWS Costs
Query AWS Cost Explorer for actual Polly usage costs with engine breakdown.
Usage:
aws-polly-tts-tool billing [OPTIONS]
Options:
- •
--days INT/-d INT: Number of days to query (default: 30) - •
--start-date TEXT: Custom start date (YYYY-MM-DD) - •
--end-date TEXT: Custom end date (YYYY-MM-DD) - •
--region TEXT/-r TEXT: AWS region for Cost Explorer - •
-V/-VV/-VVV: Verbosity levels
Examples:
# Last 30 days of Polly costs aws-polly-tts-tool billing # Last 7 days aws-polly-tts-tool billing --days 7 # Custom date range aws-polly-tts-tool billing --start-date 2025-01-01 --end-date 2025-01-31 # With verbose output aws-polly-tts-tool billing --days 7 -V
Output: Total cost and breakdown by engine (Standard, Neural, Generative, Long-form) in USD.
Note: Requires IAM permission ce:GetCostAndUsage
pricing - Show Pricing Information
Display static pricing information for all Polly engines with cost examples.
Usage:
aws-polly-tts-tool pricing
Examples:
# Show pricing table and examples aws-polly-tts-tool pricing
Output: Comprehensive pricing with:
- •Cost per 1M characters for each engine
- •Technology type and quality level
- •Character limits per request
- •Concurrent request limits
- •Free tier information
- •Best use cases
- •Cost examples (1,000 words, audiobooks)
info - Tool Configuration
Display AWS credentials status and tool configuration.
Usage:
aws-polly-tts-tool info
Examples:
# Verify AWS authentication and show config aws-polly-tts-tool info
Output:
- •AWS credential status (Valid/Invalid)
- •Account ID, User ID, ARN
- •Available engines
- •Output formats
- •Useful command examples
completion - Shell Completion
Generate shell completion scripts for bash, zsh, or fish.
Usage:
aws-polly-tts-tool completion [bash|zsh|fish]
Arguments:
- •
SHELL: Shell type (bash, zsh, or fish) - required
Examples:
# Generate bash completion aws-polly-tts-tool completion bash # Install for bash (add to ~/.bashrc) eval "$(aws-polly-tts-tool completion bash)" # Install for zsh (add to ~/.zshrc) eval "$(aws-polly-tts-tool completion zsh)" # Install for fish aws-polly-tts-tool completion fish > ~/.config/fish/completions/aws-polly-tts-tool.fish # File-based installation (recommended) aws-polly-tts-tool completion bash > ~/.aws-polly-tts-tool-complete.bash echo 'source ~/.aws-polly-tts-tool-complete.bash' >> ~/.bashrc
Output: Shell-specific completion script. After installation, restart shell or source config file.
</details> <details> <summary><strong>⚙️ Advanced Features (Click to expand)</strong></summary>SSML Support
Full SSML (Speech Synthesis Markup Language) support for advanced speech control.
Features:
- •Prosody: Control rate, pitch, volume
- •Breaks: Add pauses of specific duration
- •Emphasis: Add emphasis to words
- •Speaking styles: Newscaster, conversational (select voices)
- •Phonemes: Control pronunciation
Examples:
# Basic pause aws-polly-tts-tool synthesize '<speak>Hello <break time="500ms"/> world</speak>' --ssml # Prosody control (speed, pitch, volume) aws-polly-tts-tool synthesize '<speak><prosody rate="slow" pitch="low">Deep voice</prosody></speak>' --ssml # Emphasis aws-polly-tts-tool synthesize '<speak>I <emphasis level="strong">really</emphasis> like this</speak>' --ssml # Newscaster style (Matthew, Joanna only) aws-polly-tts-tool synthesize '<speak><amazon:domain name="news">Breaking news today</amazon:domain></speak>' --ssml --voice Matthew # Multiple prosody attributes aws-polly-tts-tool synthesize '<speak><prosody rate="fast" pitch="high" volume="loud">Excited announcement!</prosody></speak>' --ssml
SSML Resources:
Multi-Level Verbosity
Progressive logging detail for debugging without code changes.
Levels:
- •Default: Errors and warnings only (clean output)
- •
-V(INFO): High-level operations (voice selection, file operations) - •
-VV(DEBUG): Detailed steps (validation, API calls, character counts) - •
-VVV(TRACE): Full AWS SDK internals (credentials, HTTP requests, boto3 events)
Examples:
# Default: No verbose output aws-polly-tts-tool synthesize "Hello world" --output test.mp3 # INFO level (-V) aws-polly-tts-tool synthesize "Hello world" -V --output test.mp3 # [INFO] Using voice: Joanna (neural engine) # [INFO] Synthesizing audio to file: test.mp3 # DEBUG level (-VV) aws-polly-tts-tool synthesize "Hello world" -VV --output test.mp3 # [DEBUG] Validating engine: neural # [DEBUG] Validating output format: mp3 # [DEBUG] Initializing AWS Polly client # [INFO] Using voice: Joanna (neural engine) # [DEBUG] Synthesized 11 characters # TRACE level (-VVV) - Full AWS SDK details aws-polly-tts-tool synthesize "Hello world" -VVV --output test.mp3 # [DEBUG] Looking for credentials via: env # [INFO] Found credentials in shared credentials file: ~/.aws/credentials # [DEBUG] Starting new HTTPS connection (1): polly.eu-central-1.amazonaws.com:443 # [DEBUG] https://polly.eu-central-1.amazonaws.com:443 "POST /v1/speech HTTP/1.1" 200
Note: All logs go to stderr, keeping stdout clean for data/piping.
Library Usage
Import and use as a Python library for programmatic access.
Basic Usage:
from aws_polly_tts_tool import (
get_polly_client,
synthesize_audio,
save_speech,
VoiceManager,
calculate_cost,
)
# Initialize client
client = get_polly_client(region="us-east-1")
# Synthesize audio
audio_bytes, char_count = synthesize_audio(
client=client,
text="Hello world",
voice_id="Joanna",
output_format="mp3",
engine="neural"
)
# Save to file
save_speech(
client=client,
text="Hello world",
voice_id="Joanna",
output_path=Path("output.mp3"),
engine="neural"
)
# List voices
voice_manager = VoiceManager(client)
voices = voice_manager.list_voices(engine="neural", language="en")
# Calculate cost
cost = calculate_cost(character_count=5000, engine="neural")
print(f"Estimated cost: ${cost:.4f}")
Public API:
- •
get_polly_client(region=None)- Initialize boto3 Polly client - •
synthesize_audio(client, text, voice_id, output_format, engine, text_type)- Synthesize audio - •
save_speech(client, text, voice_id, output_path, ...)- Save to file - •
play_speech(client, text, voice_id, ...)- Play through speakers - •
VoiceManager(client)- Voice discovery and management - •
calculate_cost(char_count, engine)- Cost estimation
Voice Engine Selection Guide
Standard Engine ($4/1M chars)
- •Technology: Traditional concatenative TTS
- •Quality: Basic synthetic sound
- •Limit: 3,000 chars/request
- •Best for: Cost-sensitive applications, basic announcements
- •Free tier: 5M chars/month (12 months)
Neural Engine ($16/1M chars)
- •Technology: Deep learning neural networks
- •Quality: Natural, human-like voices
- •Limit: 3,000 chars/request
- •Best for: General-purpose TTS, recommended for most use cases
- •Free tier: 1M chars/month (12 months)
Generative Engine ($30/1M chars)
- •Technology: Advanced generative AI
- •Quality: Most lifelike, emotionally engaged
- •Limit: 3,000 chars/request
- •Best for: High-quality content, brand voices, engaging experiences
- •Free tier: None
Long-form Engine ($100/1M chars)
- •Technology: Neural with long-context optimization
- •Quality: Consistent over long passages
- •Limit: 100,000 chars/request
- •Best for: Audiobooks, long articles, consistent narration
- •Free tier: None
Decision Matrix:
- •Budget-conscious → Standard
- •General use → Neural (recommended)
- •Premium quality → Generative
- •Audiobooks/articles → Long-form
Cost Tracking Strategies
Immediate Estimates:
# Use --show-cost for instant character count and cost aws-polly-tts-tool synthesize "Text" --show-cost
Actual Billing:
# Query real AWS costs with Cost Explorer aws-polly-tts-tool billing --days 30
Cost Optimization Tips:
- •Use Standard engine for non-critical audio
- •Cache synthesized audio files to avoid re-synthesis
- •Batch process text for efficiency
- •Use Long-form engine only for actual long content
- •Monitor with
billingcommand regularly
Cost Examples:
- •1,000 words (~5,000 chars):
- •Standard: $0.02
- •Neural: $0.08
- •Generative: $0.15
- •Long-form: $0.50
- •50,000 word audiobook:
- •Standard: $1.00
- •Neural: $4.00
- •Generative: $7.50
- •Long-form: $25.00
Common Issues
Issue: No AWS credentials found
# Symptom Error: Unable to locate credentials
Solution:
# Configure AWS credentials aws configure # Or set environment variables export AWS_ACCESS_KEY_ID="your-access-key" export AWS_SECRET_ACCESS_KEY="your-secret-key" export AWS_DEFAULT_REGION="us-east-1" # Verify with aws-polly-tts-tool info
Issue: Audio playback fails on Python 3.13+
# Symptom Error: No module named 'audioop'
Solution: Option 1: Use Python 3.12 (recommended)
mise use python@3.12 uv tool install . --python 3.12
Option 2: Save to file instead (works on all Python versions)
aws-polly-tts-tool synthesize "Hello" --output speech.mp3
Issue: Voice not found
# Symptom Error: Voice 'invalid' not found
Solution:
# List available voices aws-polly-tts-tool list-voices # Filter by engine aws-polly-tts-tool list-voices --engine neural # Case-sensitive voice names aws-polly-tts-tool synthesize "Hello" --voice Joanna # Correct
Issue: Engine not supported by voice
# Symptom Error: Voice doesn't support this engine
Solution:
# Check which engines a voice supports aws-polly-tts-tool list-voices | grep "VoiceName" # Not all voices support all engines # Example: Standard voices don't support neural engine
Issue: Cost Explorer access denied
# Symptom Error: AccessDeniedException when calling GetCostAndUsage
Solution:
Add IAM permission ce:GetCostAndUsage:
{
"Effect": "Allow",
"Action": ["ce:GetCostAndUsage"],
"Resource": "*"
}
Issue: Text too long for engine
# Symptom Error: Text exceeds character limit
Solution:
- •Standard/Neural/Generative: Max 3,000 chars per request
- •Long-form: Max 100,000 chars per request
- •Split long text into chunks or use Long-form engine
Getting Help
# General help aws-polly-tts-tool --help # Command-specific help aws-polly-tts-tool synthesize --help aws-polly-tts-tool list-voices --help # Show version aws-polly-tts-tool --version # Verify configuration aws-polly-tts-tool info
Debug Mode
Use progressive verbosity to diagnose issues:
# Basic debug info aws-polly-tts-tool synthesize "Hello" -V # Detailed debug info aws-polly-tts-tool synthesize "Hello" -VV # Full AWS SDK trace aws-polly-tts-tool synthesize "Hello" -VVV
Best Practices
- •Default to Neural Engine: Best balance of quality and cost for most use cases
- •Use SSML for Control: Add pauses, emphasis, and prosody for natural speech
- •Cache Audio Files: Save synthesized audio to avoid repeated API calls and costs
- •Monitor Costs: Use
billingcommand to track actual spending - •Validate Voice Support: Use
list-voicesto check engine compatibility before synthesis - •Save Critical Audio: Use
--outputto save important audio for offline use - •Use Verbosity: Add
-V/-VV/-VVVwhen debugging issues - •Leverage stdin: Pipe text from files or commands for automation
Resources
- •GitHub: https://github.com/dnvriend/aws-polly-tts-tool
- •Amazon Polly Docs: https://docs.aws.amazon.com/polly/
- •Polly Pricing: https://aws.amazon.com/polly/pricing/
- •SSML Reference: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
- •Boto3 Polly API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/polly.html