AgentSkillsCN

elevenlabs-voices

借助ElevenLabs API,支持18种人物角色、32种语言、音效处理、批量处理以及语音设计,打造高品质的语音合成效果。

SKILL.md
--- frontmatter
name: elevenlabs-voices
version: 2.0.4
description: High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.
tags: [tts, voice, speech, elevenlabs, audio, sound-effects, voice-design, multilingual]

ElevenLabs Voice Personas v2.0

Comprehensive voice synthesis toolkit using ElevenLabs API.

✨ Features

  • 18 Voice Personas - Carefully curated voices for different use cases
  • 32 Languages - Multi-language synthesis with the multilingual v2 model
  • Streaming Mode - Real-time audio output as it generates
  • Sound Effects (SFX) - AI-generated sound effects from text prompts
  • Batch Processing - Process multiple texts in one go
  • Cost Tracking - Monitor character usage and estimated costs
  • Voice Design - Create custom voices from descriptions
  • Pronunciation Dictionary - Custom word pronunciation rules
  • Moltbot Integration - Works with Moltbot's built-in TTS

🎙️ Available Voices

VoiceAccentGenderPersonaBest For
rachel🇺🇸 USfemalewarmConversations, tutorials
adam🇺🇸 USmalenarratorDocumentaries, audiobooks
bella🇺🇸 USfemaleprofessionalBusiness, presentations
brian🇺🇸 USmalecomfortingMeditation, calm content
george🇬🇧 UKmalestorytellerAudiobooks, storytelling
alice🇬🇧 UKfemaleeducatorTutorials, explanations
callum🇺🇸 USmaletricksterPlayful, gaming
charlie🇦🇺 AUmaleenergeticSports, motivation
jessica🇺🇸 USfemaleplayfulSocial media, casual
lily🇬🇧 UKfemaleactressDrama, elegant content
matilda🇺🇸 USfemaleprofessionalCorporate, news
river🇺🇸 USneutralneutralInclusive, informative
roger🇺🇸 USmalecasualPodcasts, relaxed
daniel🇬🇧 UKmalebroadcasterNews, announcements
eric🇺🇸 USmaletrustworthyBusiness, corporate
chris🇺🇸 USmalefriendlyTutorials, approachable
will🇺🇸 USmaleoptimistMotivation, uplifting
liam🇺🇸 USmalesocialYouTube, social media

🎯 Quick Presets

  • default → rachel (warm, friendly)
  • narrator → adam (documentaries)
  • professional → matilda (corporate)
  • storyteller → george (audiobooks)
  • educator → alice (tutorials)
  • calm → brian (meditation)
  • energetic → liam (social media)
  • trustworthy → eric (business)
  • neutral → river (inclusive)
  • british → george
  • australian → charlie
  • broadcaster → daniel (news)

🌍 Supported Languages (32)

The multilingual v2 model supports these languages:

CodeLanguageCodeLanguage
enEnglishplPolish
deGermannlDutch
esSpanishsvSwedish
frFrenchdaDanish
itItalianfiFinnish
ptPortuguesenoNorwegian
ruRussiantrTurkish
ukUkrainiancsCzech
jaJapaneseskSlovak
koKoreanhuHungarian
zhChineseroRomanian
arArabicbgBulgarian
hiHindihrCroatian
taTamilelGreek
idIndonesianmsMalay
viVietnamesethThai
bash
# Synthesize in German
python3 tts.py --text "Guten Tag!" --voice rachel --lang de

# Synthesize in French
python3 tts.py --text "Bonjour le monde!" --voice adam --lang fr

# List all languages
python3 tts.py --languages

💻 CLI Usage

Basic Text-to-Speech

bash
# List all voices
python3 scripts/tts.py --list

# Generate speech
python3 scripts/tts.py --text "Hello world" --voice rachel --output hello.mp3

# Use a preset
python3 scripts/tts.py --text "Breaking news..." --voice broadcaster --output news.mp3

# Multi-language
python3 scripts/tts.py --text "Bonjour!" --voice rachel --lang fr --output french.mp3

Streaming Mode

Generate audio with real-time streaming (good for long texts):

bash
# Stream audio as it generates
python3 scripts/tts.py --text "This is a long story..." --voice adam --stream

# Streaming with custom output
python3 scripts/tts.py --text "Chapter one..." --voice george --stream --output chapter1.mp3

Batch Processing

Process multiple texts from a file:

bash
# From newline-separated text file
python3 scripts/tts.py --batch texts.txt --voice rachel --output-dir ./audio

# From JSON file
python3 scripts/tts.py --batch batch.json --output-dir ./output

JSON batch format:

json
[
  {"text": "First line", "voice": "rachel", "output": "line1.mp3"},
  {"text": "Second line", "voice": "adam", "output": "line2.mp3"},
  {"text": "Third line"}
]

Simple text format (one per line):

code
Hello, this is the first sentence.
This is the second sentence.
And this is the third.

Usage Statistics

bash
# Show usage stats and cost estimates
python3 scripts/tts.py --stats

# Reset statistics
python3 scripts/tts.py --reset-stats

🎵 Sound Effects (SFX)

Generate AI-powered sound effects from text descriptions:

bash
# Generate a sound effect
python3 scripts/sfx.py --prompt "Thunder rumbling in the distance"

# With specific duration (0.5-22 seconds)
python3 scripts/sfx.py --prompt "Cat meowing" --duration 3 --output cat.mp3

# Adjust prompt influence (0.0-1.0)
python3 scripts/sfx.py --prompt "Footsteps on gravel" --influence 0.5

# Batch SFX generation
python3 scripts/sfx.py --batch sounds.json --output-dir ./sfx

# Show prompt examples
python3 scripts/sfx.py --examples

Example prompts:

  • "Thunder rumbling in the distance"
  • "Cat purring contentedly"
  • "Typing on a mechanical keyboard"
  • "Spaceship engine humming"
  • "Coffee shop background chatter"

🎨 Voice Design

Create custom voices from text descriptions:

bash
# Basic voice design
python3 scripts/voice-design.py --gender female --age middle_aged --accent american \
  --description "A warm, motherly voice"

# With custom preview text
python3 scripts/voice-design.py --gender male --age young --accent british \
  --text "Welcome to the adventure!" --output preview.mp3

# Save to your ElevenLabs library
python3 scripts/voice-design.py --gender female --age young --accent american \
  --description "Energetic podcast host" --save "MyHost"

# List all design options
python3 scripts/voice-design.py --options

Voice Design Options:

OptionValues
Gendermale, female, neutral
Ageyoung, middle_aged, old
Accentamerican, british, african, australian, indian, latin, middle_eastern, scandinavian, eastern_european
Accent Strength0.3-2.0 (subtle to strong)

📖 Pronunciation Dictionary

Customize how words are pronounced:

Edit pronunciations.json:

json
{
  "rules": [
    {
      "word": "Moltbot",
      "replacement": "Clawd bot",
      "comment": "Pronounce as two words"
    },
    {
      "word": "API",
      "replacement": "A P I",
      "comment": "Spell out acronym"
    }
  ]
}

Usage:

bash
# Pronunciations are applied automatically
python3 scripts/tts.py --text "The Moltbot API is great" --voice rachel

# Disable pronunciations
python3 scripts/tts.py --text "The API is great" --voice rachel --no-pronunciations

💰 Cost Tracking

The skill tracks your character usage and estimates costs:

bash
python3 scripts/tts.py --stats

Output:

code
📊 ElevenLabs Usage Statistics

  Total Characters: 15,230
  Total Requests:   42
  Since:            2024-01-15

💰 Estimated Costs:
  Starter    $4.57 ($0.30/1k chars)
  Creator    $3.66 ($0.24/1k chars)
  Pro        $2.74 ($0.18/1k chars)
  Scale      $1.68 ($0.11/1k chars)

🤖 Moltbot TTS Integration

Using with Moltbot's Built-in TTS

Moltbot has built-in TTS support that can use ElevenLabs. Configure in ~/.moltbot/moltbot.json:

json
{
  "tts": {
    "enabled": true,
    "provider": "elevenlabs",
    "elevenlabs": {
      "apiKey": "your-api-key-here",
      "voice": "rachel",
      "model": "eleven_multilingual_v2"
    }
  }
}

Triggering TTS in Chat

In Moltbot conversations:

  • Use /tts on to enable automatic TTS
  • Use the tts tool directly for one-off speech
  • Request "read this aloud" or "speak this"

Using Skill Scripts from Moltbot

bash
# Moltbot can run these scripts directly
exec python3 /path/to/skills/elevenlabs-voices/scripts/tts.py --text "Hello" --voice rachel

⚙️ Configuration

The scripts look for API key in this order:

  1. ELEVEN_API_KEY or ELEVENLABS_API_KEY environment variable
  2. Moltbot config (~/.moltbot/moltbot.json → tts.elevenlabs.apiKey)
  3. Skill-local .env file

Create .env file:

bash
echo 'ELEVEN_API_KEY=your-key-here' > .env

🎛️ Voice Settings

Each voice has tuned settings for optimal output:

SettingRangeDescription
stability0.0-1.0Higher = consistent, lower = expressive
similarity_boost0.0-1.0How closely to match original voice
style0.0-1.0Exaggeration of speaking style

📝 Triggers

  • "use {voice_name} voice"
  • "speak as {persona}"
  • "list voices"
  • "voice settings"
  • "generate sound effect"
  • "design a voice"

📁 Files

code
elevenlabs-voices/
├── SKILL.md              # This documentation
├── README.md             # Quick start guide
├── voices.json           # Voice definitions & settings
├── pronunciations.json   # Custom pronunciation rules
├── examples.md           # Detailed usage examples
├── scripts/
│   ├── tts.py            # Main TTS script
│   ├── sfx.py            # Sound effects generator
│   └── voice-design.py   # Voice design tool
└── references/
    └── voice-guide.md    # Voice selection guide

🔗 Links


📋 Changelog

v2.0.0

  • Added 32 language support with --lang parameter
  • Added streaming mode with --stream flag
  • Added sound effects generation (sfx.py)
  • Added batch processing with --batch flag
  • Added cost tracking with --stats flag
  • Added voice design tool (voice-design.py)
  • Added pronunciation dictionary support
  • Added Moltbot TTS integration documentation
  • Improved error handling and progress output