AgentSkillsCN

elevenlabs

适用于使用 ElevenLabs 生成语音、旁白及音频内容。当用户输入“文本转语音”“生成音频”“旁白”“克隆声音”“转录音频”“音效”“ElevenLabs”等指令,或当您需要为课程、视频或播客制作音频时,此技能便会自动触发。该技能依托 ElevenLabs MCP 服务器,实现无缝对接与集成。

SKILL.md
--- frontmatter
name: elevenlabs
description: Use for generating speech, voiceovers, and audio content with ElevenLabs. Triggers on "text to speech", "generate audio", "voiceover", "clone voice", "transcribe audio", "sound effects", "ElevenLabs", or when creating audio for courses, videos, or podcasts. Leverages ElevenLabs MCP server for direct integration.

Generating Audio with ElevenLabs

This skill enables AI-powered audio generation using ElevenLabs' text-to-speech, voice cloning, transcription, and sound effects capabilities. It integrates via the official ElevenLabs MCP server for seamless workflow.

MCP Server Integration

The ElevenLabs MCP server provides direct access to all ElevenLabs capabilities. When properly configured, Claude can generate speech, clone voices, and process audio through natural language commands.

Verifying MCP Server Availability

Check if the ElevenLabs MCP server is configured by looking for available tools. If not available, guide the user through setup (see ./mcp-server-setup.md).

Available MCP Tools

When the ElevenLabs MCP server is active, these tools become available:

ToolPurpose
text_to_speechConvert text to natural speech
get_voicesList available voices
voice_cloneClone a voice from audio samples
transcribeConvert audio/video to text
sound_effectsGenerate sound effects from descriptions
voice_isolateSeparate speech from background noise
audio_convertApply voice effects to audio

Core Workflows

1. Text-to-Speech Generation

Generate voiceovers for course content, videos, or podcasts:

code
Generate speech for: "Welcome to Module 1. In this lesson, we'll explore..."
Voice: Use a warm, professional voice
Model: eleven_multilingual_v2 (for stability) or eleven_flash_v2_5 (for speed)

Model Selection Guide:

  • Eleven v3 (Alpha): Most expressive, dramatic delivery, 70+ languages, 5K char limit
  • Eleven Multilingual v2: Most stable for longer content, 29 languages, 10K char limit
  • Eleven Flash v2.5: Ultra-low latency (~75ms), 32 languages, 40K char limit
  • Eleven Turbo v2.5: Balance of quality/speed (250-300ms), 32 languages, 40K char limit

2. Voice Cloning

Create custom voices from audio samples:

  1. Prepare 1-3 minutes of clean audio (no background noise)
  2. Use voice_clone tool with the audio file path
  3. Name the voice descriptively (e.g., "course-narrator-professional")

3. Audio Transcription

Transcribe recordings using Scribe models:

code
Transcribe the audio file at: ./recordings/interview.mp3
Include speaker diarization (up to 32 speakers)

Scribe Models:

  • Scribe v1: 99 languages, speaker diarization
  • Scribe v2 Realtime: 90 languages, ~150ms latency

4. Sound Effects Generation

Create custom sound effects for videos:

code
Generate sound effect: "gentle notification chime"
Generate sound effect: "applause from small audience"

Course Audio Production Workflow

For producing course audio content:

Step 1: Prepare Scripts

Ensure scripts are finalized and saved as text files. See ./voice-generation-workflows.md for script formatting tips.

Step 2: Select Voice

code
List available voices with get_voices
Choose based on:
- Gender and age range
- Accent and language
- Tone (professional, casual, energetic)

Step 3: Generate Audio

code
For each lesson script:
1. Generate speech with selected voice
2. Save to content/audio/module-XX/lesson-XX.mp3
3. Verify audio quality

Step 4: Post-Processing

  • Use voice_isolate to clean any recordings with background noise
  • Apply consistent audio levels across all files

Output File Organization

code
content/
├── audio/
│   ├── module-00/
│   │   ├── lesson-01-intro.mp3
│   │   └── lesson-02-overview.mp3
│   ├── module-01/
│   │   └── ...
│   └── sound-effects/
│       ├── transition-chime.mp3
│       └── success-notification.mp3
└── transcripts/
    └── ...

Best Practices

For Voiceovers

  • Keep segments under 5,000 characters for v3, under 10,000 for Multilingual v2
  • Add natural pauses with ... or line breaks
  • Test voice with a short sample before full generation
  • Use consistent voice across a course for professional feel

For Voice Cloning

  • Use high-quality source audio (no compression artifacts)
  • Provide diverse samples (different emotions, pacing)
  • Label cloned voices clearly in your project

For Transcription

  • Enable speaker diarization for interviews/dialogues
  • Request timestamps for video synchronization
  • Review and correct specialized terminology

Troubleshooting

IssueSolution
MCP server not availableSee ./mcp-server-setup.md for configuration
Audio quality issuesTry different model or voice settings
Timeout on large filesBreak content into smaller segments
Voice sounds unnaturalAdjust text formatting, add punctuation for pacing

API Credits

ElevenLabs uses a credit-based system:

  • Free tier: 10,000 characters/month
  • Flash/Turbo models: 50% lower cost per character
  • Monitor usage at elevenlabs.io dashboard

References