AgentSkillsCN

elevenlabs-transcribe

借助 ElevenLabs Scribe 将音频转录为文本。支持批量转录、从 URL 实时流式传输、麦克风输入以及本地文件导入。

SKILL.md
--- frontmatter
name: elevenlabs-transcribe
description: Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
homepage: https://elevenlabs.io/speech-to-text
metadata: {"clawdbot":{"emoji":"🎙️","requires":{"bins":["ffmpeg","python3"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY"}}

ElevenLabs Speech-to-Text

Official ElevenLabs skill for speech-to-text transcription.

Convert audio to text with state-of-the-art accuracy. Supports 90+ languages, speaker diarization, and realtime streaming.

Prerequisites

  • ffmpeg installed (brew install ffmpeg on macOS)
  • ELEVENLABS_API_KEY environment variable set
  • Python 3.8+ (dependencies auto-install on first run)

Usage

bash
{baseDir}/scripts/transcribe.sh <audio_file> [options]
{baseDir}/scripts/transcribe.sh --url <stream_url> [options]
{baseDir}/scripts/transcribe.sh --mic [options]

Examples

Batch Transcription

Transcribe a local audio file:

bash
{baseDir}/scripts/transcribe.sh recording.mp3

With speaker identification:

bash
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize

Get full JSON response with timestamps:

bash
{baseDir}/scripts/transcribe.sh interview.wav --diarize --json

Realtime Streaming

Stream from a URL (e.g., live radio, podcast):

bash
{baseDir}/scripts/transcribe.sh --url https://npr-ice.streamguys1.com/live.mp3

Transcribe from microphone:

bash
{baseDir}/scripts/transcribe.sh --mic

Stream a local file in realtime (useful for testing):

bash
{baseDir}/scripts/transcribe.sh audio.mp3 --realtime

Quiet Mode for Agents

Suppress status messages on stderr:

bash
{baseDir}/scripts/transcribe.sh --mic --quiet

Options

OptionDescription
--diarizeIdentify different speakers in the audio
--lang CODEISO language hint (e.g., en, pt, es, fr)
--jsonOutput full JSON with timestamps and metadata
--eventsTag audio events (laughter, music, applause)
--realtimeStream local file instead of batch processing
--partialsShow interim transcripts during realtime mode
-q, --quietSuppress status messages (recommended for agents)

Output Format

Text Mode (default)

Plain text transcription:

code
The quick brown fox jumps over the lazy dog.

JSON Mode (--json)

json
{
  "text": "The quick brown fox jumps over the lazy dog.",
  "language_code": "eng",
  "language_probability": 0.98,
  "words": [
    {"text": "The", "start": 0.0, "end": 0.15, "type": "word", "speaker_id": "speaker_0"}
  ]
}

Realtime Mode

Final transcripts print as they're committed. With --partials:

code
[partial] The quick
[partial] The quick brown fox
The quick brown fox jumps over the lazy dog.

Supported Formats

Audio: MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP

Limits: Up to 3GB file size, 10 hours duration

Error Handling

The script exits with non-zero status on errors:

  • Missing API key: Set ELEVENLABS_API_KEY environment variable
  • File not found: Check the file path exists
  • Missing ffmpeg: Install with your package manager
  • API errors: Check API key validity and rate limits

When to Use Each Mode

ScenarioCommand
Transcribe a recording./transcribe.sh file.mp3
Meeting with multiple speakers./transcribe.sh meeting.mp3 --diarize
Live radio/podcast stream./transcribe.sh --url <url>
Voice input from user./transcribe.sh --mic --quiet
Need word timestamps./transcribe.sh file.mp3 --json