AgentSkillsCN

audio-transcriber

当用户提供了音频文件路径,请求对音频进行转录、收听录音,或处理语音消息时使用。支持 .opus、.mp3、.m4a、.wav、.ogg、.flac、.webm 等文件扩展名,或提及 WhatsApp 音频、语音备忘录、录音文件或播客等内容。

SKILL.md
--- frontmatter
name: audio-transcriber
description: Use when user provides an audio file path, asks to transcribe audio, listen to a recording, or process voice messages. Triggers on file extensions like .opus, .mp3, .m4a, .wav, .ogg, .flac, .webm, or mentions of WhatsApp audio, voice notes, recordings, or podcasts.

Audio Transcriber

Transcribe audio files locally using OpenAI Whisper. Claude cannot hear — this skill bridges that gap by running Whisper on the user's machine.

When to Use

  • User drops an audio file path (any format: .opus, .mp3, .m4a, .wav, .ogg, .flac, .webm, .aac, .wma)
  • User asks to "listen to", "transcribe", or "process" a recording
  • User mentions WhatsApp audio, voice notes, or voice messages
  • User wants subtitles/captions generated from audio or video

Prerequisites Check

Before transcribing, verify the toolchain:

bash
# 1. Check if Whisper is installed
which whisper

# 2. If not installed, install via pipx (preferred on macOS)
pipx install openai-whisper

# 3. If pipx unavailable, use pip with venv
python3 -m venv /tmp/whisper-env && source /tmp/whisper-env/bin/activate && pip install openai-whisper

Transcription Command

bash
whisper "<audio-file-path>" \
  --model medium \
  --output_dir "<scratchpad-dir>" \
  --output_format txt

Use the session scratchpad directory for output — never pollute the user's project.

Model Selection

ModelSizeSpeedAccuracyUse When
tiny39MBFastestLowQuick gist, short clear audio
base74MBFastFairSimple, clear speech
small244MBModerateGoodMost casual use
medium1.5GBSlowGreatDefault — best tradeoff
large-v33GBSlowestBestNon-English, heavy accents, noisy audio

Default to medium. Upgrade to large-v3 if user reports poor quality or audio is non-English with heavy dialect.

Language Handling

  • Whisper auto-detects language — no flag needed in most cases
  • Force language if auto-detection fails: --language de (ISO code or full name)
  • For translation to English, use: --task translate

Output Formats

FormatFlagUse Case
.txt--output_format txtDefault — plain transcript
.srt--output_format srtSubtitles with timestamps
.vtt--output_format vttWeb subtitles (WebVTT)
.json--output_format jsonProgrammatic access, word-level timestamps
all--output_format allGenerate every format

Workflow

  1. Verify file existsls -la "<path>"
  2. Check Whisper installed — install if missing
  3. Run transcription with medium model, output to scratchpad
  4. Read the .txt output and present to user
  5. Offer follow-ups: summarize, translate, extract action items, analyze

Common Issues

ProblemFix
pip install blocked by PEP 668Use pipx install openai-whisper instead
Whisper not found after installCheck ~/.local/bin is in PATH, or use full path
Poor transcription qualityUpgrade to large-v3, or force correct --language
Unsupported format errorConvert first: ffmpeg -i input.xyz output.wav
Very long audio (>30 min)Still works, just takes time. Warn user about duration.

Multiple Files

Whisper accepts multiple files:

bash
whisper file1.opus file2.mp3 file3.m4a --model medium --output_dir "<scratchpad>"

Or loop for separate handling:

bash
for f in /path/to/audios/*.opus; do
  whisper "$f" --model medium --output_dir "<scratchpad>" --output_format txt
done