Audio Transcription Skill

This skill provides high-quality speech-to-text transcription using multiple AI providers. It automatically handles large files through compression and chunking.

Supported Providers

ElevenLabs Scribe

•Accuracy: 96.7% for English (industry-leading)
•Max file size: 3GB / 10 hours
•Features: Speaker diarization (up to 32 speakers), word-level timestamps
•Cost: $0.40/hour
•Best for: Multi-speaker recordings, highest accuracy needs

OpenAI Whisper

•Accuracy: Excellent
•Max file size: 25MB (automatic chunking for larger files)
•Features: Segment timestamps, language detection
•Cost: $0.006/min ($0.003/min with GPT-4o Mini)
•Best for: Standard transcription, good balance of cost and quality

Google Gemini

•Accuracy: Very good
•Max file size: 2GB
•Features: Multimodal analysis, summarization capabilities
•Cost: ~$0.09-0.23/hour (generous free tier available)
•Best for: Cost-sensitive projects, multimodal needs

Usage

Basic Transcription

bash

bun run src/index.ts transcribe \
  --provider openai \
  --input ./recording.mp3

With Speaker Diarization

bash

bun run src/index.ts transcribe \
  --provider elevenlabs \
  --input ./meeting.mp3 \
  --diarize \
  --timestamps \
  --format srt

Export to Subtitles

bash

bun run src/index.ts transcribe \
  --provider gemini \
  --input ./video.mp4 \
  --format vtt \
  --output ./captions.vtt

View Provider Info

bash

bun run src/index.ts providers

Output Formats

Format	Extension	Description
text	.txt	Plain text transcript
srt	.srt	SubRip subtitle format
vtt	.vtt	WebVTT subtitle format
json	.json	Full structured data with metadata

Large File Handling

The skill automatically handles files larger than provider limits:

•Compression: For OpenAI, files are first compressed using Opus codec
•Chunking: Files are split into 10-minute segments with overlap
•Merging: Results are intelligently merged to avoid duplicates

Configuration

bash

# ElevenLabs
export ELEVENLABS_API_KEY=your_key

# OpenAI
export OPENAI_API_KEY=your_key

# Google Gemini
export GOOGLE_API_KEY=your_key

Dependencies

For chunking support (OpenAI with large files):

•ffmpeg - Audio processing
•ffprobe - Duration detection

Install on macOS:

bash

brew install ffmpeg