Whisper Transcribe Skill
Transcribe audio and video files to text using OpenAI's Whisper with contextual grounding from markdown files.
Purpose
Intelligent audio/video transcription that:
- •Converts media files to accurate text transcripts
- •Uses markdown context files to correct technical terms, names, and jargon
- •Handles various audio/video formats (mp3, wav, m4a, mp4, webm, etc.)
When to Use
- •User asks to transcribe an audio or video file
- •User wants to convert a recording to text
- •User mentions "whisper" in context of transcription
- •User needs meeting notes or interview transcripts
- •User has media files with domain-specific terminology
Installation
macOS (Recommended for MacBook Pro)
# Install via Homebrew (recommended) brew install ffmpeg openai-whisper # Verify installation whisper --version
Linux/pip Installation
# Install ffmpeg first sudo apt install ffmpeg # Debian/Ubuntu # or: sudo dnf install ffmpeg # Fedora # Install Whisper pip install openai-whisper
Verify Installation
whisper --version ffmpeg -version
Transcription Workflow
Step 1: Identify Media File and Context
- •Locate the audio/video file to transcribe
- •Check for markdown files in the same directory (context files)
- •If no context files exist, optionally create one using
assets/context-template.md
Step 2: Run Whisper Transcription
Basic transcription:
whisper "/path/to/audio.mp3" --output_dir "/path/to/output"
With model selection (trade-off: speed vs accuracy):
# Fast (less accurate) whisper "audio.mp3" --model tiny # Balanced (recommended) whisper "audio.mp3" --model base # High quality whisper "audio.mp3" --model small # Best quality (slower, requires more RAM) whisper "audio.mp3" --model medium whisper "audio.mp3" --model large
With language specification:
whisper "audio.mp3" --language en
Output format options:
whisper "audio.mp3" --output_format txt # Plain text whisper "audio.mp3" --output_format srt # Subtitles whisper "audio.mp3" --output_format vtt # Web subtitles whisper "audio.mp3" --output_format json # Detailed JSON whisper "audio.mp3" --output_format all # All formats
Step 3: Apply Context Grounding
Use the scripts/transcribe_with_context.py script for automated grounding, or manually apply corrections:
# Automated approach (recommended) python scripts/transcribe_with_context.py /path/to/audio.mp3
For manual grounding:
- •Read the transcript output
- •Read all
.mdfiles in the media file's directory - •Extract terminology, names, and technical terms from context files
- •Search transcript for likely misrecognitions
- •Apply corrections based on context
Common corrections:
- •"cooler net ease" -> "Kubernetes"
- •"sequel" -> "SQL"
- •"post gress" -> "Postgres"
- •Names: Match phonetic variations to names in context files
Step 4: Save Corrected Transcript
Save the grounded transcript with a clear filename:
original_filename_transcript.txt original_filename_transcript.md
Context Files
Context files are markdown files in the same directory as the media file. They provide grounding information to improve transcription accuracy.
What to Include in Context Files
- •People: Names of speakers, team members, interviewees
- •Technical Terms: Domain-specific vocabulary, product names
- •Acronyms: Abbreviations and their expansions
- •Organizations: Company names, department names
- •Projects: Project codenames, feature names
Context File Example
See assets/context-template.md for a complete template.
# Meeting Context ## Speakers - Richard Hightower (host) - Jane Smith (engineering lead) ## Technical Terms - Kubernetes (container orchestration) - FastAPI (Python web framework) - AlloyDB (Google Cloud database) ## Acronyms - CI/CD - Continuous Integration/Continuous Deployment - PR - Pull Request
Model Selection Guide
Use base for general use, medium for important recordings. See references/whisper-options.md for full model comparison and all available options.
Quick reference: tiny (fastest) < base (balanced) < small (better) < medium (high) < large (best accuracy)
For MacBook Pro with Apple Silicon: small or medium models recommended for best speed/accuracy balance.
Troubleshooting
"whisper: command not found"
# macOS brew install openai-whisper # Linux pip install openai-whisper export PATH="$HOME/.local/bin:$PATH"
"ffmpeg not found"
# macOS brew install ffmpeg # Linux sudo apt install ffmpeg
Out of memory errors
Use a smaller model:
whisper "audio.mp3" --model tiny
Slow transcription
- •Use
tinyorbasemodel for faster results - •Ensure correct architecture is being used (Apple Silicon vs Intel)
Resources
scripts/
The scripts/transcribe_with_context.py script automates the full workflow:
- •Finds context files automatically
- •Runs Whisper transcription
- •Applies context-based corrections
- •Saves the final transcript
Usage:
python scripts/transcribe_with_context.py /path/to/audio.mp3
references/
See references/whisper-options.md for complete CLI reference and advanced options.
assets/
The assets/context-template.md provides a template for creating context files to improve transcription accuracy.