Video Processor
Instructions
This skill provides comprehensive video processing utilities including YouTube video download, audio extraction, format conversion, and audio transcription using yt-dlp, FFmpeg, and OpenAI's Whisper model.
Prerequisites
Required tools (must be installed in your environment):
- •
yt-dlp: Video downloader for YouTube and thousands of other sites
bash# Install via pip pip install -U yt-dlp # Verify installation yt-dlp --version
- •
FFmpeg: Multimedia framework for video/audio processing
bash# macOS brew install ffmpeg # Ubuntu/Debian apt-get install ffmpeg # Verify installation ffmpeg -version
- •
OpenAI Whisper: Speech-to-text transcription model
bash# Install via pip pip install -U openai-whisper # Verify installation whisper --help
Python packages (included in script via PEP 723):
- •click (CLI framework)
- •ffmpeg-python (Python wrapper for FFmpeg)
- •yt-dlp (video downloader)
Workflow
Use the scripts/video_processor.py script for all video processing tasks. The script provides a simple CLI with the following commands:
0. Download Video from YouTube or Other Platforms (NEW!)
Download videos from YouTube and thousands of other supported websites:
# Download video uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4 # Download audio only (as MP3) uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --audio-only # Show video info without downloading uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." --info # Download with subtitles uv run .claude/skills/video-processor/scripts/video_processor.py download "https://youtube.com/watch?v=..." output.mp4 --subtitle
Options:
- •
--audio-only: Download audio only (extracts to MP3) - •
--subtitle: Download and embed subtitles (supports en, zh-Hans, zh-Hant) - •
--info: Show video information without downloading - •
--format: Specify video format preference (default: best quality)
1. Extract Audio from Video
Extract the audio track from a video file:
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav
Options:
- •
--format: Output audio format (default: wav). Supports: wav, mp3, aac, flac - •Output is suitable for transcription or standalone audio use
2. Convert Video to MP4
Convert any video file to MP4 format:
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4
Options:
- •
--codec: Video codec (default: libx264). Common options: libx264, libx265, h264 - •
--preset: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslow
3. Convert Video to WebM
Convert any video file to WebM format (web-optimized):
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm
Options:
- •
--codec: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9 - •WebM is optimized for web playback and streaming
4. Transcribe Audio with Whisper
Transcribe audio or video files to text using OpenAI's Whisper model:
# Transcribe video file (audio will be extracted automatically) uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt # Transcribe audio file directly uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt
Options:
- •
--model: Whisper model size (default: base). Options:- •
tiny: Fastest, lowest accuracy (~1GB RAM) - •
base: Fast, good accuracy (~1GB RAM) [DEFAULT] - •
small: Balanced (~2GB RAM) - •
medium: High accuracy (~5GB RAM) - •
large: Best accuracy, slowest (~10GB RAM)
- •
- •
--language: Language code (default: auto-detect). Examples: en, es, fr, de, zh - •
--format: Output format (default: txt). Options: txt, srt, vtt, json
Transcription workflow:
- •If input is video, FFmpeg extracts audio to temporary WAV file
- •Whisper processes the audio file
- •Transcription is saved in requested format
- •Temporary files are cleaned up automatically
5. Combined Workflow Example
Process a video end-to-end:
# 1. Extract audio for analysis uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav # 2. Transcribe to SRT subtitles uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small # 3. Convert to web format uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm
Key Technical Details
FFmpeg and Whisper Integration:
- •FFmpeg doesn't transcribe audio itself - it prepares audio for external transcription
- •The workflow is: Extract audio (FFmpeg) → Transcribe (Whisper) → Optional: Re-integrate with video
- •FFmpeg can pipe audio directly to Whisper for real-time processing (advanced use case)
Audio Format for Transcription:
- •Whisper works best with WAV or MP3 formats
- •Sample rate: 16kHz is optimal (script handles conversion automatically)
- •The script extracts audio with optimal settings for Whisper
Output Formats:
- •txt: Plain text transcript
- •srt: SubRip subtitle format (includes timestamps)
- •vtt: WebVTT subtitle format (web standard)
- •json: Detailed JSON with word-level timestamps
Error Handling
The script includes comprehensive error handling:
- •Validates input files exist
- •Checks FFmpeg and Whisper are installed
- •Provides clear error messages for missing dependencies
- •Handles temporary file cleanup on errors
Performance Tips
- •Use
tinyorbasemodels for quick drafts - •Use
smallormediumfor production transcriptions - •Use
largeonly when maximum accuracy is required - •For long videos, consider extracting audio first, then transcribe in segments
- •WebM conversion with VP9 takes longer but produces smaller files
Examples
Example 1: Quick Video to MP4 Conversion
User request:
I have an AVI file from my old camera. Can you convert it to MP4?
You would:
- •Use the to-mp4 command with default settings:
bash
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4
- •Confirm the conversion completed successfully
- •Inform the user about the output file location
Example 2: Extract Audio and Transcribe
User request:
I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it?
You would:
- •First extract the audio:
bash
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav
- •Then transcribe using the base model (good balance of speed/accuracy):
bash
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base
- •Share the transcript.txt file with the user
Example 3: Create Web-Optimized Video with Subtitles
User request:
I need to put this video on my website with subtitles. Can you help?
You would:
- •Convert to WebM for web optimization:
bash
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm
- •Generate SRT subtitle file:
bash
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small
- •Inform user they now have:
- •presentation.webm (web-optimized video)
- •subtitles.srt (subtitle file for embedding)
Example 4: High-Quality Transcription with Language Specification
User request:
I have a Spanish interview video that needs an accurate transcript for publication.
You would:
- •Use a larger model with language specified for best accuracy:
bash
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.txt --model medium --language es
- •Optionally create SRT for review:
bash
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.srt --format srt --model medium --language es
- •Review the transcript with the user and make any necessary corrections
Example 5: Batch Processing Multiple Videos
User request:
I have a folder of training videos that all need to be converted to WebM and transcribed.
You would:
- •List all video files in the directory:
bash
ls training_videos/*.mp4
- •For each video file, run the conversion and transcription:
bash
# For each video: video1.mp4, video2.mp4, etc. uv run .claude/skills/video-processor/scripts/video_processor.py to-webm training_videos/video1.mp4 output/video1.webm uv run .claude/skills/video-processor/scripts/video_processor.py transcribe training_videos/video1.mp4 output/video1.txt --model base # Repeat for each file
- •Confirm all conversions and transcriptions completed
- •Provide summary of output files
Summary
The video-processor skill provides a unified interface for common video processing tasks:
- •Audio extraction: Extract audio tracks in various formats
- •Format conversion: Convert to MP4 (universal) or WebM (web-optimized)
- •Transcription: Speech-to-text with multiple output formats
- •Flexible: CLI arguments for model selection, language, and output formats
All operations are handled through a single, well-documented script with sensible defaults and comprehensive error handling.