YouTube Transcripts Skill
Extract transcripts from YouTube videos with three-tier fallback:
- •Direct - youtube-transcript-api (fastest)
- •Proxy - IPRoyal residential proxy rotation (handles rate limits)
- •Whisper - yt-dlp audio download → OpenAI Whisper (last resort)
Quick Start
bash
# Get transcript (auto-fallback through all tiers) python .agents/skills/youtube-transcripts/youtube_transcript.py get -i dQw4w9WgXcQ # Skip proxy tier python .agents/skills/youtube-transcripts/youtube_transcript.py get -i VIDEO_ID --no-proxy # Skip whisper tier python .agents/skills/youtube-transcripts/youtube_transcript.py get -i VIDEO_ID --no-whisper # List available transcript languages python .agents/skills/youtube-transcripts/youtube_transcript.py list-languages -i VIDEO_ID # Check proxy configuration python .agents/skills/youtube-transcripts/youtube_transcript.py check-proxy
Commands
Get Transcript
bash
python .agents/skills/youtube-transcripts/youtube_transcript.py get \ --url "https://youtube.com/watch?v=dQw4w9WgXcQ" \ --lang en
Options:
| Option | Short | Description |
|---|---|---|
--url | -u | YouTube video URL |
--video-id | -i | Video ID directly |
--lang | -l | Language code (default: en) |
--no-proxy | Skip proxy tier | |
--no-whisper | Skip Whisper fallback tier | |
--retries | -r | Max retries per tier (default: 3) |
Output: JSON with transcript segments (text, start time, duration)
List Available Languages
bash
python .agents/skills/youtube-transcripts/youtube_transcript.py list-languages -i VIDEO_ID
Output: JSON with available transcript languages
Check Proxy
bash
python .agents/skills/youtube-transcripts/youtube_transcript.py check-proxy python .agents/skills/youtube-transcripts/youtube_transcript.py check-proxy --test-rotation
Tests IPRoyal proxy connectivity and IP rotation.
Output Format
json
{
"meta": {
"video_id": "dQw4w9WgXcQ",
"language": "en",
"took_ms": 3029,
"method": "direct"
},
"transcript": [
{"text": "Hello world", "start": 0.0, "duration": 2.5},
{"text": "This is a test", "start": 2.5, "duration": 3.0}
],
"full_text": "Hello world This is a test...",
"errors": []
}
Method values: direct, proxy, whisper, or null (if all failed)
Three-Tier Fallback
Tier 1: Direct
- •Uses youtube-transcript-api without proxy
- •Fastest, no additional cost
- •May fail with rate limits on repeated requests
Tier 2: IPRoyal Proxy
- •Uses IPRoyal residential proxy (auto-rotates IPs)
- •Handles rate limiting (429) and blocking (403)
- •Requires proxy credentials
Environment variables:
| Variable | Description |
|---|---|
IPROYAL_HOST | Proxy host (e.g., geo.iproyal.com) |
IPROYAL_PORT | Proxy port (e.g., 12321) |
IPROYAL_USER | Proxy username |
IPROYAL_PASSWORD | Proxy password |
Tier 3: Whisper Fallback
- •Downloads audio with yt-dlp
- •Transcribes with OpenAI Whisper API
- •Works for videos with disabled captions
- •Costs ~$0.006/minute of audio
Environment variables:
| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key for Whisper |
Dependencies
bash
pip install youtube-transcript-api requests yt-dlp openai
- •
youtube-transcript-api- Tier 1 & 2 - •
requests- Proxy support - •
yt-dlp- Tier 3 audio download - •
openai- Tier 3 transcription
Integration with Memory
bash
# Get transcript python .agents/skills/youtube-transcripts/youtube_transcript.py get -i VIDEO_ID > transcript.json # Ingest into memory memory-agent workspace-ingest --source transcript.json --scope youtube
Limitations
- •Tier 1-2 require captions (auto-generated or manual)
- •Tier 3 (Whisper) works for any video but costs money
- •Private/unlisted videos may not be accessible
- •Very long videos may exceed Whisper file size limits (25MB)