AgentSkillsCN

youtube-transcripts

从 YouTube 视频中提取字幕与文本。当用户说“获取字幕”、“转录这段视频”、“这段 YouTube 视频在讲什么”、“提取视频字幕”、“YouTube 字幕”或提供了 YouTube URL 并希望获取视频中的文字内容时,可使用此技能。

SKILL.md
--- frontmatter
name: youtube-transcripts
description: >
  Extract transcripts from YouTube videos. Use when user says "get transcript",
  "transcribe this video", "what does this YouTube video say", "extract captions",
  "youtube transcript", or provides a YouTube URL and wants the text content.
allowed-tools: Bash, Read
triggers:
  - get transcript
  - transcribe video
  - youtube transcript
  - extract captions
  - what does this video say
  - get subtitles from
  - youtube url text
metadata:
  short-description: YouTube transcript extraction

YouTube Transcripts Skill

Extract transcripts from YouTube videos with three-tier fallback:

  1. Direct - youtube-transcript-api (fastest)
  2. Proxy - IPRoyal residential proxy rotation (handles rate limits)
  3. Whisper - yt-dlp audio download → OpenAI Whisper (last resort)

Quick Start

bash
# Get transcript (auto-fallback through all tiers)
python .agents/skills/youtube-transcripts/youtube_transcript.py get -i dQw4w9WgXcQ

# Skip proxy tier
python .agents/skills/youtube-transcripts/youtube_transcript.py get -i VIDEO_ID --no-proxy

# Skip whisper tier
python .agents/skills/youtube-transcripts/youtube_transcript.py get -i VIDEO_ID --no-whisper

# List available transcript languages
python .agents/skills/youtube-transcripts/youtube_transcript.py list-languages -i VIDEO_ID

# Check proxy configuration
python .agents/skills/youtube-transcripts/youtube_transcript.py check-proxy

Commands

Get Transcript

bash
python .agents/skills/youtube-transcripts/youtube_transcript.py get \
  --url "https://youtube.com/watch?v=dQw4w9WgXcQ" \
  --lang en

Options:

OptionShortDescription
--url-uYouTube video URL
--video-id-iVideo ID directly
--lang-lLanguage code (default: en)
--no-proxySkip proxy tier
--no-whisperSkip Whisper fallback tier
--retries-rMax retries per tier (default: 3)

Output: JSON with transcript segments (text, start time, duration)

List Available Languages

bash
python .agents/skills/youtube-transcripts/youtube_transcript.py list-languages -i VIDEO_ID

Output: JSON with available transcript languages

Check Proxy

bash
python .agents/skills/youtube-transcripts/youtube_transcript.py check-proxy
python .agents/skills/youtube-transcripts/youtube_transcript.py check-proxy --test-rotation

Tests IPRoyal proxy connectivity and IP rotation.

Output Format

json
{
  "meta": {
    "video_id": "dQw4w9WgXcQ",
    "language": "en",
    "took_ms": 3029,
    "method": "direct"
  },
  "transcript": [
    {"text": "Hello world", "start": 0.0, "duration": 2.5},
    {"text": "This is a test", "start": 2.5, "duration": 3.0}
  ],
  "full_text": "Hello world This is a test...",
  "errors": []
}

Method values: direct, proxy, whisper, or null (if all failed)

Three-Tier Fallback

Tier 1: Direct

  • Uses youtube-transcript-api without proxy
  • Fastest, no additional cost
  • May fail with rate limits on repeated requests

Tier 2: IPRoyal Proxy

  • Uses IPRoyal residential proxy (auto-rotates IPs)
  • Handles rate limiting (429) and blocking (403)
  • Requires proxy credentials

Environment variables:

VariableDescription
IPROYAL_HOSTProxy host (e.g., geo.iproyal.com)
IPROYAL_PORTProxy port (e.g., 12321)
IPROYAL_USERProxy username
IPROYAL_PASSWORDProxy password

Tier 3: Whisper Fallback

  • Downloads audio with yt-dlp
  • Transcribes with OpenAI Whisper API
  • Works for videos with disabled captions
  • Costs ~$0.006/minute of audio

Environment variables:

VariableDescription
OPENAI_API_KEYOpenAI API key for Whisper

Dependencies

bash
pip install youtube-transcript-api requests yt-dlp openai
  • youtube-transcript-api - Tier 1 & 2
  • requests - Proxy support
  • yt-dlp - Tier 3 audio download
  • openai - Tier 3 transcription

Integration with Memory

bash
# Get transcript
python .agents/skills/youtube-transcripts/youtube_transcript.py get -i VIDEO_ID > transcript.json

# Ingest into memory
memory-agent workspace-ingest --source transcript.json --scope youtube

Limitations

  • Tier 1-2 require captions (auto-generated or manual)
  • Tier 3 (Whisper) works for any video but costs money
  • Private/unlisted videos may not be accessible
  • Very long videos may exceed Whisper file size limits (25MB)