YouTube
Overview
This skill handles YouTube links and transcript operations. It automatically cleans YouTube URLs to canonical short form and fetches transcripts using the MCP YouTube transcript tool, saving them directly to the Bookmarks database to avoid filling up chat context.
Core Capabilities
1. URL Cleaning
When YouTube URLs are encountered (pasted, mentioned, or used), automatically clean them:
Always convert to short form: https://youtu.be/<id>
Remove all:
- •Query parameters (e.g.,
?v=,?si=,&feature=, etc.) - •Tracking identifiers
- •Playlist parameters
- •Timestamp parameters
Supported input formats:
- •
https://www.youtube.com/watch?v=VIDEO_ID(any query params) - •
https://youtube.com/watch?v=VIDEO_ID - •
https://youtu.be/VIDEO_ID(any query params) - •
https://m.youtube.com/watch?v=VIDEO_ID - •
https://www.youtube.com/embed/VIDEO_ID
Use the utility script:
from scripts.youtube_utils import clean_youtube_url, extract_video_id # Clean any YouTube URL clean_url = clean_youtube_url(dirty_url) # Result: "https://youtu.be/eIoohUmYpGI" # Or just extract the ID video_id = extract_video_id(dirty_url) # Result: "eIoohUmYpGI"
2. Transcript Fetching and Saving to Bookmarks
Key principle: Save transcripts directly to Database/Bookmarks to avoid polluting chat context with full transcript content. This allows handling multiple video links without context overflow.
Output location: Database/Bookmarks/youtube-{video_id}.md
Caching: The Database/Bookmarks directory acts as a cache. If a video already has a bookmark file with a valid transcript (not empty/failed), it won't be re-fetched unless forced.
Workflow:
- •Clean the URL using
clean_youtube_url(url)to get canonical short form - •Check if bookmark exists - if valid transcript exists, skip fetching
- •Fetch transcript using
mcp__youtube-transcript__get_transcriptMCP tool (only if not cached) - •Save directly to Bookmarks using the
save_transcript.pyscript - •Return only success message with
cachedflag (not the full transcript)
Simplified workflow for Claude:
When a user provides a YouTube URL, use the process_youtube_url.py script which handles everything:
echo '{"url": "URL", "lang": "en", "force": false}' | \
python3 .claude/skills/youtube/scripts/process_youtube_url.py
This single command:
- •Checks if transcript is already cached in Bookmarks
- •If cached, returns immediately with
cached: true - •If not cached, fetches transcript directly using youtube-transcript-api (Python library)
- •Saves to Database/Bookmarks with proper formatting
- •Returns success status without showing full transcript content
Output:
{
"success": true,
"video_id": "abc123",
"filepath": "Database/Bookmarks/youtube-abc123.md",
"url": "https://youtu.be/abc123",
"title": "Video Title",
"cached": false
}
Benefits:
- •No MCP tool needed - fetches transcripts directly
- •NO context pollution - transcript never appears in chat
- •Automatic caching - same video won't be fetched twice
- •Single command - no multi-step workflow needed
Important:
- •Check cache first to avoid redundant API calls
- •DO NOT return the full transcript text in chat
- •Only inform the user that the transcript was saved/cached
- •Use the
cachedflag in output to inform user if existing bookmark was used - •This prevents context pollution when handling multiple videos
- •Users can read the transcript from the bookmark file later
- •Failed/empty transcripts are NOT cached and will be retried on next request
3. Language Support
The MCP transcript tool supports multiple languages via the lang parameter:
- •Default:
"en"(English) - •Other examples:
"ko"(Korean),"es"(Spanish),"fr"(French), etc.
Always use "en" unless the user specifically requests a different language.
Resources
scripts/process_youtube_url.py
Primary entry point - Use this script for all YouTube transcript operations.
Handles the complete workflow: cache checking → fetching → saving.
Input (stdin JSON):
{
"url": "https://youtube.com/watch?v=...",
"lang": "en", // Optional, defaults to "en"
"force": false // Optional, force re-fetch even if cached
}
Output (stdout JSON):
{
"success": true,
"video_id": "abc123",
"filepath": "Database/Bookmarks/youtube-abc123.md",
"url": "https://youtu.be/abc123",
"title": "Video Title",
"cached": false
}
scripts/fetch_transcript.py
Fetches YouTube transcripts using the youtube-transcript-api Python library.
Called internally by process_youtube_url.py. Uses YouTube's internal API to retrieve transcripts with language fallback support (requested lang → English → first available).
scripts/save_transcript.py
Saves transcript data to Bookmarks database with proper formatting.
Purpose: Write transcript content directly to Database/Bookmarks/youtube-{video_id}.md to avoid polluting chat context.
Input (stdin JSON):
{
"url": "https://youtube.com/watch?v=...",
"title": "Video Title",
"transcript": [{"text": "...", "start": 0.0, "duration": 1.5}, ...],
"force": false // Optional: force re-fetch even if cached
}
Output (stdout JSON):
{
"success": true,
"video_id": "abc123",
"filepath": "Database/Bookmarks/youtube-abc123.md",
"url": "https://youtu.be/abc123",
"title": "Video Title",
"cached": false // True if existing valid transcript was found
}
Key features:
- •Checks cache first - if bookmark exists with valid transcript, returns immediately
- •Only re-fetches if bookmark doesn't exist, has empty/failed transcript, or
force: true - •Concatenates transcript segments into continuous text
- •Creates properly formatted bookmark markdown with frontmatter
- •Automatically creates
Database/Bookmarksdirectory if needed - •Returns only metadata (not full transcript) to avoid context pollution
- •
cachedflag indicates whether existing bookmark was used
scripts/youtube_utils.py
Python utility module providing URL manipulation functions:
URL utilities:
- •
extract_video_id(url): Extract video ID from any YouTube URL format - •
clean_youtube_url(url): Convert any URL to cleanhttps://youtu.be/<id>form
The script is self-contained and can be executed directly for testing URL cleaning functionality.
Best Practices
- •Always clean URLs when YouTube links are pasted or referenced to canonical short form
- •Check cache first before fetching transcripts to avoid redundant API calls
- •Save transcripts to Bookmarks using
save_transcript.pyto avoid context pollution - •DO NOT output full transcript in chat - only inform user of saved/cached location
- •Use English by default for transcripts unless user specifies otherwise
- •Handle multiple videos efficiently - the Bookmarks approach allows processing many videos without context overflow
- •Inform user clearly whether transcript was fetched or loaded from cache
- •Retry failed transcripts - empty/failed transcripts are not cached and will be retried
- •Use
force: trueonly when user explicitly wants to re-fetch existing transcripts