YouTube Video Summarizer
Create hierarchical, time-coded summaries of educational YouTube videos for future reference and study.
Workflow
Note: These scripts require network access to YouTube. If running in a restricted environment, ensure YouTube (www.youtube.com) is accessible.
Step 1: Fetch Video Data
Required: Run the transcript script to get video content with timestamps:
python scripts/fetch_transcript.py <youtube_url>
Fallback: If captions are unavailable, use whisper.cpp to transcribe audio:
python scripts/transcribe_audio.py <youtube_url> [model]
Models: tiny, base (default), small, medium, large. Set WHISPER_MODEL env var to change default.
Optional: If available, fetch metadata for chapters and title:
python scripts/fetch_metadata.py <youtube_url>
The metadata script provides title, chapters (if available), and duration. If unavailable, ask the user for the video title or extract it from the URL.
Error handling:
- •No captions + no whisper.cpp: "No captions available. Install whisper.cpp to enable audio transcription: https://github.com/ggerganov/whisper.cpp"
- •No captions + whisper failed: Report the specific transcription error to the user
Step 2: Analyze Structure
Review the video data to determine structure:
- •Check for chapters: If
metadata.chaptersexists and is non-empty, use these as the top-level structure - •Transcript analysis: Review the transcript for natural topic transitions, key concepts, and information density
- •Determine granularity: Aim for approximately 1 timestamp entry per 2 minutes, adjusting based on content density
Step 3: Generate Hierarchical Summary
Create a markdown document with this structure:
# [Video Title] **Channel**: [Uploader Name] **Duration**: [Duration in HH:MM:SS] **URL**: [Full YouTube URL] --- ## Summary [2-3 sentence overview of the video's main topic and key takeaways] --- ## Detailed Contents ### [Chapter/Topic 1] - **[MM:SS](youtube_url&t=XXs)** - Brief description of key point - **[MM:SS](youtube_url&t=XXs)** - Brief description of key point #### [Subtopic if needed] - **[MM:SS](youtube_url&t=XXs)** - Brief description of key point ### [Chapter/Topic 2] ...
Timestamp format: Use &t=XXs format where XX is seconds (e.g., &t=125s for 2:05)
Brief descriptions: Each bullet should be 1-2 sentences capturing the key information at that timestamp
Hierarchical structure: Use chapters as H3, subtopics as H4, with timestamp bullets under each
Step 4: Quality Check
Before finalizing:
- •Verify all timestamps are clickable links
- •Ensure granularity is appropriate (~1 entry per 2 minutes, adjusted for density)
- •Confirm descriptions are searchable and informative
- •Check that hierarchy reflects the video's natural structure
Step 5: Output
Save the markdown summary to ./[sanitized_video_title]_summary.md (current working directory) and present it to the user.
Best Practices
- •Use existing chapters: When available, YouTube chapters provide excellent top-level structure
- •Adjust density: Dense technical content may need more timestamps; lighter content may need fewer
- •Searchable descriptions: Write descriptions that capture keywords someone would search for later
- •Natural hierarchy: Don't force subtopics if the content flows linearly
- •Link validation: Always test that timestamp links work correctly
Scripts
- •
scripts/fetch_metadata.py- Fetches video title, chapters, duration, and description using yt-dlp - •
scripts/fetch_transcript.py- Fetches transcript with timestamps using youtube-transcript-api - •
scripts/transcribe_audio.py- Fallback: downloads audio and transcribes with whisper.cpp
Environment Variables
| Variable | Purpose |
|---|---|
WHISPER_CPP_PATH | Override whisper.cpp binary location (default: search PATH) |
WHISPER_MODEL | Override default model (default: base) |