Download Transcripts from CSV
Why? Manually downloading transcripts one-by-one is tedious and error-prone. This skill automates bulk transcript downloads with rate limiting, progress tracking, and resume capability.
Quick Start
# Standard usage (recommended) ytscriber download --folder <channel-name> --delay 60 # Single video (adds to default collection and downloads) ytscriber add "https://www.youtube.com/watch?v=VIDEO_ID" --folder random
Workflow
1. Verify Prerequisites
Ensure you have a CSV file with video URLs:
- •Created by
extract-videosskill, OR - •Manually curated with
urlcolumn
# Check CSV exists and has videos head -5 ~/Documents/YTScriber/<channel-name>/videos.csv
[!TIP] If you don't have a CSV yet, run
extract-videosfirst to build your video list from a YouTube channel.
2. Run Download Command
ytscriber download --folder <folder-name> --delay 60
| Option | Description | Default | Notes |
|---|---|---|---|
--folder | Folder name containing videos.csv | Required | Uses platformdirs data path |
--delay | Seconds between requests | 60 | Minimum 30, recommended 60+ |
--verbose, -v | Enable verbose output | False | Shows download progress |
[!CAUTION] NEVER set
--delaybelow 30 seconds. YouTube will block your IP if you make requests too quickly. The default of 60 seconds exists for a reason. You WILL get banned and may wait hours before downloading again.
3. Validate Downloads
After the command completes:
# Check how many transcripts downloaded ls -la ~/Documents/YTScriber/<folder-name>/transcripts/*.md | wc -l # Verify CSV status updated grep -c "success" ~/Documents/YTScriber/<folder-name>/videos.csv
The command automatically:
- •Downloads transcripts as markdown with YAML frontmatter
- •Updates CSV
transcript_downloadedcolumn with success/error/empty - •Skips already-downloaded videos on re-run
Output Format
Transcripts are saved as markdown with YAML frontmatter:
--- video_id: dQw4w9WgXcQ video_url: https://www.youtube.com/watch?v=dQw4w9WgXcQ title: Building Resilient Microservices at Scale author: Jane Smith published_date: 2025-03-15 length_minutes: 42.5 views: 15234 description: "Video description..." is_generated: True --- [Transcript text as continuous paragraph]
Examples
Overnight Batch Processing
# Large channel - run overnight with safe delay ytscriber download --folder aws-reinvent-2025 --delay 90 --verbose
Conference Playlist
# Smaller collection - minimum safe delay ytscriber download --folder pycon-2024 --delay 30
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| "No transcript found" | Video lacks captions | Some videos have no captions available |
| IP blocked / 403 errors | Too many requests too quickly | Wait 30-60 minutes, then retry with --delay 120 |
| Script interrupted mid-run | Network issue or Ctrl+C | Re-run the exact same command; it skips completed videos |
| Empty transcript files | Auto-captions unavailable or video is live | Check if video has captions on YouTube; skip if not |
| "Folder not found" | Wrong folder name | Verify folder with ls ~/Documents/YTScriber/ |
| Slow downloads | Rate limiting working correctly | This is expected; 100 videos at 60s delay = ~2 hours |
Common Mistakes
- •
Reducing delay to "speed up" - Do not set delay below 30 seconds. You will get IP banned. Use overnight runs for large batches.
- •
Running without checking CSV first - Always verify your CSV exists and has videos before running. Empty CSVs waste time.
- •
Not using resume capability - If interrupted, don't start over. Re-run the same command to resume from where you left off.
Quality Checklist
Before running:
- • CSV file exists and has
urlcolumn - • Folder name is correct
- • Delay is 30+ seconds (60+ recommended)
- • Sufficient time allocated (1 min per video minimum)
After running:
- • Transcript count matches expected videos
- • CSV
transcript_downloadedcolumn updated - • Spot-check 2-3 transcripts for valid content
- • No empty or malformed markdown files