TubeScribe 🎬
Turn any YouTube video into a polished document + audio summary in seconds.
Drop a YouTube link → get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.
💸 100% Free & Local
- •No subscription — runs entirely on your machine
- •No API keys required — works out of the box
- •No data leaves your computer — your content stays private
- •No usage limits — summarize as many videos as you want
✨ Features
- •🎯 Smart Speaker Detection — Automatically identifies who's talking in interviews, podcasts, and conversations
- •📝 Clickable Timestamps — Every quote links directly to that moment in the video
- •💬 YouTube Comments — Fetches top comments, summarizes viewer sentiment, highlights best reactions
- •📄 Clean Documents — Export as HTML, DOCX, or Markdown
- •🔊 Audio Summaries — Listen to the key points (MP3/WAV)
🎬 Works With Any Video
- •Interviews & podcasts (multi-speaker detection)
- •Lectures & tutorials (single speaker)
- •Music videos (lyrics extraction)
- •News & documentaries
- •Any YouTube content with captions
Quick Start
When user sends a YouTube URL:
- •Spawn sub-agent with the full pipeline task immediately
- •Reply: "🎬 TubeScribe is processing — I'll let you know when it's ready!"
- •Continue conversation (don't wait!)
- •Sub-agent notification will announce completion with title and details
DO NOT BLOCK — spawn and move on instantly.
First-Time Setup
Run setup to check dependencies and configure defaults:
python skills/tubescribe/scripts/setup.py
This checks: summarize CLI, pandoc/python-docx, ffmpeg, Kokoro TTS
Full Workflow (Single Sub-Agent)
Spawn ONE sub-agent that does the entire pipeline:
sessions_spawn(
task=f"""
## TubeScribe: Process {youtube_url}
Run the COMPLETE pipeline — do not stop until all steps are done.
### Step 1: Extract
```bash
python3 /Users/matusvojtek/.openclaw/workspace/skills/tubescribe/scripts/tubescribe.py "{youtube_url}"
Note the video_id from the output (e.g., "Source: /tmp/tubescribe_ABC123_source.json" → video_id is ABC123).
Step 2: Read source JSON
Read /tmp/tubescribe_<video_id>_source.json and note:
- •metadata.title (for filename)
- •metadata.video_id
- •metadata.channel, upload_date, duration_string
Step 3: Create formatted markdown
Write to /tmp/tubescribe_<video_id>_output.md:
- •
# **<title>**
- •Video info block (Channel, Date, Duration, clickable URL)
- •
## **Participants**— table with bold headers:code| **Name** | **Role** | **Description** | |----------|----------|-----------------|
- •
## **Summary**— 3-5 paragraphs
- •
## **Key Quotes**— 5 best with clickable YouTube timestamps
- •
## **Viewer Sentiment**(if comments exist)
- •
## **Best Comments**(if comments exist) — Top 5, NO lines between them:Just blank line between comments, NOcodeComment text here. <p align="right">▲ 123 @AuthorName</p> Next comment text here. <p align="right">▲ 45 @AnotherAuthor</p>
---separators.
- •
## **Full Transcript**— merge segments, speaker labels, clickable timestamps
Step 4: Create DOCX
Clean the title for filename (remove special chars), then:
pandoc /tmp/tubescribe_<video_id>_output.md -o ~/Documents/TubeScribe/<safe_title>.docx
Step 5: Generate audio
cd ~/.openclaw/tools/kokoro && source .venv/bin/activate
Then Python: read Summary from markdown, generate with Kokoro (voice=0.6af_heart+0.4af_sky), save as MP3 to ~/Documents/TubeScribe/<safe_title>_summary.mp3
Step 6: Cleanup
python3 /Users/matusvojtek/.openclaw/workspace/skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>
Step 7: Open folder
open ~/Documents/TubeScribe/
Report
Tell what was created: DOCX name, MP3 name + duration, video stats. """, label="tubescribe", runTimeoutSeconds=900, cleanup="delete" )
**After spawning, reply immediately:**
> 🎬 Processing "[video title if known, or just the URL]" — I'll let you know when it's ready!
Then continue the conversation. The sub-agent notification announces completion.
## Configuration
Config file: `~/.tubescribe/config.json`
```json
{
"output": {
"folder": "~/Documents/TubeScribe",
"open_folder_after": true,
"open_document_after": false,
"open_audio_after": false
},
"document": {
"format": "docx",
"engine": "pandoc"
},
"audio": {
"enabled": true,
"format": "mp3",
"tts_engine": "builtin"
},
"kokoro": {
"venv_path": "~/.tubescribe/kokoro-env",
"voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
"speed": 1.05
},
"processing": {
"subagent_timeout": 600,
"cleanup_temp_files": true
}
}
Output Options
| Option | Default | Description |
|---|---|---|
output.folder | ~/Documents/TubeScribe | Where to save files |
output.open_folder_after | true | Open output folder when done |
output.open_document_after | false | Auto-open generated document |
output.open_audio_after | false | Auto-open generated audio summary |
Document Options
| Option | Default | Values | Description |
|---|---|---|---|
document.format | docx | docx, html, md | Output format |
document.engine | pandoc | pandoc, python_docx | Converter for DOCX |
Audio Options
| Option | Default | Values | Description |
|---|---|---|---|
audio.enabled | true | true, false | Generate audio summary |
audio.format | mp3 | mp3, wav | Audio format (mp3 needs ffmpeg) |
audio.tts_engine | builtin | builtin, kokoro | TTS engine (builtin = macOS say) |
Kokoro TTS Options (optional)
| Option | Default | Description |
|---|---|---|
kokoro.venv_path | ~/.tubescribe/kokoro-env | Python venv with Kokoro installed |
kokoro.voice_blend | {af_heart: 0.6, af_sky: 0.4} | Custom voice mix |
kokoro.speed | 1.05 | Playback speed (1.0 = normal, 1.05 = 5% faster) |
Processing Options
| Option | Default | Description |
|---|---|---|
processing.subagent_timeout | 600 | Seconds for sub-agent (increase for long videos) |
processing.cleanup_temp_files | true | Remove /tmp files after completion |
Comment Options
| Option | Default | Description |
|---|---|---|
comments.max_count | 50 | Number of comments to fetch |
comments.timeout | 90 | Timeout for comment fetching (seconds) |
Queue Options
| Option | Default | Description |
|---|---|---|
queue.stale_minutes | 30 | Consider a processing job stale after this many minutes |
Output Structure
~/Documents/TubeScribe/
├── {Video Title}.html # Formatted document (or .docx / .md)
└── {Video Title}_summary.mp3 # Audio summary (or .wav)
After generation, opens the folder (not individual files) so you can access everything.
Dependencies
Required:
- •
summarizeCLI —brew install steipete/tap/summarize - •Python 3.8+
Optional (better quality):
- •
pandoc— DOCX output:brew install pandoc - •
ffmpeg— MP3 audio:brew install ffmpeg - •
yt-dlp— YouTube comments:brew install yt-dlp - •Kokoro TTS — High-quality voices: see https://github.com/hexgrad/kokoro
yt-dlp Search Paths
TubeScribe checks these locations (in order):
| Priority | Path | Source |
|---|---|---|
| 1 | which yt-dlp | System PATH |
| 2 | /opt/homebrew/bin/yt-dlp | Homebrew (Apple Silicon) |
| 3 | /usr/local/bin/yt-dlp | Homebrew (Intel) / Linux |
| 4 | ~/.local/bin/yt-dlp | pip install --user |
| 5 | ~/.local/pipx/venvs/yt-dlp/bin/yt-dlp | pipx |
| 6 | ~/.openclaw/tools/yt-dlp/yt-dlp | TubeScribe auto-install |
If not found, setup downloads a standalone binary to the tools directory. The tools directory version doesn't conflict with system installations.
Queue Handling
When user sends multiple YouTube URLs while one is processing:
Check Before Starting
python skills/tubescribe/scripts/tubescribe.py --queue-status
If Already Processing
# Add to queue instead of starting parallel processing python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL" # → Replies: "📋 Added to queue (position 2)"
After Completion
# Check if more in queue python skills/tubescribe/scripts/tubescribe.py --queue-next # → Automatically pops and processes next URL
Queue Commands
| Command | Description |
|---|---|
--queue-status | Show what's processing + queued items |
--queue-add URL | Add URL to queue |
--queue-next | Process next item from queue |
--queue-clear | Clear entire queue |
Batch Processing (multiple URLs at once)
python skills/tubescribe/scripts/tubescribe.py url1 url2 url3
Processes all URLs sequentially with a summary at the end.
Error Handling
The script detects and reports these errors with clear messages:
| Error | Message |
|---|---|
| Invalid URL | ❌ Not a valid YouTube URL |
| Private video | ❌ Video is private — can't access |
| Video removed | ❌ Video not found or removed |
| No captions | ❌ No captions available for this video |
| Age-restricted | ❌ Age-restricted video — can't access without login |
| Region-blocked | ❌ Video blocked in your region |
| Live stream | ❌ Live streams not supported — wait until it ends |
| Network error | ❌ Network error — check your connection |
| Timeout | ❌ Request timed out — try again later |
When an error occurs, report it to the user and don't proceed with that video.
Tips
- •For long videos (>30 min), increase sub-agent timeout to 900s
- •Speaker detection works best with clear interview/podcast formats
- •Single-speaker videos (tutorials, lectures) skip speaker labels automatically
- •Timestamps link directly to YouTube at that moment
- •Use batch mode for multiple videos:
tubescribe url1 url2 url3