AgentSkillsCN

tubescribe

配备语音检测功能的 YouTube 视频摘要器,支持格式化文档与音频输出。当用户发送 YouTube URL,或请求对 YouTube 视频进行摘要与转录时,此功能便能派上用场。

SKILL.md
--- frontmatter
name: tubescribe
description: "YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video."

TubeScribe 🎬

Turn any YouTube video into a polished document + audio summary in seconds.

Drop a YouTube link → get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.

💸 100% Free & Local

  • No subscription — runs entirely on your machine
  • No API keys required — works out of the box
  • No data leaves your computer — your content stays private
  • No usage limits — summarize as many videos as you want

✨ Features

  • 🎯 Smart Speaker Detection — Automatically identifies who's talking in interviews, podcasts, and conversations
  • 📝 Clickable Timestamps — Every quote links directly to that moment in the video
  • 💬 YouTube Comments — Fetches top comments, summarizes viewer sentiment, highlights best reactions
  • 📄 Clean Documents — Export as HTML, DOCX, or Markdown
  • 🔊 Audio Summaries — Listen to the key points (MP3/WAV)

🎬 Works With Any Video

  • Interviews & podcasts (multi-speaker detection)
  • Lectures & tutorials (single speaker)
  • Music videos (lyrics extraction)
  • News & documentaries
  • Any YouTube content with captions

Quick Start

When user sends a YouTube URL:

  1. Spawn sub-agent with the full pipeline task immediately
  2. Reply: "🎬 TubeScribe is processing — I'll let you know when it's ready!"
  3. Continue conversation (don't wait!)
  4. Sub-agent notification will announce completion with title and details

DO NOT BLOCK — spawn and move on instantly.

First-Time Setup

Run setup to check dependencies and configure defaults:

bash
python skills/tubescribe/scripts/setup.py

This checks: summarize CLI, pandoc/python-docx, ffmpeg, Kokoro TTS

Full Workflow (Single Sub-Agent)

Spawn ONE sub-agent that does the entire pipeline:

python
sessions_spawn(
    task=f"""
## TubeScribe: Process {youtube_url}

Run the COMPLETE pipeline — do not stop until all steps are done.

### Step 1: Extract
```bash
python3 /Users/matusvojtek/.openclaw/workspace/skills/tubescribe/scripts/tubescribe.py "{youtube_url}"

Note the video_id from the output (e.g., "Source: /tmp/tubescribe_ABC123_source.json" → video_id is ABC123).

Step 2: Read source JSON

Read /tmp/tubescribe_<video_id>_source.json and note:

  • metadata.title (for filename)
  • metadata.video_id
  • metadata.channel, upload_date, duration_string

Step 3: Create formatted markdown

Write to /tmp/tubescribe_<video_id>_output.md:

  1. # **<title>**

  1. Video info block (Channel, Date, Duration, clickable URL)

  1. ## **Participants** — table with bold headers:
    code
    | **Name** | **Role** | **Description** |
    |----------|----------|-----------------|
    

  1. ## **Summary** — 3-5 paragraphs

  1. ## **Key Quotes** — 5 best with clickable YouTube timestamps

  1. ## **Viewer Sentiment** (if comments exist)

  1. ## **Best Comments** (if comments exist) — Top 5, NO lines between them:
    code
    Comment text here.
    
    <p align="right">▲ 123 @AuthorName</p>
    
    Next comment text here.
    
    <p align="right">▲ 45 @AnotherAuthor</p>
    
    Just blank line between comments, NO --- separators.

  1. ## **Full Transcript** — merge segments, speaker labels, clickable timestamps

Step 4: Create DOCX

Clean the title for filename (remove special chars), then:

bash
pandoc /tmp/tubescribe_<video_id>_output.md -o ~/Documents/TubeScribe/<safe_title>.docx

Step 5: Generate audio

bash
cd ~/.openclaw/tools/kokoro && source .venv/bin/activate

Then Python: read Summary from markdown, generate with Kokoro (voice=0.6af_heart+0.4af_sky), save as MP3 to ~/Documents/TubeScribe/<safe_title>_summary.mp3

Step 6: Cleanup

bash
python3 /Users/matusvojtek/.openclaw/workspace/skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>

Step 7: Open folder

bash
open ~/Documents/TubeScribe/

Report

Tell what was created: DOCX name, MP3 name + duration, video stats. """, label="tubescribe", runTimeoutSeconds=900, cleanup="delete" )

code

**After spawning, reply immediately:**
> 🎬 Processing "[video title if known, or just the URL]" — I'll let you know when it's ready!

Then continue the conversation. The sub-agent notification announces completion.

## Configuration

Config file: `~/.tubescribe/config.json`

```json
{
  "output": {
    "folder": "~/Documents/TubeScribe",
    "open_folder_after": true,
    "open_document_after": false,
    "open_audio_after": false
  },
  "document": {
    "format": "docx",
    "engine": "pandoc"
  },
  "audio": {
    "enabled": true,
    "format": "mp3",
    "tts_engine": "builtin"
  },
  "kokoro": {
    "venv_path": "~/.tubescribe/kokoro-env",
    "voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
    "speed": 1.05
  },
  "processing": {
    "subagent_timeout": 600,
    "cleanup_temp_files": true
  }
}

Output Options

OptionDefaultDescription
output.folder~/Documents/TubeScribeWhere to save files
output.open_folder_aftertrueOpen output folder when done
output.open_document_afterfalseAuto-open generated document
output.open_audio_afterfalseAuto-open generated audio summary

Document Options

OptionDefaultValuesDescription
document.formatdocxdocx, html, mdOutput format
document.enginepandocpandoc, python_docxConverter for DOCX

Audio Options

OptionDefaultValuesDescription
audio.enabledtruetrue, falseGenerate audio summary
audio.formatmp3mp3, wavAudio format (mp3 needs ffmpeg)
audio.tts_enginebuiltinbuiltin, kokoroTTS engine (builtin = macOS say)

Kokoro TTS Options (optional)

OptionDefaultDescription
kokoro.venv_path~/.tubescribe/kokoro-envPython venv with Kokoro installed
kokoro.voice_blend{af_heart: 0.6, af_sky: 0.4}Custom voice mix
kokoro.speed1.05Playback speed (1.0 = normal, 1.05 = 5% faster)

Processing Options

OptionDefaultDescription
processing.subagent_timeout600Seconds for sub-agent (increase for long videos)
processing.cleanup_temp_filestrueRemove /tmp files after completion

Comment Options

OptionDefaultDescription
comments.max_count50Number of comments to fetch
comments.timeout90Timeout for comment fetching (seconds)

Queue Options

OptionDefaultDescription
queue.stale_minutes30Consider a processing job stale after this many minutes

Output Structure

code
~/Documents/TubeScribe/
├── {Video Title}.html         # Formatted document (or .docx / .md)
└── {Video Title}_summary.mp3  # Audio summary (or .wav)

After generation, opens the folder (not individual files) so you can access everything.

Dependencies

Required:

  • summarize CLI — brew install steipete/tap/summarize
  • Python 3.8+

Optional (better quality):

  • pandoc — DOCX output: brew install pandoc
  • ffmpeg — MP3 audio: brew install ffmpeg
  • yt-dlp — YouTube comments: brew install yt-dlp
  • Kokoro TTS — High-quality voices: see https://github.com/hexgrad/kokoro

yt-dlp Search Paths

TubeScribe checks these locations (in order):

PriorityPathSource
1which yt-dlpSystem PATH
2/opt/homebrew/bin/yt-dlpHomebrew (Apple Silicon)
3/usr/local/bin/yt-dlpHomebrew (Intel) / Linux
4~/.local/bin/yt-dlppip install --user
5~/.local/pipx/venvs/yt-dlp/bin/yt-dlppipx
6~/.openclaw/tools/yt-dlp/yt-dlpTubeScribe auto-install

If not found, setup downloads a standalone binary to the tools directory. The tools directory version doesn't conflict with system installations.

Queue Handling

When user sends multiple YouTube URLs while one is processing:

Check Before Starting

bash
python skills/tubescribe/scripts/tubescribe.py --queue-status

If Already Processing

bash
# Add to queue instead of starting parallel processing
python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL"
# → Replies: "📋 Added to queue (position 2)"

After Completion

bash
# Check if more in queue
python skills/tubescribe/scripts/tubescribe.py --queue-next
# → Automatically pops and processes next URL

Queue Commands

CommandDescription
--queue-statusShow what's processing + queued items
--queue-add URLAdd URL to queue
--queue-nextProcess next item from queue
--queue-clearClear entire queue

Batch Processing (multiple URLs at once)

bash
python skills/tubescribe/scripts/tubescribe.py url1 url2 url3

Processes all URLs sequentially with a summary at the end.

Error Handling

The script detects and reports these errors with clear messages:

ErrorMessage
Invalid URL❌ Not a valid YouTube URL
Private video❌ Video is private — can't access
Video removed❌ Video not found or removed
No captions❌ No captions available for this video
Age-restricted❌ Age-restricted video — can't access without login
Region-blocked❌ Video blocked in your region
Live stream❌ Live streams not supported — wait until it ends
Network error❌ Network error — check your connection
Timeout❌ Request timed out — try again later

When an error occurs, report it to the user and don't proceed with that video.

Tips

  • For long videos (>30 min), increase sub-agent timeout to 900s
  • Speaker detection works best with clear interview/podcast formats
  • Single-speaker videos (tutorials, lectures) skip speaker labels automatically
  • Timestamps link directly to YouTube at that moment
  • Use batch mode for multiple videos: tubescribe url1 url2 url3