YouTube Video Summary

Generate a structured, high-quality summary of a YouTube video from its transcript.

Quick Start

code

/transcript-summary <transcript_file_path>
/transcript-summary (then paste transcript text in conversation)

Examples

•/transcript-summary /tmp/youtube-get-transcripts/dQw4w9WgXcQ.en.txt - Summarize from a transcript file
•/transcript-summary followed by pasting transcript text - Summarize inline text

Typical workflow:

code

/youtube-get-transcript https://youtube.com/watch?v=xxx
→ outputs transcript file path

/transcript-summary /tmp/youtube-get-transcripts/VIDEO_ID.en.txt
→ generates structured summary

How it Works

Mode A: File Path

•Execute: {baseDir}/scripts/summary.sh "<transcript_file_path>"
•Parse JSON output to get validated file_path, char_count, and strategy
•Follow the processing strategy indicated by strategy field (see Processing Strategy below)
•Generate a structured summary following the Summary Generation Rules below

Mode B: Inline Text

•The user provides transcript text directly in the conversation (no script execution needed)
•Generate a structured summary following the Summary Generation Rules below

Processing Strategy (Mode A)

The strategy field from summary.sh determines how to handle the transcript:

Strategy: `standard` (< 80,000 chars, ~2 hr EN)

•Read the entire file with the Read tool
•Directly apply Summary Generation Rules

Strategy: `sectioned` (80,000–200,000 chars, ~2 hr – 5 hr EN)

Use a structured multi-phase approach within the main conversation to counter lost-in-middle effects:

•Read the entire file with the Read tool
•Phase 1 — Segment identification: Identify 5-10 major section boundaries (topic shifts, speaker changes, chapter markers) and list them
•Phase 2 — Section-by-section extraction: For each identified section, extract key points, data, quotes, and arguments
•
Phase 2.5 — Topic grouping (if > 6 sections identified):
- •Group related sections into 4-6 macro-topics (e.g., "Background & Context", "Core Argument", "Evidence & Examples", "Implications")
- •Under each macro-topic, aggregate the key points from its constituent sections
- •This reduces synthesis pressure in Phase 3 by providing pre-organized intermediate structure
•Phase 3 — Synthesis: Compose the final structured summary from the macro-topics (or directly from sections if ≤ 6), ensuring every identified section is represented
•Cross-check: Verify that mid-content sections are not underrepresented compared to the beginning and end

Strategy: `chunked` (> 200,000 chars, > 5 hr EN)

Use parallel subagents to process chunks independently, keeping the main conversation context clean:

•Calculate chunk count: ceil(line_count / 1000) — each chunk is ~1000 lines

•Spawn parallel subagents using the Task tool (subagent_type: "general-purpose"):

•Each subagent receives: the file path, start line offset, end line limit, and summarization instructions
•Chunk overlap: Each chunk includes a 50-line overlap with the previous chunk for context continuity (chunk 1: lines 1–1000, chunk 2: lines 951–2000, chunk 3: lines 1951–3000, etc.). The first chunk has no leading overlap.

•Each subagent prompt:

code

Read the file at {file_path} from line {start_offset} to line {end_limit} using the Read tool (with offset and limit parameters).
Then produce a summary of this section with 5-10 bullet points covering:
- Main topics and arguments discussed
- Key data points (numbers, dates, names) in **bold**
- Notable quotes as blockquotes
Write the summary in the same language as the transcript.
IMPORTANT — Boundary continuity: If the beginning of your chunk clearly continues a topic from a previous section, prefix your first bullet with [continues from previous]. If the end of your chunk is mid-topic and clearly continues into the next section, suffix your last bullet with [continues to next]. This helps the synthesis step merge cross-chunk topics.

•Use model: "haiku" for cost efficiency

•Collect all subagent section summaries
•
Synthesize a final structured summary in the main conversation following the Summary Generation Rules
- •During synthesis, check for [continues from previous] and [continues to next] markers across adjacent chunks — merge bullets that belong to the same topic into a single coherent section rather than repeating them

Fallback Rules

•Missing or unknown strategy: If the strategy field is missing, empty, or contains an unrecognized value, default to the standard strategy
•Mode B overflow detection: If inline transcript text (Mode B) appears exceptionally long (estimated > 80,000 chars), advise the user to save the text to a file and use Mode A instead, so that summary.sh can determine the correct processing strategy
•Chunked subagent retry: If a subagent in chunked mode returns an empty result or clearly irrelevant content (e.g., error messages instead of summary bullets), retry that specific chunk once before proceeding with synthesis

Summary Generation Rules

After obtaining the transcript, generate the summary using EXACTLY this structure and rules:

Output Structure

markdown

## Video Info (optional)

| Field | Value |
|-------|-------|
| **Title** | {title} |
| **Channel** | {channel} |
| **Duration** | {duration_string} |
| **Views** | {view_count, formatted with commas} |
| **Upload Date** | {upload_date, formatted as YYYY-MM-DD} |
| **Subtitle** | {subtitle_type} ({transcript_language}) |

## Content Summary

#### {Section Title 1}

- Bullet point with **key data** highlighted
- Another point
- ...

#### {Section Title 2}

- ...

(Continue for all logical sections)

## Key Takeaways

- 3-5 most important conclusions or insights from the video

Video Info Table

•Include the Video Info table if /youtube-get-info results are available in the current conversation context
•Omit the table if no metadata is available (proceed directly to Content Summary)

Content Rules

•
Section structure: Divide the summary into logical sections using H4 (####) headings
- •If the video description contains chapter timestamps, use those as the section skeleton
- •Otherwise, identify 3-8 logical topic shifts in the transcript
- •Each section should have 2-5 bullet points
•
Data preservation: Always preserve and highlight specific data points
- •Percentages, dollar amounts, dates, statistics → bold
- •Person names, company names, product names → bold on first mention
- •Direct quotes that are impactful → use blockquote format
•
Language: Write the summary in the same language as the transcript
- •If transcript is in English, summarize in English
- •If transcript is in Japanese, summarize in Japanese
- •If transcript is in Chinese, summarize in Traditional Chinese (繁體中文)
•
Length: Target compression ratio based on processing strategy:

Strategy Compression Guideline
standard 10-15% Short content, detailed coverage
sectioned 8-12% Medium-long content, balanced density
chunked 7-12% Very long content, high-level synthesis

For Mode B (inline text), use the standard ratio as default
•
Tone: Maintain an informative, neutral tone
- •Present the speaker's arguments faithfully
- •Do not add opinions or editorial commentary
- •Use active voice
•
Key Takeaways: End with 3-5 bullet points summarizing the most important insights
- •These should be standalone — understandable without reading the full summary
- •Prioritize actionable insights and surprising findings

Output Format

Script JSON output (Mode A only):

json

{
  "status": "success",
  "file_path": "/tmp/youtube-get-transcripts/VIDEO_ID.en.txt",
  "char_count": 30000,
  "line_count": 450,
  "strategy": "standard"
}

Notes

•This skill does NOT download videos or subtitles — use /youtube-get-transcript first to obtain a transcript file
•On first run, if jq is not installed, it will be auto-downloaded
•For best results, combine with /youtube-get-info to include the Video Info table in the summary

Strategy	Compression	Guideline
`standard`	10-15%	Short content, detailed coverage
`sectioned`	8-12%	Medium-long content, balanced density
`chunked`	7-12%	Very long content, high-level synthesis

YouTube Video Summary

Quick Start

Examples

How it Works

Mode A: File Path

Mode B: Inline Text

Processing Strategy (Mode A)

Strategy: standard (< 80,000 chars, ~2 hr EN)

Strategy: sectioned (80,000–200,000 chars, ~2 hr – 5 hr EN)

Strategy: chunked (> 200,000 chars, > 5 hr EN)

Fallback Rules

Summary Generation Rules

Output Structure

Video Info Table

Content Rules

Output Format

Notes

Strategy: `standard` (< 80,000 chars, ~2 hr EN)

Strategy: `sectioned` (80,000–200,000 chars, ~2 hr – 5 hr EN)

Strategy: `chunked` (> 200,000 chars, > 5 hr EN)