Consolidate Transcripts

Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.

Quick Start

bash

python scripts/consolidate_transcripts.py <channel_name>

Output: ~/Documents/YTScriber/<channel_name>/<channel_name>-consolidated.md

[!NOTE] This feature is currently a standalone script. A ytscriber consolidate CLI command is planned for a future release.

Workflow

1. Identify the Channel

List available channels:

bash

ls ~/Documents/YTScriber/

2. Choose Token Limit

Use Case	Recommended Limit	Flag
Claude (200K context)	150000	`--limit 150000`
GPT-4 Turbo (128K)	100000	`--limit 100000`
Full archive (Claude Pro)	800000	(default)
Quick sample	50000	`--limit 50000`

[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.

3. Run Consolidation

bash

python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]

Examples:

bash

# Default (800K tokens)
python scripts/consolidate_transcripts.py library-of-minds

# Custom limit for GPT-4
python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000

# Verbose output showing all included files
python scripts/consolidate_transcripts.py dwarkesh-patel --verbose

4. Verify Output

Check the consolidated file was created:

bash

ls -la ~/Documents/YTScriber/<channel_name>/*-consolidated.md

Parameters

Option	Description	Default
`channel_name`	Folder name in data directory	Required
`--limit, -l`	Maximum tokens to include	800000
`--verbose, -v`	Show detailed file list	False

Output Format

The consolidated file includes:

•Header — Generation metadata, total transcripts, token/word counts
•Table of Contents — Dates, titles, tokens, words per transcript
•Transcripts — Full text with title, date, author, source URL

Troubleshooting

Problem	Cause	Solution
`ModuleNotFoundError: tiktoken`	tiktoken not installed	`pip install tiktoken`
`No transcripts found`	Empty transcripts folder	Run `ytscriber download` first
`FileNotFoundError`	Channel doesn't exist	Check `ls ~/Documents/YTScriber/` for valid names
Output file is small	Few transcripts available	Use `--verbose` to see what was included
Token count seems wrong	Old tiktoken version	`pip install --upgrade tiktoken`

Common Mistakes

•Wrong channel name — Use the folder name exactly as shown in ls ~/Documents/YTScriber/, not the YouTube channel name.
•Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run ytscriber download first.
•Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
•Expecting real-time updates — Re-run consolidation after downloading new transcripts.

Reference

•Transcripts sorted newest first (descending by date)
•Files without dates in filename are placed last
•Token counting uses cl100k_base encoding (GPT-4/Claude compatible)
•Consolidated files are gitignored (not committed)
•Re-running overwrites the previous consolidated file