YouTube Summary Skill

Generate markdown learning materials from Japanese podcast episodes using native tools (no Python scripts).

Configuration

•Data directory: $PODPILOT_DATA (set this env variable to your data directory)
•Whisper model: $WHISPER_MODEL_PATH (ggml-base.bin)
•Channels: Configure in config/podcasts.json

Getting Channel IDs

Find YouTube channel IDs from RSS feeds or channel pages. Add them to config/podcasts.json:

json

"youtube_channels": [
  {
    "channel_name_short": "example",
    "channel_name_long": "Example Podcast",
    "channel_id": "UCxxxxxxxxxxxxxxxxxx"
  }
]

Workflow

1. Fetch Episodes from RSS

bash

curl -s "https://www.youtube.com/feeds/videos.xml?channel_id=<CHANNEL_ID>"

Parse the XML to extract:

•<yt:videoId> - Video ID
•<title> - Episode title
•<published> - Publish date

2. Download Audio

bash

yt-dlp -x --audio-format mp3 --audio-quality 0 \
  -o "<data_dir>/<channel>/<date>_<title>_<video_id>.%(ext)s" \
  "https://www.youtube.com/watch?v=<VIDEO_ID>"

3. Transcribe with Whisper

Run whisper with both TXT and SRT output:

bash

whisper-cli -m $WHISPER_MODEL_PATH -l ja \
  -f "<audio_file>.mp3" \
  --output-txt --output-srt \
  -of "<output_base>"

4. Create Linked Transcript

Convert SRT to transcript with clickable YouTube timestamps:

bash

VIDEO_ID="<video_id>"
awk -v vid="$VIDEO_ID" '
BEGIN { RS=""; FS="\n" }
{
    split($2, times, " --> ")
    start_time = times[1]
    gsub(",", ".", start_time)
    split(start_time, parts, ":")
    seconds = int(parts[1] * 3600 + parts[2] * 60 + parts[3])
    text = ""
    for (i = 3; i <= NF; i++) {
        if (text != "") text = text "\n"
        text = text $i
    }
    printf "[%s] https://www.youtube.com/watch?v=%s&t=%d\n%s\n\n", start_time, vid, seconds, text
}
' "<srt_file>" > "<output>_linked.txt"

Output format:

code

[00:00:00.000] https://www.youtube.com/watch?v=VIDEO_ID&t=0
皆さんこんにちは...

[00:00:05.760] https://www.youtube.com/watch?v=VIDEO_ID&t=5
このチャンネルでは...

5. Generate Lesson

Use the /japanese-lesson skill format to analyze the transcript and create:

•Summary (Japanese with furigana + English)
•Vocabulary tables by JLPT level (N1→N5)
•Grammar points with examples
•Reading comprehension with context clues
•10-question quiz

Save to: <data_dir>/<channel>/<date>_<title>_lesson.md

Output Files

For each episode, create:

code

<data_dir>/<channel>/
├── <date>_<title>_<video_id>.mp3         # Audio
├── <date>_<title>_<video_id>.txt         # Plain transcript
├── <date>_<title>_<video_id>.srt         # Subtitles
├── <date>_<title>_<video_id>_linked.txt  # Transcript with YouTube links
└── <date>_<title>_<video_id>_lesson.md   # Lesson (vocab, grammar, quiz)

Example Usage

User: "Process the latest sjn episode"

•Fetch RSS for sjn (UC_NROu3WWx1KZ7tNl275F7A)
•Show available episodes, let user pick
•Download audio with yt-dlp
•Transcribe with whisper-cli (creates .txt and .srt)
•Convert SRT to linked transcript (_linked.txt)
•Generate lesson using /japanese-lesson format
•Report files created

Notes

•Transcription takes ~1 minute per 10 minutes of audio
•Lesson generation is done by Claude directly (no external LLM call)
•Linked transcripts allow clicking to jump to exact moment in YouTube video