YouTube Transcriber Pipeline - Claude Code Skill
Version: 1.0.0 | Status: Ready for Claude Code Integration
Overview
Complete 4-skill pipeline for extracting, formatting, organizing, and archiving YouTube transcripts. Integrates with Claude Code CLI using the system's configured API keys (no manual key management needed).
Capabilities
This skill provides a complete workflow:
- •Extract Facts - AI-powered fact extraction from transcripts (uses Claude API via Claude Code)
- •Format Notes - Convert facts to atomic markdown notes with wiki-links
- •Integrate Obsidian - Import notes into Obsidian vault with backlinks
- •Archive Transcripts - Safe archival with integrity verification
Usage in Claude Code
# Run complete pipeline /run-youtube-transcriber --transcript ~/Downloads/transcript.txt # Just extract facts /extract-transcript-facts ~/Downloads/transcript.txt # Just format notes /format-transcript-notes ~/Downloads/notes/ --output ./formatted # Just integrate Obsidian /integrate-obsidian-vault ./notes/ --vault-path ~/Documents/Obsidian/Aaron # Just archive /archive-transcripts ~/Downloads/transcript.txt
Features
Phase 1: Extract Facts (AI-Powered)
- •Uses Claude 3.5 Sonnet via Claude Code CLI
- •7 fact categories (scientific, quotes, methods, examples, statistics, warnings, connections)
- •Hierarchical topic organization
- •Confidence scoring (0.0-1.0)
- •Automatic metadata (actionable, philosophical, controversial)
- •No API key needed - Uses Claude Code's configured authentication
Phase 2: Format as Atomic Notes
- •1 fact = 1 markdown file
- •Wiki-link cross-references [[Topic Name]]
- •YAML front matter with metadata
- •Topic indices and hierarchies
- •Obsidian vault compatibility
Phase 3: Integrate Obsidian
- •Safe vault backup before changes
- •Preserve directory structure
- •Auto-generate backlinks
- •Master index creation
- •Link verification
- •Dry-run mode for testing
Phase 4: Archive Transcripts
- •tar.gz compression
- •MD5 integrity verification
- •Metadata tracking
- •Optional cleanup
- •Verify-only mode
Installation
The skill is ready to use once copied to Claude Code:
# Copy to Claude Code skills directory cp -r ~/github/astoreyai/claude-skills/ ~/.claude/skills/youtube-transcriber # Or individual components cp -r youtube-transcript-extractor ~/.claude/skills/ cp -r transcript-to-logseq ~/.claude/skills/ cp -r transcript-to-obsidian ~/.claude/skills/ cp -r transcript-archiver ~/.claude/skills/
Requirements
For Claude Code CLI
- •Claude Code installed and configured
- •API key already configured in Claude Code (no manual setup needed)
- •Python 3.11+
For Obsidian Integration
- •Obsidian vault at
~/Documents/Obsidian/Aaron/(configurable) - •Read/write permissions to vault directory
Architecture
Claude Code CLI
↓
Transcript Extractor (Phase 1)
├─ Uses Claude Code's API key automatically
├─ Extracts facts via Claude API
└─ Outputs Fact objects
↓
Logseq Formatter (Phase 2)
├─ Converts facts to atomic notes
└─ Creates markdown files
↓
Obsidian Integrator (Phase 3)
├─ Imports notes to vault
├─ Generates backlinks
└─ Creates indices
↓
Transcript Archiver (Phase 4)
├─ Compresses files
├─ Verifies integrity
└─ Stores archive
Future: Ollama Integration
The architecture is designed to support local LLM models via Ollama:
# Future upgrade (when Ollama integration ready) /run-youtube-transcriber --transcript ~/Downloads/transcript.txt \ --llm-backend ollama \ --model mistral-7b # Benefits: # - No API costs # - Runs completely locally # - Works offline # - Full privacy (data never leaves your machine)
Swapping backends will be as simple as changing a configuration parameter.
Testing
All 4 skills are fully tested:
- •Phase 1: 18 tests (extraction logic)
- •Phase 2: 21 tests (formatting, hierarchy)
- •Phase 3: 6 tests (integration, backlinks)
- •Phase 4: 18 tests (archival, verification)
- •Integration: 6 tests (cross-skill compatibility)
Total: 69/69 tests passing (100%)
Performance
- •Fact extraction: 50-100ms per fact (via Claude API)
- •Note formatting: ~500ms per note
- •Obsidian integration: <1s for 100 notes
- •Archival: <2s for 1000 files
Data Flow
Raw Transcript (txt/md)
↓
Phase 1: Extract Facts
→ Fact objects with metadata
↓
Phase 2: Format Notes
→ Markdown files in ~/output/notes/
↓
Phase 3: Integrate Obsidian
→ Updated ~/Documents/Obsidian/Aaron/
↓
Phase 4: Archive
→ ~/youtube-transcriber/transcripts-archive/
Example Workflow
1. Extract Facts
/extract-transcript-facts ~/Downloads/ml-lecture.txt
Output:
✓ Transcript loaded (5,234 chars) ✓ Extracting facts... ✓ Facts extracted: 12 ✓ Topics identified: 4 ✓ Accuracy: 92%
2. Format Notes
/format-transcript-notes ~/Downloads/ml-lecture.txt --output ./notes
Output:
✓ Creating atomic notes... ✓ Notes created: 12 ✓ Topics structured: 4 ✓ Output: ./notes/
3. Integrate Obsidian
/integrate-obsidian-vault ./notes/ --vault-path ~/Documents/Obsidian/Aaron
Output:
✓ Vault path: ~/Documents/Obsidian/Aaron ✓ Notes integrated: 12 ✓ Backlinks created: 8 ✓ Index created: Video-Transcripts/index.md
4. Archive
/archive-transcripts ~/Downloads/ml-lecture.txt
Output:
✓ Files archived: 1 ✓ Archive: ~/youtube-transcriber/transcripts-archive/transcripts_20251122_143200.tar.gz ✓ Size: 45.2 KB
Configuration
Edit config.yaml in the skill directory:
# Paths vault_path: "~/Documents/Obsidian/Aaron" archive_dir: "~/youtube-transcriber/transcripts-archive" # Extraction max_facts: null # No limit confidence_threshold: 0.75 extract_all_categories: true # Formatting notes_per_file: 1 # Atomic notes add_wiki_links: true create_indices: true # Integration backup_before_integrate: true auto_generate_backlinks: true dry_run_mode: false # Set true to test without changes # Archival compression: "gzip" verify_after_archive: true cleanup_originals: false
Troubleshooting
API Key Issues (If Running Standalone)
If you run the scripts outside Claude Code and get API key errors:
# Claude Code integration (recommended) # Just use /extract-transcript-facts - no setup needed # Or manually set API key for standalone use export ANTHROPIC_API_KEY='sk-ant-...' python workflow_runner.py --transcript your_file.txt
Obsidian Vault Not Found
# Create vault directory if needed mkdir -p ~/Documents/Obsidian/Aaron # Set custom vault path /integrate-obsidian-vault ./notes/ --vault-path /your/vault/path
Permission Errors
# Ensure write access to vault and archive directories chmod -R 755 ~/Documents/Obsidian/Aaron mkdir -p ~/youtube-transcriber/transcripts-archive chmod -R 755 ~/youtube-transcriber
Files Included
youtube-transcriber/ ├── youtube-transcript-extractor/ Phase 1 Skill │ ├── src/ │ │ ├── extractor.py │ │ ├── formatters.py │ │ └── metadata.py │ └── tests/ (18 tests) │ ├── transcript-to-logseq/ Phase 2 Skill │ ├── src/ │ │ ├── logseq_formatter.py │ │ ├── obsidian_compat.py │ │ └── hierarchy_builder.py │ └── tests/ (21 tests) │ ├── transcript-to-obsidian/ Phase 3 Skill │ ├── src/ │ │ └── integrator.py │ └── tests/ (6 tests) │ ├── transcript-archiver/ Phase 4 Skill │ ├── src/ │ │ └── archiver.py │ └── tests/ (18 tests) │ ├── workflow_runner.py Full pipeline runner ├── integration_test.py Cross-skill tests (6 tests) └── config.yaml Configuration
Future Roadmap
v1.1.0 (Ollama Integration)
- • Swap Claude API → Ollama for local LLMs
- • Support multiple local models (Mistral, Llama2, etc)
- • Offline-first workflow
v1.2.0 (Advanced Features)
- • Batch transcript processing
- • Custom fact categories
- • Multiple output formats
- • Web UI for management
v1.3.0 (Integration)
- • Direct YouTube URL support (fetch transcripts automatically)
- • Streaming support for long transcripts
- • Multi-language support
Support
- •Documentation: See README.md in each skill directory
- •Tests: Run
python -m pytest tests/in each directory - •Issues: Check integration_test.py for validation
License
Same as parent project (astoreyai)
Status: ✅ Ready for Claude Code integration
Works With:
- •✅ Claude Code CLI (uses configured API key automatically)
- •✅ Ollama (local models - future version)
- •✅ Custom LLM backends (extensible architecture)
No API key management needed when used with Claude Code CLI