YouTube Notes — Project Skill
Project Context
- •App: Single Streamlit script at app.py. Run:
streamlit run app.pyfrom repo root. - •Flow: Paste URLs → extract video ID (regex for
youtube.com/watch?v=ID,youtu.be/ID) → fetch transcript viaYouTubeTranscriptApi.get_transcript→ summarize with Groq (llama-3.1-8b-instant). - •Env:
GROQ_API_KEYin.env; if unset, app shows transcript only and warns.
When Adding Features
- •New URL formats: Extend the regex in
YOUTUBE_ID_PATTERNor the logic inextract_video_id; keep invalid-URL handling and user-facing "Invalid YouTube URL." - •New LLM backend (e.g. Ollama): Add a branch in the summarization path (e.g. check env like
USE_OLLAMAor model choice); call Ollama API or local endpoint with the same prompt used for Groq; keep transcript fetch and error handling unchanged. - •Export (copy / download): Add a Streamlit control (e.g.
st.download_buttonor copy-to-clipboard) that uses the current notes text; do not change transcript or summarization logic. - •Transcript-only mode: Already supported when
GROQ_API_KEYis unset; optional: add an explicit "Show transcript only" toggle that skips the Groq call when set.
Conventions to Preserve
- •Transcript:
youtube-transcript-apionly; no YouTube Data API, no Whisper unless explicitly requested. - •One summarization prompt tuned for programming notes (bullets, concepts, takeaways).
- •~1.5 s delay between transcript requests for multiple URLs.
- •Errors: "This video has no captions available." for missing transcript; "Invalid YouTube URL." for bad URLs.