Zight Content Analyzer
Analyze Zight screenshots and screen recordings to understand bug reports, feature requests, and user feedback.
Prerequisites
Before processing video content, verify dependencies are installed:
which ffmpeg # Required for frame extraction and audio conversion which ffprobe # Required for video metadata (installed with ffmpeg) which whisper-cli # Required for audio transcription
If missing:
- •ffmpeg:
brew install ffmpeg(macOS) orapt install ffmpeg(Linux) - •whisper-cli:
brew install whisper-cpp(macOS), then download a model:curl -L -o ~/.local/share/whisper-cpp/models/ggml-base.en.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"
Image analysis requires no external dependencies.
Quick Reference
- •oEmbed endpoint:
https://oembed.zight.com/oembed?url=URL&format=json - •Whisper model:
~/.local/share/whisper-cpp/models/ggml-base.en.bin - •Whisper CLI:
whisper-cli - •Temp directory:
/tmp/zight-${session_id}/
See reference.md for Zight page structure and field documentation.
Step 1: Detect Content Type
Fetch the oEmbed metadata to determine if this is an image or video:
curl -sL "https://oembed.zight.com/oembed?url=ZIGHT_URL&format=json"
Check the type field:
- •
"photo"→ Check the title; if it starts with "Zight Recording" it may be an animated GIF (see Step 2a) - •
"video"or"rich"→ Video path (Step 2b)
If oEmbed fails, fall back to fetching the share page with WebFetch and extracting the content type from the page metadata.
Step 2a: Image / GIF Analysis
- •Extract the direct image URL from the oEmbed
urlfield - •Download:
curl -sL -o /tmp/zight-img.jpg "CDN_URL" - •Read the file to view it (Claude's vision will analyze the image)
- •Describe what is shown, focusing on:
- •UI state, error messages, unexpected behavior
- •Which part of the application is visible
- •Any annotations or highlights the user added
Note: Titles containing "Zight Recording" indicate animated content (GIFs or
screen recordings). GIFs returned as type: "photo" can still be viewed as
static images — Claude sees the first frame.
Return the analysis to the caller. Done.
Step 2b: Video / Screen Recording Analysis
2b.1: Extract metadata and download
WebFetch the share page to extract the video metadata. Look for the
Copernicus or gon config object in the page:
- •
content_url→ direct MP4 download URL - •
transcription.data→ existing transcript (may be null) - •
name→ original filename with context
Create a working directory and download the video:
SESSION_DIR="/tmp/zight-$(date +%s)" mkdir -p "$SESSION_DIR/frames" curl -sL -o "$SESSION_DIR/video.mp4" "CONTENT_URL"
2b.2: Get video info
ffprobe -v quiet -print_format json -show_format -show_streams "$SESSION_DIR/video.mp4"
Note the duration to decide frame extraction rate:
- •Under 30s: 1 frame every 2 seconds
- •30s-120s: 1 frame every 3 seconds
- •Over 120s: 1 frame every 5 seconds
2b.3: Extract key frames
ffmpeg -i "$SESSION_DIR/video.mp4" \ -vf "fps=1/INTERVAL" \ -q:v 2 \ "$SESSION_DIR/frames/frame-%04d.jpg" 2>&1
2b.4: Transcribe audio
First check if the page metadata had a transcript. If yes, use that and skip to step 2b.5.
If no transcript exists, extract and transcribe:
# Extract audio as 16kHz mono WAV (whisper-cpp requirement) ffmpeg -i "$SESSION_DIR/video.mp4" \ -ar 16000 -ac 1 -c:a pcm_s16le \ "$SESSION_DIR/audio.wav" 2>&1 # Transcribe with timestamps whisper-cli \ --model ~/.local/share/whisper-cpp/models/ggml-base.en.bin \ --output-txt --output-srt \ --file "$SESSION_DIR/audio.wav" 2>&1
The SRT file at $SESSION_DIR/audio.wav.srt contains timestamped segments.
The TXT file at $SESSION_DIR/audio.wav.txt contains plain text.
Read both files.
2b.5: Analyze frames visually
Read each extracted frame image. For each frame, note:
- •What screen/page is shown
- •Any error messages, toasts, modals
- •What the user is interacting with (cursor position, active elements)
- •State changes from previous frame
Correlate frame timestamps with transcript timestamps to understand what the user was describing at each moment.
2b.6: Synthesize
Produce a structured summary:
## Screen Recording Analysis **Duration**: X seconds **Application area**: [which part of the app] ### Timeline - **0:00-0:05**: User navigates to [page]. Says: "..." - **0:05-0:12**: Clicks [button]. Error appears: "..." - ... ### Bug/Issue Summary [What the user is reporting, what the expected vs actual behavior is] ### Relevant UI States [Key observations about the application state visible in frames] ### Suggested Investigation Areas [Files, components, or systems likely involved]
2b.7: Cleanup
rm -rf "$SESSION_DIR"
Error Handling
- •If oEmbed returns an error, fall back to parsing the share page HTML
- •If ffmpeg is not installed, analyze only the thumbnail (available in oEmbed)
- •If whisper-cli is not installed or fails, provide frame analysis without transcript
- •If the video has no audio track, skip transcription entirely
- •Always clean up temp files, even on error
Notes
- •Zight CDN subdomains vary (p198, p199, etc.) — always use curl, not WebFetch for downloads
- •Page config is in a
<script>tag aswindow.Copernicus(newer) orwindow.gon(older) - •Screen recordings are always MP4
- •Some recordings may have no narration — that's fine, frame analysis alone is valuable