Look At - Multimodal File Analysis
Fast, cost-effective file analysis using Google's Gemini 2.5 Flash Lite model for PDFs, images, diagrams, and other media files.
Tool Selection Enforcement
Rationalization Table - STOP When Thinking:
| Excuse | Reality | Do Instead |
|---|---|---|
| "I can read images directly with Read" | You'll waste thousands of context tokens showing the full image | Use look_at for analysis |
| "I'll use Read for this PDF" | You'll lose table structure and visual information by extracting raw text | Use look_at for PDFs with tables/charts/diagrams |
| "Just a quick glance at the file" | Your quick glances still consume full context tokens | Use look_at for targeted extraction |
| "I need exact text, so Read is required" | Gemini's extraction is accurate for most use cases | Use look_at first, Read only if extraction insufficient |
| "look_at adds complexity" | You gain context savings and faster processing | Use look_at for media files |
| "The file is small" | Your small files still waste context if uninterpreted | Size doesn't determine tool choice, content type does |
| "I'll process it myself" | You waste reasoning tokens on trivial extraction | Delegate to look_at |
Red Flags - STOP Immediately When Thinking:
- •If you catch yourself thinking "Let me Read this image/PDF/screenshot" → STOP. Use look_at for media files.
- •If you catch yourself thinking "I can see the image directly" → STOP. Seeing it directly still wastes context. Use look_at.
- •If you catch yourself thinking "Just need to glance at this diagram" → STOP. Glancing still costs context tokens. Use look_at.
- •If you catch yourself thinking "The PDF is text-based, so Read is fine" → STOP. If it has structure/tables/charts, use look_at.
Cost & Context Benefits
| Scenario | Read Tool | look_at Tool |
|---|---|---|
| PDF with table | Extracts raw text (~1000 tokens), loses table structure | Extracts table as structured data (~100 tokens) |
| Screenshot | Loads entire image (~500 tokens), requires interpretation | Describes content (~50 tokens) |
| Diagram | Shows image (~800 tokens), requires analysis | Explains architecture (~100 tokens) |
| Multi-page PDF | All pages loaded (~5000 tokens) | Extracts specific sections (~200 tokens) |
look_at saves 80-95% of context tokens by extracting only relevant information.
When to Use
Use look_at when you need:
- •Media files the Read tool cannot interpret
- •Extracting specific information or summaries from documents
- •Describing visual content in images or diagrams
- •Analyzing charts, tables, or structured data in PDFs
- •When analyzed/extracted data is needed, not raw file contents
Never use look_at when:
- •Source code or plain text files needing exact contents (use Read)
- •Files that need editing afterward (need literal content from Read)
- •Simple file reading where no interpretation is needed
- •Exact formatting or structure must be preserved
How It Works
- •Provide a file path and a specific goal (what to extract)
- •The helper script uploads the file to Gemini's API
- •Gemini 2.5 Flash Lite analyzes the file and extracts requested information
- •Only the relevant extracted information is returned (saves context tokens)
Usage Pattern
CRITICAL - Display Requirement:
Always set the Bash tool description parameter to show a clean invocation:
description: "look-at: [goal text]"
Never display the full Python command to the user.
# Basic usage
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "/path/to/file.pdf" \
--goal "Extract the title and date from this document"
# With custom model
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "/path/to/diagram.png" \
--goal "Describe the architecture shown in this diagram" \
--model "gemini-2.5-flash"
IMPORTANT:
- •Always use absolute paths for files
- •Always set Bash tool
descriptionto"look-at: [goal]"for clean UX
Response Rules
When using look_at, the response includes:
- •Only the extracted information matching the goal
- •Clear statement if requested information is not found
- •Concise output focused on the goal (no preamble)
Use this extracted information directly in continued work without loading the full file into context.
Supported File Types
| Type | Extensions | MIME Types |
|---|---|---|
| Images | .jpg, .jpeg, .png, .webp, .heic, .heif | image/* |
| Videos | .mp4, .mpeg, .mov, .avi, .webm | video/* |
| Audio | .wav, .mp3, .aiff, .aac, .ogg, .flac | audio/* |
| Documents | .pdf, .txt, .csv, .md, .html | application/pdf, text/* |
Model Options
| Model | Use Case | Speed | Cost |
|---|---|---|---|
gemini-2.5-flash-lite | Default - fast, cheap analysis | Fastest | Lowest |
gemini-3-flash | More complex extraction needs | Fast | Low |
gemini-3-pro-preview | Highest accuracy required | Medium | Medium |
Default is gemini-2.5-flash-lite for optimal speed/cost ratio.
Common Patterns
REMEMBER: Always use description: "look-at: [goal]" in the Bash tool call.
Extract Specific Information
# Bash tool call with:
# description: "look-at: Extract the executive summary section"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "report.pdf" \
--goal "Extract the executive summary section"
Describe Visual Content
# Bash tool call with:
# description: "look-at: List all UI elements and their layout"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "screenshot.png" \
--goal "List all UI elements and their layout"
Analyze Diagrams
# Bash tool call with:
# description: "look-at: Explain the data flow and component relationships"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "architecture.png" \
--goal "Explain the data flow and component relationships"
Extract Structured Data
# Bash tool call with:
# description: "look-at: Extract the table data as JSON"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "table.pdf" \
--goal "Extract the table data as JSON with columns: name, value, date"
Environment Setup
Required environment variable:
export GOOGLE_API_KEY="your-api-key-here"
Required Python package:
pip install google-genai
For pixi-managed projects, add to pixi.toml:
[dependencies] google-genai = ">=1.0.0"
Cost Optimization
- •Gemini 2.5 Flash Lite is the most cost-effective option
- •Only extracts requested information (saves on output tokens)
- •Avoids loading full files into main conversation context
- •Use specific goals to minimize unnecessary processing
Troubleshooting
| Issue | Solution |
|---|---|
| API key not set | Set GOOGLE_API_KEY environment variable |
| File not found | Use absolute paths, verify file exists |
| Large file timeout | Break into smaller files or use lower-quality images |
| Rate limit errors | Add retry logic or use batch processing |
| Empty response | Check that goal is clear and specific |
Examples
See examples/ directory for:
- •
analyze_pdf.sh- PDF document extraction - •
describe_image.sh- Image analysis - •
extract_table.sh- Structured data extraction
Related Skills
- •
/gemini-batch- For batch processing of many files - •Standard
Readtool - For text files needing exact contents