Iris Skill - Visual Debugging & Ingestion
| name | description |
|---|---|
| iris | Leverages Gemini 3.0 Pro's vision capabilities to analyze screenshots, videos, and PDFs for debugging and context ingestion. USE WHEN you need the AI to "see" a problem or extract structure from visual media. |
The Key Insight
Images are data, not just art. Iris uses high-resolution multimodal processing to bridge the gap between "Pixels" and "Code." It can look at a UI bug, compare it to your CSS, and identify the exact line causing the shift.
Usage
Debug UI
bash
pai run Iris debug "error_screenshot.png"
Video Ingestion
bash
pai run Iris watch "workflow_demo.mp4"
PDF Structural Analysis
bash
pai run Iris parse "complex_spec.pdf"
How it Works
It utilizes the Gemini 3.0 media_resolution parameter to ingest high-fidelity visual data. It then runs a "Visual-to-Logic" loop that maps identified visual elements to your local repository files.
Strategic Value
- •Zero-Friction Debugging: Fix CSS and Layout issues in seconds.
- •Visual Onboarding: Ingest complex documentation or whiteboard photos instantly.
- •Temporal Insight: "Watch" a video of a system failing to identify the exact state transition where the error occurred.