visual-analysis

利用 Gemini 的多模态能力分析图像或视频。适用于当用户询问图像内容、需要 OCR 识别、目标检测、场景理解，或进行视频分析时使用。

SKILL.md

--- frontmatter

name: visual-analysis
description: Analyze images or videos using Gemini's multimodal capabilities. Use when the user asks about image content, needs OCR, object detection, scene understanding, or video analysis.
allowed-tools: gemini-api, Read, Glob

Visual Analysis with Gemini

When to Use This Skill

Automatically invoke this skill when:

•User asks to analyze an image or video file
•User requests OCR or text extraction from images
•User wants to understand image content, objects, or scenes
•User needs detailed visual descriptions
•User asks about what's in a screenshot or photo
•User requests video summarization or event detection

Examples That Trigger This Skill

•"What's in this image?"
•"Analyze screenshot.png"
•"Extract text from this receipt"
•"Describe what you see in photo.jpg"
•"What objects are in this image?"
•"Summarize what happens in video.mp4"

How to Use

•Identify the file: Get the file path from user's request
•Verify file exists: Use Read tool to check if file is accessible
•
Call Gemini: Use the analyze_visual tool from gemini-api MCP server
- •Pass the file path
- •Include user's specific question as the prompt (or use general analysis)
- •Choose model: gemini-1.5-flash for speed, gemini-1.5-pro for quality
•Present results: Return Gemini's analysis to the user

Tool Parameters

javascript

{
  "file_path": "/absolute/path/to/image.jpg",
  "prompt": "What objects are visible in this image?",
  "model": "gemini-1.5-flash"  // or "gemini-1.5-pro"
}

Capabilities

•Image Analysis: Detailed object detection, scene understanding, composition analysis
•OCR: Extract and read text from images (signs, documents, screenshots)
•Video Analysis: Summarize events, detect actions, identify changes over time
•Spatial Reasoning: Understand object relationships and layout
•Multi-frame Processing: Analyze video clips frame by frame

Best Practices

•For quick analysis, use gemini-1.5-flash
•For detailed or complex images, use gemini-1.5-pro
•Include specific questions in the prompt for targeted analysis
•For videos, mention timeframe of interest if relevant