Gemini Vision Skill
Generate AI images and videos by invoking Gemini CLI's vision extension. This skill provides access to:
- •Nano Banana (gemini-2.5-flash-image) - Image generation and transformation
- •Veo 3 (veo-3.0-generate-001) - Video generation from images
- •Webcam capture - Live frame capture for AI processing
Prerequisites
- •Gemini CLI: Must be installed and configured
- •Vision Extension: Install via:
bash
gemini extensions install vision
- •API Key: Set
GEMINI_API_KEYenvironment variable
When to Use This Skill
Use this skill when the user asks to:
- •Generate images from text prompts
- •Transform or reimagine existing images
- •Create AI-generated videos from images
- •Capture webcam frames for AI processing
- •Create "nano banana" style images
- •Generate Veo videos
Available Operations
1. Image Generation (Nano Banana)
Generate images from text prompts or transform existing images.
Command Pattern:
gemini -p "/vision:banana prompt=\"Your creative prompt here\" n=1 out_dir=./output"
Parameters:
| Parameter | Default | Description |
|---|---|---|
prompt | Required | Creative description of desired image |
n | 1 | Number of images to generate |
out_dir | "." | Output directory for images |
model | gemini-2.5-flash-image | Image generation model |
Models Available:
- •
gemini-2.5-flash-image(default, recommended) - •
gemini-3-pro-image-preview(newer, experimental)
2. Video Generation (Veo 3)
Generate short videos from images or prompts.
Command Pattern:
gemini -p "/vision:veo prompt=\"Animate this scene\" aspect_ratio=16:9 out_dir=./output"
Parameters:
| Parameter | Default | Description |
|---|---|---|
prompt | Required | Animation/motion description |
aspect_ratio | "16:9" | Video aspect ratio (16:9 or 9:16) |
resolution | auto | Video resolution (e.g., "1080p") |
negative_prompt | "" | What to avoid in video |
veo_model | veo-3.0-generate-001 | Video model |
3. Webcam Capture + AI
Capture from webcam and process with AI.
# Start camera gemini -p "/vision:start" # Capture and transform gemini -p "/vision:banana prompt=\"Transform into oil painting\"" # Stop camera gemini -p "/vision:stop"
Instructions for Claude
When the user requests image or video generation:
- •
Determine the operation type:
- •Text-to-image → Use
/vision:banana - •Image transformation → Use
/vision:bananawith input image - •Image-to-video → Use
/vision:veo - •Webcam capture → Use
/vision:captureor/vision:banana
- •Text-to-image → Use
- •
Construct the Gemini CLI command:
bashgemini -p "/vision:<command> prompt=\"<user prompt>\" <params>"
- •
Execute via Bash tool:
- •Run the command
- •Capture the output paths
- •Report success and file locations to user
- •
Handle output:
- •Images saved as
banana_*.pngorbanana_*.jpg - •Videos saved as
veo_*.mp4 - •Return the file paths to the user
- •Images saved as
Example Workflows
Generate a Single Image
User: "Generate an image of a cyberpunk city at sunset"
Action:
gemini -p "/vision:banana prompt=\"A sprawling cyberpunk city at sunset, neon lights reflecting off wet streets, flying cars in the distance, highly detailed, cinematic\" n=1 out_dir=."
Transform an Image
User: "Make this photo look like a Studio Ghibli scene" (with image attached)
Action:
- •Save the attached image to a temp location
- •Run:
gemini -p "/vision:banana prompt=\"Transform into Studio Ghibli animation style, soft colors, whimsical atmosphere\" input_paths=['/path/to/image.jpg']"
Generate a Video
User: "Create a video of ocean waves"
Action:
gemini -p "/vision:veo prompt=\"Calm ocean waves gently rolling onto a sandy beach, golden hour lighting, peaceful atmosphere\" aspect_ratio=16:9"
Webcam to Art
User: "Take a photo of me and make it look like a Renaissance painting"
Action:
# Capture and transform in one step gemini -p "/vision:banana prompt=\"Transform into a Renaissance oil painting, dramatic lighting, classical composition\""
Output Format
Always report results in this format:
## Generated Content **Type:** Image/Video **Files:** - `/path/to/banana_20251227_123456_000.png` **Prompt Used:** [the prompt] **Model:** gemini-2.5-flash-image To view: Open the file path above or use `open /path/to/file`
Error Handling
Common issues and solutions:
| Error | Solution |
|---|---|
| "Camera not found" | Run /vision:devices to list cameras |
| "GEMINI_API_KEY not set" | Export the API key in environment |
| "Model not available" | Check model ID spelling |
| "Generation failed" | Try simpler prompt or different model |
Script Usage (Alternative)
For programmatic access, use the helper script:
python ~/.claude/skills/gemini-vision/scripts/gemini_vision.py \ --operation banana \ --prompt "Your prompt here" \ --output-dir ./output \ --count 1
Options:
- •
--operation: banana, veo, capture, devices - •
--prompt: The generation prompt - •
--output-dir: Where to save files - •
--count: Number of images (for banana) - •
--aspect-ratio: For veo (16:9 or 9:16) - •
--model: Override default model