AgentSkillsCN

gemini-api

通过 REST API 将任务委派给 Google Gemini 3 系列模型。当任务受益于 Gemini 的多模态能力时,例如图像生成(Nano Banana Pro)、图像编辑(Nano Banana Pro)、图像理解/视觉处理(Gemini 3 Pro),或文本处理任务(Gemini 3 Pro),均可调用此技能。务必优先选择 Pro 模型。

SKILL.md
--- frontmatter
name: gemini-api
description: Delegate tasks to Google Gemini 3 series models via REST API. Use when tasks benefit from Gemini's multimodal capabilities - image generation (Nano Banana Pro), image editing (Nano Banana Pro), image understanding/vision (Gemini 3 Pro), or text tasks (Gemini 3 Pro). Always prioritize Pro models.

Gemini API (REST)

Delegate tasks to Gemini 3 series models. Requires GEMINI_API_KEY environment variable.

Models

ModelModel IDBest For
Gemini 3 Progemini-3-pro-previewText generation, vision/image understanding, OCR (DEFAULT for text output)
Gemini 3 Flashgemini-3-flash-previewFast text/vision tasks (only when speed needed)
Nano Banana Progemini-3-pro-image-previewImage generation, image editing (DEFAULT for image output)
Nano Bananagemini-2.5-flash-imageFast image generation (only when speed needed)

Defaults:

  • Text output tasks (text generation, vision, OCR): Use gemini-3-pro-preview
  • Image output tasks (image generation, editing): Use gemini-3-pro-image-preview

API Endpoint

code
POST https://generativelanguage.googleapis.com/v1beta/models/{MODEL_ID}:generateContent?key={API_KEY}

Request Structure

For text models (with thinking):

json
{
  "contents": [{"parts": [{"text": "..."}]}],
  "generationConfig": {
    "thinkingConfig": {"thinkingLevel": "high"}
  }
}

For image models (no thinking support):

json
{
  "contents": [{"parts": [{"text": "..."}]}],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"]
  }
}

Thinking Levels (Text models only)

LevelUse Case
lowSimple tasks, minimal latency
highComplex reasoning (default)

Note: Image models (gemini-3-pro-image-preview, gemini-2.5-flash-image) do NOT support thinking levels.


Examples

1. Text Generation (Gemini 3 Pro)

bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Explain quantum entanglement simply"}]}],
    "generationConfig": {"thinkingConfig": {"thinkingLevel": "high"}}
  }' | jq -r '.candidates[0].content.parts[0].text'

2. Image Understanding / Vision (Gemini 3 Pro)

Analyze an image - use Gemini 3 Pro for vision tasks (text output):

bash
IMAGE_B64=$(base64 -i photo.jpg)

curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"inlineData": {"mimeType": "image/jpeg", "data": "'"${IMAGE_B64}"'"}},
        {"text": "Describe this image in detail. What objects, people, or text do you see?"}
      ]
    }],
    "generationConfig": {"thinkingConfig": {"thinkingLevel": "high"}}
  }' | jq -r '.candidates[0].content.parts[0].text'

3. Image Generation (Nano Banana Pro)

Generate images using Nano Banana Pro:

bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "A cozy coffee shop interior, warm lighting, watercolor style"}]}],
    "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -d > coffee_shop.png

4. Image Editing (Nano Banana Pro)

Edit images using Nano Banana Pro:

bash
IMAGE_B64=$(base64 -i input.jpg)

curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"inlineData": {"mimeType": "image/jpeg", "data": "'"${IMAGE_B64}"'"}},
        {"text": "Remove the background and make it transparent"}
      ]
    }],
    "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -d > edited.png

5. Text in Images (Nano Banana Pro)

Nano Banana Pro excels at rendering text:

bash
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Create a professional business card for \"Jane Smith, CEO\" at \"TechCorp Inc\" with modern minimalist design"}]}],
    "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -d > business_card.png

6. Fast Tasks (Flash models - only when speed needed)

bash
# Fast text/vision (gemini-3-flash-preview)
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"contents": [{"parts": [{"text": "Quick summary"}]}]}' | jq -r '.candidates[0].content.parts[0].text'

# Fast image generation (gemini-2.5-flash-image)
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent?key=${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "A simple icon"}]}],
    "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -d > icon.png

When to Delegate to Gemini

TaskModel
Text generationgemini-3-pro-preview (default)
Image understanding/visiongemini-3-pro-preview (default)
OCR / text extractiongemini-3-pro-preview (default)
Image generationgemini-3-pro-image-preview (default)
Image editinggemini-3-pro-image-preview (default)
Graphics with text/logosgemini-3-pro-image-preview (default)
Fast text/vision (user requested)gemini-3-flash-preview
Fast image tasks (user requested)gemini-2.5-flash-image

Response Structure

Text response:

json
{"candidates": [{"content": {"parts": [{"text": "..."}]}}]}

Image response:

json
{"candidates": [{"content": {"parts": [
  {"text": "Here is your image..."},
  {"inlineData": {"mimeType": "image/png", "data": "<base64>"}}
]}}]}

Python Script

Use scripts/gemini_api.py (prioritizes Pro models):

bash
# Text generation (defaults to gemini-3-pro-preview)
python scripts/gemini_api.py text "Explain REST APIs"

# Image understanding (defaults to gemini-3-pro-preview)
python scripts/gemini_api.py vision photo.jpg "What's in this image?"

# Image generation (defaults to Nano Banana Pro)
python scripts/gemini_api.py generate "A sunset over mountains" -o sunset.png

# Image editing (defaults to Nano Banana Pro)
python scripts/gemini_api.py edit input.jpg "Add a rainbow" -o output.png

# Fast mode (only when user requests speed)
python scripts/gemini_api.py text "Quick question" --model flash
python scripts/gemini_api.py vision photo.jpg "Quick check" --model flash
python scripts/gemini_api.py generate "Simple icon" -o icon.png --model flash