Image Generation Skill

Generate and edit images via Gemini's native image generation API using curl.

Models

Model	ID	Best for
Nano Banana	`gemini-2.5-flash-image`	Fast, high-volume, low-latency
Nano Banana Pro	`gemini-3-pro-image-preview`	Pro asset production, complex prompts, accurate text rendering, 4K

Default to gemini-2.5-flash-image unless the user asks for high quality, 4K, search grounding, or text-heavy images.

Text-to-Image

bash

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}]
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png

Read the output image file to show it to the user.

Image Editing (image + text → image)

Encode an existing image as base64 and send it alongside a text prompt:

bash

BASE64_IMG=$(base64 -i input.png)

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"contents\": [{
      \"parts\": [
        {\"text\": \"YOUR EDIT PROMPT HERE\"},
        {\"inline_data\": {\"mime_type\": \"image/png\", \"data\": \"$BASE64_IMG\"}}
      ]
    }]
  }" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png

Pro Model Options

When using gemini-3-pro-image-preview, you can set aspect ratio and resolution:

bash

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png

Aspect Ratios

1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

Resolutions (Pro only)

1K (default), 2K, 4K — must be uppercase K.

Search Grounding (Pro only)

Generate images based on real-time info (weather, news, etc.):

bash

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "YOUR PROMPT HERE"}]}],
    "tools": [{"google_search": {}}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png

Handling Responses

The API returns JSON with parts that can contain text and/or image data. Extract text and image separately:

bash

# save full response
RESPONSE=$(curl -s -X POST ... )

# extract text (if any)
echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.text) | .text'

# extract and save image
echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 -D > output.png

Workflow

•Understand what the user wants to generate or edit
•Pick the right model (flash for speed, pro for quality/text/4K)
•Write a detailed, descriptive prompt — more detail = better results
•Run the curl command, save the image
•Read the image file to display it inline
•If the user wants edits, use the image editing flow with the previous output as input

Tips

•Prompts should be descriptive and specific — style, composition, lighting, mood
•For image editing, describe what to change, not what to keep
•Pro model has a "thinking" mode — may take longer but produces better results
•All generated images include a SynthID watermark
•Pro supports up to 14 reference images in a single request (up to 6 objects + 5 humans)
•If the response has no inlineData, check for error messages in the JSON