AgentSkillsCN

image-generation

通过 Shell 脚本,利用 Google Gemini、OpenAI GPT Image 以及 xAI Grok Image API 生成并编辑图像。当用户提出“生成一张图像”、“创作一幅图像”、“编辑一幅图像”、“修改一幅图像”、“制作一张图片”、“帮我画一幅”、“文字转图像”、“用 Gemini 生成”、“用 OpenAI 生成”、“用 xAI 生成”、“用 Grok 生成”、“GPT 图像”、“Gemini 图像”或“Grok 图像”等需求时,应使用此技能。

SKILL.md
--- frontmatter
name: image-generation
description: Generates and edits images using Google Gemini, OpenAI GPT Image, and xAI Grok Image APIs via shell scripts. This skill should be used when the user asks to "generate an image", "create an image", "edit an image", "modify an image", "make a picture", "draw me a", "text to image", "generate with gemini", "generate with openai", "generate with xai", "generate with grok", "gpt image", "gemini image", or "grok image".
version: 2026.2.1

Image Generation with Gemini, OpenAI, and xAI

Generate and edit images using Google Gemini, OpenAI GPT Image 1.5, and xAI Grok Image APIs via shell scripts.

Available Providers

Google Gemini

  • Model: gemini-3-pro-image-preview (default), gemini-2.5-flash-image (faster)
  • Strengths: Multi-turn editing, aspect ratio control, Google Search grounding for factual imagery, up to 4K resolution
  • Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 4:5, 5:4, 21:9
  • Env var: GEMINI_API_KEY

OpenAI GPT Image 1.5

  • Model: gpt-image-1.5
  • Strengths: Superior text rendering, transparent backgrounds, up to 16 input images for editing, quality tiers
  • Sizes: 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait)
  • Quality: low (fast/cheap), medium, high (best fidelity)
  • Env var: OPENAI_API_KEY

xAI Grok Image

  • Model: grok-imagine-image (default), grok-2-image (basic generation only)
  • Strengths: Prompt revision by chat model, flat per-image pricing, diverse style range, many aspect ratios
  • Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 2:1, 1:2, 19.5:9, 9:19.5, 20:9, 9:20, auto
  • Editing: Same endpoint as generation; source image passed as data URI
  • Env var: XAI_API_KEY or GROK_API_KEY

Usage

Text-to-Image Generation

Use the scripts at ${CLAUDE_PLUGIN_ROOT}/scripts/:

bash
# Gemini
bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
  --mode generate \
  --prompt "a serene mountain landscape at sunset" \
  --output ./generated.png

# OpenAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/openai.sh" \
  --mode generate \
  --prompt "a serene mountain landscape at sunset" \
  --output ./generated.png

# xAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/xai.sh" \
  --mode generate \
  --prompt "a serene mountain landscape at sunset" \
  --output ./generated.png

Image Editing

bash
# Gemini
bash "${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh" \
  --mode edit \
  --prompt "change the sky to a starry night" \
  --input-image ./original.png \
  --output ./edited.png

# OpenAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/openai.sh" \
  --mode edit \
  --prompt "change the sky to a starry night" \
  --input-image ./original.png \
  --output ./edited.png

# xAI
bash "${CLAUDE_PLUGIN_ROOT}/scripts/xai.sh" \
  --mode edit \
  --prompt "change the sky to a starry night" \
  --input-image ./original.png \
  --output ./edited.png

Parallel Generation

To generate with multiple providers simultaneously:

  1. Create a task per provider with TaskCreate, using activeForm for spinner text:
    • "Generate image with Gemini" (activeForm: "Generating image with Gemini...")
    • "Generate image with OpenAI" (activeForm: "Generating image with OpenAI...")
    • "Generate image with xAI" (activeForm: "Generating image with xAI...")
  2. Mark all tasks in_progress with TaskUpdate
  3. Launch Task subagents (subagent_type: Bash) in the same message so they run concurrently
  4. As each subagent returns, mark its task completed via TaskUpdate
  5. Present all output file paths to the user

Prompting Tips

General

  • Be specific and descriptive: "a golden retriever puppy playing in autumn leaves, soft afternoon light" beats "dog in park"
  • Specify style explicitly: "watercolor painting", "photorealistic", "flat vector illustration"
  • Include composition details: "close-up", "aerial view", "centered", "rule of thirds"

Text in Images

  • OpenAI GPT Image 1.5 is significantly better at rendering text
  • Put text in quotes or ALL CAPS in the prompt: a sign that reads "OPEN 24 HOURS"
  • Specify typography details: font style, size, color, placement

Editing

  • Describe what to change, not the whole image
  • Be specific about which elements to preserve vs modify
  • For Gemini: supports iterative multi-turn refinement
  • For OpenAI: can accept up to 16 reference images
  • For xAI: prompts are revised by a chat model before generation

Error Handling

  • Scripts exit with code 1 on failure and print error details to stderr
  • If an API key is missing, the script exits immediately with a clear message
  • HTTP errors include the status code and API error message
  • If multiple providers are used in parallel and one fails, report the error and present the successful results
  • Rate limit errors (HTTP 429) mean the provider's quota is exhausted - try again later or use the other provider

Script Options Reference

gemini.sh

FlagValuesDefault
--modegenerate, edit(required)
--prompttext(required)
--outputfile path(required)
--input-imagefile path(edit only)
--aspect-ratio1:1, 16:9, etc.1:1
--modelgemini model namegemini-3-pro-image-preview

openai.sh

FlagValuesDefault
--modegenerate, edit(required)
--prompttext(required)
--outputfile path(required)
--input-imagefile path(edit only)
--size1024x1024, 1536x1024, 1024x15361024x1024
--qualitylow, medium, highhigh
--backgroundtransparent, opaque, autoauto
--modelOpenAI model namegpt-image-1.5

xai.sh

FlagValuesDefault
--modegenerate, edit(required)
--prompttext(required)
--outputfile path(required)
--input-imagefile path(edit only)
--aspect-ratio1:1, 16:9, 9:16, 4:3, 3:4, etc.(none)
--modelxAI model namegrok-imagine-image