Gemini Image Generation
Generate high-quality images using Google Gemini 3 Pro Image (gemini-3-pro-image-preview). This skill covers API setup, prompt engineering for image generation, aspect ratio selection, batch workflows, and failure mitigation.
When to Use
- •Writing image prompts for Gemini 3 Pro Image
- •Setting up a Python script to call the Gemini image generation API
- •Debugging 503 errors, text rendering issues, or quota problems
- •Building a prompt-as-file workflow for batch image generation
- •Choosing aspect ratios and resolutions for specific use cases
Model Reference
| Property | Value |
|---|---|
| Model ID | gemini-3-pro-image-preview |
| SDK | google-genai Python package |
| Install | uv add google-genai |
| Cost (3 Pro) | ~$0.13/image |
| Cost (2.5 Flash) | ~$0.04/image (lower quality) |
| Free Tier Quota | Zero -- billing MUST be enabled |
Supported Aspect Ratios
1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Supported Resolutions
1K, 2K, 4K -- uppercase K required (lowercase fails silently)
4K Dimensions by Aspect Ratio
| Aspect Ratio | Dimensions | Print Size (300 DPI) |
|---|---|---|
| 16:9 | 5504x3072 | ~18x10 in |
| 9:16 | 3072x5504 | ~10x18 in |
| 1:1 | ~4096x4096 | ~14x14 in |
API Setup
Minimal Working Example
from google import genai
from google.genai import types
from pathlib import Path
MODEL = "gemini-3-pro-image-preview"
client = genai.Client(api_key="YOUR_API_KEY") # or GOOGLE_API_KEY env var
config = types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(
aspect_ratio="16:9",
image_size="2K",
),
)
response = client.models.generate_content(
model=MODEL,
contents=prompt_text,
config=config,
)
for part in response.parts:
if part.inline_data is not None:
image = part.as_image()
image.save(str(Path("output.png")))
Configuration Notes
- •
response_modalitiesMUST be["IMAGE"]for image output - •
aspect_ratioaccepts any supported ratio string (e.g.,"16:9") - •
image_sizeaccepts"1K","2K", or"4K"-- uppercase K required - •Billing MUST be enabled on the Google Cloud project -- free tier has zero image quota
Prompt Engineering
Core Principle: Essays, Not Tag Lists
Write 300-500 word prose descriptions. The model responds to narrative context far better than comma-separated keywords. Think of it as writing a scene for a cinematographer, not filing metadata.
Structured Prompt Flow
Organize prompts in this order for best results:
- •Subject -- who or what is the focal point, with exact position and materials
- •Environment -- setting, background, spatial relationships
- •Secondary elements -- supporting objects, atmospheric effects
- •Lighting -- direction, color temperature, time of day
- •Mood -- emotional tone, atmosphere (build implicitly through prior sections)
- •Style -- medium, color constraints, texture, camera/lens, artistic reference
Use ### markdown headers to separate sections. Both humans and the model benefit from structural markers.
Hyper-Specificity Wins
| Weak | Strong |
|---|---|
| "fantasy armor" | "ornate elven plate armor, etched with silver leaf patterns" |
| "vintage car" | "1950s American sedan -- rounded fenders, chrome bumper" |
| "night sky" | "deep indigo sky scattered with cold white pinprick stars" |
| "old building" | "crumbling limestone facade with iron balconets, moss in the mortar joints" |
Every vague noun is a coin flip. Every specific detail is a constraint the model can honor.
Positive Descriptions Over Exclusion Lists
Describe what SHOULD be in the image rather than listing what should not. The model interprets positive instructions more reliably than negatives.
Mandatory exception: Always include No text anywhere in the image. as the final line. Text rendering is unreliable and this directive consistently suppresses garbled lettering.
Spatial Anchoring
Place elements using concrete spatial language:
- •Grid positions: "lower-right third," "upper-left quadrant," "center-frame"
- •Relative placement: "behind and to the left," "extending from edge to edge"
- •Scale relationships: "disproportionately large," "tiny against the horizon"
- •Depth planes: "foreground," "middle ground," "far distance"
Vague spatial language ("in the scene," "nearby") produces vague compositions. Be a set designer, not a narrator.
Color Strategy
Define a constrained palette and assign roles:
- •Warm isolation: Reserve one warm color for the emotional anchor. Everything else stays cool (or vice versa). A single warm element in a cool scene draws the eye like a campfire in the dark
- •Layer order: Describe which colors print on top of which (matters for screen-print and layered styles)
- •Limit: 4-6 colors maximum. More colors = less cohesion
The Subject Rule
One primary subject per image. If there are two figures, one leads. If there's a landscape and a figure, decide which dominates. Every element should serve the subject -- supporting it spatially, tonally, or narratively.
Physical Metaphors Beat Abstract Concepts
Describe tangible, visible qualities rather than abstract feelings. "Warm golden light pooling on weathered wood" lands. "A sense of nostalgia" does not.
Containment Language Prevents Edge Bleed
When visual effects (glow, smoke, light rays) should stay within the composition, explicitly describe containment boundaries. Without this, effects bleed to frame edges and overwhelm the subject.
Style Specification
The Style section is the technical contract with the model. Include:
- •Medium: What this image looks like physically (screen-print, oil painting, photograph, patent drawing)
- •Color constraints: Exact palette, number of layers/inks
- •Texture: Halftone grain, paper stock, film grain, brush strokes -- where and how heavy
- •Imperfections: Misregistration, foxing, scratches, light leaks. Perfect images feel sterile
- •Mood summary: 3-5 evocative words capturing the emotional target
- •Exclusions: "No text anywhere in the image" (always include this)
Style Archetypes
Screen-print poster
- •Flat color separation, distinct ink layers
- •Visible halftone dot grain (specify WHERE it's heaviest)
- •Paper stock texture showing through
- •Slight ink misregistration between layers
- •Bold, graphic, high-contrast
Cinematic matte painting
- •35mm film grain
- •Photorealistic lighting with painterly atmosphere
- •Deep depth of field or selective focus
- •Color grading (specify warm/cool, lifted blacks, etc.)
Vintage document
- •Aged paper with foxing stains, fold creases, fiber texture
- •Period-appropriate rendering (ink, watercolor, pencil)
- •Institutional formatting (seals, stamps, margins)
- •Looks like a photograph of a real physical object
Illustration / editorial
- •Clean linework or defined shapes
- •Limited palette, often with one accent color
- •Graphic composition, strong silhouettes
- •Can evoke specific decades (1960s travel poster, 1920s deco)
Mood Through Specifics
Mood emerges from specific choices, not adjectives:
| Mood | How to achieve it |
|---|---|
| Lonesome | Single subject, vast negative space, cool palette, distant horizon |
| Ominous | Low angle, heavy darks, subject backlit or partially obscured |
| Tender | Close framing, warm light, soft edges, intimate scale |
| Epic | Wide aspect ratio, dramatic sky, small figure against large landscape |
| Uncanny | Familiar scene with one wrong element (reversed shadow, missing reflection) |
State the mood explicitly in the Style section, but build it implicitly through every preceding section.
Aspect Ratio as Storytelling
Aspect ratio drives composition and narrative -- it is not just a technical setting.
| Ratio | Use Case | Example |
|---|---|---|
| 16:9 | Wide landscapes, cinematic establishing shots | Campfire panoramas, desert highway vistas |
| 3:4 | Portraits, character studies, detailed objects | Patent drawings, character concepts |
| 1:1 | Symmetrical compositions, contained scenes | Emblems, face-to-face encounters |
| 9:16 | Phone wallpapers, tall/vertical drama | Towers, waterfalls, vertical compositions |
| 21:9 | Ultra-wide cinematic, banner images | Epic landscapes, film-style frames |
Critical: 9:16 requires full recomposition. Simply cropping a 16:9 prompt produces unusable results. Write a dedicated prompt that places subject and environment within the vertical frame from the start.
Workflow: Prompt-as-File Architecture
For projects generating multiple images, use a file-based workflow:
Prompt File Format
--- name: kebab-case-name aspect_ratio: '16:9' resolution: 2K style: screen-print-poster last_generated: null last_updated: '2026-01-15T12:00:00Z' --- ### Subject [300-500 word prompt body organized with ### headers] ### Environment ... ### Style ... No text anywhere in the image.
Generation Script Pattern
import sys
import time
import yaml
from pathlib import Path
from datetime import datetime
from google import genai
from google.genai import types
MODEL = "gemini-3-pro-image-preview"
PROMPT_DIR = Path("prompts")
OUTPUT_DIR = Path("output")
MAX_RETRIES = 3
client = genai.Client() # uses GOOGLE_API_KEY env var
def generate(prompt_path: Path) -> Path | None:
"""Generate an image from a prompt file with retry logic."""
text = prompt_path.read_text()
# Split YAML frontmatter from prompt body
_, fm_raw, body = text.split("---", 2)
frontmatter = yaml.safe_load(fm_raw)
config = types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(
aspect_ratio=frontmatter.get("aspect_ratio", "16:9"),
image_size=frontmatter.get("resolution", "2K"),
),
)
for attempt in range(MAX_RETRIES):
try:
response = client.models.generate_content(
model=MODEL,
contents=body.strip(),
config=config,
)
for part in response.parts:
if part.inline_data is not None:
timestamp = datetime.now().strftime("%Y-%m-%d-%H%M")
name = frontmatter.get("name", prompt_path.stem)
output_path = OUTPUT_DIR / f"{name}-{timestamp}.png"
part.as_image().save(str(output_path))
return output_path
except Exception as e:
if "503" in str(e) or "UNAVAILABLE" in str(e):
wait = 10 * (2 ** attempt)
print(f" Retry {attempt + 1}/{MAX_RETRIES} in {wait}s: {e}")
time.sleep(wait)
else:
raise
print(f" Failed after {MAX_RETRIES} retries: {prompt_path.name}")
return None
if __name__ == "__main__":
OUTPUT_DIR.mkdir(exist_ok=True)
if len(sys.argv) > 1:
# Generate specific prompt
generate(Path(sys.argv[1]))
else:
# Batch: generate all prompts with delay between calls
for prompt in sorted(PROMPT_DIR.glob("*.md")):
print(f"Generating: {prompt.name}")
result = generate(prompt)
if result:
print(f" Saved: {result}")
time.sleep(15) # mandatory delay between calls
Batch Shell Pattern
for prompt in prompts/*.md; do
python generate.py "$prompt"
sleep 15
done
The 15-second delay between calls is not optional. See "503 UNAVAILABLE" below.
Failures and Workarounds
Free Tier Has Zero Quota
The free tier provides zero image generation quota. Billing MUST be enabled on the Google Cloud project. The API returns a quota exhaustion error with no useful message.
503 UNAVAILABLE -- Batch Overload
Rapid sequential generation triggers 503 errors from aggressive rate limiting.
Fix: Minimum 10-15 second delay between calls, plus exponential backoff:
| Retry | Wait |
|---|---|
| 1 | 10s |
| 2 | 20s |
| 3 | 40s |
| After 3 | Skip and log |
Text Rendering Is Unreliable
The model cannot reliably render text. Letters come out garbled, misspelled, or stylistically inconsistent.
Mitigation: Always include "No text anywhere in the image" in every prompt. For label-like elements, describe the physical object (engraved metal plate, embossed leather tag) rather than the text content.
Output Format Inconsistency
The API sometimes returns JPEG data even when the output filename ends in .png. Verify actual format or convert explicitly if format consistency matters downstream.
Dense Prompt Timeouts
Complex prompts with heavy detail can timeout on the first attempt. Retry logic handles this -- the same prompt typically succeeds on the second or third call.
Overly Long Prompts (800+ Words)
The model begins ignoring later instructions in very long prompts. The sweet spot is 300-500 words. If a prompt exceeds this, restructure rather than truncate -- move the most important visual details earlier.
Multiple Similar Subjects
When a prompt requests multiple instances of similar subjects (e.g., several characters in similar clothing), the model makes them near-identical. Differentiate aggressively with unique physical details for each.
Iteration Strategy
Edit, Don't Re-Roll
Refine existing prompts based on output rather than starting from scratch. Small targeted edits converge faster than fresh attempts. The model is sensitive to incremental phrasing changes.
Archive Every Generation
Use timestamp-based filenames to preserve iterations: {name}-{YYYY-MM-DD-HHMM}.png. Each generation costs money -- never overwrite previous output.
Print-Shop Language for Screen-Print Results
When generating print-style artwork, use vocabulary from the print shop: ink layers, halftone grain, paper stock texture, misregistration effects, spot color separation. The model has strong training signal on print production terminology.
Common Pitfalls
What Fails
- •Contradictory instructions: "vibrant and muted" -- pick one
- •Too many subjects: Three equally weighted elements compete for attention
- •Abstract concepts without visual anchors: "the feeling of loss" -- show a specific scene that evokes it
- •Overly long prompts: 800+ words causes the model to contradict itself
- •Requesting specific text: Gemini renders text poorly -- always suppress it
- •Relying on negatives: "no people, no buildings, no cars" -- describe what IS there
What Succeeds
- •One clear scene: A single moment, frozen, described spatially
- •Constrained palette: Fewer colors = stronger visual identity
- •Specific textures: "visible halftone grain heaviest on the sky gradient" beats "textured"
- •Emotional detail in every section: Each element carries meaning, not just geometry
- •The final sentence of each section lands on a feeling, not a measurement