Generate Illustrations with Gemini
Analyze a blog post or topic, propose illustrations, and generate them via the Gemini image generation API.
Arguments
Parse the user's input for:
- •Target: A file path (e.g.,
_d/four-healths.md) or a freeform topic (e.g., "meditation benefits") - •
--style 'description': Override the default illustration style entirely - •
--ref 'path': One or more reference images for character consistency (can be repeated). When using the default raccoon style, always pass the canonical reference image (see below) unless the user opts out - •
--api-url 'url': Override the Gemini API endpoint (default below) - •
--count N: Max number of images to generate (default: 3) - •
--aspect 'W:H': Aspect ratio viaimageConfig(default: 3:4, portrait). Valid values:1:1,2:3,3:2,3:4,4:3,4:5,5:4,9:16,16:9,21:9 - •
--transparent: Generate on magenta chroma-key background (#FF00FF), then strip it via ImageMagick border-seeded flood fill. The scanner samples pixels along every image edge and keeps only the ones that are actually near#FF00FFas flood seeds — so shots where Gemini rendered grass/scenery into some corners (which broke the earlier 4-corners-only approach when flood-fill started from grass at 30% fuzz and ate the subject) still strip cleanly. Only magenta pixels reachable from the image edges are made transparent; interior magenta-tinted pixels (pink fur highlights, glass reflections, lobster-claw reds) are preserved automatically. Fast (sub-second) and pixel-accurate — no ML needed. Requiresmagick(ImageMagick). If the subject fills the frame and no border pixel is near-magenta, the strip is skipped with an error rather than producing a swiss-cheese result. After the strip, two layered evals auto-run — see Automatic eval below. - •
--no-eval: Skip the alpha-mask eval pass that looks for interior holes, residual magenta, and edge fringe (needs numpy/pillow/scipy — theuv run --scriptshebang installs them automatically, but plainpython3invocations withoutuvmay need this flag). The alpha-mean signal still runs. - •
--eval-strict: Exit nonzero when any alpha-mask eval threshold trips. Useful when a calling agent wants to retry or fail loudly instead of silently shipping a broken alpha mask.
Automatic eval
When --transparent is active, generate.py runs two complementary evals on the output and prints metrics to stderr.
(1) Alpha-mean signal (evaluate_strip, always on — same thresholds test_generate.py asserts):
- •Status
healthy— alpha mean in 15–85% band. - •Status
subject_eaten— alpha mean below 15%; the strip ate the subject. Regenerate with a magenta border on all four sides. - •Status
nothing_stripped— alpha mean above 85%; subject fills the frame. Widen the crop.
(2) Alpha-mask quality signal (eval_alpha, opt-out via --no-eval):
- •
interior_hole_px— pixels in transparent regions that only become enclosed once the opaque mask is morphologically closed by 1 pixel. Isolates flood-fill bleed-through damage: thin 1–2-pixel channels the chroma-strip drills through the character (neck, between fingers, limb outlines) that topologically connect a real interior hole to the outside background so a naive "enclosed transparent" check reports zero. Legitimate design gaps (armpit openings, space between legs) are wider than 2 px and stay unaffected by the closing, so they don't false-alarm. - •
interior_hole_largest_px— pixels in the single biggest channel-revealed hole. More stable across images; good thresholding target because one big visible hole is what a human notices. - •
residual_magenta_px— opaque pixels still near chroma color (signals trapped magenta pockets corner-seeded flood couldn't reach). - •
edge_fringe_px— partial-alpha pixels (signals halo).
Output format:
eval [healthy] out.webp: alpha=51.3% size=74.0KB [eval] /tmp/out.webp: holes=0 (largest=0), residual=0, fringe=0 [OK] [eval] /tmp/out.webp: holes=4508 (largest=4356), residual=0, fringe=0 [WARN: interior damage likely — check alpha mask]
Thresholds for the mask-quality signal are conservative by default (holes > 500, residual > 500, fringe > 2000). Pass --no-eval to skip it. Pass --eval-strict to exit nonzero when a mask-quality threshold trips.
Why: visual inspection on a light or dark background hides interior damage (holes read as shadow/shading). The alpha mask is the ground truth. Baking both evals into the skill makes them the default, so a silently-broken output can't ship. See /hill-climbing for the "eval becomes regression guard" pattern.
Configuration
- •Auth:
GOOGLE_API_KEY— auto-loaded from~/.envbygenerate.py - •Default style: Read from
raccoon-style.txt(in this skill's directory) bygenerate.py - •Reference image: Auto-resolved by
generate.py(searches~/gits/blog*/images/raccoon-nerd.webp) - •Low-level script:
gemini-image.shhandles single API calls (used internally bygenerate.py) - •Generation wrapper:
../image-explore/generate.pyhandles env loading, style, ref image, and parallel batch execution
When --style is provided, it replaces the default raccoon style entirely (it is not appended).
Workflow
Phase 1: Read & Analyze Content
If the target is a file path:
- •Read the file
- •Identify the main themes/sections
- •Note any existing images (look for
blob_image,local_image,image_floatincludes, and raw markdown images) - •Identify sections that would benefit from illustrations — prioritize sections that have no images yet
If the target is a freeform topic:
- •Use it directly as the theme for image generation
- •Skip the content analysis and go straight to Phase 2
Phase 2: Design Illustrations
For each illustration opportunity, prepare:
- •Section: Which part of the post it enhances (or "standalone" for topic-based)
- •Filename: Following the project convention —
raccoon-{descriptor}.webpfor raccoon style,{descriptor}.webpotherwise - •Prompt: A detailed generation prompt that combines the style + the specific scene/action
- •For raccoon style, include a
Shirt text: 'SOMETHING'directive relevant to the section - •Include the aspect ratio in the prompt (e.g., "portrait orientation, 3:4 aspect ratio")
- •For raccoon style, include a
Present at most --count illustrations (default 3).
Phase 3: Confirm with User
Present the illustration plan as a table:
| # | Section | Filename | Prompt Summary |
|---|---|---|---|
| 1 | Health | raccoon-kettlebell.webp | Raccoon lifting kettlebell, shirt: "FIT FELLOW" |
| 2 | Family | raccoon-picnic.webp | Raccoon at family picnic, shirt: "FAMILY TIME" |
Ask the user to approve, modify, or remove items before generating. Use AskUserQuestion to confirm.
Phase 4: Generate Images
Use generate.py from the image-explore skill. It handles env loading (~/.env), raccoon style
(from raccoon-style.txt), reference image resolution, and parallel batch execution automatically.
- •
Resolve the script path:
bashCHOP_ROOT="$(cd "$(dirname "$(readlink -f ~/.claude/skills/gen-image/SKILL.md)")" && git rev-parse --show-toplevel)" GEN="$CHOP_ROOT/skills/image-explore/generate.py"
- •
Single image:
bashuv run "$GEN" single --scene "Raccoon lifting kettlebell in a gym" --shirt "FIT" --output raccoon-kettlebell.webp
The script's PEP 723 shebang auto-installs deps (typer + numpy/pillow/scipy for the
--transparenteval). Pass--aspect,--ref, or--styleto override defaults. Under--transparent, pass--no-evalto skip the mask-quality eval on stock python3 callers without numpy/scipy, and--eval-strictto exit nonzero when any eval threshold trips. - •
Multiple images (parallel): Write a JSON file and use batch mode:
json[ { "scene": "Raccoon lifting kettlebell in a gym", "shirt": "FIT", "output": "raccoon-kettlebell.webp" }, { "scene": "Raccoon at family picnic", "shirt": "FAMILY", "output": "raccoon-picnic.webp" } ]bashuv run "$GEN" batch illustrations.json --aspect 3:4
- •
After generation, show each image to the user by reading the file with the Read tool (which renders images inline).
- •
If generation fails, report the error and ask if the user wants to retry with a modified prompt or skip.
Auto-eval runs on every generation. When --transparent is set, generate.py runs two complementary evals right after the chroma-key pass — the alpha-mean signal (always) and the alpha-mask quality signal (interior holes, residual magenta, edge fringe; opt-out via --no-eval). Details and thresholds in the Automatic eval subsection above. See /hill-climbing for the "eval becomes regression guard" pattern.
Verifying transparent output. Don't judge chroma-key quality by compositing on a solid background — interior holes read as the background color. Extract the alpha channel as a mask: magick out.webp -alpha extract mask.png. A clean mask is a solid silhouette; swiss-cheese holes mean the chroma ate interior color data.
Never chain chroma passes on different magenta tones (e.g. a second pass on #E040E0 to catch pink shadow remnants). It eats magenta-tinted highlights inside fluffy characters. If the first pass has fringe, regenerate the source with a stricter prompt (no shadow on ground, no gradient, no environment) rather than filtering harder.
Phase 5: Insert References (Optional)
Ask the user if they want the images inserted into the post. If yes:
- •
For images stored in the blog's
assets/images/directory, use:code{% include local_image_float_right.html src="filename.webp" %} - •
For images stored in the external blob repo (
idvorkin/blob), use:code{% include blob_image_float_right.html src="blog/filename.webp" %} - •
Insert the include tag just below the relevant section header (after any front matter or introductory text)
If the target was a freeform topic (not a file), skip this phase — just tell the user where the files were saved.
Output Directory
- •If editing a blog post, save images to the same directory convention the blog uses (ask user if unsure)
- •If freeform topic, save to
~/tmp/and tell the user the paths
Error Handling
- •Missing API key:
generate.pyauto-loads from~/.env. If still missing, tell the user to setGOOGLE_API_KEY - •API error: Show the error message, suggest checking the API key or endpoint
- •No jq: The helper script (
gemini-image.sh) requiresjq - •No cwebp: Images will be saved as PNG instead of WebP — inform the user
Safety
- •Always confirm before generating (API calls cost money)
- •Never generate more than 5 images in a single run without explicit user approval
- •Show each generated image to the user for review