Optimize With Environments
Goal
Use GEPA to optimize system prompts in a controlled, reproducible loop.
Scope
Current GEPA path is for system prompt optimization. If user asks for unsupported optimization targets, stop and clarify before proceeding.
Endpoint And Model Selection Nudge
- •Encourage users to define reusable aliases in
configs/endpoints.toml. - •Ask whether optimization should be validated on instruct or reasoning models.
- •Instruct go-tos:
gpt-4.1series,qwen3instruct series. - •Reasoning go-tos:
gpt-5series,qwen3thinking series,glmseries. - •For benchmark reporting, keep model family fixed between baseline and optimized comparisons unless the user requests a cross-family study.
Core Workflow
- •Verify baseline first:
bash
prime eval run my-env -m gpt-4.1-mini -n 50 -r 3 -s
- •Run GEPA:
bash
prime gepa run my-env -m gpt-4.1-mini -M gpt-4.1-mini -B 500 -n 100 -N 50
- •Or run from config:
bash
prime gepa run configs/gepa/wordle.toml
- •Re-evaluate with optimized prompt and compare against baseline.
High-Value Settings
- •
-B/--max-calls: total optimization budget. - •
-n/--num-trainand-N/--num-val: train/validation split sizes. - •
--minibatch-size: reflection granularity. - •
--perfect-score: skip already-solved minibatches when max score is known. - •
--state-columns: include environment-specific context in reflection data.
Output Artifacts
Expect and inspect:
- •
best_prompt.txt - •
pareto_frontier.jsonl - •
metadata.json
Quality Rules
- •Do not optimize on top of broken reward logic.
- •For weak deterministic checks, fix rubric quality before GEPA tuning.
- •Keep model, sampling, and dataset conditions stable during baseline-vs-GEPA comparison.
- •Report limitations directly when feature gaps block requested optimization.
Deliverable
Return:
- •Baseline metrics.
- •Optimized metrics.
- •Prompt diff summary.
- •Recommendation to adopt, iterate, or stop.