optimize-with-environments

Optimize With Environments

Goal

Use GEPA to optimize system prompts in a controlled, reproducible loop.

Scope

Current GEPA path is for system prompt optimization. If user asks for unsupported optimization targets, stop and clarify before proceeding.

Endpoint And Model Selection Nudge

•Encourage users to define reusable aliases in configs/endpoints.toml.
•Ask whether optimization should be validated on instruct or reasoning models.
•Instruct go-tos: gpt-4.1 series, qwen3 instruct series.
•Reasoning go-tos: gpt-5 series, qwen3 thinking series, glm series.
•For benchmark reporting, keep model family fixed between baseline and optimized comparisons unless the user requests a cross-family study.

Core Workflow

•Verify baseline first:

bash

prime eval run my-env -m gpt-4.1-mini -n 50 -r 3 -s

•Run GEPA:

bash

prime gepa run my-env -m gpt-4.1-mini -M gpt-4.1-mini -B 500 -n 100 -N 50

•Or run from config:

bash

prime gepa run configs/gepa/wordle.toml

•Re-evaluate with optimized prompt and compare against baseline.

High-Value Settings

•-B/--max-calls: total optimization budget.
•-n/--num-train and -N/--num-val: train/validation split sizes.
•--minibatch-size: reflection granularity.
•--perfect-score: skip already-solved minibatches when max score is known.
•--state-columns: include environment-specific context in reflection data.

Output Artifacts

Expect and inspect:

•best_prompt.txt
•pareto_frontier.jsonl
•metadata.json

Quality Rules

•Do not optimize on top of broken reward logic.
•For weak deterministic checks, fix rubric quality before GEPA tuning.
•Keep model, sampling, and dataset conditions stable during baseline-vs-GEPA comparison.
•Report limitations directly when feature gaps block requested optimization.

Deliverable

Return:

•Baseline metrics.
•Optimized metrics.
•Prompt diff summary.
•Recommendation to adopt, iterate, or stop.