Frontend Design Improvements Loop
Use this skill to run end-to-end benchmark iterations that improve frontend-design behavior through instruction tuning, not model weight tuning.
Scope
- •Repository:
improved-frontend-skills-for-gpt - •Canonical prompt: repo root
prompt.md(same prompt for every version) - •Active version workspace:
experiments/version-X/ - •Primary references:
- •
research/targetted-designs/ - •
research/theo-screenshots-2k-clean/opus45_with_skill/ - •
research/theo-screenshots-2k-clean/opus_iterations/
- •
Read references/opus_targets.md, references/mutation_axes.md, and references/experiment_topologies.md before drafting a new version.
Non-Negotiables
- •Never modify
prompt.mdunless explicitly asked. - •Always run Codex in the target version folder (
-C experiments/version-X/...). - •Keep one mutation hypothesis per version.
- •Keep each version self-contained (
SKILL.md,t4-canvas/,README.md,CRITQUES.md,screenshots/). - •Do not delete previous versions.
- •New versions must be isolated by default: do not inherit prior
t4-canvasimplementation unless explicitly requested.
Workflow
1) Choose baseline and one mutation hypothesis
- •Baseline is the latest completed
experiments/version-*. - •Pick one mutation axis only (see
references/mutation_axes.md). - •Write the hypothesis in the new version README before implementation.
2) Create the next version
- •Use
scripts/new_version_from_previous.sh <previous-version-dir> <new-version-dir>. - •Default behavior is isolated: copy prior
SKILL.md, create fresh emptyt4-canvas/. - •Legacy mode only when explicitly requested:
--copy-app.
2b) Horizontal idea branches (parallel)
- •Use
scripts/new_parallel_from_previous.sh <previous-version-dir> <new-version-dir>.... - •This creates multiple sibling versions from one baseline for parallel idea exploration.
- •Keep mutation axis distinct across siblings.
3) Tune frontend-design skill instructions
- •Edit only
experiments/version-X/.agents/skills/frontend-design/SKILL.md. - •Keep constraints auditable and measurable.
- •Avoid vague language; use explicit guards and pass/fail criteria.
4) Run Codex headlessly with canonical prompt
- •Use
scripts/run_headless_iteration.sh <version-dir> <repo-root>/prompt.md [run-label]. - •Optional reliability args:
[max-attempts] [retry-delay-sec]. - •This forces
codex execto run with cwd inside version folder and logs artifacts. - •Prefer short, restartable runs over one giant run.
5) Capture screenshots for /1.. /5
- •Start the version app in
t4-canvas/. - •Use Playwriter and capture full-page screenshots for routes
/1to/5. - •Save under
experiments/version-X/screenshots/.
6) Critique and score
- •Score against rubric in
references/scoring_rubric.md. - •Compare against Opus reference sets (not generic web quality).
- •Write critique notes to
experiments/version-X/CRITQUES.mdafter every run. - •Record:
- •wins
- •regressions
- •next mutation
7) Decide next action
- •If net gain: keep mutation and continue.
- •If mixed: keep only if rubric delta is positive on target dimensions.
- •If regression: revert mutation in next version and try a different axis.
Reliability Pattern For Long Runs
- •Design for 3-7 minute chunks per run.
- •Do not require tmux by default; run direct
codex execwith artifact checkpoints. - •Persist outputs every run:
- •
artifacts/<run-label>/events.jsonl - •
artifacts/<run-label>/stderr.log - •
artifacts/<run-label>/final.md
- •
- •If interrupted, continue with
codex exec resume --lastand continue from latest checkpoint.
Deliverable Contract Per Version
Each version must include:
- •
experiments/version-X/.agents/skills/frontend-design/SKILL.md(mutation applied) - •
experiments/version-X/README.mdwith:- •hypothesis
- •exact mutation
- •rubric score delta
- •next step
- •
experiments/version-X/CRITQUES.mdwith expected-vs-output critique notes - •
experiments/version-X/screenshots/version-X-route-1.png...version-X-route-5.png
Scripts
- •
scripts/new_version_from_previous.sh - •
scripts/new_parallel_from_previous.sh - •
scripts/run_headless_iteration.sh