frontend-design-improvements-loop

为本仓库运行可重复的多版本前端设计基准测试循环。适用于创建新的实验版本、调整前端设计 SKILL.md 指导说明、在 `experiments/version-X/` 内部运行 Codex 并使用共享的 `prompt.md`、截取全页面 `/1.. /5` 屏幕截图，以及参照 Opus-with-skill 参考集对输出结果进行评分。

SKILL.md

--- frontmatter

name: frontend-design-improvements-loop
description: Run repeatable multi-version frontend-design benchmark loops for this repo. Use this when creating new experiment versions, tuning frontend-design SKILL.md instructions, running Codex from inside `experiments/version-X/` with the shared `prompt.md`, capturing full-page `/1.. /5` screenshots, and scoring outputs against Opus-with-skill reference sets.

Frontend Design Improvements Loop

Use this skill to run end-to-end benchmark iterations that improve frontend-design behavior through instruction tuning, not model weight tuning.

Scope

•Repository: improved-frontend-skills-for-gpt
•Canonical prompt: repo root prompt.md (same prompt for every version)
•Active version workspace: experiments/version-X/
•
Primary references:
- •research/targetted-designs/
- •research/theo-screenshots-2k-clean/opus45_with_skill/
- •research/theo-screenshots-2k-clean/opus_iterations/

Read references/opus_targets.md, references/mutation_axes.md, and references/experiment_topologies.md before drafting a new version.

Non-Negotiables

•Never modify prompt.md unless explicitly asked.
•Always run Codex in the target version folder (-C experiments/version-X/...).
•Keep one mutation hypothesis per version.
•Keep each version self-contained (SKILL.md, t4-canvas/, README.md, CRITQUES.md, screenshots/).
•Do not delete previous versions.
•New versions must be isolated by default: do not inherit prior t4-canvas implementation unless explicitly requested.

Workflow

1) Choose baseline and one mutation hypothesis

•Baseline is the latest completed experiments/version-*.
•Pick one mutation axis only (see references/mutation_axes.md).
•Write the hypothesis in the new version README before implementation.

2) Create the next version

•Use scripts/new_version_from_previous.sh <previous-version-dir> <new-version-dir>.
•Default behavior is isolated: copy prior SKILL.md, create fresh empty t4-canvas/.
•Legacy mode only when explicitly requested: --copy-app.

2b) Horizontal idea branches (parallel)

•Use scripts/new_parallel_from_previous.sh <previous-version-dir> <new-version-dir>....
•This creates multiple sibling versions from one baseline for parallel idea exploration.
•Keep mutation axis distinct across siblings.

3) Tune `frontend-design` skill instructions

•Edit only experiments/version-X/.agents/skills/frontend-design/SKILL.md.
•Keep constraints auditable and measurable.
•Avoid vague language; use explicit guards and pass/fail criteria.

4) Run Codex headlessly with canonical prompt

•Use scripts/run_headless_iteration.sh <version-dir> <repo-root>/prompt.md [run-label].
•Optional reliability args: [max-attempts] [retry-delay-sec].
•This forces codex exec to run with cwd inside version folder and logs artifacts.
•Prefer short, restartable runs over one giant run.

5) Capture screenshots for `/1.. /5`

•Start the version app in t4-canvas/.
•Use Playwriter and capture full-page screenshots for routes /1 to /5.
•Save under experiments/version-X/screenshots/.

6) Critique and score

•Score against rubric in references/scoring_rubric.md.
•Compare against Opus reference sets (not generic web quality).
•Write critique notes to experiments/version-X/CRITQUES.md after every run.
•
Record:
- •wins
- •regressions
- •next mutation

7) Decide next action

•If net gain: keep mutation and continue.
•If mixed: keep only if rubric delta is positive on target dimensions.
•If regression: revert mutation in next version and try a different axis.

Reliability Pattern For Long Runs

•Design for 3-7 minute chunks per run.
•Do not require tmux by default; run direct codex exec with artifact checkpoints.
•
Persist outputs every run:
- •artifacts/<run-label>/events.jsonl
- •artifacts/<run-label>/stderr.log
- •artifacts/<run-label>/final.md
•If interrupted, continue with codex exec resume --last and continue from latest checkpoint.

Deliverable Contract Per Version

Each version must include:

•experiments/version-X/.agents/skills/frontend-design/SKILL.md (mutation applied)
•
experiments/version-X/README.md with:
- •hypothesis
- •exact mutation
- •rubric score delta
- •next step
•experiments/version-X/CRITQUES.md with expected-vs-output critique notes
•experiments/version-X/screenshots/version-X-route-1.png ... version-X-route-5.png

Scripts

•scripts/new_version_from_previous.sh
•scripts/new_parallel_from_previous.sh
•scripts/run_headless_iteration.sh