AgentSkillsCN

frontend-design-improvements-loop

为本仓库运行可重复的多版本前端设计基准测试循环。适用于创建新的实验版本、调整前端设计 SKILL.md 指导说明、在 `experiments/version-X/` 内部运行 Codex 并使用共享的 `prompt.md`、截取全页面 `/1.. /5` 屏幕截图,以及参照 Opus-with-skill 参考集对输出结果进行评分。

SKILL.md
--- frontmatter
name: frontend-design-improvements-loop
description: Run repeatable multi-version frontend-design benchmark loops for this repo. Use this when creating new experiment versions, tuning frontend-design SKILL.md instructions, running Codex from inside `experiments/version-X/` with the shared `prompt.md`, capturing full-page `/1.. /5` screenshots, and scoring outputs against Opus-with-skill reference sets.

Frontend Design Improvements Loop

Use this skill to run end-to-end benchmark iterations that improve frontend-design behavior through instruction tuning, not model weight tuning.

Scope

  • Repository: improved-frontend-skills-for-gpt
  • Canonical prompt: repo root prompt.md (same prompt for every version)
  • Active version workspace: experiments/version-X/
  • Primary references:
    • research/targetted-designs/
    • research/theo-screenshots-2k-clean/opus45_with_skill/
    • research/theo-screenshots-2k-clean/opus_iterations/

Read references/opus_targets.md, references/mutation_axes.md, and references/experiment_topologies.md before drafting a new version.

Non-Negotiables

  1. Never modify prompt.md unless explicitly asked.
  2. Always run Codex in the target version folder (-C experiments/version-X/...).
  3. Keep one mutation hypothesis per version.
  4. Keep each version self-contained (SKILL.md, t4-canvas/, README.md, CRITQUES.md, screenshots/).
  5. Do not delete previous versions.
  6. New versions must be isolated by default: do not inherit prior t4-canvas implementation unless explicitly requested.

Workflow

1) Choose baseline and one mutation hypothesis

  • Baseline is the latest completed experiments/version-*.
  • Pick one mutation axis only (see references/mutation_axes.md).
  • Write the hypothesis in the new version README before implementation.

2) Create the next version

  • Use scripts/new_version_from_previous.sh <previous-version-dir> <new-version-dir>.
  • Default behavior is isolated: copy prior SKILL.md, create fresh empty t4-canvas/.
  • Legacy mode only when explicitly requested: --copy-app.

2b) Horizontal idea branches (parallel)

  • Use scripts/new_parallel_from_previous.sh <previous-version-dir> <new-version-dir>....
  • This creates multiple sibling versions from one baseline for parallel idea exploration.
  • Keep mutation axis distinct across siblings.

3) Tune frontend-design skill instructions

  • Edit only experiments/version-X/.agents/skills/frontend-design/SKILL.md.
  • Keep constraints auditable and measurable.
  • Avoid vague language; use explicit guards and pass/fail criteria.

4) Run Codex headlessly with canonical prompt

  • Use scripts/run_headless_iteration.sh <version-dir> <repo-root>/prompt.md [run-label].
  • Optional reliability args: [max-attempts] [retry-delay-sec].
  • This forces codex exec to run with cwd inside version folder and logs artifacts.
  • Prefer short, restartable runs over one giant run.

5) Capture screenshots for /1.. /5

  • Start the version app in t4-canvas/.
  • Use Playwriter and capture full-page screenshots for routes /1 to /5.
  • Save under experiments/version-X/screenshots/.

6) Critique and score

  • Score against rubric in references/scoring_rubric.md.
  • Compare against Opus reference sets (not generic web quality).
  • Write critique notes to experiments/version-X/CRITQUES.md after every run.
  • Record:
    • wins
    • regressions
    • next mutation

7) Decide next action

  • If net gain: keep mutation and continue.
  • If mixed: keep only if rubric delta is positive on target dimensions.
  • If regression: revert mutation in next version and try a different axis.

Reliability Pattern For Long Runs

  • Design for 3-7 minute chunks per run.
  • Do not require tmux by default; run direct codex exec with artifact checkpoints.
  • Persist outputs every run:
    • artifacts/<run-label>/events.jsonl
    • artifacts/<run-label>/stderr.log
    • artifacts/<run-label>/final.md
  • If interrupted, continue with codex exec resume --last and continue from latest checkpoint.

Deliverable Contract Per Version

Each version must include:

  • experiments/version-X/.agents/skills/frontend-design/SKILL.md (mutation applied)
  • experiments/version-X/README.md with:
    • hypothesis
    • exact mutation
    • rubric score delta
    • next step
  • experiments/version-X/CRITQUES.md with expected-vs-output critique notes
  • experiments/version-X/screenshots/version-X-route-1.png ... version-X-route-5.png

Scripts

  • scripts/new_version_from_previous.sh
  • scripts/new_parallel_from_previous.sh
  • scripts/run_headless_iteration.sh