Spec-First Development

A structured workflow for LLM-assisted coding that delays implementation until decisions are explicit.

When This Activates

•"Build X" or "Create Y" (new features/projects)
•"Implement..." (non-trivial functionality)
•"Add a feature that..." (multi-step work)
•Any request requiring 3+ files or unclear requirements

When to Skip

•Single-file changes under 50 lines
•Typo fixes, log additions, config tweaks
•User explicitly says "just do it" or "quick fix"

Core Principles

•
Delay implementation until tradeoffs are explicit — Use conversation to clarify constraints, compare options, surface risks. Only then write code.
•
Treat the model like a junior engineer with infinite typing speed — Provide structure: clear interfaces, small tasks, explicit acceptance criteria. Code is cheap; understanding and correctness are scarce.
•
Specs beat prompts — For anything non-trivial, create a durable artifact (spec file) that can be re-fed, diffed, and reused across sessions.
•
Generated code is disposable; tests are not — Assume rewrites. Design for easy replacement: small modules, minimal coupling, clean seams, strong tests.
•
The model is over-confident; reality is the judge — Everything important gets verified by execution: tests, linters, typecheckers, reproducible builds.

The 6-Stage Workflow

Stage A: Frame the Problem (conversation mode)

Goal: Decide before you implement.

Prompts that work:

•"List 3 viable approaches. Compare on: complexity, failure modes, testability, future change, time to first demo."
•"What assumptions are you making? Which ones are risky?"
•"Propose a minimal version that can be deleted later without regret."

Output: Decision notes for decisions.md

Stage B: Write spec.md (freeze decisions)

Goal: Turn decisions into unambiguous requirements.

markdown

# spec.md

## Purpose
One paragraph: what this is for.

## Non-Goals
Explicitly state what you are NOT building.

## Interfaces
Inputs/outputs, data types, file formats, API endpoints, CLI commands.

## Key Decisions
Libraries, architecture, persistence choices, constraints.

## Edge Cases and Failure Modes
Timeouts, retries, partial failures, invalid input, concurrency, idempotency.

## Acceptance Criteria
Bullet list of testable statements. Avoid "should be fast."
Prefer: "processes 1k items under 2s on M1 Mac."

## Test Plan
Unit/integration boundaries, fixtures, golden files, what must be mocked.

Stage C: Generate todo.md (planning mode)

Goal: Stepwise checklist where each step has a verification command.

markdown

# todo.md

- [ ] Add project scaffolding (build/run/test commands)
  Verify: `npm run build && npm test`

- [ ] Implement module X with interface Y
  Verify: `npm test -- --grep "module X"`

- [ ] Add tests for edge cases A/B/C
  Verify: `npm test -- --grep "edge cases"`

- [ ] Wire integration
  Verify: `npm run integration`

- [ ] Add docs
  Verify: `npm run docs && open docs/index.html`

Each item must be independently checkable. This prevents "looks right" progress.

Stage D: Execute Changes (implementation mode)

Goal: Small diffs, frequent verification, controlled context.

Rules:

•One logical change per step
•Keep focus on one interface at a time
•After each change: run verification command, paste actual output back
•Commit early and often

For large codebases:

•Provide only relevant files plus spec/todo
•If summarizing repo, do it once and keep as reusable artifact

Stage E: Verify and Review (adversarial mode)

Goal: Force the model to try to break its own work.

Prompts:

•"Act as a hostile reviewer. Find correctness bugs, not style nits. List concrete failing scenarios."
•"Given these acceptance criteria, which are not actually satisfied? Be specific."
•"Propose 5 tests that would fail if the implementation is wrong."

Stage F: Decide What Lasts

Goal: Keep the system easy to delete and rewrite.

Heuristics:

•Keep "policy" (business rules) separate from "mechanism" (I/O, DB, HTTP)
•Prefer shallow abstractions that can be removed without cascade
•Invest in tests and fixtures more than clever architecture

The Three-File Convention

Keep at the root of every project:

code

spec.md      # what/why/constraints
todo.md      # steps + verification commands
decisions.md # tradeoffs, rejected options, assumptions

Agent Readiness Checklist (IMPACT)

Before running autonomous/agentic execution, verify:

Dimension	Question	If No...
Intent	Do you have acceptance criteria and a test harness?	Don't run agent
Memory	Do you have durable artifacts (spec/todo) so it can resume?	It will thrash
Planning	Can it produce/update a plan with checkpoints?	It will improvise badly
Authority	Is what it can do restricted (edit, test, commit)?	Too risky
Control Flow	Does it decide next step based on tool output?	It's just generating blobs
Tools	Does it have minimum necessary tooling and nothing extra?	Attack surface too large

Approve at meaningful checkpoints (end of todo item, after test suite passes), not every micro-step.

Prompt Patterns

Authoritarian (for correctness):

code

Edit these files: [paths]
Interface: [exact signatures]
Acceptance criteria: [list]
Required tests: [list]
Don't change anything else.

Options and tradeoffs (for design):

code

Give me 3 options and a recommendation.
Make the recommendation conditional on constraints A/B/C.

Context discipline (for large codebases):

code

Only use the files I provided.
If you need more context, ask for a specific file and explain why.

Make it provable:

code

Add a test that fails on the buggy version and passes on the correct one.

Output Format

When this skill activates, produce:

code

SPEC-FIRST WORKFLOW

STAGE A - FRAMING:
[3 approaches with tradeoffs]
[Recommendation]

STAGE B - SPEC:
[Draft spec.md content]

STAGE C - TODO:
[Draft todo.md with verification commands]

Ready to proceed to Stage D (execution)?