Characterization Tests (Golden Master / Feathers)
Create characterization tests for an existing subsystem whose behavior is unclear, messy, or AI-generated. Pin down current observable behavior so that refactoring and incremental replacement are safe.
Input
Subsystem / entrypoint / behavior surface to lock down: $ARGUMENTS
Preconditions:
- •At least one runnable entry point (function, CLI command, server route, etc.)
- •A concept capsule is NOT required, but use its vocabulary if one exists
Procedure
1. Identify the behavior surface
List 1-3 surfaces to lock down: public functions, CLI commands, HTTP endpoints, file-in→file-out, serialized outputs. Prefer the most stable public surface available.
2. Minimal fixture set
- •One canonical happy path
- •1-3 meaningful edge cases
- •One "weird but real" case you suspect is fragile
Keep it small — a suite that's too big becomes unmaintainable.
3. Stabilize nondeterminism
Before capturing outputs, neutralize noise sources:
- •Timestamps: freeze time
- •Randomness: seed or stub
- •Ordering: sort keys, canonicalize arrays where order is not meaningful
- •OS-dependent paths/newlines: normalize
- •Network calls: stub/record once — do not hit live services
- •Concurrency: force single-threaded if needed for determinism
If behavior is inherently nondeterministic, define a tolerant comparator (ignore specific fields, assert shape/membership rather than equality).
4. Capture golden outputs
For each fixture: run the surface, record observable output (return value / stdout+stderr / exit code / HTTP status+body), store as snapshot/golden file.
- •Stable, readable format (text, JSON)
- •Keep goldens small
- •Make it obvious how to intentionally update them (a "regenerate" command)
5. Write characterization tests
Assert: given fixture inputs → output matches golden (or tolerant comparator).
- •Assert observable behavior only, not internal structure
- •Do NOT refactor production code while writing these tests
- •Few strong assertions over many weak ones
6. Coverage report
- •Surfaces covered
- •Fixture list (what cases are locked down)
- •Known nondeterminism and how it was stabilized
- •Behavior gaps (what remains unknown/unlocked)
- •Recommended next step: usually
/pragma:refactoror strangler seams
7. Commit (tests only)
Characterization: lock down current behavior of <subsystem>
Constraints
- •No refactor during characterization. Don't touch production code — you're measuring the thing you're locking down.
- •Keep fixtures minimal. Tempted to add many cases → you need a capsule + real spec work.
- •Tests must be deterministic (or explicitly tolerant in controlled ways).
- •Golden updates must be intentional. Accidental snapshot churn destroys trust.
Output
- •The coverage report
- •The test/fixture locations
- •The exact command(s) to run the characterization suite
- •The recommended next step (
/pragma:refactoror strangler approach) - •Lifecycle:
State: stabilizing,Next: /pragma:refactor,Loop: /pragma:consult(default unless user explicitly continues directly)