Characterization Tests (Golden Master / Feathers)

Create characterization tests for an existing subsystem whose behavior is unclear, messy, or AI-generated. Pin down current observable behavior so that refactoring and incremental replacement are safe.

Input

Subsystem / entrypoint / behavior surface to lock down: $ARGUMENTS

Preconditions:

•At least one runnable entry point (function, CLI command, server route, etc.)
•A concept capsule is NOT required, but use its vocabulary if one exists

Procedure

1. Identify the behavior surface

List 1-3 surfaces to lock down: public functions, CLI commands, HTTP endpoints, file-in→file-out, serialized outputs. Prefer the most stable public surface available.

2. Minimal fixture set

•One canonical happy path
•1-3 meaningful edge cases
•One "weird but real" case you suspect is fragile

Keep it small — a suite that's too big becomes unmaintainable.

3. Stabilize nondeterminism

Before capturing outputs, neutralize noise sources:

•Timestamps: freeze time
•Randomness: seed or stub
•Ordering: sort keys, canonicalize arrays where order is not meaningful
•OS-dependent paths/newlines: normalize
•Network calls: stub/record once — do not hit live services
•Concurrency: force single-threaded if needed for determinism

If behavior is inherently nondeterministic, define a tolerant comparator (ignore specific fields, assert shape/membership rather than equality).

4. Capture golden outputs

For each fixture: run the surface, record observable output (return value / stdout+stderr / exit code / HTTP status+body), store as snapshot/golden file.

•Stable, readable format (text, JSON)
•Keep goldens small
•Make it obvious how to intentionally update them (a "regenerate" command)

5. Write characterization tests

Assert: given fixture inputs → output matches golden (or tolerant comparator).

•Assert observable behavior only, not internal structure
•Do NOT refactor production code while writing these tests
•Few strong assertions over many weak ones

6. Coverage report

•Surfaces covered
•Fixture list (what cases are locked down)
•Known nondeterminism and how it was stabilized
•Behavior gaps (what remains unknown/unlocked)
•Recommended next step: usually /pragma:refactor or strangler seams

7. Commit (tests only)

Characterization: lock down current behavior of <subsystem>

Constraints

•No refactor during characterization. Don't touch production code — you're measuring the thing you're locking down.
•Keep fixtures minimal. Tempted to add many cases → you need a capsule + real spec work.
•Tests must be deterministic (or explicitly tolerant in controlled ways).
•Golden updates must be intentional. Accidental snapshot churn destroys trust.

Output

•The coverage report
•The test/fixture locations
•The exact command(s) to run the characterization suite
•The recommended next step (/pragma:refactor or strangler approach)
•Lifecycle: State: stabilizing, Next: /pragma:refactor, Loop: /pragma:consult (default unless user explicitly continues directly)