Workflow:
- •Reproduce
- •Run the repo standard test command(s)
- •Re-run with verbosity and isolate to failing project/suite
- •If flaky: run failing tests multiple times to confirm nondeterminism
- •Classify failure
- •Logic/assertion mismatch (product bug vs test bug)
- •Race condition / timing / async ordering
- •Environment/config differences (timezone, culture, file paths, OS)
- •Data dependence / test order dependence
- •Snapshot/golden file drift
- •External dependency (network, DB, file system) not isolated
- •Minimize
- •Reduce to smallest repro:
- •single test
- •single project
- •single seed dataset
- •Identify the first bad commit if possible (git bisect guidance if needed)
- •Fix
- •Prefer product fix over changing expected outputs
- •Make tests deterministic:
- •freeze time (clock abstraction)
- •remove random seeds or seed them deterministically
- •isolate shared mutable state
- •ensure proper awaits and synchronization
- •Add regression coverage (or strengthen existing coverage)
- •Prevent recurrence
- •Add guardrails (timeouts, retries only for known flaky external issues)
- •Document root cause in a short comment or test name if helpful
Finish with:
- •Root cause (1–3 bullets)
- •Fix summary (what changed)
- •Commands run + results
- •Remaining flakes/todos (if any)