Writing Quality Tests
Overview
High-signal tests prove behavior, not implementation. Favor stable interfaces, explicit oracles, and fast feedback. Default to the lowest level that proves the behavior; climb the pyramid only when integration proof is required.
Core rule: If a test is nondeterministic or tied to internals, it is debt. Fix it.
When to Use
- •Designing or refactoring tests for new features, bug fixes, or regressions
- •Hardening flaky tests or slow suites
- •Reviewing test submissions for clarity, coverage, and maintainability
- •Choosing between unit, contract, integration, or end-to-end coverage for a change
- •Not for manual exploratory testing or load/perf-only work; use this for automated behavioral/regression checks
Non-Negotiables
- •Deterministic: same input -> same result; no hidden time/network randomness
- •Behavioral oracles: assertions map to business behavior or contracts, never incidental internals
- •Minimal coupling: tests fail for product changes, not helper refactors
- •Focused scope: one behavior per test; isolated fixtures; clear names
- •Fast feedback: prefer fast layers; cache expensive setup; parallelize safely
Workflow
- •Prove it fails: capture the regression input or wished-for case and watch the test fail first (or reproduce the bug) before code changes.
- •Clarify behavior: preconditions, action, postconditions, invariants. Capture regression input if fixing a bug.
- •Pick level: unit for pure logic; contract for external calls; integration for seams; E2E only to prove flows or contracts end-to-end.
- •Design oracle: assert outputs, state, events, and invariants; avoid implementation details or transient UI.
- •Shape fixtures: use builders/factories; avoid globals; randomize with seeds only when helpful and log the seed.
- •Write the test: AAA (arrange-act-assert) or GWT; table-driven for variants; property-based for algebraic invariants.
- •Validate: run focused test first, then suite. If flaky, hunt nondeterminism (time, randomness, order, network) and remove it.
- •Document intent: name states behavior; failure message points to the expected contract.
Patterns to Prefer
- •Boundary and mutation pairs: min/max/empty/null plus one mutated variation to prove invariants.
- •Table-driven cases: enumerate input/output pairs to avoid duplicate tests and improve diffability.
- •Property-based checks: algebraic properties (idempotence, reversibility, ordering), round-trips, monotonic counters.
- •Contracts at seams: mock at boundaries you own; for third-party calls, pin to contract tests or recorded fixtures.
- •Guarded goldens: only for complex structured output; require explicit review of golden updates.
Coverage Strategy
- •Coverage is opt-in: never run coverage unless explicitly requested by the user in the current session (e.g., "improve coverage on file X to Y%"). PM/teammate/CI pressure does not override this rule.
- •Pyramid discipline: many unit tests, fewer integration, very few E2E. Use E2E to prove cross-service flow or UI contract.
- •Change-based coverage: every test should fail without the code change and pass with it; capture the regression input/output.
- •Critical paths first: auth, billing, migrations, data loss, irreversible actions. Add invariants that must never be violated.
- •Data and time: cover time zones, DST, leap years, ordering, pagination, idempotency, and retry semantics.
- •Observability: log seeds for randomized tests; emit diagnostics on failure (inputs, seed, environment versions).
Example (explicit coverage request): User: "improve coverage on file X to 80%". Run targeted coverage for that file only, add behavior-driven tests to hit missing branches, and avoid coverage runs outside that request.
bash
pytest --cov=path/to/file.py --cov-report=term-missing
Flake Prevention
- •Remove time races: replace sleeps with waits on explicit conditions; freeze or inject clocks.
- •Isolate state: fresh fixtures per test; unique temp dirs/ports; clean databases; no shared singletons.
- •Control randomness: seed RNG, capture seed in failure output, prefer deterministic builders.
- •Network and IO: stub external calls; if unavoidable, record/replay; set tight timeouts and retries with jitter disabled in tests.
- •Parallel safety: ensure fixtures are parallel-safe or mark tests serial; avoid global mutable state.
Review Checklist
- •Name states behavior and level (e.g., "adds item to cart (integration)").
- •Single reason to fail; assertions map to user-visible behavior or contract.
- •Fixtures minimal and local; builders hide irrelevant details; no shared hidden state.
- •Negative and edge cases present; regression case for the original bug captured.
- •Tests run quickly; slow/expensive flows justified and focused.
Hygiene (adaptable patterns)
- •Structure: Given–When–Then or AAA so intent is obvious.
- •Hypothesis: fix generators or code instead of suppressing health checks; log seeds for repro.
- •Async correctness: use real async paths/fakes; don’t hide missing awaits with sync doubles.
- •Assertion scope: assert behavior/contract fields; avoid brittle full-payload snapshots unless testing a contract.
- •Coverage as health, not blocker: focus on low-coverage behavior-heavy files; be pragmatic with legacy or infra-heavy areas.
Marks (for selective runs)
- •unit: isolated logic with external deps mocked
- •contract/integration: cross-component seams with real wiring or adapters
- •async: true async paths; avoid sync fakes masking awaits
- •property: Hypothesis-based invariants in dedicated property files
- •slow: >1s or real infra; justify and keep focused
Common Anti-Patterns
- •Brittle UI or text snapshots without intent; prefer semantic assertions or scoped snapshots.
- •Over-mocking internals; mocking within the module under test; asserting call order that is not part of the contract.
- •Sleep-based waits; reliance on wall-clock time; unseeded randomness.
- •Combined scenarios covering multiple behaviors in one test; global fixtures that hide setup.
- •Golden files updated blindly; tests that assert logging implementation rather than outcomes.
- •Running coverage by default instead of waiting for explicit coverage requests.
Red Flags - Stop and Fix
- •Tests pass or fail intermittently
- •Assertions tied to private methods or call order instead of observable behavior
- •Unseeded randomness, sleeps instead of explicit waits, or shared mutable fixtures
- •Golden updates accepted without review of intent
- •A test never failed before the code change
- •Running coverage without the user explicitly asking
- •Running coverage due to PM/teammate/CI pressure