AgentSkillsCN

tdd

采用严格的红-绿-重构工作流程,打造健壮且自我文档化的代码。在假定使用特定框架之前,先通过探索代码库来发现项目的测试配置。适用于:(1) 采用测试先行的方式实现新功能;(2) 通过重现测试修复漏洞;(3) 在测试安全网的保护下重构现有代码;(4) 为遗留代码添加测试;(5) 在提交前确保代码质量;(6) 当测试已存在但工作流程尚不明确时;(7) 在新项目中建立测试实践时。触发条件:test、tdd、red-green-refactor、failing test、test first、test-driven、write tests、add tests、run tests。

SKILL.md
--- frontmatter
name: tdd
description: "Strict Red-Green-Refactor workflow for robust, self-documenting code. Discovers project test setup via codebase exploration before assuming frameworks. Use when: (1) Implementing new features with test-first approach, (2) Fixing bugs with reproduction tests, (3) Refactoring existing code with test safety net, (4) Adding tests to legacy code, (5) Ensuring code quality before committing, (6) When tests exist but workflow unclear, or (7) When establishing testing practices in a new project. Triggers: test, tdd, red-green-refactor, failing test, test first, test-driven, write tests, add tests, run tests."
metadata:
  author: ai-dev-atelier
  version: "1.0"

Test-Driven Development (TDD)

Strict Red-Green-Refactor workflow for robust, self-documenting, production-ready code.

Quick Navigation

SituationGo To
New to this codebaseStep 1: Explore Environment
Know the framework, starting workStep 2: Select Mode
Need the core loop referenceStep 3: Core TDD Loop
Complex edge cases to coverProperty-Based Testing
Tests are flaky/unreliableFlaky Test Management
Need isolated test environmentHermetic Testing
Measuring test qualityMutation Testing

The Three Rules (Robert C. Martin)

  1. No Production Code without a failing test
  2. Write Only Enough Test to Fail (compilation errors count)
  3. Write Only Enough Code to Pass (no optimizations yet)

The Loop: 🔴 RED (write failing test) → 🟢 GREEN (minimal code to pass) → 🔵 REFACTOR (clean up) → Repeat


Step 1: Explore Test Environment

Do NOT assume anything. Explore the codebase first.

Checklist:

  • Search for test files: glob("**/*.test.*"), glob("**/*.spec.*"), glob("**/test_*.py")
  • Check package.json scripts, Makefile, or CI workflows
  • Look for config: vitest.config.*, jest.config.*, pytest.ini, Cargo.toml

Framework Detection:

LanguageConfig FilesTest Command
Node.jspackage.json, vitest.config.*npm test, bun test
Pythonpyproject.toml, pytest.inipytest
Gogo.mod, *_test.gogo test ./...
RustCargo.tomlcargo test

Step 2: Select Mode

ModeWhenFirst Action
New FeatureAdding functionalityRead existing module tests, confirm green baseline
Bug FixReproducing issueWrite failing reproduction test FIRST
RefactorCleaning codeEnsure ≥80% coverage on target code
LegacyNo tests existAdd characterization tests before changing

Tie-breaker: If coverage <20% or tests absent → use Legacy Mode first.

Mode: New Feature

  1. Read existing tests for the module
  2. Run tests to confirm green baseline
  3. Enter Core Loop for new behavior
  4. Commits: test(module): add test for Xfeat(module): implement X

Mode: Bug Fix

  1. Write failing reproduction test (MUST fail before fix)
  2. Confirm failure is assertion error, not syntax error
  3. Write minimal fix
  4. Run full test suite
  5. Commits: test: add failing test for bug #123fix: description (#123)

Mode: Refactor

  1. Run coverage on the specific function you'll refactor
  2. If coverage <80% → add characterization tests first
  3. Refactor in small steps (ONE change → run tests → repeat)
  4. Never change behavior during refactor

Mode: Legacy Code

  1. Find Seams - insertion points for tests (Sensing Seams, Separation Seams)
  2. Break Dependencies - use Sprout Method or Wrap Method
  3. Add characterization tests (capture current behavior)
  4. Build safety net: happy path + error cases + boundaries
  5. Then apply TDD for your changes

→ See references/examples.md for full code examples of each mode.


Step 3: The Core TDD Loop

Before Starting: Scenario List

List all behaviors to cover:

  • Happy path cases
  • Edge cases and boundaries
  • Error/failure cases
  • Pessimism: 3 ways this could fail (network, null, invalid state)

🔴 RED Phase

  1. Write ONE test (single behavior or edge case)
  2. Use AAA: Arrange → Act → Assert
  3. Run test, verify it FAILS for expected reason

Checks:

  • Is failure an assertion error? (Not SyntaxError/ModuleNotFoundError)
  • Can I explain why this should fail?
  • If test passes immediately → STOP. Test is broken or feature exists.

🟢 GREEN Phase

  1. Write minimal code to pass
  2. Do NOT implement "perfect" solution
  3. Verify test passes

Checks:

  • Is this the simplest solution?
  • Can I delete any of this code and still pass?

🔵 REFACTOR Phase

  1. Look for duplication, unclear names, magic values
  2. Clean up without changing behavior
  3. Verify tests still pass

Repeat

Select next scenario, return to RED.

Triangulation: If implementation is too specific (hardcoded), write another test with different inputs to force generalization.


Stop Conditions

SignalResponse
Test passes immediatelyCheck assertions, verify feature isn't already built
Test fails for wrong reasonFix setup/imports first
Flaky testSTOP. Fix non-determinism immediately
Slow feedback (>5s)Optimize or mock external calls
Coverage decreasedAdd tests for uncovered paths

Test Distribution: The Testing Trophy

The Testing Trophy (Kent C. Dodds) reflects modern testing reality: integration tests give the best confidence-to-effort ratio.

code
          _____________
         /   System    \      ← Few, slow, high confidence; brittle (E2E)
        /_______________\
       /                 \
      /    Integration    \   ← Real interactions between units — **BEST ROI** (Integration)
      \                   /
       \_________________/
         \    Unit     /      ← Fast & cheap but test in isolation (Unit) 
          \___________/
          /   Static  \       ← Typecheck, linting — typos/types (Static)
         /_____________\

Layer Breakdown

LayerWhatToolsWhen
StaticType errors, syntax, lintingTypeScript, ESLintAlways on, catches 50%+ of bugs for free
UnitPure functions, algorithms, utilitiesvitest, jest, pytestIsolated logic with no dependencies
IntegrationComponents + hooks + services togetherTesting Library, MSW, TestcontainersReal user flows, real(ish) data
E2EFull app in browserPlaywright, CypressCritical paths only (login, checkout)

Why Integration Tests Win

Unit tests prove code works in isolation. Integration tests prove code works together.

ConcernUnit TestIntegration Test
Component renders
Component + hook works
Component + API works
User flow works
Catches real bugsSometimesUsually

The insight: Most bugs live in the seams between modules, not inside pure functions. Integration tests catch seam bugs; unit tests don't.

Practical Guidance

  1. Start with integration tests - Test the way users use your code
  2. Drop to unit tests for complex algorithms or edge cases
  3. Use E2E sparingly - Slow, flaky, expensive to maintain
  4. Let static analysis do the heavy lifting - TypeScript catches more bugs than most unit tests
  5. Prefer fakes over mocks - Fakes have real behavior; mocks just return canned data
  6. SMURF quality: Sustainable, Maintainable, Useful, Resilient, Fast

Anti-Patterns

PatternProblemFix
Mirror BlindnessSame agent writes test AND codeState test intent before GREEN
Happy Path BiasOnly success scenariosInclude errors in Scenario List
Refactoring While RedChanging structure with failing testsGet to GREEN first
The MockeryOver-mocking hides bugsPrefer fakes or real implementations
Coverage TheaterTests without meaningful assertionsAssert behavior, not lines
Multi-Test StepMultiple tests before implementingOne test at a time
Verification Trap 🤖AI tests what code does not what it should doState intent in plain language; separate agent review
Test Exploitation 🤖LLMs exploit weak assertions or overload operatorsUse PBT alongside examples; strict equality
Assertion Omission 🤖Missing edge cases (null, undefined, boundaries)Scenario list with errors; test.each
Hallucinated Mock 🤖AI generates fake mocks without proper setupTestcontainers for integration; real Fakes for unit

Critical: Verify tests by (1) running them, (2) having separate agent review, (3) never trusting generated tests blindly.


Advanced Techniques

Use these techniques at specific points in your workflow:

TechniqueUse DuringPurpose
Test Doubles🔴 RED phaseIsolate dependencies when writing tests
Property-Based Testing🔴 RED phaseCover edge cases for complex logic
Contract Testing🔴 RED phaseDefine API expectations between services
Snapshot Testing🔴 RED phaseCapture UI/response structure
Hermetic Testing🔵 SetupEnsure test isolation and determinism
Mutation Testing✅ After GREENValidate test suite effectiveness
Coverage Analysis✅ After GREENFind untested code paths
Flaky Test Management🔧 MaintenanceFix unreliable tests blocking CI

Test Doubles (Use: Writing Tests with Dependencies)

When: Your code depends on something slow, unreliable, or complex (DB, API, filesystem).

TypePurposeWhen
StubReturns canned answersNeed specific return values
MockVerifies interactionsNeed to verify calls made
FakeSimplified implementationNeed real behavior without cost
SpyRecords callsNeed to observe without changing

Decision: Dependency slow/unreliable? → Fake (complex) or Stub (simple). Need to verify calls? → Mock/Spy. Otherwise → real implementation.

→ See references/examples.mdTest Double Examples


Hermetic Testing (Use: Test Environment Setup)

When: Setting up test infrastructure. Tests must be isolated and deterministic.

Principles:

  • Isolation: Unique temp directories/state per test
  • Reset: Clean up in setUp/tearDown
  • Determinism: No time-based logic or shared mutable state

Database Strategies:

StrategySpeedFidelityUse When
In-memory (SQLite)FastLowUnit tests, simple queries
TestcontainersMediumHighIntegration tests
Transactional RollbackFastHighTests sharing schema (80x faster than TRUNCATE)

→ See references/examples.mdHermetic Testing Examples


Property-Based Testing (Use: Writing Tests for Complex Logic)

When: Writing tests for algorithms, state machines, serialization, or code with many edge cases.

Tools: fast-check (JS/TS), Hypothesis (Python), proptest (Rust)

Properties to Test:

  • Commutativity: f(a, b) == f(b, a)
  • Associativity: f(f(a, b), c) == f(a, f(b, c))
  • Identity: f(a, identity) == a
  • Round-trip: decode(encode(x)) == x
  • Metamorphic: If input changes by X, output changes by Y (useful when you don't know expected output)

How: Replace multiple example-based tests with one property test that generates random inputs.

Critical: Always log the seed on failure. Without it, you cannot reproduce the failing case.

→ See references/examples.mdProperty-Based Testing Examples


Mutation Testing (Use: Validating Test Quality)

When: After tests pass, to verify they actually catch bugs. Use for critical code (auth, payments) or before major refactors.

Tools: Stryker (JS/TS), PIT (Java), mutmut (Python)

How: Tool mutates your code (e.g., changes > to >=). If tests still pass → your tests are weak.

Interpretation:

  • >80% mutation score = good test suite
  • Survived mutants = tests don't catch those changes → add tests for these

Equivalent Mutant Problem: Some mutants change syntax but not behavior (e.g., i < 10i != 10 in a loop where i only increments). These can't be killed—100% score is often impossible. Focus on surviving mutants in critical paths, not chasing perfect scores.

When NOT to use: Tool-generated code (OpenAPI clients, Protobuf stubs, ORM models), simple DTOs/getters, legacy code with slow tests, or CI pipelines that must finish in <5 minutes. Use --incremental --since main for PR-focused runs. Note: This does NOT mean skip mutation testing on code you (the agent) wrote—always validate your own work.

→ See references/examples.mdMutation Testing Examples


Flaky Test Management (Use: CI/CD Maintenance)

When: Tests fail intermittently, blocking CI or eroding trust in the test suite.

Root Causes:

CauseFix
Timing (setTimeout, races)Fake timers, await properly
Shared stateIsolate per test
RandomnessSeed or mock
NetworkUse MSW or fakes
Order dependencyMake tests independent
Parallel transaction conflictsIsolate DB connections per worker

How: Detect (--repeat 10) → Quarantine (separate suite) → Fix root cause → Restore

Quarantine Rules:

  • Issue-linked: Every quarantined test MUST link to a tracking issue. Prevents "quarantine-and-forget."
  • Mute, don't skip: Prefer muting (runs but doesn't fail build) over skipping. You still collect failure data.
  • Reintroduction criteria: Test must pass N consecutive runs (e.g., 100) on main before leaving quarantine.

→ See references/examples.mdFlaky Test Examples


Contract Testing (Use: Writing Tests for Service Boundaries)

When: Writing tests for code that calls or exposes APIs. Prevents integration breakage.

How (Pact): Consumer defines expected interactions → Contract published → Provider verifies → CI fails if contract broken.

→ See references/examples.mdContract Testing Examples


Coverage Analysis (Use: Finding Gaps After Tests Pass)

When: After writing tests, to find untested code paths. NOT a goal in itself.

MetricMeasuresThreshold
LineLines executed70-80%
BranchDecision paths60-70%
MutationTest effectiveness>80%

Risk-Based Prioritization: P0 (auth, payments) → P1 (core logic) → P2 (helpers) → P3 (config)

Warning: High coverage ≠ good tests. Tests must assert meaningful behavior.


Snapshot Testing (Use: Writing Tests for UI/Output Structure)

When: Writing tests for UI components, API responses, or error message formats.

Appropriate: UI structure, API response shapes, error formats. Avoid: Behavior testing, dynamic content, entire pages.

How: Capture output once, verify it doesn't change unexpectedly. Always review diffs carefully.

→ See references/examples.mdSnapshot Testing Examples


Integration with Other Skills

TaskSkillUsage
Committinggit-committest: for RED, feat: for GREEN
Code Qualitycode-qualityRun during REFACTOR phase
Documentationdocs-checkCheck if behavior changes need docs

References

Foundational:

Tools: Testcontainers | fast-check | Stryker | MSW | Pact