AgentSkillsCN

code-testing

当您需要编写测试、评审测试质量、调试不稳定测试,或决定究竟该测试什么、如何测试时,此技能将为您提供全方位指导。涵盖测试结构、金字塔 vs. 奖杯式测试模型、Mock 策略,以及常见的反模式。

SKILL.md
--- frontmatter
name: code-testing
description: Use when writing tests, reviewing test quality, debugging flaky tests, or deciding what/how to test. Covers test structure, pyramid vs trophy, mocking strategy, and anti-patterns.

Code Testing

Core philosophy: Test behavior, not implementation. Good tests survive refactoring, serve as documentation, and catch real bugs without false alarms.

"Write tests. Not too many. Mostly integration." — Guillermo Rauch

When to Use

  • Writing new tests for features/bugfixes
  • Reviewing test quality or coverage
  • Debugging flaky/intermittent tests
  • Deciding test strategy (unit vs integration vs E2E)
  • Choosing what to mock

FIRST Principles

PrincipleMeaning
FastMilliseconds. Slow tests don't get run.
IsolatedPass alone, in sequence, or parallel.
RepeatableSame inputs = same results. No env/time deps.
Self-validatingAuto pass/fail. No human inspection.
TimelyWritten close to (or before) code.

Test Structure: AAA

typescript
test('calculates order total with tax', () => {
  // Arrange
  const order = new Order();
  order.addItem({ price: 100 });

  // Act
  const total = order.calculateTotal({ taxRate: 0.1 });

  // Assert
  expect(total).toBe(110);
});

One behavior per test. Never Assert → Act → Assert in same test.

What Makes Tests Good vs Bad

Good tests are:

  • Behavioral — sensitive to what code does
  • Structure-insensitive — unaffected by internal refactoring

Bad tests:

  • Test implementation details (internal state, private methods)
  • Create false negatives (break on valid refactors)
  • Create false positives (pass while real bugs exist)
typescript
// BAD: tests implementation
expect(wrapper.state('openIndex')).toBe(1);

// GOOD: tests behavior
await userEvent.click(screen.getByText('Section 2'));
expect(screen.getByText('Section 2 content')).toBeVisible();

Testing Pyramid vs Trophy

Traditional Pyramid (Google/Fowler)

code
      /\        E2E (5%)
     /  \
    /----\      Integration (15%)
   /------\
  /--------\    Unit (80%)

Favor when: Algorithmic code, stable interfaces, libraries, constrained resources.

Testing Trophy (Kent C. Dodds)

code
        E2E (critical paths)
       /    \
      /------\     Integration (largest)
     /--------\
    /----------\   Unit (smaller)
   /______________\ Static (TS, ESLint)

Favor when: Apps with complex integrations, UI components, modern frontend.

Core insight: Integration tests offer best confidence-to-cost ratio for most apps.

What to Test at Each Layer

LayerTestExamples
UnitPure functions, algorithms, utilsformatCurrency(1234.5) → '$1,234.50'
IntegrationComponents together, API endpoints, DB opsCart calculates totals correctly
E2ECritical user journeys onlySignup → purchase → confirmation
ContractAPI compatibility between servicesConsumer expectations match provider

Test Doubles

DoublePurposeVerification
DummyFill params, never usedNone
FakeWorking shortcut (in-memory DB)State
StubCanned responsesState
SpyRecord calls + stubState or Behavior
MockVerify specific calls madeBehavior

Key distinction: Stubs verify state (what's the result?), mocks verify behavior (what calls were made?).

Mocking Rules

  • Mock at system boundaries — external APIs, databases
  • Don't mock what you don't own — wrap third-party libs in your interfaces
  • Over-mocking = testing mocks — if setup > test code, you're testing wrong
typescript
// BAD: mock everything
jest.mock('./userService');
jest.mock('./emailService');
jest.mock('./logger');
// Now testing that mocks return what you told them to

// GOOD: real implementations where practical
const db = createTestDatabase();
const service = new UserService(db); // real service, test DB

Anti-Patterns

The Liar (validates nothing)

typescript
// Always passes, tests nothing
test('renders component', () => {
  const wrapper = render(<MyComponent />);
  expect(wrapper).toBeTruthy(); // Always true!
});

Fix: Watch test fail first. If you can't break it, it's worthless.

Brittle Tests (break on refactor)

  • Testing internal state/private methods
  • Over-specifying assertions
  • Fragile selectors (XPath, CSS classes)
  • Order-dependent tests

Fix: Test through public interfaces only.

Flaky Tests (intermittent failures)

Root causes:

  • 54% — improper async handling (sleep() vs explicit waits)
  • 31% — race conditions
  • Shared state between tests

Fixes:

typescript
// BAD
await sleep(3000);

// GOOD
await screen.findByText('Loaded');
await waitFor(() => expect(result).toBe(expected));
  • Clean shared state in beforeEach, not afterEach
  • Mock time-dependent operations
  • Run tests in random order to expose hidden deps

Ice Cream Cone (inverted pyramid)

Heavy manual testing → slow E2E → minimal unit tests.

Result: Late feedback, high cost, poor edge case coverage.

Test Naming

Names should read like specifications:

typescript
// GOOD
'calculates total price including tax'
'prevents adding out-of-stock items to cart'
'returns empty array when input is null'

// BAD
'test1'
'testCalculateTotal_ReturnsCorrectValue'

DAMP over DRY

Tests don't have tests — they must be obviously correct on inspection.

  • DRY for mechanics — test factories, common setup
  • DAMP for meaning — explicit values that matter
typescript
// DRY helper for mechanics
const createTestUser = (overrides = {}) => ({
  id: 1, name: 'Test User', ...overrides
});

// DAMP test with meaningful values
test('loyal customers receive 20% discount', () => {
  const customer = createTestUser({ loyaltyYears: 5 }); // loyalty matters!
  expect(calculateDiscount(customer)).toBe(0.20);
});

Coverage: Useful Metric, Terrible Target

Coverage reveals untested code — useful for finding gaps.

Don't chase 100%: Produces trivial tests, tests without assertions, wasted time.

Target 70-85% as sanity check, not strict gate.

Quick Reference

SituationApproach
Pure function with edge casesUnit test
Component interactionsIntegration test
Critical user journeyE2E test
External APIMock at boundary
Internal collaboratorReal implementation
Flaky timingExplicit waits, not sleep
Test breaks on refactorTest behavior, not implementation
High coverage, bugs slip throughCheck assertions, try mutation testing

Red Flags

  • Tests that break on every refactor
  • Setup code larger than test code
  • sleep() / arbitrary delays
  • Testing internal state
  • Mocking everything
  • Tests without meaningful assertions
  • Order-dependent tests
  • Coverage as strict gate