AgentSkillsCN

test-health-audit

当测试套件存在不稳定测试、测试运行缓慢,或覆盖率存在空白时使用。可在新增大量测试之后、发布前,或 CI 不够可靠时使用。适用于测试结果时而通过、时而失败,运行时间过长,或覆盖率尚不明确的情况时使用。

SKILL.md
--- frontmatter
name: test-health-audit
description: Use when test suite has flaky tests, slow test runs, or coverage gaps. Use after adding many tests, before releases, or when CI is unreliable. Use when tests pass/fail inconsistently, take too long, or coverage is unknown.

Test Health Audit

Systematic review of a test suite for flakiness, performance, and coverage. Spawns a team to audit, fix, optimize, and verify.

When to Use

  • Tests pass/fail inconsistently (flaky)
  • Test suite is slow or getting slower
  • Coverage is unknown or declining
  • Before a release or after a large feature lands
  • CI is unreliable due to test issues

The Process

dot
digraph audit {
    rankdir=TB;
    "Spawn audit agents (parallel)" [shape=box];
    "Flakiness audit" [shape=box];
    "Performance audit" [shape=box];
    "Fix flaky tests" [shape=box];
    "Optimize slow tests" [shape=box];
    "10x verification run" [shape=box];
    "Push PR" [shape=box];

    "Spawn audit agents (parallel)" -> "Flakiness audit";
    "Spawn audit agents (parallel)" -> "Performance audit";
    "Flakiness audit" -> "Fix flaky tests";
    "Performance audit" -> "Optimize slow tests";
    "Fix flaky tests" -> "10x verification run";
    "Optimize slow tests" -> "10x verification run";
    "10x verification run" -> "Push PR";
}

Phase 1: Parallel Audits

Spawn two research agents simultaneously:

Flakiness Audit (test-automator or QA agent):

  • Identify all flaky tests and categorize root causes
  • Search for these patterns across ALL test files:
PatternRiskExample
Shared filesystem pathsHIGHTests reading/writing same directory
Non-seeded RNGHIGHthread_rng(), rand::rng() in test paths
Timing dependenciesMEDIUMsleep(), Instant::now(), elapsed checks
Global/static mutable stateHIGHlazy_static, shared Mutex, AtomicBool
Execution order dependenciesMEDIUMTests that assume prior test ran
Probabilistic assertionsLOWMonte Carlo with tight margins
  • Produce ranked report: test name, file:line, root cause, severity, fix approach

Performance Audit (performance-engineer agent):

  • Profile test suite timing per file and per test
  • Identify slowest tests and why they're slow:
PatternFix
Excessive loop counts (50k+)Reduce to minimum proving the point
Brute-force probabilistic triggeringPre-computed seeds or boosted parameters
Redundant negative assertions (P=0)100 iterations sufficient for structural impossibility
Expensive AI computation in testsReduce search depth or board size
Duplicate coverage across filesConsolidate or remove redundant tests
  • Produce ranked report: test name, time contribution, optimization, expected savings

Phase 2: Parallel Fixes

Spawn fix agents based on audit findings. Common fix patterns:

Flakiness fixes:

  • Shared filesystem -> isolated temp directories per test
  • Non-seeded RNG -> seeded RNG or boosted parameters to make outcome deterministic
  • Timing deps -> generous tolerances or remove timing dependency
  • Global state -> per-test initialization

Performance fixes:

  • Structural impossibility tests (P=0 by code path): reduce to 100-1k iterations
  • Rare event triggering: boost parameters to increase probability + seeded RNG
  • Statistical distribution tests: reduce samples, adjust thresholds proportionally
  • Combat/simulation loops: give test characters overwhelming stats

Phase 3: Coverage Check

After fixes, verify coverage hasn't regressed:

bash
# Rust
cargo llvm-cov --summary-only

# JS/TS
npx jest --coverage --coverageReporters=text-summary

# Python
pytest --cov --cov-report=term-summary

Flag any modules below project threshold.

Phase 4: Verification

Run the full test suite 10 times consecutively:

bash
# Rust
for i in $(seq 1 10); do echo "Run $i"; cargo test 2>&1 | grep "test result:"; done

# JS/TS
for i in $(seq 1 10); do echo "Run $i"; npx jest 2>&1 | grep "Tests:"; done

Pass criteria: 0 failures across all 10 runs.

Team Composition

RoleCountTask
QA / test-automator1Flakiness audit
performance-engineer1Performance audit
QA / general-purpose1-3Fix flaky tests (parallelize by category)
Dev / general-purpose1Optimize slow tests

Scale QA agents based on number of flaky test categories found.

Common Mistakes

MistakeFix
Reducing iteration count too aggressivelyKeep enough for statistical confidence (3x expected hits minimum)
Fixing flakiness by adding sleep/retryFix root cause (isolation, determinism), not symptoms
Modifying production code to fix testsOnly change test code unless the production code has a genuine bug
Ignoring Monte Carlo tests as "probably fine"Check margins are generous (2-5x expected range)
Skipping the 10x verificationFlakiness is probabilistic; 1 run proves nothing