AgentSkillsCN

Flaky Test Detector

易出错测试检测器

SKILL.md

Skill: flaky-test-detector

Purpose

Identify and quarantine non-deterministic (flaky) tests to prevent the agent from getting stuck in a self-correction loop trying to fix a non-existent bug.

When to Use

  • When a test fails during verification but the code change seems correct.
  • When the same test passes and fails on subsequent runs without code changes.
  • As a pre-flight check before starting a new feature.

How It Works

Step 1: Detect Flaky Tests

Run the test suite multiple times and compare results:

bash
# For Node.js/Jest
for i in {1..5}; do npm test -- --json --outputFile=.ralph/test_run_$i.json; done

# For Python/Pytest
for i in {1..5}; do pytest --json-report --json-report-file=.ralph/test_run_$i.json; done

Step 2: Analyze Results

Compare the test results across all runs. A test is considered flaky if it passes in some runs and fails in others.

bash
# Example analysis (pseudocode)
# If test X passed in runs 1, 2, 4 but failed in runs 3, 5 -> FLAKY

Step 3: Quarantine Flaky Tests

For each flaky test, add a skip marker with a reason:

Jest (JavaScript/TypeScript):

javascript
// Before
test('should process data correctly', () => { ... });

// After
test.skip('should process data correctly', () => { ... }); // FLAKY: Intermittent failure, see .ralph/flaky_tests.md

Pytest (Python):

python
# Before
def test_should_process_data_correctly():
    ...

# After
@pytest.mark.skip(reason="FLAKY: Intermittent failure, see .ralph/flaky_tests.md")
def test_should_process_data_correctly():
    ...

Step 4: Log and Create Follow-Up Story

  1. Log the flaky test to .ralph/flaky_tests.md:

    markdown
    ## Flaky Test: test_should_process_data_correctly
    - **File**: tests/test_data_processor.py
    - **Detected**: 2026-01-30
    - **Failure Rate**: 2/5 runs (40%)
    - **Hypothesis**: Possible race condition in async data fetching.
    
  2. Create a new user story in prd.json:

    json
    {
      "id": "FLAKY-001",
      "title": "Fix flaky test: test_should_process_data_correctly",
      "description": "Investigate and fix the root cause of the flaky test.",
      "acceptanceCriteria": ["Test passes 10/10 times consecutively."],
      "passes": false,
      "priority": "low"
    }
    

Configuration

Add the following to your CLAUDE.md:

markdown
## Flaky Test Detection

- **Number of Test Runs**: 5
- **Flaky Threshold**: Any test that fails at least once is considered flaky.
- **Quarantine Action**: Skip with reason.