Skill: flaky-test-detector
Purpose
Identify and quarantine non-deterministic (flaky) tests to prevent the agent from getting stuck in a self-correction loop trying to fix a non-existent bug.
When to Use
- •When a test fails during verification but the code change seems correct.
- •When the same test passes and fails on subsequent runs without code changes.
- •As a pre-flight check before starting a new feature.
How It Works
Step 1: Detect Flaky Tests
Run the test suite multiple times and compare results:
bash
# For Node.js/Jest
for i in {1..5}; do npm test -- --json --outputFile=.ralph/test_run_$i.json; done
# For Python/Pytest
for i in {1..5}; do pytest --json-report --json-report-file=.ralph/test_run_$i.json; done
Step 2: Analyze Results
Compare the test results across all runs. A test is considered flaky if it passes in some runs and fails in others.
bash
# Example analysis (pseudocode) # If test X passed in runs 1, 2, 4 but failed in runs 3, 5 -> FLAKY
Step 3: Quarantine Flaky Tests
For each flaky test, add a skip marker with a reason:
Jest (JavaScript/TypeScript):
javascript
// Before
test('should process data correctly', () => { ... });
// After
test.skip('should process data correctly', () => { ... }); // FLAKY: Intermittent failure, see .ralph/flaky_tests.md
Pytest (Python):
python
# Before
def test_should_process_data_correctly():
...
# After
@pytest.mark.skip(reason="FLAKY: Intermittent failure, see .ralph/flaky_tests.md")
def test_should_process_data_correctly():
...
Step 4: Log and Create Follow-Up Story
- •
Log the flaky test to
.ralph/flaky_tests.md:markdown## Flaky Test: test_should_process_data_correctly - **File**: tests/test_data_processor.py - **Detected**: 2026-01-30 - **Failure Rate**: 2/5 runs (40%) - **Hypothesis**: Possible race condition in async data fetching.
- •
Create a new user story in
prd.json:json{ "id": "FLAKY-001", "title": "Fix flaky test: test_should_process_data_correctly", "description": "Investigate and fix the root cause of the flaky test.", "acceptanceCriteria": ["Test passes 10/10 times consecutively."], "passes": false, "priority": "low" }
Configuration
Add the following to your CLAUDE.md:
markdown
## Flaky Test Detection - **Number of Test Runs**: 5 - **Flaky Threshold**: Any test that fails at least once is considered flaky. - **Quarantine Action**: Skip with reason.