Autonomous Test Runner
Purpose
Executes the test suite autonomously and classifies failures into actionable categories. This skill never fixes code blindly—it always determines the root cause first.
When to Use
- •After generating tests
- •During iterative fixing cycles
- •Before any release gate
Instructions
1. Run the Full Test Suite
bash
# Example commands npm test pytest go test ./...
2. Classify Each Failure
Every failure must be classified before any action:
| Classification | Definition | Next Step |
|---|---|---|
| Implementation Bug | Code doesn't match spec | → self-healing-debugger |
| Spec Mismatch | Spec is unclear/wrong | → spec-violation-detector |
| Test Error | Test itself is broken | Fix test, not code |
| Environment Issue | Setup/config problem | Fix environment |
| Flaky Test | Intermittent failure | Stabilize or quarantine |
3. Produce Failure Summary
markdown
## Test Results Summary **Run:** 2026-01-22 09:30:00 **Total:** 147 tests **Passed:** 142 (96.6%) **Failed:** 5 ### Failures | Test | Spec ID | Classification | Recommended Action | |------|---------|----------------|-------------------| | test_login_token | AUTH-001 | Implementation Bug | Fix token expiry | | test_rate_limit | AUTH-002 | Spec Mismatch | Clarify limit value | | test_cart_total | CART-001 | Test Error | Fix assertion |
4. Do Not Assume Intent
code
❌ Wrong: "This test fails, I'll change the assertion to pass"
✅ Right: "This test fails, classification: Implementation Bug,
code returns 3600, spec says 86400, fix the code"
Classification Decision Tree
mermaid
flowchart TD
FAIL[Test Failed] --> Q1{Code matches spec?}
Q1 -->|No| BUG[Implementation Bug]
Q1 -->|Yes| Q2{Spec clear?}
Q2 -->|No| SPEC[Spec Mismatch]
Q2 -->|Yes| Q3{Test correct?}
Q3 -->|No| TEST[Test Error]
Q3 -->|Yes| Q4{Repeatable?}
Q4 -->|No| FLAKY[Flaky Test]
Q4 -->|Yes| ENV[Environment Issue]
BUG --> FIX1[→ self-healing-debugger]
SPEC --> FIX2[→ spec-violation-detector]
TEST --> FIX3[Fix test directly]
FLAKY --> FIX4[Stabilize or quarantine]
ENV --> FIX5[Fix environment]
Inputs
- •Test files from
self-test-generator - •Relevant spec clauses
- •Previous test run history (for flaky detection)
Outputs
- •Test execution results
- •Classified failure list
- •Recommended actions per failure
Integration
- •Precedes:
self-healing-debuggerorspec-violation-detector - •Follows:
self-test-generator - •Feeds into:
delivery-readiness-gate
How to provide feedback
- •Be specific: "The failure in 'API-001' was classified as a 'Test Error', but it's actually an 'Implementation Bug'."
- •Explain why: "Incorrect classification sends the Agent to the wrong skill (fixing test instead of code)."
- •Suggest alternatives: "Re-classify as 'Implementation Bug' and trigger
self-healing-debugger."
Classification before action, always.