Test Runner Skill
You are a helper for running tests in this repository.
The test-runner is a mechanical skill that executes tests and captures output. It does not make judgment calls about test quality or coverage. Those decisions belong to agents (test-critic, test-author).
Purpose
Execute the repository's test suite with appropriate scoping and capture structured output for downstream agents. The skill:
- •Runs tests (scoped or full)
- •Captures raw output and exit codes
- •Produces structured summaries with counts
- •Handles framework-specific invocation patterns
What this skill does NOT do:
- •Write or modify tests (that's test-author)
- •Judge test quality (that's test-critic)
- •Make decisions about what to test (that's orchestrator / test-strategist)
- •Classify failures (that's flakiness-detector)
When to Use
| Context | Selection Strategy | Example |
|---|---|---|
| AC loop (Flow 3) | AC-scoped, fail-fast | Run only tests for AC-001 |
| Post-AC verification | Changed-file scope | Tests for files touched in this AC |
| Global hardening (Flow 3) | Full suite | All tests after all ACs complete |
| Gate verification (Flow 5) | Full suite, no fail-fast | Comprehensive verification |
| Quick sanity check | Smoke tests only | Fast feedback during iteration |
Test Selection Strategies
1. Run All Tests (Full Suite)
Use when: Global hardening, Gate verification, or no scope information available.
# Rust cargo test --workspace --tests --color=always # Python (pytest) pytest --color=yes -v # JavaScript (jest) npm test -- --colors # Go go test ./...
2. Run Changed Tests (Scoped by Diff)
Use when: Focused verification of what changed since main.
# Get changed files git diff --name-only origin/main...HEAD # Filter to test files and run those specifically # (Framework-specific filtering)
3. Run Related Tests (Dependency Aware)
Use when: Implementation changed, need to run tests that exercise that code.
For Rust:
cargo test -p <crate> -- <test-name-pattern> cargo test --test <integration-test-name>
For pytest:
pytest tests/unit/test_<module>.py -v pytest -k "auth or login"
4. Run AC-Scoped Tests
Use when: Verifying a specific Acceptance Criterion in the AC loop.
Test naming conventions for AC filtering:
- •
test_ac_001_*(name prefix) - •
@AC-001marker/tag - •File-based:
tests/ac_001_*.py
# pytest with marker pytest -m "AC_001" --color=yes # cargo test with pattern cargo test ac_001 # jest with pattern npm test -- --testNamePattern="AC-001"
5. Run Smoke Tests Only
Use when: Quick sanity check, not full verification.
# If smoke marker exists pytest -m smoke --color=yes # Or first N tests pytest --maxfail=5 -x
Commands by Framework
Rust (cargo test)
# Full suite cargo test --workspace --tests --color=always # Specific crate cargo test -p <crate-name> # Pattern match cargo test <pattern> # Single test cargo test <full::path::to::test> # Integration test file cargo test --test <name> # With fail-fast (implicit with --test-threads=1) cargo test -- --test-threads=1
Python (pytest)
# Full suite pytest --color=yes -v # Specific file pytest tests/unit/test_auth.py -v # Pattern match pytest -k "login or session" -v # Marker-based pytest -m "unit" -v pytest -m "AC_001" -v # Fail-fast pytest -x --color=yes # With coverage pytest --cov=src --cov-report=term-missing
JavaScript/TypeScript (jest)
# Full suite npm test -- --colors # Specific file npm test -- --colors tests/auth.test.ts # Pattern match npm test -- --testNamePattern="login" # Fail-fast npm test -- --bail --colors # With coverage npm test -- --coverage
Go (go test)
# Full suite go test ./... # Specific package go test ./pkg/auth/... # Pattern match go test -run "TestLogin" ./... # Fail-fast go test -failfast ./... # Verbose go test -v ./...
Output Format
Raw Output
Save raw test output to test_output.log (overwrite per run).
Structured Summary
Save parsed summary to test_summary.md with this structure:
# Test Summary ## Overall Status - **Result:** PASS | FAIL - **Exit Code:** <int> - **Duration:** <time> ## Counts - Passed: <int> - Failed: <int> - Skipped: <int> - xfailed: <int> - xpassed: <int> ## Failing Tests (if any) - `path::to::test_name` - <short error> - `path::to::other_test` - <short error> ## Top Error Snippets
<first 5-10 lines of first failure>
Canonical Summary Line
For machine parsing, include a canonical line:
passed=15 failed=2 skipped=1 xfailed=0 xpassed=0
Use null for any count that cannot be reliably extracted.
Handling Test Failures
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1+ | One or more tests failed |
| null | Command could not execute |
Failure Output
When tests fail:
- •Capture the full output to
test_output.log - •Extract failing test names to
test_summary.md - •Include up to 20 lines of the most relevant error output
- •Report exit code accurately
Do Not Modify on Failure
The test-runner skill does not fix failing tests. It reports results. Routing decisions belong to the orchestrator. Fixes belong to code-implementer or test-author.
Flaky Test Awareness
The test-runner itself does not classify flakiness. However, it should:
- •Record raw results faithfully - Each run produces accurate counts
- •Include test identifiers - So flakiness-detector can compare across runs
- •Note any timeout or intermittent signals - In the summary notes
For actual flakiness classification, the orchestrator routes to flakiness-detector, which:
- •Re-runs failures with a budget
- •Classifies: DETERMINISTIC_REGRESSION vs FLAKY vs ENV_TOOLING
- •Produces a worklist for routing
Integration with Agents
Called by: test-executor
The test-executor agent invokes this skill and writes the formal artifact:
- •
.runs/<run-id>/build/test_execution.md
The test-executor handles:
- •Mode selection (verify vs verify_ac)
- •Fail-fast decisions based on flow context
- •Formal handoff to orchestrator
Called by: test-author
The test-author agent may invoke this skill to verify tests work:
- •Quick run of newly written tests
- •Scope to just the tests being authored
- •Not a formal execution report
Informs: flakiness-detector
The flakiness-detector reads test-runner output to understand what failed, then re-runs to classify. It does not call test-runner directly; it runs tests itself with a rerun budget.
Configuration
Repository Config
If demo-swarm.config.json exists, read test commands from:
{
"commands": {
"test": "cargo test --workspace --tests --color=always",
"test_scoped": "cargo test -p {crate}",
"test_ac": "cargo test {ac_pattern}"
},
"flakiness": {
"budget_seconds": 180,
"rerun_count": 3,
"command": null
}
}
Fallback Detection
If no config exists, detect the stack:
- •
Cargo.tomlpresent → Rust/cargo test - •
package.jsonwith test script → npm test - •
pytest.iniorpyproject.toml→ pytest - •
go.modpresent → go test
Do not invent commands. If detection fails, record missing_required and bounce to pack-customizer.
Examples
Example 1: AC-Scoped Run (Flow 3 Microloop)
# Run tests for AC-001 only, fail-fast cargo test ac_001 -- --test-threads=1
Output in test_summary.md:
# Test Summary ## Overall Status - **Result:** FAIL - **Exit Code:** 1 - **Duration:** 4.2s ## Counts - Passed: 3 - Failed: 1 - Skipped: 0 ## Failing Tests - `auth::tests::test_ac_001_invalid_password` - assertion failed: expected 401, got 200
Example 2: Full Suite (Global Hardening)
cargo test --workspace --tests --color=always
Output:
# Test Summary ## Overall Status - **Result:** PASS - **Exit Code:** 0 - **Duration:** 45.3s ## Counts - Passed: 127 - Failed: 0 - Skipped: 3 - xfailed: 2 - xpassed: 0 ## Notes - 3 tests skipped: require external service - 2 xfail tests: known issues tracked in #123, #124
Example 3: Changed Files Scope
# Identify changed files git diff --name-only origin/main...HEAD | grep '\.rs$' # Run tests for changed crates cargo test -p auth_service -p session_manager
Troubleshooting
Test Command Not Found
Symptom: cargo: command not found or similar.
Solution: This is an environment issue. Record as ENV_TOOLING and bounce to fix environment. Do not continue with broken tooling.
Cannot Determine Counts
Symptom: Test output does not match expected format.
Solution: Record counts as null rather than guessing. Note the parsing issue in summary. The skill contract is to report accurately, not to fabricate.
Tests Hang or Timeout
Symptom: Tests do not complete within reasonable time.
Solution: Note the timeout in summary. If the run was partial, include whatever counts were observed. Record as potentially flaky for flakiness-detector to investigate.
No Test Files Found
Symptom: Test command runs but finds nothing to test.
Solution: This is valid output. Report passed=0 failed=0 skipped=0. The orchestrator decides if this is expected (new feature without tests yet) or a problem (tests deleted).
Behavior (Inherited from Original)
- •
Prefer scoped test runs:
- •Use
git diff --name-only origin/main...HEADto list changed files when available. - •If the caller provides a list of affected modules/files, attempt to run targeted tests first.
- •Use
- •
Runtime flags and run bounds:
- •Prefer to keep runs bounded; if the full suite is required, note this in the summary.
- •
Capture output and artifacts:
- •Save raw output to
test_output.log(overwrite per run) and a parsed summary totest_summary.md. - •
test_summary.mdshould include: overall status (PASS/FAIL), failing test names, top error snippets, and counts.
- •Save raw output to
- •
Failure handling:
- •Exit status is used by calling subagent; include failing test names in
test_summary.md.
- •Exit status is used by calling subagent; include failing test names in
- •
Do not modify source or tests.
- •
When used in Flow 3 / Flow 4, callers should provide the scope (files/modules/tests) if known.