Specwright Gate: Test Quality
Three-tier test quality analysis. Tier 1 blocks, Tier 2 warns, Tier 3 is informational. Prefer ast_grep_search for structural queries (fallback to Grep).
Default verdict is FAIL. Evidence must be cited before any verdict. Absence of evidence is evidence of non-compliance.
Step 1: Read Configuration and State
Read .specwright/config.json for:
- •
project.languages— to identify test file patterns - •
commands.test— test execution command - •
gates.tests— optional custom thresholds
Read .specwright/state/workflow.json. Extract currentEpic.id and currentEpic.specDir.
If no epic active, STOP: "No active epic."
Create evidence directory:
mkdir -p {specDir}/evidence/
Step 2: Identify Scope
2a: Determine test file patterns
Based on project.languages from config:
- •TypeScript/JavaScript:
*.test.ts,*.spec.ts,*.test.js,*.spec.js,__tests__/** - •Python:
test_*.py,*_test.py,tests/** - •Go:
*_test.go - •Rust: files containing
#[cfg(test)],tests/directory - •Other: ask LLM to identify test patterns
2b: Get changed files
git diff --name-only main...HEAD 2>/dev/null || git diff --name-only HEAD~10
Filter to source files (exclude test files) to identify what needs test coverage.
2c: Get test files for changed source files
Map changed source files to their corresponding test files using naming conventions from the language.
If commands.test is null AND zero test files found: write ERROR status, STOP.
Step 3: Tier 1 — Essential Checks (BLOCK on failure)
Any Tier 1 failure sets gate status to FAIL.
3a: Test Coverage
If a coverage command is available in config or can be inferred:
- •Run coverage and capture output
- •Parse coverage percentage (format varies by language — use LLM to parse)
- •Threshold: >= 80% (configurable via
gates.tests.coverageThreshold) - •If coverage tool unavailable: WARN (not FAIL), log "Coverage tool not configured"
3b: Assertion Density
For each test file in scope:
- •Use Grep to find test functions/methods (patterns vary by language)
- •Use Grep to find assertion calls (assert, expect, should, etc.)
- •Calculate:
density = total assertions / total test functions - •Threshold: >= 2.5 assertions per test (configurable)
- •Note: LLM should recognize assertion patterns for the project's test framework
3c: Tautological Tests
Tests that always pass:
- •Zero assertions, self-equality checks, truthy-only checks
- •Assertions passing under trivial mutations (constant return, branch swap, conditional removal). Non-constraining assertions (checking definedness vs expected value) are BLOCK.
- •If loose check (optional field) vs weak assertion (computed value definedness) is unclear, invoke AskUserQuestion.
- •Detection: Grep + LLM analysis
- •Threshold: Zero tautological tests
3d: Changed Code Has Tests
For each changed source file:
- •Verify a corresponding test file exists
- •Verify the test file has been updated (if source file was modified)
- •New source files without tests = FAIL
- •Threshold: All changed source files have test coverage
Step 4: Tier 2 — Quality Signals (WARN on failure)
Tier 2 failures do NOT block but record warnings.
4a: Negative Test Ratio
Count test functions matching error/invalid/failure patterns.
- •Threshold: >= 30% of tests cover error cases
4b: Test-to-Code Ratio
Compare total test lines to total source lines in scope.
- •Threshold: >= 0.8:1 (configurable)
4c: Flaky Patterns
Search for patterns known to cause flaky tests:
- •Sleep/delay calls in tests
- •Time-dependent assertions
- •Random value generation without seeds
- •Threshold: Zero flaky patterns
4d: Mock Ratio
If mocking is used, compare mock setup to assertion volume.
- •Threshold: More assertions than mock setup lines
4e: Test Isolation
Check for tests that depend on external state:
- •Database calls without cleanup/fixtures
- •Network calls without mocking
- •File system operations without temp directories
Step 5: Tier 3 — Detection Rules (INFO only)
Flag test quality opportunities: weak assertions (not-null vs specific values), over-mocking (>5 dependencies), missing boundary values (zero/negative/large), long setup (>50 lines).
Step 6: Baseline Check
If .specwright/baselines/gate-tests.json exists: matching entries downgrade BLOCK->WARN, WARN->INFO. Expired entries ignored. Partial match: AskUserQuestion. Log downgrades in evidence.
Step 7: Update Gate Status
Self-critique checkpoint: Before finalizing — did I accept anything without citing proof? Did I give benefit of the doubt? Would a skeptical auditor agree? Gaps are not future work. TODOs are not addressed. Partial implementations do not match intent. If ambiguous, FAIL.
Determine final status:
- •Incomplete analysis: ERROR (invoke AskUserQuestion)
- •Any Tier 1 failure: FAIL
- •All Tier 1 pass, any Tier 2 warning: WARN
- •All pass: PASS
Update .specwright/state/workflow.json gates.tests:
{"status": "PASS|WARN|FAIL|ERROR", "lastRun": "<ISO>", "evidence": "{specDir}/evidence/test-quality.md"}
Step 8: Save Evidence
Write {specDir}/evidence/test-quality.md: Tier 1 table (Coverage, Assertion Density, Tautological, Changed Code Coverage with PASS/FAIL + detail), Tier 2 table (Negative Ratio, Test:Code Ratio, Flaky, Mock Ratio, Isolation with PASS/WARN + detail), Tier 3 findings with file:line.
Step 9: Output Result
TESTS GATE: {PASS|WARN|FAIL}
Tier 1 (BLOCK): {pass}/{total} passed
Tier 2 (WARN): {pass}/{total} passed
Tier 3 (INFO): {count} findings