AgentSkillsCN

verification-before-completion

当您即将宣称某项工作已完成、问题已修复、测试已通过,或任务已告一段落时,此技能要求您在做出任何成功声明之前,先运行验证命令并确认输出结果。始终秉持“证据先行,再做论断”的原则。

SKILL.md
--- frontmatter
name: verification-before-completion
description: "Use when about to claim work is complete, fixed, passing, or done. Requires running verification commands and confirming output before making any success claims. Evidence before assertions, always."

Verification Before Completion

Overview

Claiming work is complete without verification is dishonesty, not efficiency.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

The Iron Law

code
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this response, you cannot claim it passes.

When to Use

Always — before ANY of these claims:

  • "Tests pass"
  • "Build succeeds"
  • "Bug is fixed"
  • "Feature is complete"
  • "Linter is clean"
  • "Type-safe"
  • "No regressions"
  • "It works"
  • "Done"

When NOT to Use

This skill has no exceptions. If you're about to claim something is done, use this skill.

Anti-Shortcut Rules

code
YOU CANNOT:
- Say "should work" — run it and show the output
- Say "tests pass" without test output in this response — stale evidence is not evidence
- Say "build succeeds" because "linter passed" — they're different things
- Trust a sub-agent's success claim — verify independently
- Use "looks good" as verification — what command proves it?
- Claim "done" based on mental reasoning — evidence is output, not thought
- Skip full test suite because "only this file changed" — regressions are unexpected
- Express satisfaction ("Great!", "Perfect!") before verification — verify, then celebrate

Common Rationalizations (Don't Accept These)

ExcuseReality
"Should work now"RUN the verification
"I'm confident"Confidence ≠ evidence
"Just this once"No exceptions
"Linter passed"Linter ≠ compiler ≠ tests
"Agent said success"Verify independently
"I'm tired"Exhaustion ≠ excuse
"Partial check is enough"Partial proves nothing
"Different words, so rule doesn't apply"Spirit over letter
"I checked mentally"Mental checks aren't evidence
"It's a trivial change"Trivial changes cause most outages

Iron Questions

code
1. What COMMAND proves this claim? (name the exact command)
2. Have I run that command in THIS response? (not earlier, not "just now")
3. Does the output ACTUALLY confirm the claim? (read the output — don't skim)
4. Did I check the EXIT CODE? (0 = success, not always)
5. Are there any WARNINGS I'm ignoring? (warnings often become errors)
6. Did I run the FULL relevant suite? (not just one test)
7. Am I claiming more than the evidence shows? ("tests pass" vs "these 3 tests pass")
8. Would I bet money on this claim based on this evidence? (if not, gather more)

The Gate Function

code
BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output — check exit code, count failures
4. VERIFY: Does output actually confirm the claim?
   - If NO → State actual status with evidence
   - If YES → State claim WITH evidence
5. ONLY THEN: Make the claim

Skip any step = lying, not verifying.

Verification Requirements

ClaimRequiresNot Sufficient
"Tests pass"Test output: 0 failures, all passPrevious run, "should pass"
"Build succeeds"Build output: exit 0"Linter passed"
"Linter clean"Linter output: 0 errors, 0 warningsPartial check
"Bug fixed"Reproduction test passes + no regressions"Code changed"
"Type-safe"Type checker output: 0 errors"No red squiggles"
"Feature complete"Line-by-line requirement check + all tests"Tests pass"
"No regressions"Full suite greenTargeted tests only
"Agent completed"VCS diff of actual changesAgent's "success" claim
"Deployed"Health check returns 200 + metrics stableDeploy command exit 0
"Performance improved"Before/after benchmarks"Feels faster"
"Security fixed"Vulnerability scan passes"Patched the code"

Patterns

Tests

code
✅ [Run test command] → [Output shows: 34/34 pass] → "All 34 tests pass"
❌ "Should pass now"
❌ "Tests look correct"

Regression Test (TDD Red-Green)

code
✅ Write test → Run (FAIL) → Fix → Run (PASS) → Revert fix → Run (FAIL) → Restore → Run (PASS)
❌ "I wrote a regression test" (without red-green cycle)

Build

code
✅ [Run build] → [Output shows: exit 0, no errors] → "Build passes"
❌ "Linter passed, so build should be fine"

Requirements

code
✅ Re-read plan → Create checklist → Verify each item with evidence → Report gaps or completion
❌ "Tests pass, so feature is complete"

Deployment

code
✅ Deploy → Health check → Metrics stable for 15min → Error rate baseline →  "Deployed successfully"
❌ "Deploy command succeeded"

Red Flags — STOP

You are about to violate this rule if:

  • Using "should", "probably", "seems to", "looks like"
  • Expressing satisfaction before verification ("Great!", "Perfect!", "Done!")
  • About to commit/push/PR without running tests
  • Trusting a subagent's success report without checking
  • Relying on partial verification ("well, this test passes...")
  • Thinking "just this once"
  • Tired and wanting to be done
  • ANY wording implying success without having run verification

Integration

  • Used after: Every executing-plans task
  • Used after: Every systematic-debugging fix
  • Used before: Every git-workflow merge
  • Enables: Honest code-review findings
  • Prevents: False "done" claims in incident-response