AgentSkillsCN

Empirical Validation

在标记工作完成之前,必须提供充分的证明——绝不轻信“相信我,它能用”。

SKILL.md
--- frontmatter
name: Empirical Validation
description: Requires proof before marking work complete — no "trust me, it works"

Empirical Validation

Core Principle

"The code looks correct" is NOT validation.

Every change must be verified with empirical evidence before being marked complete.

Validation Methods by Change Type

Change TypeRequired ValidationTool
UI ChangesScreenshot showing expected visual statebrowser_subagent
API EndpointsCommand showing correct responserun_command
Build/ConfigSuccessful build or test outputrun_command
Data ChangesQuery showing expected data staterun_command
File OperationsFile listing or content verificationrun_command

Validation Protocol

Before Marking Any Task "Done"

  1. Identify Verification Criteria

    • What should be true after this change?
    • How can that be observed?
  2. Execute Verification

    • Run the appropriate command or action
    • Capture the output/evidence
  3. Document Evidence

    • Add to .agent/state/JOURNAL.md under the task
    • Include actual output, not just "passed"
  4. Confirm Against Criteria

    • Does evidence match expected outcome?
    • If not, task is NOT complete

Examples

API Endpoint Verification

powershell
# Good: Actual test showing response
curl -X POST http://localhost:3000/api/login -d '{"email":"test@test.com"}' 
# Output: {"success":true,"token":"..."}

# Bad: Just saying "endpoint works"

UI Verification

code
# Good: Take screenshot with browser tool
- Navigate to /dashboard
- Capture screenshot
- Confirm: Header visible? Data loaded? Layout correct?

# Bad: "The component should render correctly"

Build Verification

powershell
# Good: Show build output
npm run build
# Output: Successfully compiled...

# Bad: "Build should work now"

Forbidden Phrases

Never use these as justification for completion:

  • "This should work"
  • "The code looks correct"
  • "I've made similar changes before"
  • "Based on my understanding"
  • "It follows the pattern"

Integration

This skill integrates with:

  • /verify — Primary workflow using this skill
  • /execute — Must validate before marking tasks complete
  • Rule 4 in GEMINI.md — Empirical Validation enforcement

Failure Handling

If verification fails:

  1. Do NOT mark task complete
  2. Document the failure in .agent/state/STATE.md
  3. Create fix task if cause is known
  4. Trigger Context Health Monitor if 3+ failures