verifying-correctness

在宣称代码“已完成”之前、实施之后、提交之前使用。该技能可检查需求覆盖率、并发控制、错误处理、资源管理、边界条件以及安全性。

SKILL.md

--- frontmatter

name: verifying-correctness
description: Use before claiming code is "done", after implementation, before commit. Checks requirements coverage, concurrency, error handling, resources, boundaries, and security.

Verifying Correctness

Design quality ≠ correctness. Well-designed code can still have bugs, missing requirements, or safety issues.

When to Use

•Before claiming implementation is "done"
•Before committing new code
•After strategic-coding design work
•When reviewing your own code

Dimension Detection & Checks

For each dimension: detect if it applies, then verify.

1. Requirements Coverage

Detect: Were requirements stated? (explicit list, user request, spec)

If YES, verify:

• List each requirement explicitly
• For each: point to code that implements it
• Any requirement without code? → Not done
• Any code without requirement? → Scope creep or missing requirement

Red flag: "I think I covered everything" without explicit mapping

2. Concurrency Safety

Detect: Any of these present?

•Multiple threads/processes accessing same data
•Async/await patterns
•Shared mutable state (class attributes, globals)
•"Thread-safe" in requirements or docstring
•Web handlers, queue workers, background tasks

If YES, verify:

• All shared mutable state identified
• Each access point protected (lock, atomic, queue, immutable)
• No time-of-check to time-of-use (TOCTOU) gaps
• Lock ordering consistent (if multiple locks)

Red flag: "It's probably fine" or "Python GIL handles it"

3. Error Handling

Detect: Can any operation fail?

•I/O (file, network, database)
•External calls (APIs, subprocesses)
•Resource acquisition (memory, connections)
•User input processing
•Parsing/deserialization

If YES, verify:

• Each failure point has explicit handling OR propagates
• No bare except: or except Exception: pass
• Error messages actionable (what failed, why, how to fix)
• Partial failures handled (rollback, cleanup, consistent state)

Red flag: "Errors are rare" or "caller handles it" without checking caller

4. Resource Management

Detect: Does code acquire resources?

•File handles, sockets, connections
•Locks, semaphores
•Memory allocations (large buffers, caches)
•External service handles
•Background threads/processes

If YES, verify:

• Every acquire has corresponding release
• Release happens in finally/context manager/destructor
• Release happens on error paths too
• No resource leaks on repeated calls
• Bounded growth (caches have limits, queues have limits)

Red flag: "It cleans up eventually" or daemon threads without shutdown

5. Boundary Conditions

Detect: Does code handle variable-size input?

•Collections (lists, dicts, sets)
•Strings, byte arrays
•Numeric ranges
•Optional/nullable values

If YES, verify:

• Empty input: What happens with [], "", None, 0?
• Single item: Edge case often different from N items
• Maximum size: What if input is huge? Memory? Time?
• Invalid values: Negative numbers, NaN, special characters?
• Type boundaries: int overflow, float precision?

Red flag: "Nobody would pass that" or "that's an edge case"

6. Security (if applicable)

Detect: Does code handle untrusted input?

•User-provided data (forms, API requests)
•File contents from external sources
•URLs, paths, identifiers from users
•Data that becomes SQL, shell, HTML, or code

If YES, verify:

• Input validated before use
• No string concatenation for SQL/shell/HTML (use parameterized)
• Path traversal prevented (no ../ exploitation)
• Secrets not logged or exposed in errors
• Auth/authz checked before action, not after

Red flag: "It's internal only" (internals get exposed)

Quick Checklist (Minimum)

Before "done", answer YES to all that apply:

Dimension	Detection Trigger	Verified?
Requirements	Requirements were stated	[ ] Each mapped to code
Concurrency	Shared state exists	[ ] All access protected
Errors	Operations can fail	[ ] All failures handled
Resources	Resources acquired	[ ] All released (incl. errors)
Boundaries	Variable-size input	[ ] Edge cases handled
Security	Untrusted input	[ ] Input validated

Anti-Rationalization

Thought	Reality
"Design is good, so it works"	Design ≠ correctness. Check anyway.
"It's simple code"	Simple code has bugs too. Check anyway.
"I'll add error handling later"	Later = never. Check now.
"Edge cases are rare"	Edge cases cause production incidents.
"It's not user-facing"	Internal code gets exposed. Check anyway.
"Tests will catch it"	Tests check what you wrote, not what you missed.

Output Format

When verifying, output:

code

## Correctness Verification

### Requirements: [PASS/FAIL/N/A]
- Requirement 1 → implemented in X
- Requirement 2 → implemented in Y

### Concurrency: [PASS/FAIL/N/A]
- Shared state: [list]
- Protection: [how]

### Errors: [PASS/FAIL/N/A]
- Failure points: [list]
- Handling: [approach]

### Resources: [PASS/FAIL/N/A]
### Boundaries: [PASS/FAIL/N/A]
### Security: [PASS/FAIL/N/A]

**Verdict:** [DONE / NOT DONE - list blockers]

Relationship to Other Skills

•strategic-coding: Design quality → use FIRST
•verifying-correctness: Actual correctness → use BEFORE "done"
•reviewing-module-design: Review others' code → use for PRs

Order: Design (strategic) → Implement → Verify (this skill) → Done