Finding Bugs

Iron Law

The agent MUST NOT declare "no issues found" or produce any findings summary without completing ALL six phases for EVERY item in scope. Completing Phase 3 for one function does not excuse skipping Phase 4 for that same function. Finding one bug does not excuse skipping the remaining scope. Each phase produces explicit written output before the next phase begins.

When to Use

The agent activates this skill when any of these conditions hold:

•Post-merge integration review -- the user merged multiple PRs or branches and wants to check for interaction bugs.
•Proactive audit -- the user asks the agent to audit, review, or find bugs in a module, file, or codebase with no known failure.
•Vague suspicion -- the user says "something feels off" or "can you check this area" without a specific error message.

The agent does NOT use this skill when:

•A specific error message or test failure exists (use systematic-debugging).
•The goal is to write or improve tests (use TDD workflow).

Phase Pipeline

The agent executes these six phases in strict order. No phase may be skipped. Each phase produces written output before the next begins. If the scope contains N items, every item passes through every phase.

dot

digraph finding_bugs {
    rankdir=TB;
    node [shape=box, style="rounded,filled", fillcolor="#e8e8e8", fontname="Helvetica"];
    edge [fontname="Helvetica", fontsize=10];

    scope [label="Phase 1\nScope Definition", fillcolor="#c6e4f7"];
    contracts [label="Phase 2\nContract Inventory", fillcolor="#c6e4f7"];
    impact [label="Phase 3\nImpact Tracing &\nSpec Check", fillcolor="#c6e4f7"];
    adversarial [label="Phase 4\nAdversarial Analysis", fillcolor="#f7d6c6"];
    gaps [label="Phase 5\nGap Analysis", fillcolor="#f7d6c6"];
    report [label="Phase 6\nShallow Verification\n& Report", fillcolor="#d4edda"];

    scope -> contracts [label="scope list"];
    contracts -> impact [label="contract table"];
    impact -> adversarial [label="traced issues"];
    adversarial -> gaps [label="adversarial findings"];
    gaps -> report [label="coverage gaps"];

    subgraph cluster_legend {
        label="Legend";
        style=dashed;
        node [shape=plaintext, fillcolor=white];
        l1 [label="Blue = inventory phases"];
        l2 [label="Orange = analysis phases"];
        l3 [label="Green = verification & output"];
    }
}

Phase 1: Scope Definition

The agent defines the exact scope before reading any implementation code.

•Identify every changed file, function, and module in scope.
•Map the blast radius: what other code depends on or is called by the changes.
•Write the scope list explicitly. Example: "Scope: PR #76, PR #81, files X, Y, Z, callers A, B, C."

If the user does not provide scope, the agent asks for it. The agent never assumes scope from context alone.

Phase 2: Contract Inventory

For EACH function or module in scope, the agent enumerates in writing:

Contract Element	What to Document
Input preconditions	Types, ranges, nullability, required state
Output postconditions	Return values, side effects, state mutations
Invariants	What must be true before and after execution
Implicit assumptions	Data format, ordering, encoding, timing

The agent writes this table before proceeding. Skipping this phase is the primary cause of missed bugs -- without an explicit contract list, there is nothing to trace against.

Phase 3: Impact Tracing and Spec Check

For each item in scope, the agent performs both impact tracing and specification checking.

Specification conformance: Compare the implementation against its specification (PR description, issue requirements, doc comments, design docs). Does the code do what it claims to do? The agent must read the specification source, not guess at intent.

Impact tracing:

•Do all callers still satisfy preconditions after the change?
•Do all callees still satisfy postconditions?
•Are cross-change interactions safe? (Two independently-correct changes that break when combined.)
•FedEx tour: trace one key data entity through its full lifecycle across the changed code.

Phase 4: Adversarial Analysis

The agent applies a "prove it breaks" mindset. This phase is MANDATORY even if Phases 2-3 found nothing. Finding nothing earlier means this phase is MORE important, not less.

Technique	What to Probe
Boundary analysis	Zero, null, empty, max values at edges of changed logic
Invalid state transitions	Can callers reach this code in an unexpected order?
Race conditions	If two callers invoke simultaneously, what happens?
Time boundaries	Timezone, leap year, month-end if relevant
Input edge cases	Special characters, whitespace, encoding in changed paths
Defect clustering	Spend more time on modules with prior bug history

The agent writes at least one adversarial scenario per item in scope. If no scenario produces a finding, the agent documents what was tried and why it did not break.

Phase 5: Gap Analysis

The agent identifies:

•Changed code paths with no test coverage.
•Contracts from Phase 2 that have no corresponding test assertions.
•Pesticide paradox: tests that exist but have not evolved with the code (test passes but does not exercise the new behavior).

Phase 6: Shallow Verification and Report

For each suspect found in Phases 3-5:

•Trace one concrete code path to confirm it is a real issue.
•If confirmed, add to findings with severity and confidence.
•If not reproducible through reasoning, mark as "uncertain."
•Produce the formal report (see Output Format below).

The agent does NOT fix bugs during this phase. Fixing mid-scan causes the agent to forget remaining scope items. Findings are recorded; fixes come later.

Bug-Finding Disciplines

Tier 1: Contract and Invariant Analysis (Primary)

Discipline	Description
Invariant checking	"This variable must always be positive here"
Type narrowing	"Type says X but runtime value could be Y"
Error path tracing	"If this call fails, does the caller handle it?"
Invalid state transitions	Can the system reach a state violating its contract?
Implicit coupling	"Two modules share assumptions about data format"
Specification conformance	Does the implementation match what it claims to do?

Tier 2: Change Impact Analysis

Discipline	Description
Defect clustering (Pareto)	Focus on modules with frequent changes or prior bugs
Pesticide paradox	Tests unchanged but code changed -- tests may be blind
FedEx tour	Trace a data entity through its lifecycle across changed code
Boundary analysis	Zero, null, empty, max int, off-by-one at changed edges

Tier 3: Concurrency and Timing

Discipline	Description
Race conditions	Two simultaneous callers -- what happens?
State machine analysis	Does new code respect transition ordering?
Time boundaries	Timezone, leap year, month-end logic in changed code
Session contradictions	Does code handle stale or concurrent sessions?

Tier 4: Input and Encoding

Discipline	Description
Special characters	Does changed code sanitize emoji, control chars, injection?
Null and whitespace	Are empty or whitespace inputs handled in new code?
Pairwise interactions	When multiple parameters combine, do unexpected pairs break?
Resource limits	What if memory or storage is exhausted during this path?

Rationalization Defense

When the agent catches itself thinking any of the following, the agent stops and applies the correction in the right column.

Rationalization	Correction
"The code looks fine"	Did the agent complete all 6 phases? If not, the agent does not know that.
"I will just fix this quickly"	Stop. Add it to findings and keep scanning. Fixing mid-scan means forgetting the rest.
"This scope is too large"	Break it into chunks. Analyze each chunk through all phases. Never skip a chunk.
"The tests pass so it is probably fine"	Pesticide paradox -- passing tests only prove what they test. Check for gaps.
"This change is trivial"	Trivial changes in high-traffic code paths cause production outages. Check contracts.
"I already know what this code does"	Read it again. Assumptions are where bugs hide.
"The PR was reviewed by humans"	Humans skim. The agent does the systematic trace they did not.
"This is just a refactor"	Refactors change structure. Structure changes break implicit coupling. Check callers.
"I checked the main change, the surrounding code is fine"	Cross-change interactions are the number one integration bug source. Trace the blast radius.
"I found a bug, so the review is complete"	Finding one bug does not excuse skipping remaining scope. Complete all phases for all items.

Red Flags

The agent is producing low-quality output if any of these are true:

•Declaring "no issues found" in under 2 minutes for any non-trivial scope.
•Skipping Phase 4 because Phases 2-3 found nothing.
•Reporting only one severity level (all "low" or all "high").
•Never looking at callers or callees of changed code.
•Summarizing a function without reading its implementation.
•Stopping analysis after the first finding.
•Producing free-form narrative instead of the structured report format.
•Listing techniques without executing them ("I would check boundaries" instead of actually checking boundaries).

Output Format

The agent produces exactly this structure. Every section is mandatory. Empty sections are written as "None identified" -- they are never omitted.

code

## Bug Finding Report

**Scope:** [what was analyzed -- PRs, modules, files]
**Branch/Commits:** [git refs]

### Phase Completion Checklist

- [ ] Phase 1: Scope defined (N items)
- [ ] Phase 2: Contracts inventoried (N functions)
- [ ] Phase 3: Impact traced and specs checked
- [ ] Phase 4: Adversarial analysis completed (N scenarios tested)
- [ ] Phase 5: Gap analysis completed
- [ ] Phase 6: Findings verified

### Findings

#### [F1] Severity: HIGH | file.cpp:123
**Category:** Contract violation / Integration issue / Spec mismatch / ...
**What:** One-sentence description of the bug.
**Reasoning:** Step-by-step trace of how the agent arrived at this conclusion.
**Confidence:** Confirmed / Likely / Uncertain

#### [F2] ...

### Coverage Gaps
- [G1] function_name() -- changed but no test covers the new path

### Spec Deviations
- [S1] PR #N says "X" but implementation does "Y"

### Summary
- X findings (H high, M medium, L low)
- Y coverage gaps
- Z spec deviations

Severity levels:

•HIGH: Confirmed bug, crash, or data corruption.
•MEDIUM: Likely issue, needs investigation to confirm.
•LOW: Code smell, edge case, or minor inconsistency.

Confidence levels:

•Confirmed: Traced a concrete code path demonstrating the issue.
•Likely: Strong reasoning supports the issue but no single path traced.
•Uncertain: Suspicious but could not verify through reasoning alone.

Quick Reference

Phase	Input	Output	Key Question
1. Scope	User request	Scope list with blast radius	What exactly are we analyzing?
2. Contracts	Scope list	Contract table per function	What must be true for this code?
3. Impact + Spec	Contracts	Traced issues, spec deviations	Does reality match the contract and spec?
4. Adversarial	Traced issues	Adversarial findings	How can I make this break?
5. Gaps	All prior phases	Coverage gaps list	What is NOT tested?
6. Report	All findings	Formal ranked report	Is each finding real or theoretical?