Systematic Debugging

The Iron Law

code

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't completed Phase 1, you cannot propose fixes.

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

The Four Phases

Complete each phase before proceeding to the next.

Phase 1: Root Cause Investigation

BEFORE attempting ANY fix:

•
Read error messages carefully — Don't skip past them. Read stack traces completely. Note line numbers, file paths, error codes.
•
Reproduce consistently — Can you trigger it reliably? What are the exact steps? If not reproducible → gather more data, don't guess.
•
Check recent changes — Git diff, recent commits, new dependencies, config changes, environmental differences.
•
Gather evidence in multi-component systems — Add diagnostic logging at each component boundary. Run once to see WHERE it breaks. Then investigate that specific component.
•
Trace data flow — Where does the bad value originate? What called this with bad input? Keep tracing up until you find the source. Fix at source, not at symptom.

Phase 2: Pattern Analysis

•Find working examples — Locate similar working code in the same codebase
•Compare against references — Read reference implementation COMPLETELY. Don't skim.
•Identify differences — What's different between working and broken? List every difference.
•Understand dependencies — What settings, config, environment does this need?

Phase 3: Hypothesis and Testing

•Form single hypothesis — State clearly: "I think X is the root cause because Y"
•Test minimally — Smallest possible change to test hypothesis. One variable at a time.
•Verify — Did it work? Yes → Phase 4. No → NEW hypothesis. DON'T add more fixes on top.
•If you don't know — Say "I don't understand X." Don't pretend.

Phase 4: Implementation

•Create failing test case — Use TDD skill. Simplest reproduction.
•Implement single fix — Address root cause. ONE change. No "while I'm here" improvements.
•Verify fix — Test passes? No other tests broken? Issue resolved?
•If fix doesn't work — Count attempts. If < 3: return to Phase 1. If ≥ 3: STOP and question the architecture. Discuss with human before more attempts.

Track Progress

Use the todo tool to track debugging phases:

code

todo create "Debugging: [issue description]"
todo batch items: [
  { text: "Phase 1: Root cause investigation", group: "Investigation" },
  { text: "Phase 2: Pattern analysis", group: "Investigation" },
  { text: "Phase 3: Form and test hypothesis", group: "Fix" },
  { text: "Phase 4: Implement and verify fix", group: "Fix" }
]

Red Flags — STOP and Return to Phase 1

•"Quick fix for now, investigate later"
•"Just try changing X and see"
•"Add multiple changes, run tests"
•"I don't fully understand but this might work"
•Proposing solutions before tracing data flow
•"One more fix attempt" (when already tried 2+)
•Each fix reveals new problem in different place

Common Rationalizations

Excuse	Reality
"Issue is simple"	Simple issues have root causes too.
"Emergency, no time"	Systematic is FASTER than guess-and-check.
"Just try this first"	First fix sets the pattern. Do it right.
"I see the problem"	Seeing symptoms ≠ understanding root cause.
"One more fix" (after 2+)	3+ failures = architectural problem.

Quick Reference

Phase	Key Activities	Done When
1. Root Cause	Read errors, reproduce, check changes, trace data	Understand WHAT and WHY
2. Pattern	Find working examples, compare	Identified differences
3. Hypothesis	Form theory, test minimally	Confirmed or new hypothesis
4. Implementation	Create test, fix, verify	Bug resolved, tests pass