Systematic Debugging Skill
Overview
This skill provides a structured four-phase debugging framework emphasizing root cause discovery before attempting fixes. Core principle: "Random fixes waste time and create new bugs. Quick patches mask underlying issues."
Quick Start
- •Investigate - Gather evidence, reproduce consistently
- •Analyze - Compare with working patterns
- •Hypothesize - Form and test specific theories
- •Implement - Fix with test coverage
When to Use
- •Bug reports requiring investigation
- •Test failures with unclear causes
- •Production incidents
- •Performance regressions
- •Integration failures
- •Any debugging that requires more than 5 minutes
The Four Phases
Phase 1: Root Cause Investigation
Objective: Understand the problem completely before attempting any fix.
Steps:
- •Examine error messages thoroughly
- •Reproduce the issue consistently
- •Review recent changes (commits, configs, dependencies)
- •Gather diagnostic evidence (logs, traces, metrics)
- •For multi-component systems, add instrumentation at each boundary
Questions to answer:
- •What exactly is failing?
- •When did it start failing?
- •What changed recently?
- •Can I reproduce it reliably?
Phase 2: Pattern Analysis
Objective: Find working examples and understand differences.
Steps:
- •Locate working examples in the codebase
- •Compare against reference implementations completely
- •Identify differences systematically
- •Understand all dependencies
Key comparisons:
- •Working vs. broken code paths
- •Expected vs. actual behavior
- •Known good state vs. current state
Phase 3: Hypothesis and Testing
Objective: Form and validate theories before changing code.
Steps:
- •Formulate a specific hypothesis
- •Design a test for the hypothesis
- •Test with minimal changes (one variable at a time)
- •Verify results before proceeding
Hypothesis format: "The bug occurs because [condition] when [trigger], which causes [symptom]."
Phase 4: Implementation
Objective: Fix the root cause with proper verification.
Steps:
- •Create a failing test case reproducing the bug
- •Implement a single fix addressing the root cause
- •Verify the test passes
- •Verify no other tests broke
- •Document the fix
Critical Safeguards
Hard Stop Rule
If >= 3 fixes fail: STOP and question the architecture.
When multiple fixes fail, the issue indicates deeper structural problems requiring discussion rather than continued symptom-patching.
Red Flags (Restart Process)
- •Proposing solutions before investigation
- •Attempting multiple simultaneous fixes
- •Assuming without verification
- •Skipping reproduction step
- •"It should work" without evidence
Debugging Anti-Patterns
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Shotgun debugging | Random changes hoping something works | Systematic investigation |
| Printf debugging only | Incomplete picture | Structured instrumentation |
| Blame the framework | Avoids understanding | Verify framework behavior |
| "Works on my machine" | Environment assumptions | Document exact repro steps |
| Quick patch | Hides root cause | Find and fix actual cause |
Instrumentation Strategies
Logging Strategy
1. Entry/exit of suspected functions 2. Input/output values at boundaries 3. State changes at key points 4. Timing information for performance issues
Boundary Tracing
For multi-component systems:
[Input] -> [Component A] -> [Component B] -> [Output] ^ ^ ^ ^ | | | | Check 1 Check 2 Check 3 Check 4
Add verification at each boundary to isolate failure point.
Best Practices
Do
- •Reproduce before investigating
- •Document investigation steps
- •Test one hypothesis at a time
- •Write regression test for every bug fix
- •Share findings with team
- •Update documentation when environment-related
Don't
- •Jump to conclusions
- •Make multiple changes at once
- •Fix symptoms instead of causes
- •Skip the hypothesis step
- •Merge fixes without tests
- •Ignore intermittent failures
Error Handling
| Situation | Action |
|---|---|
| Cannot reproduce | Gather more context, check environment differences |
| Multiple potential causes | Isolate and test each separately |
| Fix breaks other things | Revert, investigate dependencies |
| Root cause unclear after investigation | Escalate, add more instrumentation |
Metrics
| Metric | Target | Description |
|---|---|---|
| First-fix success rate | >80% | Fixes that resolve issue first time |
| Regression rate | <5% | Bug fixes causing new bugs |
| Investigation time ratio | >60% | Time spent investigating vs. coding |
| Documentation rate | 100% | Bugs documented with root cause |
Debugging Checklist
- • Issue reproduced consistently
- • Recent changes reviewed
- • Error messages fully understood
- • Working comparison found
- • Hypothesis documented
- • Single-variable test performed
- • Root cause identified
- • Failing test written
- • Fix implemented
- • All tests pass
- • Fix documented
Related Skills
- •tdd-obra - Test-first development
- •writing-plans - Plan implementations
- •code-reviewer - Code quality review
Version History
- •1.0.0 (2026-01-19): Initial release adapted from obra/superpowers