Systematic Debugging Skill

Use when debugging errors, investigating failures, or troubleshooting data collection issues.

Based on obra/superpowers systematic-debugging (MIT License).

Core Principle

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.

Symptom-level patches create technical debt and mask underlying problems.

Four Mandatory Phases

Phase 1: Root Cause Investigation

•
Read error messages thoroughly
- •Full stack traces, not just the last line
- •Error codes and their documentation
- •Timestamps and request context
•
Reproduce consistently
- •Document exact reproduction steps
- •Identify conditions (specific app IDs, languages, etc.)
- •Check if issue is intermittent or consistent
•
Check recent changes
- •Git log for recent commits
- •Configuration changes
- •External API changes (rate limits, response format)

•

Gather evidence at component boundaries

python

# Add diagnostic logging at boundaries
logging.debug(f"API request: {url}, headers: {headers}")
logging.debug(f"API response: status={resp.status_code}, body={resp.text[:500]}")

•
Trace data flow backward
- •Start from the error location
- •Follow the call stack up
- •Identify where data becomes invalid

Phase 2: Pattern Analysis

•
Find working examples
- •Compare failing vs successful requests
- •Identify what's different (app ID, locale, timing)
•
Compare against references
- •Check API documentation
- •Compare with known-good implementations
- •Review similar code in the codebase
•
Identify all differences
- •Headers, parameters, encoding
- •Timing, ordering, state
•
Understand dependencies
- •External services state
- •Database state
- •Network conditions

Phase 3: Hypothesis Testing

•
Form a single clear hypothesis
- •"The error occurs because X"
- •Be specific, not vague
•
Test with minimal changes
- •Change ONE variable at a time
- •Log before and after
•
Verify results
- •Did the change fix the issue?
- •Did it introduce new issues?
- •Is it reproducible?

Phase 4: Implementation

•

Create a failing test case first

python

def test_handles_malformed_date():
    # Reproduces the bug
    result = parse_date("Invalid Date Format")
    assert result is None  # Should not crash

•
Implement single fix addressing root cause
- •Not a workaround
- •Not multiple changes at once
•
Verify the fix
- •Original test passes
- •No regression in other tests
- •Manual verification if needed

Critical Guardrails

Red Flags - Stop and Restart

•Proposing solutions before investigation
•Attempting multiple simultaneous fixes
•Making changes without understanding why
•Ignoring error messages or logs

Three-Fix Rule

If three or more fix attempts fail:

STOP. This is NOT a failed hypothesis - this is wrong architecture.

•Step back and reassess the problem
•Consider if the approach is fundamentally flawed
•Discuss with team before continuing

Debugging Data Collection Issues

Common Root Causes

•
Rate limiting
- •Check response headers for rate limit info
- •Review request timing and patterns
•
Response format changes
- •API may have changed without notice
- •Compare current vs expected response structure
•
Network issues
- •Timeouts, connection resets
- •DNS resolution failures
- •Check network binding configuration
•
Data parsing errors
- •Unexpected null values
- •Encoding issues (UTF-8, special characters)
- •Date format variations by locale
•
State management
- •Database connection pool exhaustion
- •Cursor position issues
- •Transaction isolation problems

Diagnostic Checklist

python

# 1. Log the full request
logging.debug(f"Request: {method} {url}")
logging.debug(f"Headers: {headers}")
logging.debug(f"Body: {body}")

# 2. Log the full response
logging.debug(f"Response status: {resp.status_code}")
logging.debug(f"Response headers: {resp.headers}")
logging.debug(f"Response body: {resp.text[:1000]}")

# 3. Check database state
logging.debug(f"DB pool: active={pool.get_stats()}")

# 4. Track timing
start = time.time()
# ... operation ...
logging.debug(f"Operation took {time.time() - start:.2f}s")

Time Investment

Systematic debugging: 15-30 minutes, 95% first-time success Guess-and-check: 2-3 hours, 40% success rate

The systematic approach is faster even under time pressure.