Systematic Debugging Workflow
I'll help you debug issues systematically using the scientific method - hypothesis formation, testing, and iterative refinement.
Arguments: $ARGUMENTS - error description, reproduction steps, or context
Token Optimization
Target: 50% reduction (4,000-6,000 → 1,500-3,000 tokens)
Core Optimization Strategies
1. Hypothesis-Driven Debugging (Not Exhaustive Analysis)
- •❌ AVOID: Reading entire codebase to find bugs
- •✅ DO: Form hypotheses about likely causes, test top 2-3 first
- •Token savings: 90% (200 tokens vs 2,000+ tokens)
- •Pattern: Prioritize recently changed files, common failure patterns
2. Git Diff for Recently Changed Files (Likely Bug Source)
- •❌ AVOID:
ls -Rthen reading all files - •✅ DO:
git diff --name-only HEAD~3..HEADto find changed files - •✅ DO:
git log --oneline --since="3 days ago"for recent commits - •Token savings: 85% (300 tokens vs 2,000+ tokens)
- •Pattern: Bugs often introduced in recent changes
3. Stack Trace Parsing with Grep
- •❌ AVOID: Reading entire log files with Read tool
- •✅ DO:
grep -i "error\|exception\|fatal" logs/*.log | tail -20 - •✅ DO: Parse stack traces to extract file paths and line numbers
- •Token savings: 95% (100 tokens vs 2,000+ tokens for large logs)
- •Pattern: Stack traces reveal exact failure locations
4. Test Failure Analysis Caching
- •✅ Cache test results in
debug/state.json - •✅ Cache hypothesis outcomes to avoid retesting
- •✅ Cache reproduction steps once confirmed
- •Token savings: 70% on subsequent debugging turns
- •Pattern: Multi-turn debugging sessions benefit from state
5. Progressive Investigation (Narrow Before Deep)
- •✅ Start with stack trace → identify file → read specific function
- •✅ Hypothesis testing: test most likely causes first
- •✅ Binary search through git history when needed
- •Token savings: 60% (stop early when cause found)
- •Pattern: Most bugs have obvious causes in changed code
6. Session State Tracking for Multi-Turn Debugging
- •✅ Session files in
debug/directory - •✅ Track tested hypotheses to avoid repetition
- •✅ Resume from last checkpoint on subsequent runs
- •Token savings: 80% on resumed sessions (skip completed work)
- •Pattern: Complex bugs require multiple debugging turns
Token Usage by Operation
| Operation | Unoptimized | Optimized | Savings |
|---|---|---|---|
| Initial bug analysis | 2,000-3,000 | 500-1,000 | 60-75% |
| Hypothesis formation | 1,500-2,000 | 400-800 | 60-73% |
| Stack trace parsing | 2,000+ | 100-200 | 90-95% |
| File investigation | 2,000+ | 300-600 | 70-85% |
| Test reproduction | 1,000-1,500 | 200-400 | 73-80% |
| Session resume | 2,000-3,000 | 300-600 | 80-85% |
Average Reduction: 50% (4,000-6,000 → 1,500-3,000 tokens)
Debugging-Specific Patterns
Stack Trace Analysis:
# Extract file paths and line numbers from stack traces grep -E "at .+ \(.+:[0-9]+:[0-9]+\)" error.log | head -10 # Focus investigation on these specific files/lines
Recent Changes Focus:
# Find files changed in last 3 days (likely bug sources) git diff --name-only HEAD~10..HEAD # Only read files that changed recently
Hypothesis Prioritization:
- •Recent changes (80% of bugs) - Check git diff first
- •Stack trace files (90% reliability) - Read exact failure locations
- •Error message patterns (70% of bugs) - Grep for similar errors
- •Environment/config (20% of bugs) - Check if configs changed
- •External dependencies (10% of bugs) - Check updates
Binary Search for Regressions:
# Use git bisect to find regression commit git bisect start HEAD v1.2.3 git bisect run npm test # Automated testing # Saves 95% tokens vs manual testing each commit
Caching Behavior
Session Location: debug/ (in project root)
- •
debug/plan.md- Debugging plan with hypotheses and results - •
debug/state.json- Session state and test results - •
debug/reproduction.log- Issue reproduction steps and logs
Cache Location: .claude/cache/debug/
- •
hypotheses.json- Tested hypotheses and outcomes - •
stack-traces.json- Parsed stack trace information - •
changed-files.json- Recently changed files analysis
Cache Validity:
- •Until issue resolved (status: "solved" in state.json)
- •Until source files change (checksum-based)
- •7 days maximum for stale sessions
Shared With:
- •
/debug-root-cause- Root cause analysis skill - •
/debug-session- Debug session documentation - •
/test- Test execution for verification
Usage Examples
Start New Debugging Session:
debug-systematic "API returns 500 on POST /users" # Expected tokens: 1,500-3,000 (full analysis)
Resume Existing Session:
debug-systematic resume # Expected tokens: 800-1,500 (skips completed hypotheses)
Test Specific Hypothesis:
debug-systematic test 1 # Expected tokens: 500-1,000 (focused testing)
Check Debugging Progress:
debug-systematic status # Expected tokens: 200-500 (read session state only)
Mark Issue as Solved:
debug-systematic solved # Expected tokens: 300-600 (generate summary)
Early Exit Conditions
Exit immediately (saves 90% tokens) when:
- •✅ Issue already solved (check
debug/state.jsonstatus) - •✅ No test framework available (can't reproduce)
- •✅ Not a git repository (can't check recent changes)
- •✅ Root cause already identified in session state
Progressive disclosure saves 60-80% tokens:
- •Show hypothesis formation → wait for user confirmation
- •Test one hypothesis at a time → report results
- •Only deep dive when hypothesis confirms
Implementation Checklist
- •✅ Git diff analysis for recent changes (PRIMARY optimization)
- •✅ Stack trace parsing with Grep (saves 90-95%)
- •✅ Session-based hypothesis tracking (saves 70-80% on reruns)
- •✅ Progressive hypothesis testing (most likely → least likely)
- •✅ Bash-based log analysis (minimal tokens)
- •✅ Test failure result caching
- •✅ Early exit when issue resolved
- •✅ Binary search for regressions (git bisect)
- •✅ Focus area flags (specific file/function debugging)
Optimization Status: ✅ Optimized (Phase 2 Batch 2, 2026-01-26) Expected Tokens: 1,500-3,000 (vs. 4,000-6,000 unoptimized) Achieved Reduction: 50% average across all debugging operations
Session Intelligence
I'll maintain debugging session continuity:
Session Files (in current project directory):
- •
debug/plan.md- Debugging plan with hypotheses and results - •
debug/state.json- Session state and test results - •
debug/reproduction.log- Issue reproduction steps and logs
IMPORTANT: Session files are stored in a debug folder in your current project root
Auto-Detection:
- •If session exists: Resume debugging from last hypothesis
- •If no session: Create debugging plan and initial reproduction
- •Commands:
resume,reproduce,status,solved
Phase 1: Issue Reproduction & Information Gathering
Extended Thinking for Complex Debugging
For complex or elusive bugs, I'll use extended thinking to explore debugging strategies:
<think> When debugging complex issues: - Multiple potential root causes that interact - Timing-sensitive or race condition bugs - Environment-specific failures - Subtle state corruption scenarios - Performance degradation patterns - Security vulnerability exploitation paths </think>Triggers for Extended Analysis:
- •Intermittent or non-deterministic bugs
- •Production-only failures
- •Performance issues without obvious cause
- •Security vulnerabilities
- •Multi-component system failures
MANDATORY FIRST STEPS:
- •Check if
debugdirectory exists in current working directory - •If directory exists, check for session files:
- •Look for
debug/state.json - •Look for
debug/plan.md - •If found, resume from last hypothesis
- •Look for
- •If no directory or session exists:
- •Gather error information
- •Create reproduction steps
- •Initialize debugging session
Information Gathering (Token-Efficient):
#!/bin/bash
# Systematic Debugging - Information Gathering
gather_debug_info() {
echo "=== Issue Reproduction Information ==="
echo ""
# 1. Error logs (use Grep, not cat)
echo "Recent error logs:"
if [ -d "logs" ]; then
grep -i "error\|exception\|fatal" logs/*.log 2>/dev/null | tail -20 || echo " No errors in logs"
fi
# 2. Git status (what changed recently)
echo ""
echo "Recent changes:"
git log --oneline --since="3 days ago" | head -10 || echo " Not a git repository"
# 3. Environment info
echo ""
echo "Environment:"
if [ -f "package.json" ]; then
echo " Node: $(node --version 2>/dev/null || echo 'not installed')"
echo " NPM: $(npm --version 2>/dev/null || echo 'not installed')"
elif [ -f "requirements.txt" ]; then
echo " Python: $(python --version 2>/dev/null || echo 'not installed')"
fi
# 4. System resources
echo ""
echo "System resources:"
echo " Memory: $(free -h 2>/dev/null | grep Mem | awk '{print $3 "/" $2}' || echo 'N/A')"
echo " Disk: $(df -h . 2>/dev/null | tail -1 | awk '{print $3 "/" $2 " (" $5 ")"}' || echo 'N/A')"
# 5. Running processes (if server issue)
echo ""
echo "Relevant processes:"
ps aux | grep -E "node|python|java" | grep -v grep | head -5 || echo " No relevant processes"
}
gather_debug_info > debug/initial-state.log
cat debug/initial-state.log
Reproduction Steps:
#!/bin/bash
# Create reproducible test case
create_reproduction() {
cat > debug/reproduction.sh << 'EOF'
#!/bin/bash
# Minimal reproduction script
echo "=== Bug Reproduction Steps ==="
echo ""
echo "Step 1: Setup environment"
# TODO: Add setup commands
echo "Step 2: Execute actions that trigger bug"
# TODO: Add trigger commands
echo "Step 3: Verify bug occurs"
# TODO: Add verification
echo ""
echo "Expected: [describe expected behavior]"
echo "Actual: [describe actual behavior]"
EOF
chmod +x debug/reproduction.sh
echo "Created reproduction script: debug/reproduction.sh"
}
create_reproduction
Phase 2: Hypothesis Formation
I'll formulate testable hypotheses about the root cause:
Hypothesis Generation Framework:
# Debugging Plan - [timestamp] ## Issue Description **Summary**: [brief description] **Severity**: Critical | High | Medium | Low **Impact**: [affected users/systems] **Frequency**: Always | Intermittent | Rare ## Error Details
[Full error message/stack trace]
## Environment - **Platform**: [OS, runtime version] - **Configuration**: [relevant settings] - **Recent Changes**: [commits/deployments] ## Hypotheses (Prioritized) ### Hypothesis 1: [Most likely cause] - PRIORITY: HIGH **Theory**: [explanation of suspected cause] **Evidence**: [supporting observations] **Test**: [how to verify/disprove] **Expected**: [what should happen if correct] **Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved ### Hypothesis 2: [Second most likely] - PRIORITY: MEDIUM **Theory**: [explanation] **Evidence**: [observations] **Test**: [verification method] **Expected**: [expected outcome] **Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved ### Hypothesis 3: [Alternative cause] - PRIORITY: LOW **Theory**: [explanation] **Evidence**: [observations] **Test**: [verification method] **Expected**: [expected outcome] **Result**: [ ] Pending | [ ] Confirmed | [ ] Disproved ## Investigation Log - [timestamp]: Initial reproduction successful - [timestamp]: Hypothesis 1 testing in progress
Hypothesis Prioritization:
- •Recent changes - Check git history
- •Common patterns - Known bug categories
- •Environment issues - Dependencies, config
- •Logic errors - Code analysis
- •External factors - Third-party services
Phase 3: Systematic Testing
I'll test each hypothesis methodically:
Testing Framework:
#!/bin/bash
# Hypothesis Testing Script
test_hypothesis() {
local hypothesis_num="$1"
local test_description="$2"
echo "=== Testing Hypothesis $hypothesis_num ==="
echo "Test: $test_description"
echo ""
# Create checkpoint before testing
git stash push -m "Debug checkpoint before hypothesis $hypothesis_num"
# Run test
local result="PENDING"
# Log result
echo "[$hypothesis_num] $test_description: $result" >> debug/test-results.log
}
# Example: Test hypothesis about missing dependency
test_dependency_hypothesis() {
echo "Hypothesis: Missing or incompatible dependency"
# Check dependency versions
if [ -f "package.json" ]; then
echo "Checking npm dependencies..."
npm list --depth=0 2>&1 | grep -i "missing\|error" && {
echo "❌ CONFIRMED: Missing dependencies detected"
return 0
}
fi
echo "✓ DISPROVED: All dependencies present"
return 1
}
# Example: Test hypothesis about race condition
test_race_condition_hypothesis() {
echo "Hypothesis: Race condition in async code"
# Add delays to test timing sensitivity
echo "Running test with delays..."
# TODO: Add test with deliberate delays
echo "Running test rapidly..."
for i in {1..10}; do
# TODO: Run test in tight loop
true
done
}
# Test each hypothesis in priority order
test_dependency_hypothesis
test_race_condition_hypothesis
Binary Search Debugging:
#!/bin/bash
# Binary search through git history to find regression
git_bisect_debug() {
echo "=== Git Bisect Debugging ==="
# Find last known good commit
read -p "Enter last known good commit (or tag): " good_commit
read -p "Enter first known bad commit (or 'HEAD'): " bad_commit
git bisect start
git bisect bad $bad_commit
git bisect good $good_commit
cat > debug/bisect-test.sh << 'EOF'
#!/bin/bash
# Automated bisect test script
# Run test
npm test || exit 1 # Exit 1 if bad, 0 if good
# Or manual verification
echo "Test the current commit and press:"
echo " g - if this commit is good"
echo " b - if this commit is bad"
read -n 1 response
[ "$response" = "g" ] && exit 0 || exit 1
EOF
chmod +x debug/bisect-test.sh
echo "Run: git bisect run ./debug/bisect-test.sh"
}
Phase 4: Isolation & Simplification
I'll create minimal test cases:
Issue Isolation:
#!/bin/bash
# Create minimal reproducible example
create_minimal_reproduction() {
local issue_type="$1"
mkdir -p debug/minimal-case
case $issue_type in
"api")
cat > debug/minimal-case/test.js << 'EOF'
// Minimal API test case
const fetch = require('node-fetch');
async function testIssue() {
const response = await fetch('http://localhost:3000/api/endpoint');
const data = await response.json();
console.log('Response:', data);
// Add assertion that fails
}
testIssue().catch(console.error);
EOF
;;
"frontend")
cat > debug/minimal-case/test.html << 'EOF'
<!DOCTYPE html>
<html>
<head>
<title>Minimal Test Case</title>
</head>
<body>
<button id="testBtn">Click to trigger issue</button>
<div id="output"></div>
<script>
document.getElementById('testBtn').addEventListener('click', () => {
// Minimal code to reproduce issue
console.log('Testing...');
});
</script>
</body>
</html>
EOF
;;
"database")
cat > debug/minimal-case/test.sql << 'EOF'
-- Minimal database query to reproduce issue
BEGIN TRANSACTION;
-- Setup test data
CREATE TEMP TABLE test_data (id INT, value TEXT);
INSERT INTO test_data VALUES (1, 'test');
-- Query that demonstrates issue
SELECT * FROM test_data WHERE condition;
ROLLBACK;
EOF
;;
esac
echo "Created minimal test case in debug/minimal-case/"
}
Phase 5: Solution Implementation
Once root cause is identified, I'll implement the fix:
Fix Validation:
#!/bin/bash
# Validate fix before committing
validate_fix() {
echo "=== Fix Validation ==="
# 1. Run original reproduction - should now pass
echo "Step 1: Run original reproduction..."
if [ -f "debug/reproduction.sh" ]; then
./debug/reproduction.sh && echo "✓ Original issue resolved" || {
echo "❌ Issue still reproduces"
return 1
}
fi
# 2. Run full test suite
echo "Step 2: Run test suite..."
npm test 2>&1 | tee debug/post-fix-tests.log
# 3. Check for regressions
echo "Step 3: Check for regressions..."
git diff HEAD -- . | grep -E "^\+" | grep -v "^+++" | head -20
# 4. Verify no new errors
echo "Step 4: Lint check..."
npm run lint 2>&1 | grep -i "error" && {
echo "⚠️ New linting errors introduced"
} || echo "✓ No new linting errors"
echo ""
echo "✓ Fix validation complete"
}
validate_fix
Fix Documentation:
## Solution ### Root Cause [Detailed explanation of what caused the issue] ### Fix Applied [Description of the solution] ```diff // Before - problematic code // After + corrected code
Verification
- • Original reproduction no longer triggers issue
- • All tests passing
- • No regressions introduced
- • Edge cases handled
Prevention
[How to prevent similar issues in the future]
- •Add test coverage for [scenario]
- •Update validation to catch [condition]
- •Add monitoring for [metric]
## Phase 6: Regression Prevention
I'll add safeguards to prevent recurrence:
**Test Addition:**
```bash
#!/bin/bash
# Add regression test
add_regression_test() {
local test_framework="$1"
case $test_framework in
"jest")
cat >> tests/regression.test.js << 'EOF'
describe('Regression: [Issue Description]', () => {
test('should not reproduce issue #123', async () => {
// Reproduce the scenario that previously failed
const result = await functionThatHadBug();
// Assert correct behavior
expect(result).toBe(expectedValue);
});
});
EOF
;;
"pytest")
cat >> tests/test_regression.py << 'EOF'
def test_issue_123_regression():
"""Regression test for [issue description]"""
# Reproduce the scenario
result = function_that_had_bug()
# Assert correct behavior
assert result == expected_value
EOF
;;
esac
echo "Added regression test to prevent future occurrence"
}
Context Continuity
Session Resume:
When you return and run /debug-systematic or /debug-systematic resume:
- •Load debugging plan and hypothesis results
- •Show which hypotheses have been tested
- •Continue from next untested hypothesis
- •Track full debugging timeline
Progress Example:
RESUMING DEBUGGING SESSION ├── Issue: API timeout on user search ├── Hypotheses: 5 total ├── Tested: 3 (2 disproved, 1 confirmed) ├── Current: Testing database query optimization └── Status: Root cause identified Continuing investigation...
Practical Examples
Start Debugging:
/debug-systematic "API returns 500 on POST /users" /debug-systematic reproduce # Create reproduction steps /debug-systematic # Auto-resume if session exists
Hypothesis Testing:
/debug-systematic test 1 # Test specific hypothesis /debug-systematic isolate # Create minimal reproduction /debug-systematic bisect # Git bisect to find regression
Session Control:
/debug-systematic resume # Continue debugging /debug-systematic status # Show current progress /debug-systematic solved # Mark as solved and summarize
Debugging Techniques
Common Debugging Patterns:
- •Print Debugging:
add_debug_logging() {
echo "Adding strategic debug points..."
# Add before suspected issue
# Add after suspected issue
# Compare outputs
}
- •Rubber Duck Debugging:
## Explain to Rubber Duck 1. What the code should do: [expected behavior] 2. What the code actually does: [actual behavior] 3. Step-by-step execution: [trace through] 4. Where it diverges: [AHA moment]
- •Divide and Conquer:
# Comment out half the code # Does issue persist? # - Yes: Issue in remaining half # - No: Issue in commented half # Repeat until isolated
Safety Guarantees
Protection Measures:
- •Git checkpoints before each test
- •Automated state restoration
- •No destructive operations without confirmation
- •Clear rollback paths
Important: I will NEVER:
- •Modify production code without validation
- •Skip hypothesis testing
- •Apply fixes without verification
- •Add AI attribution
Skill Integration
When appropriate, I may suggest:
- •
/test- Run comprehensive test suite - •
/security-scan- Check if bug is security-related - •
/commit- Commit fix with clear message
Advanced Debugging Tools
Performance Profiling:
profile_performance() {
# Node.js profiling
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Python profiling
python -m cProfile -o profile.stats script.py
python -m pstats profile.stats
}
Memory Leak Detection:
detect_memory_leak() {
# Monitor memory over time
while true; do
ps aux | grep node | awk '{print $6}' | head -1
sleep 5
done | tee memory.log
# Analyze pattern
gnuplot << 'EOF'
set terminal png
set output 'memory-usage.png'
plot 'memory.log' with lines
EOF
}
Network Debugging:
debug_network() {
# Capture network traffic
tcpdump -i any -w debug/network.pcap port 3000
# Analyze with tshark
tshark -r debug/network.pcap -Y "http.response.code >= 400"
}
What I'll Actually Do
- •Gather information - Comprehensive context using Grep
- •Reproduce issue - Create reliable reproduction
- •Form hypotheses - Prioritized theories about cause
- •Test systematically - Validate each hypothesis
- •Isolate problem - Minimal reproducible case
- •Implement fix - Targeted solution
- •Prevent regression - Add tests and monitoring
I'll maintain complete debugging session continuity, tracking all hypotheses and results across sessions.
Credits: Systematic debugging methodology based on scientific method and debugging best practices from "Debugging: The 9 Indispensable Rules" by David Agans.