Technique: Root Cause Tracing
Concept: Bugs are rarely where they explode. The explosion (Stack Trace) is just where the bad data finally violated a constraint.
The Protocol
Do not fix the line in the stack trace yet. Follow the data upstream.
Step 1: Identify the "Contraband"
What specific value caused the crash?
- •Example: A
nulluser ID. - •Example: An empty string
""where a date was expected. - •Example: A value of
-1in a price field.
Step 2: The Upstream Walk (The "5 Whys")
Open your IDE. Use "Find Usages" or "Call Hierarchy."
- •Level 0 (Crash Site):
processPayment(user.id)-> Crashed becauseuser.idis null.- •Question: Who called
processPayment?
- •Question: Who called
- •Level 1:
CheckoutService.submitOrdercalled it.- •Question: Where did
CheckoutServiceget theuserobject?
- •Question: Where did
- •Level 2: It was passed in from
SessionManager.getCurrentUser().- •Question: Why did
SessionManagerreturn a user object with a null ID?
- •Question: Why did
- •Level 3:
SessionManagerhydrated it from the Redis Cache.- •Question: Why is the data in Redis corrupt?
- •Level 4 (Root Cause): The
LoginHandlerwrote the user to Redis before the database generated the ID.
Step 3: Verify the Origin
- •Hypothesis: The bug is in
LoginHandler(Level 4), notprocessPayment(Level 0). - •Test: Fix the
LoginHandlerordering. - •Result: The crash at Level 0 disappears because the data is now correct.
Common Traps
- •The "Null Check" Trap: Adding
if (user.id == null) return;at Level 0 fixes the crash, but it creates a "Zombie Order" (an order that fails silently). Do not do this. - •The "Sanitizer" Trap: Cleaning the string at Level 1 hides the fact that the database at Level 5 is corrupt.
Rule: Fix the data generation, not the data consumption.