Debugging Skill

Intent

Reproduce failures reliably, isolate root cause, apply minimal fix, and verify without regressions.

Trigger

Use this skill for:

•Test or runtime failures.
•Regressions after recent changes.
•Intermittent behavior that still has a reproducible signal.

Do not use this skill for feature implementation without a failure signal.

Procedure

Step 1: Capture failure signal (mandatory)

Collect exact failure context first:

•failing command
•full error output
•failing file and line hints
•expected behavior vs actual behavior

If the user does not provide a command, request one command that reproduces the issue.

Blocking output:

text

DEBUG_SIGNAL
- command: "<exact command>"
- failure_type: <compile|test|runtime|integration|unknown>
- first_error_line: "<copied line>"
- affected_scope: "<module or feature>"

Step 2: Reproduce exactly (mandatory)

Run the exact command from Step 1 before reading broad code.

Recommended command patterns:

bash

# Keep full output for diagnosis
bun test <target> 2>&1
bun run typecheck 2>&1

Rules:

•If reproduction fails (cannot reproduce), stop and ask for missing context.
•Do not patch before successful reproduction.
•Do not broaden scope until first reproduction is captured.

Step 3: Generate ranked hypotheses (max 3)

Each hypothesis must include:

•one-sentence cause statement
•file:line to inspect
•validation action that can falsify it

Template:

text

HYPOTHESIS_1 (most likely)
- cause: "<statement>"
- inspect: "<path:line>"
- validation: "<command or read target>"

HYPOTHESIS_2
...

HYPOTHESIS_3
...

Hard rules:

•Maximum 3 active hypotheses.
•Maximum 3 tool calls per hypothesis before moving on.
•Never apply a fix before at least one hypothesis is confirmed.

Step 4: Validate hypotheses one by one

Validation order:

•Read implicated files and call chain.
•Correlate with error stack and data flow.
•Run a focused command to confirm/refute.

Validation checklist:

•Does the suspected line execute in failing path?
•Does input/state at that line match expectation?
•Does changing only this point explain observed error?

If hypothesis is refuted, document reason and move to next.

Step 5: Apply minimal fix

Fix only confirmed root cause lines.

Change boundary rules:

•Prefer surgical edit over rewrite.
•Keep public API unchanged unless failure requires API change.
•Do not bundle cleanup/refactor with bug fix.
•Avoid speculative guards that only hide symptoms.

Blocking output before verification:

text

FIX_PLAN
- root_cause: "<confirmed cause>"
- files_to_change:
  - <path>
- why_minimal: "<why this is smallest valid fix>"

Step 6: Verify in two layers

Layer A: exact repro command from Step 2. Layer B: broader safety check for nearby regressions.

Verification sequence:

bash

# Layer A: exact reproduction command
bun test <same-target-as-step-2>

# Layer B: broader check
bun test

If Layer A passes but Layer B fails:

•report as partial success
•list new failures and probable relation
•avoid claiming full completion

Step 7: Emit final debugging report

text

DEBUG_REPORT
- root_cause: "<single sentence>"
- fix_description: "<what changed>"
- evidence:
  - "<before signal>"
  - "<after signal>"
- verification: <pass|partial|fail>
- residual_risk: "<if any>"

Decision Shortcuts

•Compile/type errors: start at first compiler error, not downstream cascades.
•Test assertion mismatch: inspect fixture/setup before business logic.
•Runtime null/undefined errors: validate data contract and call path entry.
•Flaky tests: classify deterministic vs timing/order dependency before editing.

Stop Conditions

•Cannot reproduce after two explicit attempts with the provided command.
•Three hypotheses are exhausted with no confirmed root cause.
•Proposed fix creates two or more new failing tests.
•Tool-call budget exceeds configured limit without convergence.

When stopping, provide:

•what was tried
•strongest remaining hypothesis
•minimal missing input required from user

Anti-Patterns (never)

•Fixing symptom lines without tracing call chain.
•Adding broad try/catch to suppress failure.
•Editing test expectations to match buggy behavior.
•Mixing refactor with bug fix in one patch.
•Claiming "fixed" without rerunning the original failing command.

References

•Failure classification and triage prompts: skills/base/debugging/references/failure-triage.md

Example

Input:

text

"`bun test test/runtime/runtime.test.ts` fails with a type mismatch in digest.ts."

Expected workflow:

•Reproduce exact test failure.
•Produce 1-3 ranked hypotheses with file targets.
•Confirm one root cause and apply minimal edit.
•Rerun same test, then broader suite.
•Return DEBUG_REPORT with evidence lines.