AI Debug
Figure out why an existing AI feature is broken.
Works with:
- •Linear MCP - Pull issue/bug details
- •Manual - Describe the symptoms
Entry Point
When this skill is invoked, start with:
code
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ AI DEBUG ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ When AI fails, teams blame the model. But 90% of failures are context failures. What's going wrong? 1. Provide a Linear issue ID 2. Describe the symptoms ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Usage
code
/ai-debug # Describe symptoms manually /ai-debug LIN-123 # Start from Linear bug/issue
What It Does
Works backwards from symptoms to root cause using the 4D audit:
| Symptom | Likely Root Cause | Focus Area |
|---|---|---|
| Hallucinations | Missing domain context, no grounding | D2, D4 |
| Inconsistency | Vague job definition, missing rules | D1, D4 |
| Generic outputs | Missing user/environment context | D2 |
| Wrong tone/format | Missing constraints, no examples | D1, D4 |
| Slow responses | Too much context, bad discovery | D2, D3 |
| High costs | Dumping everything in prompt | D2, D3 |
| Demo vs prod mismatch | Discovery strategy broken | D3, D4 |
Key insight: When AI fails, teams blame the model. But 90% of failures are context failures.
The 4D Audit
D1: Was the Job Defined?
- •Can you articulate exactly what the model should produce?
- •Is there a written spec for inputs, outputs, constraints?
- •Do engineers and PMs agree on what "good" looks like?
D2: Is Context Right?
- •What context is the model actually receiving?
- •Walk through the 6 layers: Intent, User, Domain, Rules, Environment, Exposition
- •Is context structured or dumped as raw text?
- •Is there too much context (token bloat)?
D3: Is Context Fetched Reliably?
- •How is each piece of context being fetched at runtime?
- •What happens when a data source is unavailable?
- •Is there visibility into what context is used per request?
D4: Are Failures Being Caught?
- •Are there pre-checks before calling the model?
- •Are there post-checks validating output?
- •What's the fallback UX when things break?
- •Is there a feedback loop capturing failures?
Output
code
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CONTEXT AUDIT COMPLETE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Feature: [Name] Symptoms: [What was reported] D1 Demand: [CLEAR / GAP / CRITICAL] D2 Data: [CLEAR / GAP / CRITICAL] D3 Discovery: [CLEAR / GAP / CRITICAL] D4 Defense: [CLEAR / GAP / CRITICAL] Primary Issue: [Root cause summary] RECOMMENDED FIXES (prioritized): 1. [Highest impact fix] 2. [Second fix] 3. [Third fix] Quick Win: [Smallest change that would help] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Workflow
- •Collect symptoms (what's going wrong)
- •Map symptoms to likely causes using the table above
- •Audit each D dimension with diagnostic questions
- •Identify root cause and prioritize fixes
- •Offer to add findings to Linear or export
Questions to ask at each step:
- •"What specific behavior are you seeing?"
- •"What should it be doing instead?"
- •"When did this start happening?"
- •"Does it happen every time or intermittently?"
Framework: 4D Context Canvas (Aakash Gupta & Miqdad Jaffer) Best for: Debugging hallucinations, inconsistency, performance issues in AI features