Error Recovery
Overview
Handle failures gracefully with structured recovery.
Core principle: When things break, don't panic. Assess, preserve, recover, verify.
Announce at start: "I'm using error-recovery to handle this failure."
The Recovery Protocol
code
Error Detected
│
▼
┌─────────────┐
│ 1. ASSESS │ ← Severity? Scope? Impact?
└──────┬──────┘
│
▼
┌─────────────┐
│ 2. PRESERVE │ ← Capture evidence before it's lost
└──────┬──────┘
│
▼
┌─────────────┐
│ 3. RECOVER │ ← Follow decision tree
└──────┬──────┘
│
▼
┌─────────────┐
│ 4. VERIFY │ ← Confirm clean state
└──────┬──────┘
│
▼
┌─────────────┐
│ 5. DOCUMENT │ ← Record what happened
└─────────────┘
Step 1: Assess Severity
Severity Levels
| Level | Description | Examples |
|---|---|---|
| Critical | System unusable, data at risk | Build completely broken, tests cause data loss |
| Major | Significant functionality broken | Feature doesn't work, many tests failing |
| Minor | Isolated issue, workaround exists | Single test flaky, style error |
| Info | Warning only, not blocking | Deprecation notice, performance hint |
Assessment Questions
markdown
## Error Assessment **Error:** [Description of error] **Location:** [Where it occurred] ### Severity Checklist - [ ] Is the system still functional? - [ ] Is any data at risk? - [ ] Are other features affected? - [ ] Is this blocking progress? ### Scope - Files affected: [list] - Features affected: [list] - Users affected: [none/some/all]
Step 2: Preserve Evidence
Capture BEFORE attempting fixes:
Error Logs
bash
# Capture error output pnpm test 2>&1 | tee error-log.txt # Or from failed command ./failing-command 2>&1 | tee error-log.txt
Stack Traces
markdown
## Stack Trace
Error: Connection refused at Database.connect (src/db/connection.ts:45) at UserService.init (src/services/user.ts:23) at main (src/index.ts:12)
code
State Capture
bash
# Git state git status git diff # Environment state env | grep -E "NODE|NPM|PATH" # Dependency state pnpm list
Screenshot (if visual)
For UI errors, capture screenshots before changes.
Step 3: Recover
Decision Tree
code
What type of failure?
│
┌────┴────┬────────────┬────────────┐
│ │ │ │
Code Build Environment External
Error Error Issue Service
│ │ │ │
▼ ▼ ▼ ▼
┌────┐ ┌────┐ ┌────┐ ┌────┐
│Git │ │Clean│ │Re- │ │Wait/│
│reco│ │build│ │init │ │Retry│
│very│ │ │ │ │ │ │
└────┘ └────┘ └────┘ └────┘
Code Error Recovery
Single file broken:
bash
# Revert just that file git checkout HEAD -- path/to/file.ts
Feature broken (multiple files):
bash
# Find last good commit git log --oneline # Revert to that commit (soft reset keeps changes staged) git reset --soft [GOOD_COMMIT] # Or hard reset (discards changes) git reset --hard [GOOD_COMMIT]
Working directory is a mess:
bash
# Stash current changes git stash # Verify clean state git status # Optionally recover stash later git stash pop
Build Error Recovery
bash
# Clean build artifacts rm -rf node_modules dist build .cache # Reinstall dependencies pnpm install --frozen-lockfile # Clean install from lock file # Rebuild pnpm build
Environment Error Recovery
bash
# Check environment env | grep -E "NODE|PNPM" # Reset Node modules rm -rf node_modules pnpm install --frozen-lockfile # If using nvm, verify version nvm use # Re-run init script ./scripts/init.sh
External Service Error
bash
# Check if service is up curl -I https://service.example.com/health # If down, wait and retry sleep 60 curl -I https://service.example.com/health # If still down, check status page # Document as external blocker
Step 4: Verify
After recovery, verify clean state:
Basic Verification
bash
# Clean working directory git status # Expected: "nothing to commit, working tree clean" or known changes # Tests pass pnpm test # Build succeeds pnpm build # Types check pnpm typecheck
Functionality Verification
bash
# Run the specific thing that was broken pnpm test --grep "specific test" # Or verify the feature manually
Step 5: Document
Issue Comment
bash
gh issue comment [ISSUE_NUMBER] --body "## Error Recovery **Error encountered:** [Description] **Severity:** Major **Evidence:** \`\`\` [Error output] \`\`\` **Recovery actions:** 1. [Action 1] 2. [Action 2] **Verification:** - [x] Tests pass - [x] Build succeeds **Root cause:** [If known] **Prevention:** [If applicable] "
Knowledge Graph
javascript
// Store for future reference
mcp__memory__add_observations({
observations: [{
entityName: "Issue #[NUMBER]",
contents: [
"Encountered [error type] on [date]",
"Caused by: [root cause]",
"Resolved by: [recovery action]"
]
}]
});
Common Recovery Patterns
"Tests were passing, now failing"
bash
# What changed? git diff HEAD~3 # Did dependencies change? git diff HEAD~3 pnpm-lock.yaml # Clean reinstall rm -rf node_modules && pnpm install --frozen-lockfile
"Works locally, fails in CI"
bash
# Check for environment differences # - Node version # - OS differences # - Env vars # Run with CI-like settings CI=true pnpm test
"Build was working, now broken"
bash
# Check TypeScript errors pnpm typecheck # Check for circular dependencies pnpm dlx madge --circular src/ # Clean build rm -rf dist && pnpm build
"I broke everything"
bash
# Don't panic # Find last known good state git log --oneline # Reset to that state git reset --hard [GOOD_COMMIT] # Verify pnpm test # Start again more carefully
Escalation
If recovery fails after 2-3 attempts:
markdown
## Escalation: Unrecoverable Error **Issue:** #[NUMBER] **Error:** [Description] **Recovery attempts:** 1. [Attempt 1] - [Result] 2. [Attempt 2] - [Result] **Current state:** [Broken/Partially working] **Evidence preserved:** [Links to logs, screenshots] **Requesting help with:** [Specific question]
Mark issue as Blocked and await human input.
Checklist
When error occurs:
- • Severity assessed
- • Evidence preserved (logs, state, screenshots)
- • Recovery action selected
- • Recovery executed
- • Clean state verified
- • Tests pass
- • Build succeeds
- • Issue documented
Integration
This skill is called by:
- •
issue-driven-development- When errors occur - •
ci-monitoring- CI failures
This skill may trigger:
- •
research-after-failure- If cause is unknown - •Issue update via
issue-lifecycle