Integrator Debug Patterns
Shared patterns across all Integrator debug commands. Auto-loaded as context.
See also: @integrator-architecture, @arco-architecture
Interaction Model
CRITICAL -- All debug commands follow this collaborative model:
- •Work in series, not in parallel -- tackle one step at a time, like a human debugging alongside the user.
- •Explain as you go -- before each action, explain what you're about to do and why. After each result, share your interpretation and hypotheses.
- •Keep the user in the loop at all times -- never run multiple investigative steps silently. Present findings, state your current hypothesis, and confirm direction before moving on.
- •Show evidence -- CRITICAL: after every log fetch, DLQ peek, or API call, quote the relevant snippets (log lines, payloads, status codes, error messages, timestamps) directly in your response. Never summarize without showing the raw data that supports the conclusion. The user must see the evidence, not just the interpretation.
- •Ask before branching -- if the investigation could go in multiple directions, present the options and let the user decide.
Self-Improvement Protocol
During debug sessions, new insights about architecture, log patterns, or debugging strategy will emerge.
When a new insight is discovered:
- •Briefly state: "I learned X. Should I update the skill/command now or later?"
- •If approved, edit the relevant skill or command file immediately
- •Never update silently -- always ask first
What counts as a new insight:
- •New API key name -> caller mapping
- •New DLQ name -> flow mapping
- •New log group or log pattern
- •Architectural detail not yet documented (routing, auth, retries)
- •New debugging strategy or shortcut
- •Correction to existing documentation
Cross-Command Delegation
Debug commands can suggest running another command for deeper investigation:
| From | To | When |
|---|---|---|
/debug-integrator-alarm | /debug-integrator-tid | After identifying a transactionId from alarm logs |
/debug-integrator-dlq | /debug-integrator-tid | After extracting transactionId from DLQ message |
/debug-integrator-orders | /debug-integrator-tid | After finding transactionId for a specific order |
/debug-integrator-alarm | /debug-integrator-dlq | When alarm is DLQ-related |
Always suggest the delegation explicitly and let the user decide.
Common Tools
| Tool | Purpose |
|---|---|
aws-get-cloudwatch-logs | Fetch and paginate CloudWatch logs; omit --start-date for progressive mode (see @aws-tools) |
aws-get-integrator-logs | Fetch all 6 Integrator log groups in parallel, merge by timestamp with __source labels (see @aws-tools) |
aws-get-api-keys | List/filter API keys by suffix to identify callers (see @aws-tools) |
aws-get-dlq-summary | DLQ attributes + peek at messages with identifier extraction (see @aws-tools) |
jsonl-distribution-table.js | Group JSONL by specified --fields into a distribution table (see @aws-tools) |
jsonl-merge-and-sort-by-field.js | Merge multiple JSONL files, sort by --sort-field (see @aws-tools) |
gh api | Check recent deployments, PRs, and commits on GitHub |
aws cloudwatch describe-alarms | List active CloudWatch alarms |
Codebase Investigation
After gathering log evidence, leverage the Integrator codebase to deepen the investigation:
- •Check OpenAPI spec in
docs/integrator/for endpoint routing (HTTP_PROXY vs HTTP) - •Search for endpoint handler, auth logic, and downstream client
- •Trace the request flow through code to understand what could produce the observed error
- •Cross-reference error messages from logs with error strings in code
- •Identify configuration, env vars, or external dependencies involved
Smoke tests at scripts/smoke-tests/ can reproduce issues by calling endpoints via the API GW URL with an API key.
Deployment Check
Use gh to check if any deploy happened close to the first error timestamp:
gh api repos/arco-cv/arco2-integrator/deployments --paginate \
-q '.[] | select(.created_at >= "<date>") | {created_at, environment, sha: .sha[0:7], description}'
- •If deploy happened shortly before errors started: inspect the commit
- •If the gap is large (hours): likely unrelated -- focus on external causes
Findings Summary Template
After investigation, summarize:
- •Key evidence (quoted log lines, payloads, status codes that support the conclusion)
- •Root cause (if identifiable from logs + code)
- •Timeline of events (with timestamps from actual logs)
- •Affected transactions/documents
- •Relevant code paths and configuration
- •Current hypothesis and confidence level
- •Suggested next steps