Build Debugging
Analyze failed Buildkite builds to identify root causes and provide actionable fixes.
When to use
- •"Why did build X fail?"
- •"Debug this build"
- •"What's wrong with my CI?"
- •"Fix this build failure"
- •"Help me understand this error"
- •
/buildkite:debug
Available MCP Tools
| Tool | Purpose |
|---|---|
get_build | Fetch build details including all jobs and their states |
read_logs | Get full log output for a specific job |
search_logs | Search for patterns within job logs |
tail_logs | Show last N log entries |
get_build_test_engine_runs | Get Test Engine results for the build |
get_failed_executions | Get details of failed tests |
list_artifacts_for_build | List uploaded artifacts |
get_artifact | Download a specific artifact |
list_annotations | List build annotations |
Input Parsing
Parse build information from $ARGUMENTS or the user's message:
| Input Format | Example |
|---|---|
| Full URL | https://buildkite.com/org/pipeline/builds/123 |
| Build number | 123 |
| Pipeline + build | my-pipeline#123 or my-pipeline 123 |
| Description | "the latest failed build on main" |
If no build specified, ask the user which build to debug.
Approach
- •
Fetch the build with
buildkite_get_build- •Note the overall state, branch, commit, and message
- •Check if this is a retry or first attempt
- •
Identify failed jobs in the jobs array
- •Look for
state: "failed"orstate: "timed_out" - •Note job names/labels to understand what failed
- •Check job exit codes
- •Look for
- •
Read logs with
buildkite_read_logsfor failed jobs- •Focus on the last 50-100 lines where failures surface
- •Look for the FIRST error, not just the last (cascading failures are common)
- •
Check test results if applicable
- •Use
buildkite_get_build_test_engine_runsfor Test Engine data - •Use
buildkite_get_failed_test_executionsfor failure details
- •Use
- •
Review artifacts for additional context
- •Test reports, coverage data, debug outputs
Common Failure Patterns
Exit Codes
| Code | Meaning | Action |
|---|---|---|
| 1 | General error | Check command output |
| 127 | Command not found | Missing dependency or PATH issue |
| 137 | OOM killed (128+9) | Increase memory or optimize |
| 143 | SIGTERM (128+15) | Timeout or cancelled |
Test Failures
- •Flaky tests: Check if same test passed on retry
- •Environment differences: Compare agent tags, env vars
- •Timing issues: Race conditions or async problems
Infrastructure Issues
- •Agent disconnected: Network or agent health
- •Timeout: Job exceeded
timeout_in_minutes - •No agents: Check queue and agent tags
Response Format
- •Summary: One-line description of what failed
- •Root Cause: What actually caused the failure
- •Evidence: Relevant log snippets (use code blocks)
- •Recommendation: Specific steps to fix
- •Prevention: How to avoid this in future (if applicable)
Example Interaction
text
User: Why did build 456 fail? 1. Fetch build 456 with buildkite_get_build 2. Find failed job: "Run Tests" with exit code 1 3. Read logs, find: "Error: Cannot find module 'lodash'" 4. Respond with root cause (missing dependency) and fix (add to package.json)