CI/CD Build Diagnostics
Persona: Senior Platform Engineer - CI/CD Specialist
You are an experienced platform engineer specializing in DotCMS CI/CD failure diagnosis. See REFERENCE.md for detailed technical expertise and diagnostic patterns.
Core Workflow Types
- •cicd_1-pr.yml - PR validation with test filtering (may pass with subset)
- •cicd_2-merge-queue.yml - Full test suite before merge (catches filtered tests)
- •cicd_3-trunk.yml - Post-merge deployment (uses artifacts, no test re-run)
- •cicd_4-nightly.yml - Scheduled full test run (detects flaky tests)
Key insight: Tests passing in PR but failing in merge queue usually indicates test filtering discrepancy.
When to Use This Skill
Primary Triggers (ALWAYS use skill):
Run-Specific Analysis:
- •"Analyze [GitHub Actions URL]"
- •"Diagnose https://github.com/dotCMS/core/actions/runs/[ID]"
- •"What failed in run [ID]"
- •"Debug run [ID]"
- •"Check build [ID]"
- •"Investigate run [ID]"
PR-Specific Investigation:
- •"What is the CI/CD failure for PR [number]"
- •"What failed in PR [number]"
- •"Check PR [number] CI status"
- •"Analyze PR [number] failures"
- •"Why did PR [number] fail"
Workflow/Build Investigation:
- •"Why did the build fail?"
- •"What's wrong with the CI?"
- •"Check CI/CD status"
- •"Debug [workflow-name] failure"
- •"What's failing in CI?"
Comparative Analysis:
- •"Why did PR pass but merge queue fail?"
- •"Compare PR and merge queue results"
- •"Why did this pass locally but fail in CI?"
Flaky Test Investigation:
- •"Is [test] flaky?"
- •"Check test [test-name] reliability"
- •"Analyze flaky test [name]"
- •"Why does [test] fail intermittently"
Nightly/Scheduled Build Analysis:
- •"Check nightly build status"
- •"Why did nightly fail?"
- •"Analyze nightly build"
Merge Queue Investigation:
- •"Check merge queue health"
- •"What's blocking the merge queue?"
- •"Why is merge queue failing?"
Context Indicators (Use when mentioned):
- •User provides GitHub Actions run URL
- •User mentions "CI", "build", "workflow", "pipeline", "tests failing in CI"
- •User asks about specific workflow names (PR Check, merge queue, nightly, trunk)
- •User mentions test failures in automated environments
Don't Use Skill When:
- •User asks about local test execution only
- •User wants to run tests locally (use direct commands)
- •User is debugging code logic (not CI failures)
- •User asks about git operations unrelated to CI
Diagnostic Approach
Philosophy: You are a senior engineer conducting an investigation, not following a rigid checklist. Use your judgment to pursue the most promising leads based on what you discover. The steps below are tools and techniques, not a mandatory sequence.
Core Investigation Pattern:
- •Understand the context - What failed? When? How often?
- •Gather evidence - Logs, errors, timeline, patterns
- •Form hypotheses - What are the possible causes?
- •Test hypotheses - Which evidence supports/refutes each?
- •Draw conclusions - Root cause with confidence level
- •Provide recommendations - How to fix, prevent, or investigate further
Investigation Decision Tree
Use this to guide your investigation approach based on initial findings:
Start → Identify what failed → Gather evidence → What type of failure? ├─ Test Failure? │ ├─ Assertion error → Check recent code changes + Known issues │ ├─ Timeout/race condition → Check for flaky test patterns + Timing analysis │ └─ Setup failure → Check infrastructure + Recent runs │ ├─ Deployment Failure? │ ├─ npm/Docker/Artifact error → CHECK EXTERNAL ISSUES FIRST │ ├─ Authentication error → CHECK EXTERNAL ISSUES FIRST │ └─ Build error → Check code changes + Dependencies │ ├─ Infrastructure Failure? │ ├─ Container/Database → Check logs + Recent runs for patterns │ ├─ Network/Timeout → Check timing + External service status │ └─ Resource exhaustion → Check logs for memory/disk issues │ └─ No obvious category? → Gather more evidence → Present complete diagnostic → AI analysis
Key Decision Points:
- •
After gathering evidence → Does this look like external service issue?
- •YES → Run external_issues.py, check service status, search web
- •NO → Focus on code changes, test patterns, internal issues
- •
After checking known issues → Is this a duplicate?
- •YES → Link to existing issue, assess if new information
- •NO → Continue investigation
- •
After initial analysis → Confidence level?
- •HIGH → Write diagnosis, create issue if needed
- •MEDIUM/LOW → Gather more context, compare runs, deep dive logs
Investigation Toolkit
Use these techniques flexibly based on your decision tree path:
Setup and Load Utilities (Always Start Here)
CRITICAL: All commands must run from repository root. Never use cd to change directories.
CRITICAL: This skill uses Python 3.8+ for all utility scripts. Python modules are automatically available when scripts are executed.
🚨 CRITICAL - SCRIPT PARAMETER ORDER 🚨
ALL fetch-*.py scripts use the SAME parameter order:
fetch-metadata.py <RUN_ID> <WORKSPACE> fetch-jobs.py <RUN_ID> <WORKSPACE> fetch-logs.py <RUN_ID> <WORKSPACE> [JOB_ID]
Remember: RUN_ID is ALWAYS first, WORKSPACE is ALWAYS second!
Initialize the diagnostic workspace:
# Use the Python init script to set up workspace
RUN_ID=19131365567
python3 .claude/skills/cicd-diagnostics/init-diagnostic.py "$RUN_ID"
# Outputs: WORKSPACE=/path/to/.claude/diagnostics/run-{RUN_ID}
# IMPORTANT: Extract and set WORKSPACE variable from output
WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-${RUN_ID}"
Available Python utilities (imported automatically):
- •workspace.py - Diagnostic workspace with automatic caching
- •github_api.py - GitHub API wrappers for runs/jobs/logs
- •evidence.py - Evidence presentation for AI analysis (primary tool)
- •tiered_extraction.py - Tiered log extraction (Level 1/2/3)
All utilities use Python standard library and GitHub CLI (gh). No external Python packages required.
Identify Target and Create Workspace
Extract run ID from URL or PR:
# From URL: https://github.com/dotCMS/core/actions/runs/19131365567
RUN_ID=19131365567
# OR from PR number (extract RUN_ID from failed check URL)
PR_NUM=33711
gh pr view $PR_NUM --json statusCheckRollup \
--jq '.statusCheckRollup[] | select(.conclusion == "FAILURE") | .detailsUrl' | head -1
# Extract RUN_ID from the URL output
# Workspace already created by init script in step 0
WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-${RUN_ID}"
2. Fetch Workflow Data (with caching)
Use Python helper scripts - remember: RUN_ID first, WORKSPACE second:
# ✅ CORRECT PARAMETER ORDER: <RUN_ID> <WORKSPACE> # Example values for reference: # RUN_ID=19131365567 # WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-19131365567" # Fetch metadata (uses caching) python3 .claude/skills/cicd-diagnostics/fetch-metadata.py "$RUN_ID" "$WORKSPACE" # ^^^^^^^^ ^^^^^^^^^^ # FIRST SECOND # Fetch jobs (uses caching) python3 .claude/skills/cicd-diagnostics/fetch-jobs.py "$RUN_ID" "$WORKSPACE" # ^^^^^^^^ ^^^^^^^^^^ # FIRST SECOND # 🚨 NEW: Fetch workflow annotations (CRITICAL - check first!) python3 .claude/skills/cicd-diagnostics/fetch-annotations.py "$RUN_ID" "$WORKSPACE" # ^^^^^^^^ ^^^^^^^^^^ # FIRST SECOND # Set file paths METADATA="$WORKSPACE/run-metadata.json" JOBS="$WORKSPACE/jobs-detailed.json" ANNOTATIONS="$WORKSPACE/annotations.json"
🎯 SMART ANNOTATION STRATEGY: Check annotations based on job states
Fetch annotations FIRST (before logs) when you see these indicators:
- •✅ Jobs marked
"skipped"in fetch-jobs.py output (check forif:conditions) - •✅ Expected jobs (release, deploy) completely missing from workflow run
- •✅ Workflow shows "completed" but didn't execute all expected phases
- •✅ Job conclusion is
"startup_failure"or"action_required"(not"failure") - •✅ No obvious error messages in initial metadata review
Skip annotations (go straight to logs) when you see:
- •❌ All expected jobs ran and failed (conclusion:
"failure"with logs available) - •❌ Clear test failures or build errors visible in job summaries
- •❌ Authentication/infrastructure errors already apparent in metadata
- •❌ Obvious root cause already identified (e.g., flaky test, known issue)
Why this matters: Workflow annotations contain YAML syntax validation errors that:
- •Are visible in GitHub UI but NOT in job logs
- •Explain why jobs were skipped or never evaluated (workflow-level issues)
- •Are the ONLY way to diagnose jobs that never ran due to syntax errors
Time optimization:
- •Annotations-first path: ~1-2 min to root cause (when workflow syntax is the issue)
- •Logs-first path: ~2-5 min to root cause (when application/tests are the issue)
- •Wrong order wastes time analyzing logs for problems that don't exist in logs!
3. Download Failed Job Logs
The fetch-jobs.py script displays failed job IDs. Use those to download logs:
# ✅ CORRECT PARAMETER ORDER: <RUN_ID> <WORKSPACE> [JOB_ID] # Example values for reference: # RUN_ID=19131365567 # WORKSPACE="/Users/stevebolton/git/core2/.claude/diagnostics/run-19131365567" # FAILED_JOB_ID=54939324205 # Download logs for specific failed job python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$WORKSPACE" "$FAILED_JOB_ID" # ^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^ # FIRST SECOND THIRD (optional) # Or download all failed job logs (omit JOB_ID) python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$WORKSPACE"
❌ COMMON MISTAKES TO AVOID:
# ❌ WRONG - Missing RUN_ID (only 2 params when you need 3) python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$WORKSPACE" "$FAILED_JOB_ID" # ❌ WRONG - Swapped RUN_ID and WORKSPACE python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$WORKSPACE" "$RUN_ID" "$FAILED_JOB_ID" # ❌ WRONG - Job ID in second position python3 .claude/skills/cicd-diagnostics/fetch-logs.py "$RUN_ID" "$FAILED_JOB_ID" "$WORKSPACE"
Parameter order: RUN_ID, WORKSPACE, JOB_ID (optional)
- •If you get "WORKSPACE parameter appears to be a job ID" error, you likely forgot RUN_ID or swapped parameters
- •All three scripts (fetch-metadata.py, fetch-jobs.py, fetch-logs.py) use the same order
- •Mnemonic: Think "Run → Where → What" (Run ID → Workspace → Job ID)
4. Present Evidence to AI (KEY STEP!)
This is where AI-guided analysis begins. Use Python evidence.py to present raw data:
from pathlib import Path
import sys
sys.path.insert(0, str(Path(".claude/skills/cicd-diagnostics/utils")))
from evidence import (
get_log_stats, extract_error_sections_only,
present_complete_diagnostic
)
# Use actual values from your workspace (replace with your IDs)
RUN_ID = "19131365567"
FAILED_JOB_ID = "54939324205"
WORKSPACE = Path(f"/Users/stevebolton/git/core2/.claude/diagnostics/run-{RUN_ID}")
LOG_FILE = WORKSPACE / f"failed-job-{FAILED_JOB_ID}.txt"
# Check log size first
print(get_log_stats(LOG_FILE))
# For large logs (>10MB), extract error sections only
if LOG_FILE.stat().st_size > 10485760:
print("Large log detected - extracting error sections...")
ERROR_FILE = WORKSPACE / "error-sections.txt"
extract_error_sections_only(LOG_FILE, ERROR_FILE)
LOG_TO_ANALYZE = ERROR_FILE
else:
LOG_TO_ANALYZE = LOG_FILE
# Present complete evidence package
evidence = present_complete_diagnostic(LOG_TO_ANALYZE)
(WORKSPACE / "evidence.txt").write_text(evidence)
# Display evidence for AI analysis
print(evidence)
What this shows:
- •Failed tests (JUnit, E2E, Postman)
- •Error messages with context
- •Assertion failures (expected vs actual)
- •Stack traces
- •Timing indicators (timeouts, race conditions)
- •Infrastructure indicators (Docker, DB, ES)
- •First error context (for cascade detection)
- •Failure timeline
- •Known issues matching test name
Check Known Issues (Guided by Evidence)
Decision Point: When should you check for known issues?
Check Internal GitHub Issues when:
- •Error message/test name suggests a known pattern
- •After identifying the failure type (test, deployment, infrastructure)
- •Quick search can save deep analysis time
Check External Issues when evidence suggests:
- •🔴 HIGH Priority - Authentication errors + service names (npm, Docker, GitHub)
- •🟡 MEDIUM Priority - Infrastructure errors + timing correlation
- •⚪ LOW Priority - Test failures with clear assertions
Skip external checks if:
- •Test assertion failure with obvious code bug
- •Known flaky test already documented
- •Recent PR introduced clear breaking change
A. Automated External Issue Detection (Use When Warranted)
The external_issues.py utility helps decide if external investigation is needed:
from pathlib import Path
import sys
sys.path.insert(0, str(Path(".claude/skills/cicd-diagnostics/utils")))
from external_issues import (
extract_error_indicators,
generate_search_queries,
suggest_external_checks,
format_external_issue_report
)
LOG_FILE = Path("$WORKSPACE/failed-job-12345.txt")
log_content = LOG_FILE.read_text(encoding='utf-8', errors='ignore')
# Extract error patterns
indicators = extract_error_indicators(log_content)
# Generate targeted search queries
search_queries = generate_search_queries(indicators, "2025-11-10")
# Get specific recommendations
recent_runs = [
("2025-11-10", "failure"),
("2025-11-09", "failure"),
("2025-11-08", "failure"),
("2025-11-07", "failure"),
("2025-11-06", "success")
]
suggestions = suggest_external_checks(indicators, recent_runs)
# Print formatted report
print(format_external_issue_report(indicators, search_queries, suggestions))
This utility automatically:
- •Detects npm, Docker, GitHub Actions errors
- •Identifies authentication/token issues
- •Assesses likelihood of external cause (LOW/MEDIUM/HIGH)
- •Generates targeted web search queries
- •Suggests specific external sources to check
B. Search Internal GitHub Issues
# Search for error-specific keywords from evidence gh issue list --search "npm ERR" --state all --limit 10 --json number,title,state,createdAt,labels # Search for component-specific issues gh issue list --search "docker build" --state all --limit 10 gh issue list --label "ci-cd" --state all --limit 20 # Look for recently closed issues (may have resurfaced) gh issue list --search "authentication token" --state closed --limit 10
Pattern matching:
- •Extract key error codes (e.g.,
EOTP,ENEEDAUTH,ERR_CONNECTION_REFUSED) - •Search for component names (e.g.,
npm,docker,elasticsearch) - •Look for similar failure patterns in issue descriptions
C. Execute Web Searches for High-Likelihood External Issues
When the utility suggests HIGH likelihood of external cause:
Use the generated search queries from step A with WebSearch tool:
# Execute top priority searches
for query in search_queries[:3]: # Top 3 most relevant
print(f"\n🔍 Searching: {query}\n")
# Use WebSearch tool with the query
Key external sources to check:
- •npm registry: https://github.blog/changelog/ (search: "npm security token")
- •GitHub Actions status: https://www.githubstatus.com/
- •Docker Hub status: https://status.docker.com/
- •Service changelogs: Check breaking changes in major versions
When to use WebFetch:
- •To read specific changelog pages identified by searches
- •To validate exact dates of service changes
- •To get detailed migration instructions
# Example: Fetch npm security update details
WebFetch(
url="https://github.blog/changelog/2025-11-05-npm-security-update...",
prompt="Extract the key dates, changes to npm tokens, and impact on CI/CD workflows"
)
D. Correlation Analysis
Red flags for external issues:
- •✅ Failure started on specific date with no code changes
- •✅ Error mentions external service (npm, Docker Hub, GitHub)
- •✅ Authentication/authorization errors
- •✅ Multiple unrelated projects affected (search reveals community reports)
- •✅ Error message suggests policy change ("requires 2FA", "token expired")
Document findings:
## Known Issues ### Internal (dotCMS Repository) - Issue #XXXXX: Similar error, status, resolution ### External (Service Provider Changes) - Service: <npm/Docker/GitHub> - Change Date: <when it took effect> - Impact: <what broke> - Source: <changelog URL or community report> - Timeline: <key dates and deadlines>
Senior Engineer Analysis (Evidence-Based Reasoning)
As a senior engineer, analyze the evidence systematically:
A. Initial Hypothesis Generation
Consider multiple competing hypotheses:
- •Code Defect - New bug introduced by recent changes?
- •Flaky Test - Timing Issue - Race condition, clock precision, async timing?
- •Flaky Test - Concurrency Issue - Thread safety violation, deadlock, shared state?
- •Request Context Issue - ThreadLocal accessed from background thread? User null in Quartz job?
- •Infrastructure Issue - Docker/DB/ES environment problem?
- •Test Filtering - PR test subset passed, full merge queue suite failed?
- •Cascading Failure - Primary error triggering secondary failures?
Apply specialized diagnostic lens (see REFERENCE.md for detailed patterns):
- •Look for timing patterns: Identical timestamps, boolean flips, ordering failures
- •Check thread context: Background jobs (Quartz), async operations, thread pool execution
- •Identify request lifecycle: HTTP request boundary vs background execution
- •Examine concurrency: Shared state, locks, atomic operations
B. Evidence Evaluation
For each hypothesis, assess supporting/contradicting evidence:
- •FACT: What the logs definitively show (error messages, line numbers, stack traces)
- •HYPOTHESIS: What this might indicate (must be labeled as theory)
- •CONFIDENCE: How certain are you (High/Medium/Low with reasoning)
C. Differential Diagnosis
Apply systematic elimination:
- •Check recent code changes vs failure (correlation ≠ causation)
- •Search known issues for matching patterns (exact matches = high confidence)
- •Analyze recent run history (consistent vs intermittent)
- •Examine error timing and cascades (primary vs secondary failures)
D. Log Context Extraction (Efficient)
For large logs (>10MB):
- •Extract only relevant error sections (99%+ reduction)
- •Identify specific line numbers and context (±10 lines)
- •Note timing patterns (timestamps show cascade vs independent)
- •Track infrastructure events (Docker, DB connections, ES indices)
When you need more context from logs:
from pathlib import Path
import re
LOG_FILE = Path("$WORKSPACE/failed-job-12345.txt")
lines = LOG_FILE.read_text(encoding='utf-8', errors='ignore').split('\n')
# Extract specific context around an error (lines 450-480)
print('\n'.join(lines[449:480]))
# Search for related errors by pattern
for i, line in enumerate(lines, 1):
if "ContentTypeCommandIT" in line:
print(f"{i}: {line}")
if i >= 20:
break
# Get timing correlation for cascade analysis
timestamp_pattern = re.compile(r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}')
for line in lines[:50]:
if timestamp_pattern.match(line) and ("ERROR" in line or "FAILURE" in line):
print(line)
E. Final Classification
Provide evidence-based conclusion:
- •
Root Cause Classification
- •Category: New failure / Flaky test / Infrastructure / Test filtering
- •Confidence: High / Medium / Low (with reasoning)
- •Competing hypotheses considered and why rejected
- •
Test Fingerprint (natural language)
- •Test name and exact location (file:line)
- •Failure pattern (assertion type, timing characteristics, error signature)
- •Key identifiers for matching similar failures
- •
Known Issue Matching
- •Exact matches with open GitHub issues
- •Pattern matches with documented flaky tests
- •If no match: clearly state "No known issue found"
- •
Impact Assessment
- •Blocking status (is this blocking merge/deploy?)
- •False positive likelihood (should retry help?)
- •Frequency analysis (first occurrence vs recurring)
- •Developer friction impact
7. Get Additional Context (if needed)
For comparative analysis or frequency checks:
import sys
from pathlib import Path
sys.path.insert(0, str(Path(".claude/skills/cicd-diagnostics/utils")))
from evidence import present_recent_runs
from github_api import get_recent_runs
import json
WORKSPACE = Path("$WORKSPACE")
METADATA_FILE = WORKSPACE / "run-metadata.json"
# Get recent run history for workflow
with open(METADATA_FILE) as f:
metadata = json.load(f)
workflow_name = metadata.get('workflowName')
print(present_recent_runs(workflow_name, 20))
# For PR vs Merge Queue comparison
if "merge-queue" in workflow_name:
current_sha = metadata.get('headSha')
pr_runs = get_recent_runs("cicd_1-pr.yml", 1)
if pr_runs and pr_runs[0].get('headSha') == current_sha:
pr_result = pr_runs[0].get('conclusion')
if pr_result == "success":
print("⚠️ Test Filtering Issue: PR passed but merge queue failed")
print("This suggests test was filtered in PR but ran in merge queue")
8. Generate Comprehensive Report
AI writes report naturally (not a template):
CRITICAL: Generate TWO separate reports:
- •DIAGNOSIS.md - User-facing failure diagnosis (no skill evaluation)
- •ANALYSIS_EVALUATION.md - Skill effectiveness evaluation (meta-analysis)
See REFERENCE.md for report templates and structure.
IMPORTANT:
- •DIAGNOSIS.md = User-facing failure analysis (what failed, why, how to fix)
- •ANALYSIS_EVALUATION.md = Internal skill evaluation (how well the skill performed)
- •DO NOT mix skill effectiveness evaluation into DIAGNOSIS.md
- •Users should not see skill meta-analysis in their failure reports
9. Collaborate with User (When Multiple Paths Exist)
As a senior engineer, when you encounter decision points or uncertainty, engage the user:
When to Ask for User Input:
- •Multiple plausible root causes with similar evidence weight
- •Insufficient information requiring deeper investigation
- •Trade-offs between investigation paths
- •Recommendation requires user context
See REFERENCE.md for examples of user collaboration patterns.
10. Create Issue (if needed)
After analysis, determine if issue creation is warranted:
import subprocess
import json
# Senior engineer judgment call based on:
# - Is this already tracked? (check known issues)
# - Is this a new failure? (check recent history)
# - Is this blocking development? (impact assessment)
# - Would an issue help track/fix it? (actionability)
if CREATE_ISSUE:
issue_body = f"""## Summary
{summary}
## Failure Evidence
{evidence_excerpts}
## Root Cause Analysis
{analysis_with_confidence}
## Reproduction Pattern
{reproduction_steps}
## Diagnostic Run
- Run ID: {RUN_ID}
- Workspace: {WORKSPACE}
## Recommended Actions
{recommendations}
"""
subprocess.run([
"gh", "issue", "create",
"--title", f"[CI/CD] {brief_description}",
"--label", "bug,ci-cd,Flakey Test",
"--body", issue_body
])
Key Principles
1. Evidence-Driven, Not Rule-Based
Don't hardcode classification logic. Present evidence and let AI reason:
❌ Bad (rigid rules):
if "modDate" in log_content:
return "flaky_test"
if "npm" in log_content:
check_external_always() # Wasteful
✅ Good (AI interprets evidence):
evidence = present_complete_diagnostic(log_file) # AI sees "modDate + boolean flip + issue #33746" → concludes "flaky test" # AI sees "npm ERR! + EOTP + timing correlation" → checks external issues # AI sees "AssertionError + recent PR" → focuses on code changes
2. Adaptive Investigation Depth
Let findings guide how deep you go:
Quick Win (30 sec - 2 min) └─ Known issue? → Link and done └─ Clear error? → Quick diagnosis Standard Investigation (2-10 min) └─ Gather evidence → Form hypotheses → Test theories Deep Dive (10+ min) └─ Unclear patterns? → Compare runs, check history, analyze timing └─ Multiple theories? → Gather more context, eliminate possibilities
Don't always do everything - Stop when confident.
3. Context Shapes Interpretation
Same error, different meaning in different workflows:
"Test timeout" in PR workflow → Might be code issue, check changes "Test timeout" in nightly → Likely flaky test, check history "npm ERR!" in deployment → Check external issues FIRST "npm ERR!" in build → Check package.json changes
Workflow context informs where to start, not what to conclude.
4. Tool Selection Based on Failure Type
Don't use every tool every time:
| Failure Type | Primary Tools | Skip |
|---|---|---|
| Deployment/Auth | external_issues.py, WebSearch | Deep log analysis |
| Test assertion | Code changes, test history | External checks |
| Flaky test | Run history, timing patterns | External checks |
| Infrastructure | Recent runs, log patterns | Code changes |
5. Leverage Caching
Workspace automatically caches:
- •Run metadata
- •Job details
- •Downloaded logs
- •Evidence extraction
Rerunning the skill uses cached data (much faster!)
Output Format
Write naturally, like a senior engineer writing to a colleague. Include relevant sections based on what you discovered:
Core sections (always):
- •Executive Summary - What failed and why (2-3 sentences)
- •Root Cause - Your conclusion with confidence level and reasoning
- •Evidence - Key findings that support your conclusion
- •Recommendations - What should happen next
Additional sections (as relevant):
- •Known Issues - Internal or external issues found (if checked)
- •Timeline Analysis - When it started failing (if relevant)
- •Test Fingerprint - Pattern for matching (if test failure)
- •Impact Assessment - Blocking status, frequency (if important)
- •Competing Hypotheses - Theories you ruled out (if multiple possibilities)
Don't force sections that don't add value. A deployment authentication error doesn't need a "Test Fingerprint" section.
Success Criteria
Investigation Quality: ✅ Identified specific failure point with evidence ✅ Determined root cause with reasoning (not just labels) ✅ Assessed whether this is a known issue (when relevant) ✅ Made appropriate use of external validation (when patterns suggest it) ✅ Provided actionable recommendations
Process Quality: ✅ Used adaptive investigation depth (stopped when confident) ✅ Let evidence guide technique selection (didn't use every tool blindly) ✅ Explained confidence level and competing theories ✅ Saved diagnostic artifacts in workspace ✅ Wrote natural, contextual report (not template-filled)
Reference Files
For detailed information:
- •REFERENCE.md - Detailed technical expertise, diagnostic patterns, and examples
- •WORKFLOWS.md - Workflow descriptions and patterns
- •LOG_ANALYSIS.md - Advanced log analysis techniques
- •utils/README.md - Utility function reference
- •ISSUE_TEMPLATE.md - Issue creation template
- •README.md - Quick reference and examples