AgentSkillsCN

eia-ci-failure-patterns

在诊断CI/CD失败时使用。可通过CI失败日志或流水线错误来触发。

SKILL.md
--- frontmatter
name: eia-ci-failure-patterns
description: "Use when diagnosing CI/CD failures. Trigger with CI failure logs or pipeline errors."
license: Apache-2.0
compatibility: Requires AI Maestro installed.
metadata:
  version: 1.0.0
  author: integrator-agent
  tags: "ci, cd, github-actions, debugging, cross-platform, devops"
  platforms: "linux, macos, windows"
  languages: "python, javascript, typescript, rust, go, bash, powershell"
agent: debug-specialist
context: fork

CI Failure Patterns Skill

Overview

This skill teaches you how to systematically diagnose and fix Continuous Integration (CI) failures. CI pipelines fail for predictable reasons that fall into identifiable pattern categories. By recognizing these patterns, you can quickly identify root causes and apply proven fixes.

Prerequisites

Before using this skill, ensure:

  1. Access to CI/CD logs from the failed pipeline
  2. python3 available in PATH for running diagnostic scripts
  3. Repository access to view workflow files

Instructions

Follow these steps to diagnose and fix CI failures:

  1. Collect the CI failure log from the GitHub Actions workflow run
  2. Run the diagnostic script: python scripts/eia_diagnose_ci_failure.py --log-file ci.log
  3. If the script identifies a pattern, read the corresponding reference document for the fix
  4. If the script doesn't identify the pattern, follow the Diagnosis Decision Tree to determine the failure category
  5. Read the appropriate reference document for detailed diagnosis and fix instructions
  6. Apply the recommended fix to your code or workflow configuration
  7. Verify the fix locally (when possible) before pushing to CI
  8. Push the changes and monitor the CI run to confirm the fix resolved the issue

Checklist

Copy this checklist and track your progress:

  • Collect CI failure log from GitHub Actions workflow run
  • Run diagnostic script: python scripts/eia_diagnose_ci_failure.py --log-file ci.log
  • Identify failure pattern (from script output or Decision Tree)
  • Read the corresponding reference document for the identified pattern
  • Apply the recommended fix to code or workflow configuration
  • Verify the fix locally before pushing (if possible)
  • Push changes to trigger new CI run
  • Monitor CI run and confirm the fix resolved the issue

Use this skill when:

  • A CI workflow fails and you need to diagnose the cause
  • Tests pass locally but fail in CI
  • You see platform-specific errors (Windows vs Linux vs macOS)
  • Exit codes indicate failure but the error message is unclear
  • Dependency installation fails in CI but works locally
  • GitHub Actions infrastructure issues occur (labels, permissions, runners)

Output

This skill produces the following outputs:

Output TypeDescription
Diagnostic reportJSON or text report from eia_diagnose_ci_failure.py identifying failure patterns and suggested fixes
Platform issue scanJSON or text report from eia_detect_platform_issue.py listing platform-specific code patterns that may cause CI failures
Fix recommendationsStep-by-step instructions from reference documents for resolving identified failure patterns
Verification stepsCommands and procedures to verify fixes locally before pushing to CI

Failure Pattern Categories

CI failures fall into six main categories:

CategoryDescriptionReference Document
Cross-PlatformOS differences causing failurescross-platform-patterns.md
Exit CodesShell/script exit code handlingexit-code-patterns.md
SyntaxShell syntax, heredocs, quotingsyntax-patterns.md
DependenciesImport paths, missing packagesdependency-patterns.md
InfrastructureGitHub runners, labels, permissionsgithub-infrastructure-patterns.md
Language-SpecificPython, JS, Rust, Go peculiaritieslanguage-specific-patterns.md
Bot CategoriesPR author classification for automationbot-categories.md
Claude PR HandlingWorkflow for Claude Code Action PRsclaude-pr-handling.md

Diagnosis Decision Tree

Follow this decision tree to identify the failure category:

code
START: CI Failure Occurred
│
├─► Does error mention path or file not found?
│   ├─► YES → Check Cross-Platform Patterns (temp paths, separators, case sensitivity)
│   └─► NO ↓
│
├─► Does error show non-zero exit code?
│   ├─► YES → Check Exit Code Patterns (persistence, tool-specific codes)
│   └─► NO ↓
│
├─► Does error mention syntax error or unexpected token?
│   ├─► YES → Check Syntax Patterns (heredocs, quoting, line endings)
│   └─► NO ↓
│
├─► Does error mention import/require/module not found?
│   ├─► YES → Check Dependency Patterns (import resolution, versions)
│   └─► NO ↓
│
├─► Does error mention GitHub-specific resources (labels, runners)?
│   ├─► YES → Check Infrastructure Patterns (labels, permissions, runners)
│   └─► NO ↓
│
└─► Check Language-Specific Patterns for the failing language

Quick Reference: Most Common Patterns

PatternSymptomQuick Fix
Temp path differences"File not found" on temp filesUse tempfile.gettempdir() (Python) or os.tmpdir() (JS)
Path separator\ fails on LinuxUse path.join() or normalize paths
Exit code persistencePowerShell shows wrong exit codeAdd explicit exit 0 at script end
Heredoc terminator"Unexpected end of file"Ensure terminator at column 0, no trailing spaces
Missing labels"Label X not found"Create labels via API before workflow runs
Module import"ModuleNotFoundError"Check relative imports, PYTHONPATH, package structure

Reference Documents with Table of Contents

Cross-Platform Patterns

File: references/cross-platform-patterns.md

Contents:

  • 1.1 Temporary Path Differences
    • 1.1.1 Windows temp path handling ($env:TEMP)
    • 1.1.2 Linux/macOS temp path handling (/tmp, $TMPDIR)
    • 1.1.3 Language-specific solutions (Python, JavaScript, Bash, PowerShell)
  • 1.2 Path Separator Differences
    • 1.2.1 Forward slash vs backslash behavior
    • 1.2.2 Path normalization techniques
    • 1.2.3 Platform-agnostic path construction
  • 1.3 Line Ending Differences
    • 1.3.1 CRLF vs LF detection
    • 1.3.2 Git autocrlf configuration
    • 1.3.3 Fixing line ending issues in CI
  • 1.4 Case Sensitivity Differences
    • 1.4.1 Filesystem case sensitivity by platform
    • 1.4.2 Common case-related CI failures
    • 1.4.3 Enforcing consistent casing

Exit Code Patterns

File: references/exit-code-patterns.md

Contents:

  • 2.1 Exit Code Persistence
    • 2.1.1 PowerShell $LASTEXITCODE behavior
    • 2.1.2 Bash $? behavior and pitfalls
    • 2.1.3 Solutions for reliable exit code handling
  • 2.2 Common Exit Codes by Tool
    • 2.2.1 Git exit codes
    • 2.2.2 npm/yarn/pnpm exit codes
    • 2.2.3 pytest exit codes
    • 2.2.4 cargo exit codes
  • 2.3 GitHub Actions Exit Code Handling
    • 2.3.1 Step failure detection
    • 2.3.2 continue-on-error behavior
    • 2.3.3 Custom exit code handling

Syntax Patterns

File: references/syntax-patterns.md

Contents:

  • 3.1 Here-String and Heredoc Terminator Issues
    • 3.1.1 PowerShell here-string requirements ("@ at column 0)
    • 3.1.2 Bash heredoc requirements (EOF at column 0)
    • 3.1.3 YAML multiline string indentation
  • 3.2 Shell Quoting Differences
    • 3.2.1 Bash quoting rules
    • 3.2.2 POSIX sh quoting rules
    • 3.2.3 Zsh quoting differences
    • 3.2.4 PowerShell quoting rules
  • 3.3 Command Substitution Syntax
    • 3.3.1 Backticks vs $() differences
    • 3.3.2 Nested command substitution

Dependency Patterns

File: references/dependency-patterns.md

Contents:

  • 4.1 Module Import Path Issues
    • 4.1.1 Relative path calculation
    • 4.1.2 Language-specific import resolution
    • 4.1.3 Working directory assumptions
  • 4.2 Missing Dependencies in CI
    • 4.2.1 Lock file synchronization
    • 4.2.2 Optional dependencies
    • 4.2.3 Development vs production dependencies
  • 4.3 Version Mismatches
    • 4.3.1 Pinned version conflicts
    • 4.3.2 Transitive dependency issues
    • 4.3.3 CI-specific version requirements

GitHub Infrastructure Patterns

File: references/github-infrastructure-patterns.md

Contents:

  • 5.1 Missing Labels
    • 5.1.1 Creating labels before workflow uses them
    • 5.1.2 Label API format and authentication
    • 5.1.3 Label naming conventions
  • 5.2 Platform Exceptions and Documentation
    • 5.2.1 Runner operating system differences
    • 5.2.2 Pre-installed software differences
    • 5.2.3 Environment variable differences
  • 5.3 Runner Architecture
    • 5.3.1 x64 vs ARM runner differences
    • 5.3.2 Self-hosted runner considerations
    • 5.3.3 Runner resource limits

Language-Specific Patterns

File: references/language-specific-patterns.md

Contents:

  • 6.1 Python CI Patterns
    • 6.1.1 Virtual environment issues
    • 6.1.2 pip installation failures
    • 6.1.3 pytest configuration issues
  • 6.2 JavaScript/TypeScript CI Patterns
    • 6.2.1 node_modules caching issues
    • 6.2.2 npm vs yarn vs pnpm differences
    • 6.2.3 ESM vs CommonJS issues
  • 6.3 Rust CI Patterns
    • 6.3.1 cargo build failures
    • 6.3.2 Target directory management
    • 6.3.3 Cross-compilation issues
  • 6.4 Go CI Patterns
    • 6.4.1 Module resolution issues
    • 6.4.2 Go version mismatches
    • 6.4.3 CGO dependencies

Bot Categories

File: references/bot-categories.md

Contents:

  • 7.1 Category Table
    • 7.1.1 agent-controlled (100% reliability)
    • 7.1.2 mention-triggered (60-70% reliability)
    • 7.1.3 human (variable reliability)
  • 7.2 Signal Interpretation
    • 7.2.1 Understanding reliability percentages
    • 7.2.2 Decision matrix by author category
  • 7.3 Classifying PR Authors
    • 7.3.1 Pattern matching for bot identification
    • 7.3.2 Handling PRs by category
  • 7.4 Best Practices and Troubleshooting

Claude PR Handling

File: references/claude-pr-handling.md

Contents:

  • 8.1 Claude Code Action Integration
    • 8.1.1 GitHub Actions workflow setup
    • 8.1.2 Detecting PRs needing review
  • 8.2 Collecting Previous Feedback
    • 8.2.1 Extracting previous Claude feedback
    • 8.2.2 Synthesizing feedback for @claude mention
  • 8.3 Requesting Claude Review via @claude Mention
    • 8.3.1 Creating actionable summaries
    • 8.3.2 Posting with @claude mention
  • 8.4 Integration with Monitoring Cycle
    • 8.4.1 Autonomous monitoring loop
    • 8.4.2 ActionRequired classification
  • 8.5 Response Patterns
    • 8.5.1 Simple fix requests
    • 8.5.2 Multiple issues
    • 8.5.3 CI failure analysis
    • 8.5.4 Code review requests
  • 8.6 Troubleshooting Claude Code Action

Diagnostic Scripts

This skill includes two Python scripts for automated diagnosis:

eia_diagnose_ci_failure.py

Analyzes CI failure logs to identify patterns and suggest fixes.

bash
# Analyze a log file
python scripts/eia_diagnose_ci_failure.py --log-file /path/to/ci.log

# Analyze from stdin
cat ci.log | python scripts/eia_diagnose_ci_failure.py --stdin

# Output as JSON
python scripts/eia_diagnose_ci_failure.py --log-file ci.log --json

eia_detect_platform_issue.py

Scans source code for platform-specific patterns that may cause CI failures.

bash
# Scan a directory
python scripts/eia_detect_platform_issue.py --path /path/to/project

# Scan specific file types
python scripts/eia_detect_platform_issue.py --path . --extensions .py .js .sh

# Output as JSON
python scripts/eia_detect_platform_issue.py --path . --json

Workflow: Diagnosing a CI Failure

  1. Collect the failure log from the CI system
  2. Run the diagnostic script: python scripts/eia_diagnose_ci_failure.py --log-file ci.log
  3. Follow the decision tree if the script doesn't identify the pattern
  4. Read the appropriate reference document for detailed fix instructions
  5. Apply the fix following the documented pattern
  6. Verify locally before pushing (where possible)
  7. Run CI again to confirm the fix

Examples

Example 1: Diagnosing a Cross-Platform Path Failure

bash
# CI log shows: FileNotFoundError: /tmp/build/output.txt
# Run diagnostic script
python scripts/eia_diagnose_ci_failure.py --log-file ci.log

# Output identifies cross-platform temp path issue
# Fix: Use tempfile.gettempdir() instead of hardcoded /tmp

Example 2: Fixing a Heredoc Syntax Error

bash
# CI log shows: syntax error near unexpected token `newline`
# Cause: Heredoc EOF terminator has trailing whitespace
# Fix: Ensure EOF is at column 0 with no trailing spaces

Error Handling

The diagnostic script doesn't identify my failure

Read the full error message and search for keywords in the reference documents. Common keywords:

  • "not found", "no such file" → Cross-Platform Patterns
  • "exit code", "returned 1" → Exit Code Patterns
  • "syntax error", "unexpected" → Syntax Patterns
  • "import", "module", "require" → Dependency Patterns
  • "permission", "label", "runner" → Infrastructure Patterns

My fix works locally but still fails in CI

This usually indicates a cross-platform issue. Check:

  1. Are you testing on the same OS as CI?
  2. Are file paths hardcoded?
  3. Are environment variables the same?

CI passes sometimes but fails randomly

Check for:

  1. Race conditions in tests
  2. External service dependencies
  3. Time-sensitive tests
  4. Resource exhaustion on shared runners

Resources

See Also

  • GitHub Actions documentation: https://docs.github.com/en/actions
  • CI/CD best practices: Use matrix builds to test all platforms
  • Error handling: Always use explicit exit codes