AgentSkillsCN

pipeline-diagnostics

诊断 CI/CD 管道故障,分析 Jenkins 构建日志,排查部署问题。适用于构建失败、检查管道状态、调查错误,或了解部署健康状况时使用。

SKILL.md
--- frontmatter
name: pipeline-diagnostics
description: Diagnose CI/CD pipeline failures, analyze Jenkins build logs, and troubleshoot deployment issues. Use when builds fail, checking pipeline status, investigating errors, or understanding deployment health.

Pipeline Diagnostics

This skill helps diagnose CI/CD pipeline issues for the Elohim project using the Jenkins MCP integration.

Pipeline Architecture

code
┌──────────────────────────────────────────────────────────────────┐
│                     Elohim CI/CD Orchestration                    │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  GitHub Push                                                      │
│      ↓                                                            │
│  Orchestrator  ←── Analyzes changesets, triggers dependencies     │
│      ↓                                                            │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │  Parallel Builds (if dependencies met):                     │ │
│  │                                                              │ │
│  │  elohim-holochain  →  DNA builds, hApp packaging            │ │
│  │  elohim-edge       →  Doorway, storage containers           │ │
│  │  elohim-app        →  Angular build                         │ │
│  └─────────────────────────────────────────────────────────────┘ │
│      ↓                                                            │
│  elohim-genesis  ←── Seed validation & deployment                 │
│      ↓                                                            │
│  Health Checks  ←── Post-deployment verification                  │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

Jenkins Job Reference

Job NamePurposeTriggers
elohim-orchestratorChangeset analysis, pipeline coordinationGitHub webhook
elohim-holochainRust DNA compilation, hApp packagingOrchestrator
elohim-edgeDocker containers for doorway, storageOrchestrator
elohim-appAngular build, static assetsOrchestrator
elohim-genesisContent seeding, verificationAfter app/edge
elohim-stewardTauri desktop app (manual)Manual only

Using MCP Tools for Diagnostics

Check Jenkins Status

code
Use mcp__jenkins__getStatus to check Jenkins health

Get Job Information

code
Use mcp__jenkins__getJob with jobFullName="elohim-holochain"
Use mcp__jenkins__getJob with jobFullName="elohim-app"
Use mcp__jenkins__getJob with jobFullName="elohim-genesis"

Get Build Details

code
Use mcp__jenkins__getBuild with jobFullName="elohim-holochain"
  (omit buildNumber for latest)

Use mcp__jenkins__getBuild with jobFullName="elohim-holochain" buildNumber=123
  (specific build)

Analyze Build Logs

code
Use mcp__jenkins__getBuildLog with jobFullName="elohim-holochain"
  limit=-100  (last 100 lines)

Use mcp__jenkins__searchBuildLog with:
  jobFullName="elohim-holochain"
  pattern="error|failed|Error"
  ignoreCase=true
  contextLines=3

Check Test Results

code
Use mcp__jenkins__getTestResults with jobFullName="elohim-app"
  onlyFailingTests=true

Common Failure Patterns

DNA Build Failures

Pattern: WASM compilation error

code
Search logs for: "error\[E" or "cannot find" or "unresolved"

Common causes:

  • Missing RUSTFLAGS for getrandom backend
  • Incompatible dependency versions
  • Syntax errors in zome code

Fix checklist:

  1. Check RUSTFLAGS='--cfg getrandom_backend="custom"' is set
  2. Verify Cargo.lock is committed
  3. Check zome source for compile errors

Angular Build Failures

Pattern: TypeScript errors

code
Search logs for: "error TS" or "Cannot find module"

Common causes:

  • Type mismatches after model changes
  • Missing imports
  • Circular dependencies

Fix checklist:

  1. Run npm run build locally
  2. Check for type sync between elohim-service and elohim-app
  3. Verify all imports resolve

Seeding Failures

Pattern: Connection timeout

code
Search logs for: "ETIMEDOUT" or "WebSocket" or "connection refused"

Common causes:

  • Doorway not ready
  • Wrong admin URL
  • Network policy blocking

Fix checklist:

  1. Check doorway health endpoint
  2. Verify HOLOCHAIN_ADMIN_URL environment variable
  3. Check K8s pod status

Pattern: Schema validation

code
Search logs for: "missing required" or "validation failed"

Fix:

  1. Run npm run validate in genesis/seeder
  2. Check content files for missing id/title fields

Docker Build Failures

Pattern: Image build error

code
Search logs for: "COPY failed" or "RUN failed" or "denied"

Common causes:

  • Missing build artifacts from previous stage
  • Harbor registry auth issues
  • Dockerfile syntax

Environment Mapping

Branch PatternEnvironmentDoorway URL
dev, feat-*, claude-*Alphadoorway-dev.elohim.host
staging-*Stagingdoorway-staging.elohim.host
mainProductiondoorway.elohim.host

Diagnostic Workflow

1. Identify Failed Build

code
Use mcp__jenkins__getJobs to list recent jobs
Use mcp__jenkins__getBuild to get failure details

2. Get Error Context

code
Use mcp__jenkins__searchBuildLog with pattern matching:
- "error" (case insensitive)
- "failed"
- "Exception"
- "panic"

3. Analyze Stage

Look at the stage name to determine which pipeline component failed:

  • "Build DNAs" → Rust/WASM issues
  • "Build App" → Angular/TypeScript issues
  • "Seed Content" → Doorway/connection issues
  • "Deploy" → K8s/Docker issues

4. Check Dependencies

code
Use mcp__jenkins__getBuildChangeSets to see what changed
Use mcp__jenkins__getBuildScm for commit info

Triggering Builds

Retry Failed Build

code
Use mcp__jenkins__triggerBuild with jobFullName="elohim-holochain"

Trigger with Parameters

code
Use mcp__jenkins__triggerBuild with:
  jobFullName="elohim-genesis"
  parameters={"SKIP_SEEDING": "false", "ENVIRONMENT": "dev"}

Key Jenkinsfile Locations

FilePurpose
/projects/elohim/JenkinsfileRoot orchestrator
/projects/elohim/orchestrator/JenkinsfilePipeline controller
/projects/elohim/holochain/JenkinsfileDNA/hApp builds
/projects/elohim/genesis/JenkinsfileSeeding pipeline
/projects/elohim/steward/JenkinsfileDesktop app

Artifact Flow

code
elohim-holochain
    ↓ elohim.happ
elohim-edge
    ↓ doorway:tag, storage:tag
elohim-app
    ↓ dist/elohim-app
elohim-genesis
    ↓ seed verification

Each pipeline fetches artifacts from upstream jobs. Check artifact availability if builds fail at fetch stages.

Quick Diagnostics Commands

Check all pipeline health

code
1. mcp__jenkins__getStatus (overall Jenkins)
2. mcp__jenkins__getJobs (list jobs)
3. For each job: mcp__jenkins__getBuild (latest status)

Investigate specific failure

code
1. mcp__jenkins__getBuild with jobFullName + buildNumber
2. mcp__jenkins__getBuildLog with limit=-200 (tail)
3. mcp__jenkins__searchBuildLog with error patterns
4. mcp__jenkins__getTestResults if tests failed

Check deployment health

code
After genesis completes, verify via:
- stats:dev / stats:prod commands
- Doorway health endpoints
- Application smoke tests