AgentSkillsCN

rca

根因分析工作流程——系统化地探究故障成因。

SKILL.md
--- frontmatter
name: rca
description: Root cause analysis workflows - systematic investigation of failures

📊 View workflow diagram

RCA Skills

Root cause analysis workflows for systematic failure investigation.

Auto-Select Sub-Skill

When this skill is invoked, determine the right sub-skill based on context:

Step 1: Determine what's available

Check for HyperShift cluster:

bash
ls ~/clusters/hcp/kagenti-hypershift-custom-*/auth/kubeconfig 2>/dev/null

Check for Kind cluster:

bash
kind get clusters 2>/dev/null

Step 2: Route based on failure source and access

code
Where did the failure occur?
    │
    ├─ CI pipeline (GitHub Actions) ─────────────────────────┐
    │                                                         │
    │   Do you have a live cluster matching the CI env?       │
    │       │                                                 │
    │       ├─ HyperShift cluster available                   │
    │       │   → Use `rca:hypershift` (deep investigation)   │
    │       │                                                 │
    │       ├─ Kind cluster available (for Kind CI failures)  │
    │       │   → Use `rca:kind` (reproduce locally)          │
    │       │                                                 │
    │       └─ No cluster                                     │
    │           → Use `rca:ci` (logs and artifacts only)      │
    │           → If inconclusive, ask user to create cluster │
    │                                                         │
    ├─ Local Kind cluster ──────────────────────────────────┐ │
    │   → Use `rca:kind` (full local access)                │ │
    │                                                       │ │
    └─ HyperShift cluster ─────────────────────────────────┐│ │
        → Use `rca:hypershift` (full remote access)        ││ │
                                                           ││ │
After RCA is complete, switch to TDD for fix iteration: ◄──┘┘ │
    - `tdd:ci` (CI-only)                                       │
    - `tdd:hypershift` (live cluster)                          │
    - `tdd:kind` (local cluster)                               │

Available Skills

SkillAccessAuto-approveBest for
rca:ciCI logs/artifacts onlyN/ACI failures, no cluster
rca:hypershiftFull cluster accessAll read opsDeep investigation
rca:kindFull local accessAll opsKind failures, fast repro

Related Skills

  • tdd:ci - Fix iteration after RCA (CI-driven)
  • tdd:hypershift - Fix iteration with live cluster
  • tdd:kind - Fix iteration on Kind
  • k8s:logs - Query and analyze component logs
  • k8s:pods - Debug pod issues