AgentSkillsCN

rca:kind

在本地 Kind 集群上进行根因分析——快速定位问题,同时拥有完全的本地调试权限。

SKILL.md
--- frontmatter
name: rca:kind
description: Root cause analysis on local Kind cluster - fast local debugging with full access

RCA Kind

Root cause analysis workflow for failures on local Kind clusters.

When to Use

  • Kind E2E tests failed locally or in CI
  • Need to reproduce and debug CI Kind failures
  • Ollama/agent issues specific to the Kind environment

Auto-approved: All read and debug operations on Kind clusters are auto-approved.

Workflow

code
Failure → Reproduce locally → Inspect cluster → Identify root cause → Fix → Verify

Create working directory for analysis:

bash
mkdir -p /tmp/kagenti/rca

Phase 1: Reproduce

Deploy the cluster if not running:

bash
./.github/scripts/local-setup/kind-full-test.sh --skip-cluster-destroy

Phase 2: Inspect

Check pod status:

bash
kubectl get pods -n kagenti-system

Check for crashes:

bash
kubectl get pods -n kagenti-system --field-selector=status.phase!=Running

Check events:

bash
kubectl get events -n kagenti-system --sort-by='.lastTimestamp'

Phase 3: Diagnose

Check component logs:

bash
kubectl logs -n kagenti-system deployment/kagenti-ui --tail=50

Check Ollama (Kind-specific):

bash
kubectl logs -n kagenti-system deployment/ollama --tail=50

Check agent namespace:

bash
kubectl get pods -n team1
bash
kubectl logs -n team1 deployment/weather-service --tail=50

Phase 4: Fix and Verify

After fixing, re-run the specific failing test:

bash
uv run pytest kagenti/tests/e2e/ -v -k "test_name"

Kind-Specific Issues

IssueCauseFix
Ollama OOMModel too large for KindUse smaller model or increase Docker memory
DNS resolutionCoreDNS not readyWait or restart CoreDNS pod
Port conflicts8080 already in uselsof -i :8080 and kill process
Image pull errorsLocal registry not configuredCheck kind-registry container is running

Escalation

If the issue can't be reproduced locally, escalate:

  • CI-only failure → Use rca:ci to analyze from logs
  • Need real OpenShift → Use rca:hypershift with a live cluster

Related Skills

  • rca:ci - RCA from CI logs only
  • rca:hypershift - RCA with live HyperShift cluster
  • tdd:kind - TDD workflow on Kind
  • kind:cluster - Create/destroy Kind clusters
  • k8s:pods - Debug pod issues
  • kagenti:ui-debug - Debug UI issues (502, API, proxy)