AgentSkillsCN

infrastructure

针对 Kubernetes 与 AWS 的基础设施调试。在排查 Pod 崩溃、部署问题、资源不足、容器故障,或云基础设施相关问题时使用。

SKILL.md
--- frontmatter
name: infrastructure
description: Infrastructure debugging for Kubernetes and AWS. Use when investigating pod crashes, deployment issues, resource problems, container failures, or cloud infrastructure issues.

Infrastructure Debugging

Available Domains

Kubernetes

For pod crashes, deployment issues, resource problems, container failures. Use: /infrastructure-kubernetes

AWS (future)

For EC2, ECS, Lambda, and CloudWatch issues. Coming soon.

Quick Reference

Kubernetes Issues

bash
# List pods in namespace
python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n otel-demo

# Get pod events (ALWAYS check first!)
python .claude/skills/infrastructure-kubernetes/scripts/get_events.py <pod-name> -n otel-demo

# Get pod logs
python .claude/skills/infrastructure-kubernetes/scripts/get_logs.py <pod-name> -n otel-demo --tail 100

Common Patterns

SymptomFirst ActionScript
Pod CrashLoopBackOffCheck eventsget_events.py
Pod OOMKilledCheck resourcesget_resources.py
Pod PendingCheck events + nodesget_events.py
Deployment stuckCheck rollout historyget_history.py