Incident Diagnosis Skill
Systematically diagnose incidents by collecting data from multiple sources (K8s, metrics, logs).
When to Use This Skill
- •Responding to alerts
- •Diagnosing service degradation
- •Collecting incident context
- •Understanding root cause
- •Escalating with full context
Steps
- •Collect K8s state — Get pods, events, resources
- •Check metrics — Query Prometheus for trends
- •Review logs — Search Loki for errors
- •Correlate data — Find patterns across sources
- •Identify root cause — Match patterns to known issues
- •Suggest remediation — Recommend actions