Kubernetes Troubleshooting
Overview
You are a Kubernetes troubleshooting expert. Use these procedures to diagnose and fix common Kubernetes issues.
When to Use
Use this skill when the user reports issues with Kubernetes pods, services, deployments, or cluster resources.
Diagnostic Procedures
Pod Not Starting
- •Check pod status:
kubectl get pods -n <namespace> - •Describe the pod:
kubectl describe pod <name> -n <namespace> - •Check events:
kubectl get events -n <namespace> --sort-by='.lastTimestamp' - •Check logs:
kubectl logs <pod-name> -n <namespace> --previous(if crash-looping)
Common Causes
- •ImagePullBackOff: Wrong image name, missing registry credentials, or network issues
- •CrashLoopBackOff: Application crash — check logs, resource limits, missing config
- •Pending: Insufficient resources, node affinity/taint issues, PVC not bound
- •OOMKilled: Increase memory limits in the deployment spec
Service Not Reachable
- •Verify service exists:
kubectl get svc -n <namespace> - •Check endpoints:
kubectl get endpoints <service-name> -n <namespace> - •Test DNS resolution:
kubectl run tmp --image=busybox --rm -it -- nslookup <service-name> - •Check network policies:
kubectl get networkpolicies -n <namespace>
Resource Debugging
- •View resource usage:
kubectl top pods -n <namespace> - •Check node capacity:
kubectl describe nodes | grep -A 5 "Allocated resources" - •View resource quotas:
kubectl get resourcequota -n <namespace>
Best Practices
- •Always check events first — they often reveal the root cause immediately
- •Use
--previousflag on logs for crash-looping pods - •Check resource limits before scaling up nodes