Kubernetes Troubleshooting Skill
Systematic debugging for Kubernetes issues.
When to Use
- •Pods stuck in Pending/CrashLoopBackOff
- •OOMKilled containers
- •Service connectivity issues
- •Deployment rollout failures
- •PVC/storage problems
Diagnostic Flow
1. Get Status
bash
kubectl get pods -o wide kubectl get events --sort-by='.lastTimestamp' kubectl describe pod <pod>
2. Check Logs
bash
kubectl logs <pod> --previous # crashed container kubectl logs <pod> -c <container> # specific container stern <pod-prefix> # multiple pods
3. Resource Issues
bash
kubectl top pods kubectl describe node <node> | grep -A5 "Allocated resources"
Common Issues
Pending Pod
| Cause | Check | Fix |
|---|---|---|
| No resources | kubectl describe pod -> Events | Increase limits or add nodes |
| No matching node | Check nodeSelector/affinity | Fix selectors |
| PVC not bound | kubectl get pvc | Check storage class |
CrashLoopBackOff
| Cause | Check | Fix |
|---|---|---|
| App error | kubectl logs --previous | Fix app code |
| Missing config | Check ConfigMap/Secret mounts | Create missing resources |
| Bad command | Check command/args in spec | Fix entrypoint |
| OOMKilled | kubectl describe pod -> State | Increase memory limit |
ImagePullBackOff
| Cause | Check | Fix |
|---|---|---|
| Wrong image | Check image name/tag | Fix image reference |
| Private registry | Check imagePullSecrets | Add registry credentials |
| Rate limit | Check events | Use registry mirror |
Service Not Reachable
bash
# Check endpoints exist kubectl get endpoints <service> # Check selector matches pods kubectl get pods -l <selector> # Test from inside cluster kubectl run debug --rm -it --image=alpine -- wget -qO- <service>:<port>
Quick Commands
bash
# All failing pods kubectl get pods --field-selector=status.phase!=Running # Events for namespace kubectl get events --sort-by='.lastTimestamp' -n <ns> # Resource usage kubectl top pods --sort-by=memory # Shell into pod kubectl exec -it <pod> -- /bin/sh # Port forward for debugging kubectl port-forward <pod> 8080:80 # Restart deployment kubectl rollout restart deployment/<name> # Check rollout status kubectl rollout status deployment/<name>
Log Patterns to Search
bash
# Errors kubectl logs <pod> | grep -i error # Python tracebacks kubectl logs <pod> | grep -A 20 "Traceback" # OOM kubectl logs <pod> | grep -i "out of memory\|oom\|killed"