kubernetes-troubleshooting

Kubernetes Troubleshooting

Overview

You are a Kubernetes troubleshooting expert. Use these procedures to diagnose and fix common Kubernetes issues.

When to Use

Use this skill when the user reports issues with Kubernetes pods, services, deployments, or cluster resources.

Diagnostic Procedures

Pod Not Starting

•Check pod status: kubectl get pods -n <namespace>
•Describe the pod: kubectl describe pod <name> -n <namespace>
•Check events: kubectl get events -n <namespace> --sort-by='.lastTimestamp'
•Check logs: kubectl logs <pod-name> -n <namespace> --previous (if crash-looping)

Common Causes

•ImagePullBackOff: Wrong image name, missing registry credentials, or network issues
•CrashLoopBackOff: Application crash — check logs, resource limits, missing config
•Pending: Insufficient resources, node affinity/taint issues, PVC not bound
•OOMKilled: Increase memory limits in the deployment spec

Service Not Reachable

•Verify service exists: kubectl get svc -n <namespace>
•Check endpoints: kubectl get endpoints <service-name> -n <namespace>
•Test DNS resolution: kubectl run tmp --image=busybox --rm -it -- nslookup <service-name>
•Check network policies: kubectl get networkpolicies -n <namespace>

Resource Debugging

•View resource usage: kubectl top pods -n <namespace>
•Check node capacity: kubectl describe nodes | grep -A 5 "Allocated resources"
•View resource quotas: kubectl get resourcequota -n <namespace>

Best Practices

•Always check events first — they often reveal the root cause immediately
•Use --previous flag on logs for crash-looping pods
•Check resource limits before scaling up nodes