Kubernetes Autoscaling
Comprehensive autoscaling using HPA, VPA, and KEDA with kubectl-mcp-server tools.
Quick Reference
HPA (Horizontal Pod Autoscaler)
Basic CPU-based scaling:
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Apply and verify:
code
apply_manifest(hpa_yaml, namespace) get_hpa(namespace)
VPA (Vertical Pod Autoscaler)
Right-size resource requests:
yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
KEDA (Event-Driven Autoscaling)
Detect KEDA Installation
code
keda_detect_tool()
List ScaledObjects
code
keda_scaledobjects_list_tool(namespace) keda_scaledobject_get_tool(name, namespace)
List ScaledJobs
code
keda_scaledjobs_list_tool(namespace)
Trigger Authentication
code
keda_triggerauths_list_tool(namespace) keda_triggerauth_get_tool(name, namespace)
KEDA-Managed HPAs
code
keda_hpa_list_tool(namespace)
See KEDA-TRIGGERS.md for trigger configurations.
Common KEDA Triggers
Queue-Based Scaling (AWS SQS)
yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-scaler
spec:
scaleTargetRef:
name: queue-processor
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 100
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.region.amazonaws.com/...
queueLength: "5"
Cron-Based Scaling
yaml
triggers:
- type: cron
metadata:
timezone: America/New_York
start: 0 8 * * 1-5 # 8 AM weekdays
end: 0 18 * * 1-5 # 6 PM weekdays
desiredReplicas: "10"
Prometheus Metrics
yaml
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
query: sum(rate(http_requests_total{app="myapp"}[2m]))
threshold: "100"
Scaling Strategies
| Strategy | Tool | Use Case |
|---|---|---|
| CPU/Memory | HPA | Steady traffic patterns |
| Custom metrics | HPA v2 | Business metrics |
| Event-driven | KEDA | Queue processing, cron |
| Vertical | VPA | Right-size requests |
| Scale to zero | KEDA | Cost savings, idle workloads |
Cost-Optimized Autoscaling
Scale to Zero with KEDA
Reduce costs for idle workloads:
code
keda_scaledobjects_list_tool(namespace) # ScaledObjects with minReplicaCount: 0 can scale to zero
Right-Size with VPA
Get recommendations and apply:
code
get_resource_recommendations(namespace) # Apply VPA recommendations
Predictive Scaling
Use cron triggers for known patterns:
yaml
# Scale up before traffic spike
triggers:
- type: cron
metadata:
start: 0 7 * * * # 7 AM
end: 0 9 * * * # 9 AM
desiredReplicas: "20"
Multi-Cluster Autoscaling
Configure KEDA across clusters:
code
keda_scaledobjects_list_tool(namespace, context="production") keda_scaledobjects_list_tool(namespace, context="staging")
Troubleshooting
HPA Not Scaling
code
get_hpa(namespace) get_pod_metrics(name, namespace) # Metrics available? describe_pod(name, namespace) # Resource requests set?
KEDA Not Triggering
code
keda_scaledobject_get_tool(name, namespace) # Check status get_events(namespace) # Check events
Common Issues
| Symptom | Check | Resolution |
|---|---|---|
| HPA unknown | Metrics server | Install metrics-server |
| KEDA no scale | Trigger auth | Check TriggerAuthentication |
| VPA not updating | Update mode | Set updateMode: Auto |
| Scale down slow | Stabilization | Adjust stabilizationWindowSeconds |
Best Practices
- •
Always Set Resource Requests
- •HPA requires requests to calculate utilization
- •
Use Multiple Metrics
- •Combine CPU + custom metrics for accuracy
- •
Stabilization Windows
- •Prevent flapping with scaleDown stabilization
- •
Scale to Zero Carefully
- •Consider cold start time
- •Use activation threshold
Related Skills
- •k8s-cost - Cost optimization
- •k8s-troubleshoot - Debug scaling issues