Cloud Cost Optimization

When to Use This Skill

Use when evaluating cloud spend, choosing compute procurement strategies, setting up cost monitoring, or managing GPU costs for ML workloads. Relevant for startups burning through credits or approaching profitability.

Compute Procurement Strategy

Workload Type	On-Demand	Reserved/Savings Plan	Spot/Preemptible	Strategy
Production API servers	Baseline	1yr for base load	No	Reserve 60-70% base, on-demand for peaks
Dev/staging environments	Off nights/weekends	No	Yes with fallback	Spot + scheduled shutdown
ML training (checkpointable)	No	No	Yes	Spot with checkpointing, 60-90% savings
ML inference (latency-sensitive)	Burst overflow	1yr for steady state	No	Reserve base, on-demand burst
Batch processing / ETL	No	No	Yes	Spot with retry logic
CI/CD runners	No	No	Yes	Spot with on-demand fallback
GPU fine-tuning	No	No	Yes	Spot A100/H100, checkpoint every 30min
Database (RDS/CloudSQL)	No	1yr reserved	No	Always reserve; 30-40% savings

Startup Credits Programs

Provider	Program	Amount	Duration	Gotcha
AWS	Activate	$10K-$100K	1-2 years	Must apply through accelerator/VC
GCP	for Startups	$100K-$200K	1-2 years	Requires <$100K revenue
Azure	for Startups	$5K-$150K	1-2 years	Linked to Founders Hub tier
CoreWeave	Startup program	Varies	Varies	GPU-focused, good for ML
Lambda Labs	Direct	Pay-as-go	N/A	Often cheapest H100 spot

AWS Cost Explorer Queries

Monthly Cost Breakdown by Service

bash

aws ce get-cost-and-usage \
  --time-period Start=$(date -d '-30 days' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  | jq '.ResultsByTime[0].Groups
        | sort_by(.Metrics.BlendedCost.Amount | tonumber)
        | reverse | .[:10]'

Find Idle Resources

bash

# Unattached EBS volumes (pure waste)
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType}' \
  --output table

# Idle Elastic IPs (charged when unattached)
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].{IP:PublicIp,AllocId:AllocationId}' \
  --output table

Spot Instance Fallback Pattern

Terraform with Mixed Instance Policy

hcl

resource "aws_autoscaling_group" "workers" {
  desired_capacity = 4
  max_size         = 10
  min_size         = 2

  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 1  # 1 guaranteed instance
      on_demand_percentage_above_base_capacity = 0  # rest are spot
      spot_allocation_strategy                 = "capacity-optimized"
      spot_max_price                           = "" # on-demand price cap
    }
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.worker.id
        version            = "$Latest"
      }
      override { instance_type = "m6i.xlarge"; weighted_capacity = "1" }
      override { instance_type = "m6a.xlarge"; weighted_capacity = "1" }
      override { instance_type = "m5.xlarge";  weighted_capacity = "1" }
      override { instance_type = "m5a.xlarge"; weighted_capacity = "1" }
    }
  }
}

Spot Interruption Handler (Python)

python

import requests, signal, sys, threading, time

METADATA_URL = "http://169.254.169.254/latest/meta-data/spot/instance-action"

def check_spot_interruption() -> dict | None:
    try:
        resp = requests.get(METADATA_URL, timeout=1)
        if resp.status_code == 200:
            return resp.json()
    except requests.exceptions.RequestException:
        pass
    return None

def graceful_shutdown(checkpoint_fn):
    def handler(signum, frame):
        checkpoint_fn()
        sys.exit(0)
    signal.signal(signal.SIGTERM, handler)

    def poll():
        while True:
            if check_spot_interruption():
                checkpoint_fn()
                sys.exit(0)
            time.sleep(5)
    threading.Thread(target=poll, daemon=True).start()

Auto-Scaling Policies

Target Tracking with Scale-Down Protection

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65

Scheduled Scaling for Dev Environments

bash

# Scale down at 20:00 weekdays
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name dev-workers \
  --min-size 0 --max-size 0 --desired-capacity 0

# Scale up at 08:00 weekdays
aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name dev-workers \
  --min-size 1 --max-size 4 --desired-capacity 2

Budget Alerts

Terraform Budget with Threshold Alerts

hcl

resource "aws_budgets_budget" "monthly" {
  name         = "monthly-total"
  budget_type  = "COST"
  limit_amount = "5000"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  dynamic "notification" {
    for_each = [50, 80, 100, 120]
    content {
      comparison_operator       = "GREATER_THAN"
      threshold                 = notification.value
      threshold_type            = "PERCENTAGE"
      notification_type         = notification.value <= 100 ? "FORECASTED" : "ACTUAL"
      subscriber_email_addresses = ["eng-leads@company.com"]
      subscriber_sns_topic_arns  = [aws_sns_topic.budget_alerts.arn]
    }
  }
}

resource "aws_budgets_budget" "gpu" {
  name         = "gpu-spend"
  budget_type  = "COST"
  limit_amount = "2000"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  cost_filter {
    name   = "Service"
    values = ["Amazon Elastic Compute Cloud - Compute"]
  }

  notification {
    comparison_operator       = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = ["ml-team@company.com"]
  }
}

Gotchas and Anti-Patterns

Hidden Costs: Data Egress

•Problem: Cross-region and internet egress charges silently accumulate. $0.09/GB adds up with large datasets or high-traffic APIs.
•Fix: Keep compute and storage co-located. Use CloudFront/CDN for public content. Monitor DataTransfer-Out-Bytes. Watch NAT Gateway costs ($0.045/GB).

Hidden Costs: API Calls and Storage

•Problem: S3 LIST requests cost 5x GET. DynamoDB capacity units surprise teams. CloudWatch log ingestion at $0.50/GB.
•Fix: Audit API call patterns. Set S3 lifecycle policies. Use DynamoDB on-demand for unpredictable workloads.

GPU Idle Waste

•Problem: GPU instances running 24/7 for jobs that run hours/day. Idle p4d.24xlarge costs ~$800/day.
•Fix: Use spot with checkpointing. Implement job queues that acquire/release GPUs per job. Alert on GPU utilization < 10% over 1 hour.

Over-Provisioned Dev Environments

•Problem: Dev mirrors production sizing. 8 devs on m5.2xlarge 24/7 = $4,300/month mostly idle.
•Fix: Use t3.medium for dev. Schedule auto-shutdown outside business hours (saves 65%). Use cloud workstations on demand.