AWS Cost Optimization & FinOps
Systematic workflows for AWS cost optimization and financial operations management.
When to Use This Skill
Use this skill when you need to:
- •Find cost savings: Identify unused resources, rightsizing opportunities, or commitment discounts
- •Analyze spending: Understand cost trends, detect anomalies, or break down costs
- •Optimize architecture: Choose cost-effective services, storage tiers, or instance types
- •Implement FinOps: Set up governance, tagging, budgets, or monthly reviews
- •Make purchase decisions: Evaluate Reserved Instances, Savings Plans, or Spot instances
- •Troubleshoot costs: Investigate unexpected bills or cost spikes
- •Plan budgets: Forecast costs or evaluate impact of new projects
Cost Optimization Workflow
Follow this systematic approach for AWS cost optimization:
┌─────────────────────────────────────────────┐
│ 1. DISCOVER │
│ What are we spending money on? │
│ Run: find_unused_resources.py │
│ Run: cost_anomaly_detector.py │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 2. ANALYZE │
│ Where are the optimization opportunities?│
│ Run: rightsizing_analyzer.py │
│ Run: detect_old_generations.py │
│ Run: spot_recommendations.py │
│ Run: analyze_ri_recommendations.py │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 3. PRIORITIZE │
│ What should we optimize first? │
│ - Quick wins (low risk, high savings) │
│ - Low-hanging fruit (easy to implement) │
│ - Strategic improvements │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 4. IMPLEMENT │
│ Execute optimization actions │
│ - Delete unused resources │
│ - Rightsize instances │
│ - Purchase commitments │
│ - Migrate to new generations │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 5. MONITOR │
│ Verify savings and track metrics │
│ - Monthly cost reviews │
│ - Tag compliance monitoring │
│ - Budget variance tracking │
└─────────────────────────────────────────────┘
Core Workflows
Workflow 1: Monthly Cost Optimization Review
Frequency: Run monthly (first week of each month)
Step 1: Find Unused Resources
# Scan for waste across all resources python3 scripts/find_unused_resources.py # Expected output: # - Unattached EBS volumes # - Old snapshots # - Unused Elastic IPs # - Idle NAT Gateways # - Idle EC2 instances # - Unused load balancers # - Estimated monthly savings
Step 2: Analyze Cost Anomalies
# Detect unusual spending patterns python3 scripts/cost_anomaly_detector.py --days 30 # Expected output: # - Cost spikes and anomalies # - Top cost drivers # - Period-over-period comparison # - 30-day forecast
Step 3: Identify Rightsizing Opportunities
# Find oversized instances python3 scripts/rightsizing_analyzer.py --days 30 # Expected output: # - EC2 instances with low utilization # - RDS instances with low utilization # - Recommended smaller instance types # - Estimated savings
Step 4: Generate Monthly Report
# Use the template to compile findings cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md # Fill in: # - Findings from scripts # - Action items # - Team cost breakdowns # - Optimization wins
Step 5: Team Review Meeting
- •Present findings to engineering teams
- •Assign optimization tasks
- •Track action items to completion
Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)
When: Quarterly or when usage patterns stabilize
Step 1: Analyze Current Usage
# Identify workloads suitable for commitments python3 scripts/analyze_ri_recommendations.py --days 60 # Looks for: # - EC2 instances running consistently for 60+ days # - RDS instances with stable usage # - Calculates ROI for 1yr vs 3yr commitments
Step 2: Review Recommendations
Evaluate each recommendation:
✅ Good candidate if: - Running 24/7 for 60+ days - Workload is stable and predictable - No plans to change architecture - Savings > 30% ❌ Poor candidate if: - Workload is variable or experimental - Architecture changes planned - Instance type may change - Dev/test environment
Step 3: Choose Commitment Type
Reserved Instances:
- •Standard RI: Highest discount (63%), no flexibility
- •Convertible RI: Moderate discount (54%), can change instance type
- •Best for: Specific instance types, stable workloads
Savings Plans:
- •Compute SP: Flexible across instance types, regions (66% savings)
- •EC2 Instance SP: Flexible across sizes in same family (72% savings)
- •Best for: Variable workloads within constraints
Decision Matrix:
Known instance type, won't change → Standard RI May need to change types → Convertible RI or Compute SP Variable workloads → Compute Savings Plan Maximum flexibility → Compute Savings Plan
Step 4: Purchase and Track
- •Purchase through AWS Console or CLI
- •Tag commitments with purchase date and owner
- •Monitor utilization monthly
- •Aim for >90% utilization
Reference: See references/best_practices.md for detailed commitment strategies
Workflow 3: Instance Generation Migration
When: During architecture reviews or optimization sprints
Step 1: Detect Old Instances
# Find outdated instance generations python3 scripts/detect_old_generations.py # Identifies: # - t2 → t3 migrations (10% savings) # - m4 → m5 → m6i migrations # - Intel → Graviton opportunities (20% savings)
Step 2: Prioritize Migrations
Quick Wins (Low Risk):
t2 → t3: Drop-in replacement, 10% savings m4 → m5: Better performance, 5% savings gp2 → gp3: No downtime, 20% savings
Medium Effort (Test Required):
x86 → Graviton (ARM64): 20% savings - Requires ARM64 compatibility testing - Most modern frameworks support ARM64 - Test in staging first
Step 3: Execute Migration
For EC2 (x86 to x86):
- •Stop instance
- •Change instance type
- •Start instance
- •Verify application
For Graviton Migration:
- •Create ARM64 AMI or Docker image
- •Launch new Graviton instance
- •Test thoroughly
- •Cut over traffic
- •Terminate old instance
Step 4: Validate Savings
- •Monitor new costs in Cost Explorer
- •Verify performance is acceptable
- •Document migration for other teams
Reference: See references/best_practices.md → Compute Optimization
Workflow 4: Spot Instance Evaluation
When: For fault-tolerant workloads or Auto Scaling Groups
Step 1: Identify Candidates
# Analyze workloads for Spot suitability python3 scripts/spot_recommendations.py # Evaluates: # - Instances in Auto Scaling Groups (good candidates) # - Dev/test/staging environments # - Batch processing workloads # - CI/CD and build servers
Step 2: Assess Suitability
Excellent for Spot:
- •Stateless applications
- •Batch jobs
- •CI/CD pipelines
- •Data processing
- •Auto Scaling Groups
NOT suitable for Spot:
- •Databases (without replicas)
- •Stateful applications
- •Real-time services
- •Mission-critical workloads
Step 3: Implementation Strategy
Option 1: Fargate Spot (Easiest)
# ECS task definition
requiresCompatibilities:
- FARGATE
capacityProviderStrategy:
- capacityProvider: FARGATE_SPOT
weight: 70 # 70% Spot
- capacityProvider: FARGATE
weight: 30 # 30% On-Demand
Option 2: EC2 Auto Scaling with Spot
# Mixed instances policy
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 2
OnDemandPercentageAboveBaseCapacity: 30
SpotAllocationStrategy: capacity-optimized
LaunchTemplate:
Overrides:
- InstanceType: m5.large
- InstanceType: m5a.large
- InstanceType: m5n.large
Option 3: EC2 Spot Fleet
# Create Spot Fleet with diverse instance types aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json
Step 4: Implement Interruption Handling
# Handle 2-minute termination notice # Instance metadata: /latest/meta-data/spot/instance-action # In application: 1. Poll for termination notice 2. Gracefully shutdown (save state) 3. Drain connections 4. Exit
Reference: See references/best_practices.md → Compute Optimization → Spot Instances
Quick Reference: Cost Optimization Scripts
All Scripts Location
ls scripts/ # find_unused_resources.py # analyze_ri_recommendations.py # detect_old_generations.py # spot_recommendations.py # rightsizing_analyzer.py # cost_anomaly_detector.py
Script Usage Patterns
Monthly Review (Run all):
python3 scripts/find_unused_resources.py python3 scripts/cost_anomaly_detector.py --days 30 python3 scripts/rightsizing_analyzer.py --days 30
Quarterly Optimization:
python3 scripts/analyze_ri_recommendations.py --days 60 python3 scripts/detect_old_generations.py python3 scripts/spot_recommendations.py
Specific Region Only:
python3 scripts/find_unused_resources.py --region us-east-1 python3 scripts/rightsizing_analyzer.py --region us-west-2
Named AWS Profile:
python3 scripts/find_unused_resources.py --profile production python3 scripts/cost_anomaly_detector.py --profile production --days 60
Script Requirements
# Install dependencies pip install boto3 tabulate # AWS credentials required # Configure via: aws configure # Or use: --profile PROFILE_NAME
Service-Specific Optimization
Compute Optimization
Key Actions:
- •Migrate to Graviton (20% savings)
- •Use Spot for fault-tolerant workloads (70% savings)
- •Purchase RIs for stable workloads (40-65% savings)
- •Right-size oversized instances
Reference: references/best_practices.md → Compute Optimization
Storage Optimization
Key Actions:
- •Convert gp2 → gp3 (20% savings)
- •Implement S3 lifecycle policies (50-95% savings)
- •Delete old snapshots
- •Use S3 Intelligent-Tiering
Reference: references/best_practices.md → Storage Optimization
Network Optimization
Key Actions:
- •Replace NAT Gateways with VPC Endpoints (save $25-30/month each)
- •Use CloudFront to reduce data transfer costs
- •Colocate resources in same AZ when possible
Reference: references/best_practices.md → Network Optimization
Database Optimization
Key Actions:
- •Right-size RDS instances
- •Use gp3 storage (20% cheaper than gp2)
- •Evaluate Aurora Serverless for variable workloads
- •Purchase RDS Reserved Instances
Reference: references/best_practices.md → Database Optimization
Service Alternatives Decision Guide
Need help choosing between services?
Question: "Should I use EC2, Lambda, or Fargate?"
Answer: See references/service_alternatives.md → Compute Alternatives
Question: "Which S3 storage class should I use?"
Answer: See references/service_alternatives.md → Storage Alternatives
Question: "Should I use RDS or Aurora?"
Answer: See references/service_alternatives.md → Database Alternatives
Question: "NAT Gateway vs VPC Endpoint vs NAT Instance?"
Answer: See references/service_alternatives.md → Networking Alternatives
FinOps Governance & Process
Setting Up FinOps
Phase 1: Foundation (Month 1)
- •Enable Cost Explorer
- •Set up AWS Budgets
- •Define tagging strategy
- •Activate cost allocation tags
Phase 2: Visibility (Months 2-3)
- •Implement tagging enforcement
- •Run optimization scripts
- •Set up monthly reviews
- •Create team cost reports
Phase 3: Culture (Ongoing)
- •Cost metrics in engineering KPIs
- •Cost review in architecture decisions
- •Regular optimization sprints
- •FinOps champions in each team
Full Guide: See references/finops_governance.md
Monthly Review Process
Week 1: Data Collection
- •Run all optimization scripts
- •Export Cost & Usage Reports
- •Compile findings
Week 2: Analysis
- •Identify trends
- •Find opportunities
- •Prioritize actions
Week 3: Team Reviews
- •Present to engineering teams
- •Discuss optimizations
- •Assign action items
Week 4: Executive Reporting
- •Create executive summary
- •Forecast next quarter
- •Report optimization wins
Template: See assets/templates/monthly_cost_report.md
Detailed Process: See references/finops_governance.md → Monthly Review Process
Cost Optimization Checklist
Quick Wins (Do First)
- • Delete unattached EBS volumes
- • Delete old EBS snapshots (>90 days)
- • Release unused Elastic IPs
- • Convert gp2 → gp3 volumes
- • Stop/terminate idle EC2 instances
- • Enable S3 Intelligent-Tiering
- • Set up AWS Budgets and alerts
Medium Effort (This Quarter)
- • Right-size oversized instances
- • Migrate to newer instance generations
- • Purchase Reserved Instances for stable workloads
- • Implement S3 lifecycle policies
- • Replace NAT Gateways with VPC Endpoints (where applicable)
- • Enable automated resource scheduling (dev/test)
- • Implement tagging strategy and enforcement
Strategic Initiatives (Ongoing)
- • Migrate to Graviton instances
- • Implement Spot for fault-tolerant workloads
- • Establish monthly cost review process
- • Set up cost allocation by team
- • Implement chargeback/showback model
- • Create FinOps culture and practices
Troubleshooting Cost Issues
"My bill suddenly increased"
- •
Run cost anomaly detection:
bashpython3 scripts/cost_anomaly_detector.py --days 30
- •
Check Cost Explorer for service breakdown
- •
Review CloudTrail for resource creation events
- •
Check for AutoScaling events
- •
Verify no Reserved Instances expired
"I need to reduce costs by X%"
Follow the optimization workflow:
- •Run all discovery scripts
- •Calculate total potential savings
- •Prioritize by: Savings Amount × (1 / Effort)
- •Focus on quick wins first
- •Implement strategic changes for long-term
"How do I know if Reserved Instances make sense?"
Run RI analysis:
python3 scripts/analyze_ri_recommendations.py --days 60
Look for:
- •Instances running 60+ days consistently
- •Workloads that won't change
- •Savings > 30%
"Which resources can I safely delete?"
Run unused resource finder:
python3 scripts/find_unused_resources.py
Safe to delete (usually):
- •Unattached EBS volumes (after verifying)
- •Snapshots > 90 days (if backups exist elsewhere)
- •Unused Elastic IPs (after verifying not in DNS)
- •Stopped EC2 instances > 30 days (after confirming abandoned)
Always verify with resource owner before deletion!
Best Practices Summary
- •Tag Everything: Consistent tagging enables cost allocation and accountability
- •Monitor Continuously: Weekly script runs catch waste early
- •Review Monthly: Regular reviews prevent cost drift
- •Right-size Proactively: Don't wait for cost issues to optimize
- •Use Commitments Wisely: RIs/SPs for stable workloads only
- •Test Before Migrating: Especially for Graviton or Spot
- •Automate Cleanup: Scheduled shutdown of dev/test resources
- •Share Wins: Celebrate cost savings to build FinOps culture
Additional Resources
Detailed References:
- •
references/best_practices.md: Comprehensive optimization strategies - •
references/service_alternatives.md: Cost-effective service selection - •
references/finops_governance.md: Organizational FinOps practices
Templates:
- •
assets/templates/monthly_cost_report.md: Monthly reporting template
Scripts:
- •All scripts in
scripts/directory with--helpfor usage
AWS Documentation:
- •AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
- •AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
- •FinOps Foundation: https://www.finops.org