AgentSkillsCN

Cost Optimization

AWS成本分析、Cost Explorer及节支建议方面的专家。

SKILL.md
--- frontmatter
name: Cost Optimization
description: Specialist in AWS cost analysis, Cost Explorer, and savings recommendations.

Cost Optimization Engineer

Overview

This skill focuses on analyzing AWS costs, identifying optimization opportunities, and providing actionable recommendations to reduce cloud spending while maintaining performance and reliability.

Core Capabilities

  • AWS Cost Explorer data analysis
  • Identifying idle and underutilized resources
  • Right-sizing recommendations
  • Reserved Instance and Savings Plan analysis
  • Cost trend analysis and anomaly detection
  • Scheduler ROI calculations

Instructions

1. Understanding Cost Analysis Constraints

Important Limits:

  • AWS Cost Explorer API only provides data for the last 14 months
  • Do NOT request data older than 14 months
  • Cost data has typically 24-48 hour delay (not real-time)
  • Granularity options: DAILY, MONTHLY, HOURLY (hourly limited to last 14 days)

2. Basic Cost Analysis Workflow

Step 1: Get Overall Account Costs

bash
# Last 30 days total cost
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --profile <profile>

Step 2: Break Down by Service

bash
# Cost by service for last month
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '1 month ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --profile <profile>

Step 3: Identify Top Cost Drivers

Parse the JSON output to identify services with highest costs, then drill down into those services.

3. EC2 Cost Analysis

Daily EC2 costs for last 30 days:

bash
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics "UnblendedCost" \
  --filter file:///tmp/ec2-filter.json \
  --profile <profile>

Where /tmp/ec2-filter.json:

json
{
  "Dimensions": {
    "Key": "SERVICE",
    "Values": ["Amazon Elastic Compute Cloud - Compute"]
  }
}

Break down EC2 by instance type:

bash
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics "UnblendedCost" \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE \
  --filter file:///tmp/ec2-filter.json \
  --profile <profile>

Identify Idle EC2 Instances:

bash
# List running instances
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,LaunchTime]' \
  --output table \
  --profile <profile>

# Check CPU utilization (last 7 days)
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=<instance-id> \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 \
  --statistics Average,Maximum \
  --profile <profile>

Recommendation: Instances with <5% avg CPU and no significant network I/O are candidates for shutdown or downsizing.

4. RDS Cost Analysis

RDS costs for last 30 days:

bash
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics "UnblendedCost" \
  --filter file:///tmp/rds-filter.json \
  --profile <profile>

Where /tmp/rds-filter.json:

json
{
  "Dimensions": {
    "Key": "SERVICE",
    "Values": ["Amazon Relational Database Service"]
  }
}

List RDS instances and check utilization:

bash
aws rds describe-db-instances \
  --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,Engine,MultiAZ]' \
  --output table \
  --profile <profile>

# Check database connections
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name DatabaseConnections \
  --dimensions Name=DBInstanceIdentifier,Value=<db-id> \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 \
  --statistics Average,Maximum \
  --profile <profile>

Optimization Opportunities:

  • Dev/test databases running 24/7 → Add to scheduler
  • Multi-AZ in non-production → Consider single-AZ
  • Over-provisioned instances with low CPU → Downsize

5. EBS Volume Cost Analysis

Identify unattached EBS volumes:

bash
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].[VolumeId,Size,VolumeType,CreateTime]' \
  --output table \
  --profile <profile>

Estimate savings: Unattached volumes are 100% waste. Calculate: Volume_Size_GB * $0.10/GB/month (approximate gp3 cost).

6. Scheduler ROI Calculation

Calculate potential savings from scheduling:

  1. Identify resources in scheduler:

    bash
    # This would query your DynamoDB table
    aws dynamodb scan \
      --table-name nucleus-ops-main \
      --filter-expression "begins_with(SK, :sk)" \
      --expression-attribute-values '{":sk":{"S":"SCHEDULE#"}}' \
      --profile <profile>
    
  2. Estimate hourly costs:

    • EC2: instance_hours * hourly_rate
    • RDS: instance_hours * hourly_rate
    • Calculate runtime reduction (e.g., 168 hrs/week → 50 hrs/week = 70% reduction)
  3. Monthly savings: hourly_cost * hours_saved * 30 days

7. Cost Anomaly Detection

Monitor month-over-month trends:

bash
# Compare current month to last month
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '60 days ago' +%Y-%m-01),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --profile <profile>

Identify anomalies:

  • Services with >30% month-over-month increase
  • New services that appeared suddenly
  • Cost spikes on specific days

8. Multi-Account Cost Analysis

When analyzing costs across multiple accounts:

  1. Get credentials for each account using get_aws_credentials
  2. Run cost queries for each account
  3. Aggregate and compare:
    • Total costs per account
    • Cost per service per account
    • Identify accounts with highest spend
    • Find optimization opportunities unique to each account

Format findings clearly:

code
Account: Production (123456789012)
- Total: $5,234/month
- Top services: EC2 ($2,100), RDS ($1,800), S3 ($500)
- Savings opportunity: $1,470/month (28%)

Account: Staging (987654321098)
- Total: $2,456/month
- Top services: EC2 ($1,200), RDS ($900)
- Savings opportunity: $1,840/month (75%) - schedule resources!

9. Actionable Recommendations

Always provide specific, actionable recommendations:

Good recommendation: "Instance i-0123456789 (t3.large) has averaged 3% CPU over 30 days. Recommend downgrading to t3.small to save ~$35/month."

Bad recommendation: "Some instances are underutilized. Consider optimization."

10. Cost Optimization Report Template

When asked for a cost optimization report, structure as:

  1. Executive Summary

    • Total monthly cost
    • Potential monthly savings
    • Savings percentage
  2. Top Cost Drivers

    • Service breakdown
    • Instance counts and types
  3. Quick Wins (easy, high-impact)

    • Unattached EBS volumes → Delete
    • Stopped instances → Terminate or start
    • Idle resources → Add to scheduler
  4. Strategic Optimizations (requires planning)

    • Right-sizing recommendations
    • Reserved Instance opportunities
    • Architecture changes
  5. Scheduler Impact

    • Current scheduled resources
    • Estimated savings from scheduling
    • Additional resources to schedule

Best Practices

  • Always specify time ranges carefully - don't exceed 14 months
  • Use write_file tool to save JSON responses for complex analysis
  • Calculate ROI - show dollar impact, not just percentages
  • Prioritize quick wins - Sort recommendations by effort vs impact
  • Account for growth - Consider future capacity needs before downsizing

Example Workflow

User: "Analyze EC2 costs for the last 3 months"

  1. Get overall account cost for last 90 days
  2. Filter by EC2 service, break down by instance type
  3. List all running EC2 instances
  4. Check CPU and network utilization for top cost instances
  5. Identify idle instances (<5% CPU)
  6. Calculate savings from stopping idle instances
  7. Provide formatted report with specific instance IDs and $ savings