Service Cost Deep Dive

Name: service-cost-deep-dive
Rating: 92
Author: Cloudzero

Purpose

This skill provides comprehensive, detailed analysis of a specific cloud service's costs, breaking it down by all relevant dimensions and identifying service-specific optimization opportunities.

When to Use

•"Analyze my [service name] costs"
•"Deep dive into EC2 spending"
•"Break down RDS costs"
•"Why is [service] so expensive?"
•"Optimize my Lambda costs"
•Service-specific cost reviews
•Targeted optimization efforts
•Understanding service usage patterns
•Keywords: deep dive, analyze, breakdown, detailed, specific service, EC2, RDS, S3, Lambda, etc.

Prerequisites

This skill builds on the understand-cloudzero-organization skill.

Before applying this procedure:

•If you haven't already in this session, load the understand-cloudzero-organization skill and follow its instructions
•Reference the cached organization context (don't reload unnecessarily)

How This Skill Works

Step 1: Identify the Service

Determine which service to analyze:

code

# If user mentions service name, find exact FQDID
get_available_dimensions(filter="Service")

# Get all dimension values to find exact match
get_dimension_values(dimension="CZ:Service", match="[user's service name]")

Step 2: Overall Service Cost Analysis

Get high-level view of the service:

Total Service Cost:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    cost_type="real_cost"
)

Service Cost Trend:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    granularity="daily",
    cost_type="real_cost"
)

Calculate:

•Total cost for period
•Average daily cost
•Trend direction (growing/declining/stable)
•Percentage of total cloud spend

Step 3: Multi-Dimensional Breakdown

Break down service costs by all relevant dimensions:

By Account:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Account"],
    limit=20
)

By Region:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Region"],
    limit=20
)

By Account and Region:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Account", "CZ:Region"],
    limit=50
)

By Usage Type (if available):

code

# Discover if usage type dimension exists
get_available_dimensions(filter="UsageType")

# If available, group by it
get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:UsageType"],
    limit=50
)

By Resource (if available):

code

# Discover if resource dimension exists
get_available_dimensions(filter="Resource")

# If available, get top resources
get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Resource"],
    limit=50
)

Step 4: Tag-Based Analysis

Understand how service is used across environments and teams:

By Environment:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Tag:Environment"],
    limit=10
)

By Team (if tagged):

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Tag:Team"],
    limit=20
)

By Application (if tagged):

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["CZ:Tag:Application"],
    limit=20
)

Step 5: Custom Dimension Attribution

Use organization-specific dimensions:

code

# Discover custom dimensions
get_available_dimensions(filter="User:Defined")

# Analyze by custom dimensions
get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    group_by=["User:Defined:Team"],
    limit=20
)

Step 6: Untagged Resource Analysis

Identify resources without proper tagging:

code

# Look for costs that don't have environment tags
get_cost_data(
    filters={
        "CZ:Service": ["[service_name]"],
        "CZ:Tag:Environment": [""]  # Empty/untagged
    },
    group_by=["CZ:Account", "CZ:Region"],
    limit=50
)

Step 7: Time-Based Pattern Analysis

Understand usage patterns:

Hourly patterns (if looking at short period):

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    granularity="hourly",
    date_range="last 7 days"
)

Daily patterns:

code

get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    granularity="daily",
    date_range="last 90 days"
)

Identify:

•Weekday vs. weekend patterns
•Peak usage times
•Idle periods
•Unusual spikes

Step 8: Service-Specific Optimization Analysis

For Compute Services (EC2, ECS, EKS, Lambda):

•Instance type distribution
•Utilization patterns
•Rightsizing opportunities
•Spot instance eligibility
•Reserved Instance/Savings Plan coverage
•Idle/underutilized instances

For Storage Services (S3, EBS, EFS):

•Storage class distribution
•Growth rate
•Old/unused data
•Lifecycle policy opportunities
•Snapshot costs

For Database Services (RDS, DynamoDB, Redshift):

•Instance sizes and types
•Multi-AZ costs
•Backup costs
•Read replica costs
•Reserved Instance opportunities

For Data Transfer:

•Egress costs by destination
•Inter-region transfer
•Optimization through caching/CDN

For Serverless (Lambda, API Gateway):

•Request volume vs. cost
•Memory allocation efficiency
•Cold start impact
•Duration optimization opportunities

Step 9: Cost Type Comparison

Compare different cost perspectives:

code

# Real cost (default)
get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    cost_type="real_cost"
)

# On-demand cost (to calculate savings)
get_cost_data(
    filters={"CZ:Service": ["[service_name]"]},
    cost_type="on_demand_cost"
)

Calculate effective savings rate:

code

Savings Rate = ((On-Demand Cost - Real Cost) / On-Demand Cost) * 100

Output Format

Provide comprehensive service analysis:

1. Executive Summary

•Service name
•Total cost for period: $X
•Percentage of total cloud spend: X%
•Trend: [Growing/Stable/Declining] at X% rate
•Top optimization opportunity
•Estimated savings potential: $X

2. Service Cost Overview

Total Cost: $X,XXX Time Period: [dates] Daily Average: $XXX Trend: [Growing/Stable/Declining] Growth Rate: X% [MoM/WoW]

Cost Distribution:

•Percentage of total cloud spend: XX%
•Rank among all services: #X

3. Geographic Distribution

By Region:

Region	Cost	% of Service	Key Resources
us-east-1	$X,XXX	XX%	[Details]
us-west-2	$X,XXX	XX%	[Details]
...	...	...	...

Insights:

•Most expensive region: [Region] at $X
•Multi-region distribution: [Analysis]
•Regional efficiency differences: [Details]

4. Account Distribution

By Account:

Account	Cost	% of Service	Trend
Account A	$X,XXX	XX%	+X%
Account B	$X,XXX	XX%	-X%
...	...	...	...

Insights:

•Highest spending account: [Account]
•Fastest growing account: [Account] at +X%
•Accounts to investigate: [List with reasons]

5. Usage Breakdown

By Usage Type / Resource Type:

Type	Cost	% of Service	Notes
Type A	$X,XXX	XX%	[Details]
Type B	$X,XXX	XX%	[Details]
...	...	...	...

Insights:

•Most expensive usage type: [Type]
•Unusual or unexpected usage: [Details]

6. Tagging and Attribution

By Environment:

•Production: $X,XXX (XX%)
•Staging: $X,XXX (XX%)
•Development: $X,XXX (XX%)
•Untagged: $X,XXX (XX%) ⚠️

By Team/Application:

•
•
•Untagged: $X,XXX ⚠️

Tagging Issues:

•XX% of costs are untagged
•[Specific accounts/regions with tagging gaps]

7. Usage Patterns

Time-Based Patterns:

•Peak usage time: [Time] with $X/hour
•Off-peak usage: [Time] with $X/hour
•Weekend vs. weekday: [Comparison]
•Opportunities for scheduling: [Details]

Trend Analysis:

•7-day trend: [Pattern description]
•30-day trend: [Pattern description]
•Notable events: [Spikes or dips with dates]

8. Service-Specific Optimization Opportunities

[Customize based on service type]

For Compute (EC2 example):

•Rightsizing: [X instances appear oversized] - Potential savings: $X/month
•Reserved Instances: [Coverage is X%, opportunity for Y% more] - Potential savings: $X/month
•Spot Instances: [Workloads eligible for spot] - Potential savings: $X/month
•Idle Resources: [X instances with <10% utilization] - Potential savings: $X/month
•Instance Generation: [Old generation instances] - Upgrade for better price/performance

For Storage (S3 example):

•Storage Classes: [X TB eligible for Glacier/IA] - Potential savings: $X/month
•Lifecycle Policies: [Objects not using lifecycle rules] - Potential savings: $X/month
•Versioning: [Old versions consuming storage] - Potential savings: $X/month
•Incomplete Multipart Uploads: [Cleanup needed] - Potential savings: $X/month

For Databases (RDS example):

•Instance Sizing: [Over-provisioned instances] - Potential savings: $X/month
•Reserved Instances: [On-demand instances eligible] - Potential savings: $X/month
•Multi-AZ: [Non-prod shouldn't use Multi-AZ] - Potential savings: $X/month
•Backup Retention: [Excessive retention] - Potential savings: $X/month
•Read Replicas: [Underutilized replicas] - Potential savings: $X/month

9. Savings Analysis

Current Savings (if using RIs/SPs):

•On-Demand Cost: $X,XXX
•Real Cost: $Y,YYY
•Current Savings: $Z,ZZZ (XX%)

Additional Savings Potential:

Total Potential Savings: $[Sum]/month (XX% reduction)

10. Detailed Recommendations

Immediate Actions (Quick Wins):

•[Action with high impact, low effort]
•[Action with high impact, low effort]
•[Action with high impact, low effort]

Short-Term Actions (1-2 weeks):

•[Action requiring some planning]
•[Action requiring some planning]

Long-Term Actions (1-3 months):

•[Action requiring significant effort or time]
•[Architectural changes]

Monitoring and Governance:

•[Set up alerts for specific thresholds]
•[Implement tagging policies]
•[Regular review cadence]

11. Comparison to Best Practices

Industry Benchmarks:

•Typical [service] costs for similar workloads: [Range]
•Your position: [Above/Below/Within] range
•Efficiency score: [Assessment]

Optimization Maturity:

•Tagging coverage: [Score]
•RI/SP coverage: [Score]
•Rightsizing implementation: [Score]
•Overall maturity: [Score]

Skill-Specific Best Practices

•Use all available dimensions - Don't stop at basic account/region
•Leverage service-specific knowledge - Different services need different analysis
•Calculate savings potential - Quantify all recommendations
•Prioritize by impact - Focus on highest-value optimizations
•Consider business context - Some "inefficiencies" may be intentional
•Compare cost types - Use on_demand_cost to calculate savings
•Look for untagged resources - Often indicates governance gaps

For general cost analysis best practices, see ${CLAUDE_PLUGIN_ROOT}/references/best-practices.md

Service-Specific Analysis Guides

Compute Services (EC2, ECS, Lambda)

Key Dimensions:

•Instance type, size, family
•Purchase option (On-Demand, RI, Spot)
•Utilization metrics (if available)
•Operating system

Key Questions:

•Are instances rightsized?
•Is RI/SP coverage optimal?
•Are spot instances being used where appropriate?
•Are there idle instances?
•Is auto-scaling configured?

Storage Services (S3, EBS, Glacier)

Key Dimensions:

•Storage class
•Request type (PUT, GET, etc.)
•Data transfer
•Region

Key Questions:

•Are appropriate storage classes being used?
•Are lifecycle policies implemented?
•Are old snapshots being cleaned up?
•Is versioning causing unnecessary costs?
•Are there forgotten buckets/volumes?

Database Services (RDS, DynamoDB, Redshift)

Key Dimensions:

•Engine type
•Instance class
•Multi-AZ vs. Single-AZ
•Backup storage
•Read replicas

Key Questions:

•Are instances rightsized?
•Is RI coverage appropriate?
•Are non-prod databases too large?
•Is backup retention optimized?
•Are read replicas necessary?

Networking (Data Transfer, VPC, NAT Gateway)

Key Dimensions:

•Transfer type (internet, inter-region, intra-region)
•Source and destination
•NAT Gateway data processing

Key Questions:

•Can traffic be routed more efficiently?
•Is CDN/CloudFront being used effectively?
•Are unnecessary cross-region transfers occurring?
•Are NAT Gateways necessary or can VPC endpoints help?

Advanced Techniques

Anomaly Detection Within Service

Compare service costs to its own historical patterns:

•Identify days with unusual spending
•Detect gradual drift over time
•Flag new resource types or usage patterns

Efficiency Scoring

Create composite score based on:

•Tagging coverage (%)
•RI/SP coverage (%)
•Rightsizing adoption (%)
•Storage class optimization (%)

What-If Scenarios

Model potential optimizations:

•"If we rightsize all oversized instances..."
•"If we increase RI coverage to 80%..."
•"If we migrate to newer instance generation..."

Peer Comparison

Compare service usage across:

•Different accounts (why does Account A spend more?)
•Different regions (why is us-east-1 more expensive?)
•Different teams (what do efficient teams do differently?)

Tips for Effective Analysis

•Be service-specific: EC2 analysis differs from S3 analysis
•Quantify everything: Every recommendation should have dollar impact
•Consider dependencies: Some costs enable savings elsewhere
•Think holistically: Optimization in one area may increase costs in another
•Provide implementation guidance: Don't just identify issues, suggest how to fix them
•Follow up: Recommend ongoing monitoring after optimization

Service Cost Deep Dive

Purpose

When to Use

Prerequisites

How This Skill Works

Step 1: Identify the Service

Step 2: Overall Service Cost Analysis

Step 3: Multi-Dimensional Breakdown

Step 4: Tag-Based Analysis

Step 5: Custom Dimension Attribution

Step 6: Untagged Resource Analysis

Step 7: Time-Based Pattern Analysis

Step 8: Service-Specific Optimization Analysis

Step 9: Cost Type Comparison

Output Format

1. Executive Summary

2. Service Cost Overview

3. Geographic Distribution

4. Account Distribution

5. Usage Breakdown

6. Tagging and Attribution

7. Usage Patterns

8. Service-Specific Optimization Opportunities

9. Savings Analysis

10. Detailed Recommendations

11. Comparison to Best Practices

Skill-Specific Best Practices

Service-Specific Analysis Guides

Compute Services (EC2, ECS, Lambda)

Storage Services (S3, EBS, Glacier)

Database Services (RDS, DynamoDB, Redshift)

Networking (Data Transfer, VPC, NAT Gateway)

Advanced Techniques

Anomaly Detection Within Service

Efficiency Scoring

What-If Scenarios

Peer Comparison

Tips for Effective Analysis

See Also