AgentSkillsCN

cloud-infrastructure

AWS/GCP 云基础设施:良好架构、安全性、成本效益、可观测性。在处理 Terraform 输出、IAM 策略、VPC 设计、负载均衡器,或进行云架构决策时使用。

SKILL.md
--- frontmatter
name: cloud-infrastructure
description: "AWS/GCP cloud infrastructure: Well-Architected, security, cost, observability. Use when working with Terraform outputs, IAM policies, VPC design, load balancers, or cloud architecture decisions."
allowed-tools: [mcp__acp__Read, mcp__acp__Edit, mcp__acp__Write, mcp__acp__Bash]

ABOUTME: AWS/GCP cloud infrastructure patterns and best practices

ABOUTME: Well-Architected, security, cost optimization, observability

Cloud Infrastructure (AWS & GCP)

Quick Reference

bash
# AWS
aws sts get-caller-identity
aws --profile staging ecs list-services

# GCP
gcloud auth list
gcloud config set project PROJECT_ID

# Security scanning
trivy config . && checkov -d .

See: terraform/SKILL.md | _PATTERNS.md


AWS Well-Architected (6 Pillars)

PillarKey Practices
Operational ExcellenceIaC, runbooks, observability, chaos engineering
SecurityLeast privilege IAM, GuardDuty/Security Hub, KMS encryption, SCPs
ReliabilityMulti-AZ, auto-scaling, RTO/RPO backups
PerformanceRight-size, caching, serverless, read replicas
CostReserved/Savings Plans, Spot, tagging
SustainabilityOptimize utilization, Graviton

AWS ECS vs EKS

FactorECSEKS
ComplexityLowerHigher (K8s)
Multi-cloudNoYes
CostFree control plane$0.10/hr/cluster

ECS Task Definition

hcl
resource "aws_ecs_task_definition" "app" {
  family                   = "app"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 256
  memory                   = 512
  container_definitions = jsonencode([{
    name   = "app"
    image  = "${var.ecr_repo}:${var.image_tag}"
    healthCheck = { command = ["CMD-SHELL", "curl -f http://localhost/health || exit 1"] }
  }])
}

GCP Cloud Run

yaml
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "100"
    spec:
      containerConcurrency: 80
      containers:
        - image: gcr.io/project/image
          resources: { limits: { cpu: "1", memory: "512Mi" } }

Cost Optimization

ToolUse
Cost Explorer / Compute OptimizerAWS analysis
InfracostIaC cost in PRs
GCP FinOps HubGemini recommendations

Compute: Reserved (72% savings), Spot (90%), Graviton, auto-scaling Storage: S3 Intelligent Tiering, lifecycle policies Networking: VPC endpoints (avoid NAT costs)


Observability (OpenTelemetry)

Why OTEL: Vendor-agnostic, unified traces/metrics/logs

yaml
receivers:
  otlp: { protocols: { grpc: { endpoint: 0.0.0.0:4317 } } }
processors:
  batch: { timeout: 1s }
exporters:
  awsxray: { region: us-east-1 }
service:
  pipelines:
    traces: { receivers: [otlp], processors: [batch], exporters: [awsxray] }

Networking

AWS VPC Design

code
VPC (10.0.0.0/16)
├── Public → ALB, NAT
├── Private → ECS/EKS, Lambda
└── Isolated → RDS (no internet)

VPC Endpoints

hcl
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${var.region}.s3"
}

Code Review Checklist

CategoryChecks
SecurityNo hardcoded secrets, least privilege IAM, KMS encryption, logging enabled
CostTagged resources, right-sized, auto-scaling
ReliabilityMulti-AZ, health checks, backups

Resources

AWS: Well-Architected | Security Best Practices

GCP: Architecture Framework

Tools: Checkov | Infracost | OpenTelemetry