AgentSkillsCN

aws-architecture

AWS 云架构模式与最佳实践。在设计、部署或评审 AWS 基础设施时,本指南将为您提供全面支持,涵盖 EC2、ECS、EKS、Lambda、RDS、S3、IAM 以及 VPC 等服务。

SKILL.md
--- frontmatter
name: aws-architecture
description: "AWS cloud architecture patterns and best practices. Use when designing, deploying, or reviewing AWS infrastructure including EC2, ECS, EKS, Lambda, RDS, S3, IAM, and VPC."

AWS Architecture

Comprehensive guide for building secure, scalable, and cost-effective infrastructure on Amazon Web Services.

When to Use

  • Designing AWS architecture for new projects
  • Deploying applications to AWS services
  • Setting up networking (VPC, subnets, security groups)
  • Configuring IAM policies and roles
  • Optimizing costs and performance
  • Security hardening AWS resources

Core Services Overview

Compute

ServiceUse CaseKey Features
EC2Virtual serversFull control, any workload
ECSContainer orchestrationDocker on AWS
EKSManaged KubernetesK8s without cluster management
LambdaServerless functionsEvent-driven, pay-per-use
FargateServerless containersNo server management

Storage

ServiceUse CaseKey Features
S3Object storageUnlimited, 11 9s durability
EBSBlock storage (EC2)Persistent volumes
EFSFile storage (NFS)Shared across instances
FSxManaged file systemsWindows/Lustre

Database

ServiceUse CaseKey Features
RDSManaged relationalMySQL, PostgreSQL, etc.
AuroraHigh-performance SQL5x MySQL, 3x PostgreSQL
DynamoDBNoSQL key-valueServerless, any scale
ElastiCacheIn-memory cacheRedis, Memcached
DocumentDBMongoDB compatibleManaged document DB

Networking

ServiceUse CaseKey Features
VPCVirtual networkIsolated cloud network
ALB/NLBLoad balancingLayer 7/Layer 4
CloudFrontCDNGlobal edge caching
Route 53DNSDomain management
API GatewayAPI managementREST/WebSocket APIs

VPC Architecture

Standard 3-Tier VPC

code
┌─────────────────────────────────────────────────────────────────┐
│ VPC: 10.0.0.0/16                                                │
│                                                                 │
│  ┌─────────────────────┐    ┌─────────────────────┐            │
│  │   Public Subnet     │    │   Public Subnet     │            │
│  │   10.0.1.0/24       │    │   10.0.2.0/24       │            │
│  │   (us-east-1a)      │    │   (us-east-1b)      │            │
│  │   ┌─────────────┐   │    │   ┌─────────────┐   │            │
│  │   │ NAT Gateway │   │    │   │ NAT Gateway │   │            │
│  │   └─────────────┘   │    │   └─────────────┘   │            │
│  │   ┌─────────────┐   │    │   ┌─────────────┐   │            │
│  │   │     ALB     │   │    │   │     ALB     │   │            │
│  │   └─────────────┘   │    │   └─────────────┘   │            │
│  └─────────────────────┘    └─────────────────────┘            │
│                                                                 │
│  ┌─────────────────────┐    ┌─────────────────────┐            │
│  │   Private Subnet    │    │   Private Subnet    │            │
│  │   10.0.10.0/24      │    │   10.0.20.0/24      │            │
│  │   (us-east-1a)      │    │   (us-east-1b)      │            │
│  │   ┌─────────────┐   │    │   ┌─────────────┐   │            │
│  │   │ App Servers │   │    │   │ App Servers │   │            │
│  │   └─────────────┘   │    │   └─────────────┘   │            │
│  └─────────────────────┘    └─────────────────────┘            │
│                                                                 │
│  ┌─────────────────────┐    ┌─────────────────────┐            │
│  │   Database Subnet   │    │   Database Subnet   │            │
│  │   10.0.100.0/24     │    │   10.0.200.0/24     │            │
│  │   (us-east-1a)      │    │   (us-east-1b)      │            │
│  │   ┌─────────────┐   │    │   ┌─────────────┐   │            │
│  │   │     RDS     │───┼────┼───│ RDS Replica │   │            │
│  │   └─────────────┘   │    │   └─────────────┘   │            │
│  └─────────────────────┘    └─────────────────────┘            │
└─────────────────────────────────────────────────────────────────┘

Terraform VPC Example

hcl
# Using AWS VPC Module
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.0.0"

  name = "${var.project}-${var.environment}"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  private_subnets = ["10.0.10.0/24", "10.0.20.0/24", "10.0.30.0/24"]
  database_subnets = ["10.0.100.0/24", "10.0.200.0/24", "10.0.300.0/24"]

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "prod"  # Cost saving for non-prod
  enable_vpn_gateway     = false
  enable_dns_hostnames   = true
  enable_dns_support     = true

  # Database subnet group
  create_database_subnet_group = true

  # VPC Flow Logs
  enable_flow_log                      = true
  create_flow_log_cloudwatch_iam_role  = true
  create_flow_log_cloudwatch_log_group = true

  tags = local.common_tags

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}

Security Groups

hcl
# ALB Security Group
resource "aws_security_group" "alb" {
  name_prefix = "${var.project}-alb-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS from internet"
  }

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTP (redirect to HTTPS)"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, { Name = "${var.project}-alb-sg" })
}

# Application Security Group
resource "aws_security_group" "app" {
  name_prefix = "${var.project}-app-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
    description     = "App port from ALB"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, { Name = "${var.project}-app-sg" })
}

# Database Security Group
resource "aws_security_group" "db" {
  name_prefix = "${var.project}-db-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
    description     = "PostgreSQL from app"
  }

  tags = merge(local.common_tags, { Name = "${var.project}-db-sg" })
}

IAM Best Practices

Least Privilege Policies

hcl
# Application Role
resource "aws_iam_role" "app" {
  name = "${var.project}-app-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

# S3 Access (specific bucket, specific actions)
resource "aws_iam_policy" "s3_access" {
  name = "${var.project}-s3-access"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "ListBucket"
        Effect = "Allow"
        Action = ["s3:ListBucket"]
        Resource = aws_s3_bucket.data.arn
      },
      {
        Sid    = "ReadWriteObjects"
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject"
        ]
        Resource = "${aws_s3_bucket.data.arn}/*"
      }
    ]
  })
}

# Secrets Manager Access
resource "aws_iam_policy" "secrets_access" {
  name = "${var.project}-secrets-access"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "secretsmanager:GetSecretValue"
      ]
      Resource = [
        "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:secret:${var.project}/*"
      ]
    }]
  })
}

resource "aws_iam_role_policy_attachment" "app_s3" {
  role       = aws_iam_role.app.name
  policy_arn = aws_iam_policy.s3_access.arn
}

resource "aws_iam_role_policy_attachment" "app_secrets" {
  role       = aws_iam_role.app.name
  policy_arn = aws_iam_policy.secrets_access.arn
}

Cross-Account Access

hcl
# Role that can be assumed from another account
resource "aws_iam_role" "cross_account" {
  name = "CrossAccountDeployRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        AWS = "arn:aws:iam::${var.trusted_account_id}:root"
      }
      Action = "sts:AssumeRole"
      Condition = {
        StringEquals = {
          "sts:ExternalId" = var.external_id
        }
      }
    }]
  })
}

ECS/Fargate

ECS Service with Fargate

hcl
# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "${var.project}-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

# Task Definition
resource "aws_ecs_task_definition" "app" {
  family                   = "${var.project}-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 256
  memory                   = 512
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn            = aws_iam_role.app.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "${var.ecr_repository_url}:${var.image_tag}"

    portMappings = [{
      containerPort = 8080
      hostPort      = 8080
      protocol      = "tcp"
    }]

    environment = [
      { name = "NODE_ENV", value = "production" }
    ]

    secrets = [
      {
        name      = "DATABASE_URL"
        valueFrom = aws_secretsmanager_secret.db_url.arn
      }
    ]

    logConfiguration = {
      logDriver = "awslogs"
      options = {
        awslogs-group         = aws_cloudwatch_log_group.app.name
        awslogs-region        = var.region
        awslogs-stream-prefix = "app"
      }
    }

    healthCheck = {
      command     = ["CMD-SHELL", "wget -q --spider http://localhost:8080/health || exit 1"]
      interval    = 30
      timeout     = 5
      retries     = 3
      startPeriod = 60
    }
  }])
}

# ECS Service
resource "aws_ecs_service" "app" {
  name            = "${var.project}-app"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.app_count
  launch_type     = "FARGATE"

  deployment_configuration {
    minimum_healthy_percent = 50
    maximum_percent         = 200
  }

  network_configuration {
    subnets          = module.vpc.private_subnets
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }

  lifecycle {
    ignore_changes = [desired_count]  # Allow autoscaling
  }
}

# Auto Scaling
resource "aws_appautoscaling_target" "app" {
  max_capacity       = 10
  min_capacity       = var.environment == "prod" ? 2 : 1
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "cpu" {
  name               = "${var.project}-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.app.resource_id
  scalable_dimension = aws_appautoscaling_target.app.scalable_dimension
  service_namespace  = aws_appautoscaling_target.app.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

Lambda

Lambda Function with API Gateway

hcl
# Lambda Function
resource "aws_lambda_function" "api" {
  function_name = "${var.project}-api"
  role          = aws_iam_role.lambda.arn
  handler       = "index.handler"
  runtime       = "nodejs20.x"
  timeout       = 30
  memory_size   = 256

  filename         = data.archive_file.lambda.output_path
  source_code_hash = data.archive_file.lambda.output_base64sha256

  environment {
    variables = {
      NODE_ENV     = "production"
      TABLE_NAME   = aws_dynamodb_table.main.name
    }
  }

  vpc_config {
    subnet_ids         = module.vpc.private_subnets
    security_group_ids = [aws_security_group.lambda.id]
  }

  tracing_config {
    mode = "Active"  # X-Ray tracing
  }
}

# API Gateway
resource "aws_apigatewayv2_api" "main" {
  name          = "${var.project}-api"
  protocol_type = "HTTP"

  cors_configuration {
    allow_origins = ["https://${var.domain}"]
    allow_methods = ["GET", "POST", "PUT", "DELETE"]
    allow_headers = ["Content-Type", "Authorization"]
    max_age       = 3600
  }
}

resource "aws_apigatewayv2_integration" "lambda" {
  api_id             = aws_apigatewayv2_api.main.id
  integration_type   = "AWS_PROXY"
  integration_uri    = aws_lambda_function.api.invoke_arn
  integration_method = "POST"
}

resource "aws_apigatewayv2_route" "default" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "$default"
  target    = "integrations/${aws_apigatewayv2_integration.lambda.id}"
}

resource "aws_apigatewayv2_stage" "prod" {
  api_id      = aws_apigatewayv2_api.main.id
  name        = "prod"
  auto_deploy = true

  default_route_settings {
    throttling_burst_limit = 100
    throttling_rate_limit  = 50
  }
}

resource "aws_lambda_permission" "apigw" {
  statement_id  = "AllowAPIGateway"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.api.function_name
  principal     = "apigateway.amazonaws.com"
  source_arn    = "${aws_apigatewayv2_api.main.execution_arn}/*/*"
}

RDS

Aurora PostgreSQL

hcl
resource "aws_rds_cluster" "main" {
  cluster_identifier = "${var.project}-aurora"
  engine             = "aurora-postgresql"
  engine_version     = "15.4"
  database_name      = var.db_name
  master_username    = var.db_username
  master_password    = random_password.db.result

  db_subnet_group_name   = module.vpc.database_subnet_group_name
  vpc_security_group_ids = [aws_security_group.db.id]

  storage_encrypted = true
  kms_key_id        = aws_kms_key.db.arn

  backup_retention_period = var.environment == "prod" ? 30 : 7
  preferred_backup_window = "03:00-04:00"
  skip_final_snapshot     = var.environment != "prod"

  enabled_cloudwatch_logs_exports = ["postgresql"]

  serverlessv2_scaling_configuration {
    min_capacity = var.environment == "prod" ? 1 : 0.5
    max_capacity = var.environment == "prod" ? 16 : 4
  }

  tags = local.common_tags
}

resource "aws_rds_cluster_instance" "main" {
  count = var.environment == "prod" ? 2 : 1

  identifier         = "${var.project}-aurora-${count.index}"
  cluster_identifier = aws_rds_cluster.main.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.main.engine
  engine_version     = aws_rds_cluster.main.engine_version

  performance_insights_enabled = true

  tags = local.common_tags
}

S3

Secure S3 Bucket

hcl
resource "aws_s3_bucket" "data" {
  bucket = "${var.project}-data-${data.aws_caller_identity.current.account_id}"

  tags = local.common_tags
}

resource "aws_s3_bucket_versioning" "data" {
  bucket = aws_s3_bucket.data.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.s3.arn
    }
    bucket_key_enabled = true
  }
}

resource "aws_s3_bucket_public_access_block" "data" {
  bucket = aws_s3_bucket.data.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_lifecycle_configuration" "data" {
  bucket = aws_s3_bucket.data.id

  rule {
    id     = "archive"
    status = "Enabled"

    transition {
      days          = 90
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 180
      storage_class = "GLACIER"
    }

    expiration {
      days = 365
    }

    noncurrent_version_expiration {
      noncurrent_days = 30
    }
  }
}

Cost Optimization

Strategies

StrategySavingsEffort
Reserved Instances (1yr)30-40%Low
Savings Plans20-30%Low
Spot Instances60-90%Medium
Right-sizing20-50%Medium
S3 LifecycleVariableLow

Cost Tags

hcl
locals {
  common_tags = {
    Project     = var.project
    Environment = var.environment
    ManagedBy   = "terraform"
    Owner       = var.owner
    CostCenter  = var.cost_center
  }
}

Budget Alerts

hcl
resource "aws_budgets_budget" "monthly" {
  name         = "${var.project}-monthly-budget"
  budget_type  = "COST"
  limit_amount = var.monthly_budget
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type             = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = [var.budget_alert_email]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 100
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_email_addresses = [var.budget_alert_email]
  }
}

CLI Reference

bash
# EC2
aws ec2 describe-instances --filters "Name=tag:Environment,Values=prod"
aws ec2 start-instances --instance-ids i-1234567890abcdef0
aws ec2 stop-instances --instance-ids i-1234567890abcdef0

# ECS
aws ecs list-clusters
aws ecs list-services --cluster my-cluster
aws ecs update-service --cluster my-cluster --service my-service --force-new-deployment

# Lambda
aws lambda invoke --function-name my-function output.json
aws lambda update-function-code --function-name my-function --s3-bucket bucket --s3-key code.zip

# S3
aws s3 ls s3://bucket-name/
aws s3 cp file.txt s3://bucket-name/
aws s3 sync ./folder s3://bucket-name/folder

# RDS
aws rds describe-db-instances
aws rds create-db-snapshot --db-instance-identifier mydb --db-snapshot-identifier mydb-snapshot

# Secrets Manager
aws secretsmanager get-secret-value --secret-id my-secret
aws secretsmanager create-secret --name my-secret --secret-string '{"key":"value"}'

# CloudWatch
aws logs tail /aws/lambda/my-function --follow
aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization ...

Security Checklist

  • VPC with private subnets for sensitive workloads
  • Security groups with least privilege
  • IAM roles (not users) for applications
  • No hardcoded credentials
  • S3 buckets encrypted and private
  • RDS encrypted at rest and in transit
  • CloudTrail enabled for audit
  • GuardDuty enabled for threat detection
  • Config rules for compliance

Integration

Works with:

  • /terraform - AWS provider configuration
  • /devops - AWS deployment pipelines
  • /security - AWS security review
  • cost-optimization skill - AWS cost analysis