Designing & Optimizing AWS Architectures Rule Book
Overview
Purpose: Standardize how the agent designs and optimizes AWS architectures.
Scope:
- •Greenfield: design new infrastructure.
- •Brownfield: analyze existing architectures and propose improvements.
Reference Frameworks:
- •AWS Well-Architected Framework (WAF)
- •Well-Architected Lenses (Serverless, SaaS, ML, etc.)
Phases
- •Discover: gather requirements / current context.
- •Design: propose new architecture.
- •Review: map an existing system against Well-Architected.
- •Optimize: recommend improvements.
Workflow
Step 1: Context Gathering
- •Start by clarifying whether the goal is to design a new infrastructure or optimize an existing one.
- •If it's new, focus first on the core objective (what the system needs to achieve). Other details like constraints and workloads can be explored gradually as the design unfolds.
- •For existing environments, first locate the infrastructure (accounts, regions, IaC repositories). From there, review the supporting assets such as IaC definitions, diagrams, monitoring data, and cost reports.
Step 2: Requirements Definition
- •Functional (APIs, batch jobs, analytics).
- •Non-functional (availability, performance, security, compliance, cost).
Step 3: Architecture Mapping
- •
Match requirements to AWS services (compute, storage, networking, database).
- •
Consider Serverless-first designs when applicable:
- •
Compute → Lambda, Step Functions, Fargate
- •
API → API Gateway + AppSync
- •
Storage → S3, DynamoDB
- •
Messaging → SNS, SQS, EventBridge
- •
Security → IAM, Cognito, WAF, KMS
- •
Step 4: Well-Architected Review
- •
5 Pillars Checklist
- •
Operational Excellence: monitoring, IaC, automation.
- •
Security: IAM least privilege, encryption, threat detection.
- •
Reliability: HA, backup/restore, fault isolation.
- •
Performance Efficiency: caching, scaling, right-sizing.
- •
Cost Optimization: Spot, RIs, lifecycle rules, serverless.
- •
- •
Serverless Lens Focus:
- •
Minimize undifferentiated ops.
- •
Event-driven orchestration (Step Functions/EventBridge).
- •
Use managed data stores (DynamoDB, Aurora Serverless).
- •
Secure with IAM boundaries, managed identity (Cognito).
- •
Step 5: Proposal / Optimization
- •
Draft architecture diagram.
- •
For existing → generate recommendations table: Pillar, Current Gap, Recommendation, Expected Impact
Step 6: Validation
- •
Risks & mitigations.
- •
Cost estimates (before/after).
- •
Load test strategy
Step 7: Report
- •
Write everything into Markdown architecture file.
- •
Include: Overview, Requirements, Architecture, Diagrams, Well-Architected Review, Optimizations, Risks, Costs.
Security References
1. Identity & Access
- •
Enforce least privilege IAM policies.
- •
Prefer IAM roles over static keys.
- •
Use ABAC or RBAC (tags, groups, accounts) for scalable access control.
- •
Require MFA for privileged accounts.
- •
Use AWS SSO / IAM Identity Center for central identity management.
2. Data Protection
- •
Encrypt all data at rest (S3, EBS, RDS, DynamoDB, etc.) with KMS CMKs.
- •
Encrypt all data in transit (TLS 1.2+).
- •
Enable S3 Block Public Access and least privilege bucket policies.
- •
Use Secrets Manager / Parameter Store — no hardcoded credentials.
3. Network Security
- •
Use VPC with private subnets for workloads.
- •
Restrict inbound/outbound traffic with Security Groups and NACLs.
- •
Use VPC Endpoints for private service access (no public internet).
- •
Add WAF/Shield for public-facing endpoints.
- •
Prefer ALB/NLB with TLS termination over exposing EC2 directly.
4. Monitoring & Logging
- •
Enable CloudTrail in all regions and send logs to a centralized S3 bucket.
- •
Enable Config Rules for compliance enforcement.
- •
Integrate GuardDuty, Security Hub, Inspector for threat detection.
- •
Centralize logs (CloudWatch Logs / OpenSearch) and set retention policies.
- •
Use CloudWatch alarms for anomalies, cost spikes, security events.
5. Resilience & Recovery
- •
Apply multi-AZ deployments for critical data stores.
- •
Enforce automated backups with retention policies.
- •
Test disaster recovery scenarios (RTO/RPO compliance).
- •
Use infrastructure as code (Terraform/CDK/CloudFormation) to rebuild environments securely.
6. Governance & Compliance
- •
Apply service control policies (SCPs) with AWS Organizations.
- •
Enforce tagging standards for resources (cost, owner, env).
- •
Align with compliance frameworks (ISO, SOC2, HIPAA, GDPR) when required.
- •
Use Trusted Advisor and Well-Architected Tool for regular reviews.
Cost References
- •Native Cost Tools First: Use cloud provider billing tools as primary source
- •Credits Excluded: Always exclude credits unless analyzing discount impact
- •Comprehensive Discovery: Identify ALL infrastructure components
- •Current Pricing: Research real-time standard pricing only
- •Python Calculations: Use Python for ALL numeric operations
NOTE: Dont implement anything until you generate the report and ask for my permission