Architecture
A comprehensive architecture skill that helps design, evaluate, and document software architectures for robust, scalable, and maintainable systems.
Quick Start
Basic architecture workflow:
# Understand requirements (functional + non-functional) # Identify constraints and trade-offs # Design system components and relationships # Document architecture decisions # Validate against requirements
Core Capabilities
1. System Architecture Design
Design complete system architectures:
- •Monolithic: Single deployable unit
- •Microservices: Distributed services architecture
- •Serverless: Event-driven, function-based
- •Event-Driven: Asynchronous message-based
- •Layered: Separation of concerns in layers
- •Hexagonal: Ports and adapters pattern
- •CQRS: Command Query Responsibility Segregation
- •Event Sourcing: State as sequence of events
2. Design Patterns
Apply proven design patterns:
Creational Patterns:
- •Singleton, Factory, Builder, Prototype, Abstract Factory
Structural Patterns:
- •Adapter, Bridge, Composite, Decorator, Facade, Proxy
Behavioral Patterns:
- •Observer, Strategy, Command, State, Template Method, Chain of Responsibility
3. Architecture Quality Attributes
Evaluate and optimize for:
- •Performance: Response time, throughput, resource usage
- •Scalability: Horizontal and vertical scaling
- •Availability: Uptime, fault tolerance, disaster recovery
- •Security: Authentication, authorization, encryption, data protection
- •Maintainability: Code quality, modularity, testability
- •Reliability: Error handling, resilience, redundancy
- •Usability: User experience, API design
- •Observability: Logging, monitoring, tracing
4. Technology Selection
Evaluate and recommend technologies:
- •Databases: SQL vs NoSQL, selection criteria
- •Message Queues: Kafka, RabbitMQ, SQS
- •Caching: Redis, Memcached, CDN
- •API Protocols: REST, GraphQL, gRPC
- •Cloud Platforms: AWS, Azure, GCP
- •Containerization: Docker, Kubernetes
5. Architecture Documentation
Document architecture effectively:
- •C4 Model: Context, Container, Component, Code diagrams
- •Architecture Decision Records (ADRs): Document key decisions
- •Data Flow Diagrams: How data moves through system
- •Sequence Diagrams: Component interactions
- •Deployment Diagrams: Infrastructure and deployment
Architecture Patterns
Microservices Architecture
┌─────────────────────────────────────────────────┐
│ API Gateway / BFF │
└────────┬──────────┬──────────┬──────────────────┘
│ │ │
┌────▼───┐ ┌───▼────┐ ┌──▼──────┐
│ User │ │ Order │ │ Payment │
│Service │ │Service │ │ Service │
└────┬───┘ └───┬────┘ └──┬──────┘
│ │ │
┌────▼───┐ ┌──▼─────┐ ┌──▼──────┐
│ User │ │ Order │ │ Payment │
│ DB │ │ DB │ │ DB │
└────────┘ └────────┘ └─────────┘
│ │ │
└────┴─────────┴──────────┴──────┘
Message Bus
Characteristics:
- •Independent deployment and scaling
- •Polyglot persistence
- •Decentralized data management
- •Resilience through isolation
- •Technology diversity
Trade-offs:
- •✅ Independent scaling
- •✅ Technology flexibility
- •✅ Fault isolation
- •❌ Distributed system complexity
- •❌ Data consistency challenges
- •❌ Operational overhead
Event-Driven Architecture
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Producer │─────▶│ Event Bus │─────▶│Consumer 1│
└──────────┘ │ (Kafka/SNS) │ └──────────┘
└───────┬──────┘
│
┌────▼──────┐
│Consumer 2 │
└───────────┘
Characteristics:
- •Asynchronous communication
- •Loose coupling between components
- •Scalable event processing
- •Event replay capability
Use Cases:
- •Real-time data processing
- •Microservices integration
- •IoT systems
- •Activity tracking
Layered Architecture
┌─────────────────────────────────┐ │ Presentation Layer │ ← Controllers, Views ├─────────────────────────────────┤ │ Business Logic Layer │ ← Services, Domain ├─────────────────────────────────┤ │ Data Access Layer │ ← Repositories, DAOs ├─────────────────────────────────┤ │ Database Layer │ ← Database └─────────────────────────────────┘
Characteristics:
- •Clear separation of concerns
- •Each layer has specific responsibility
- •Dependencies flow downward
- •Easy to understand and maintain
Hexagonal Architecture (Ports & Adapters)
┌─────────────────────────┐
│ External Systems │
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Adapters │ ← HTTP, CLI, Message Queue
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Ports │ ← Interfaces
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Domain Logic │ ← Core Business Logic
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Ports │ ← Interfaces
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Adapters │ ← Database, APIs, File System
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ External Systems │
└─────────────────────────┘
Characteristics:
- •Domain logic independent of external concerns
- •Testable in isolation
- •Flexible adapter implementation
- •Clear boundaries
Scalability Patterns
Horizontal Scaling
┌─────────────┐
│Load Balancer│
└──────┬──────┘
┌────────┼────────┐
┌────▼───┐ ┌─▼────┐ ┌─▼─────┐
│Server 1│ │Server│ │Server │
│ │ │ 2 │ │ 3 │
└────┬───┘ └─┬────┘ └─┬─────┘
└───────┴────────┘
│
┌──────▼──────┐
│ Database │
└─────────────┘
Techniques:
- •Load balancing
- •Stateless services
- •Shared data layer
- •Session management
Caching Strategy
Client ──▶ CDN ──▶ API Server ──▶ Redis ──▶ Database
(Static) (Cache) (Cache) (Source)
Cache Levels:
- •CDN: Static assets
- •Application Cache: Query results, computed data
- •Database Cache: Query cache
Cache Patterns:
- •Cache-Aside: Application manages cache
- •Read-Through: Cache loads data automatically
- •Write-Through: Write to cache and DB
- •Write-Behind: Async writes to DB
Database Scaling
Vertical Scaling:
- •Increase server resources
- •Limited by hardware
Horizontal Scaling:
- •Replication: Master-Slave, Multi-Master
- •Sharding: Partition data across servers
- •CQRS: Separate read and write databases
Architecture Decision Framework
Decision Template
# Decision: [Title] ## Context - What problem are we solving? - What are the constraints? - What are the requirements? ## Options Considered ### Option 1: [Name] **Pros:** - Pro 1 - Pro 2 **Cons:** - Con 1 - Con 2 **Estimated Effort:** [Low/Medium/High] **Risk Level:** [Low/Medium/High] ### Option 2: [Name] [Same structure] ## Decision We chose [Option X] because [reasoning]. ## Consequences - Positive: [benefits] - Negative: [trade-offs] - Risks: [what could go wrong] - Mitigation: [how to address risks] ## Validation How will we validate this decision?
Common Architecture Patterns
API Gateway Pattern
"""
API Gateway centralizes external requests and routes to services.
"""
class APIGateway:
def __init__(self):
self.user_service = UserService()
self.order_service = OrderService()
self.auth_service = AuthService()
async def handle_request(self, request: Request) -> Response:
# Authentication
if not await self.auth_service.authenticate(request):
return Response(status=401)
# Rate limiting
if not await self.rate_limiter.check(request.user_id):
return Response(status=429)
# Route to appropriate service
if request.path.startswith('/users'):
return await self.user_service.handle(request)
elif request.path.startswith('/orders'):
return await self.order_service.handle(request)
return Response(status=404)
Circuit Breaker Pattern
"""
Circuit breaker prevents cascading failures.
"""
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN
async def call(self, func, *args, **kwargs):
if self.state == 'OPEN':
if self._should_attempt_reset():
self.state = 'HALF_OPEN'
else:
raise CircuitBreakerOpen('Service unavailable')
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
self.failures = 0
self.state = 'CLOSED'
def _on_failure(self):
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'OPEN'
def _should_attempt_reset(self):
return (time.time() - self.last_failure_time) >= self.timeout
Repository Pattern
"""
Repository pattern abstracts data access.
"""
from abc import ABC, abstractmethod
from typing import List, Optional
class UserRepository(ABC):
@abstractmethod
async def find_by_id(self, user_id: int) -> Optional[User]:
pass
@abstractmethod
async def find_by_email(self, email: str) -> Optional[User]:
pass
@abstractmethod
async def save(self, user: User) -> User:
pass
@abstractmethod
async def delete(self, user_id: int) -> bool:
pass
class SQLUserRepository(UserRepository):
def __init__(self, db_session):
self.db = db_session
async def find_by_id(self, user_id: int) -> Optional[User]:
result = await self.db.execute(
"SELECT * FROM users WHERE id = ?", (user_id,)
)
row = result.fetchone()
return User.from_row(row) if row else None
async def save(self, user: User) -> User:
if user.id:
await self.db.execute(
"UPDATE users SET name = ?, email = ? WHERE id = ?",
(user.name, user.email, user.id)
)
else:
result = await self.db.execute(
"INSERT INTO users (name, email) VALUES (?, ?)",
(user.name, user.email)
)
user.id = result.lastrowid
return user
System Design Process
1. Requirements Gathering
Functional Requirements:
- •What should the system do?
- •What features are needed?
- •What are the use cases?
Non-Functional Requirements:
- •Performance: Latency, throughput targets
- •Scale: Expected users, data volume
- •Availability: Uptime requirements
- •Security: Compliance, data protection
- •Cost: Budget constraints
2. Capacity Planning
Users: 10 million Daily Active Users: 1 million Requests per second: 1M users × 10 requests/day / 86400 seconds ≈ 116 RPS Peak traffic (3x average): 350 RPS Data: - Per user: 1 KB metadata + 100 KB content - Total: 10M × 101 KB ≈ 1 TB Bandwidth: - Request size: 1 KB - Response size: 10 KB - Bandwidth: 350 RPS × 11 KB ≈ 3.85 MB/s ≈ 30 Mbps
3. High-Level Design
┌─────────┐
│ Client │
└────┬────┘
│
┌────▼──────────┐
│ CDN │ (Static content)
└────┬──────────┘
│
┌────▼──────────┐
│ Load Balancer │
└────┬──────────┘
│
┌────▼──────────┐
│ Web Servers │ (3+ instances)
└────┬──────────┘
│
┌────▼──────────┐
│ App Servers │ (5+ instances)
└────┬──────────┘
│
┌────▼──────────┬────────────┐
│ │ │
│ Cache │ Database │ Message
│ (Redis) │ (Master/ │ Queue
│ │ Slaves) │ (Kafka)
└──────────────┴────────────┴─────────┘
4. Detailed Component Design
Design each component with:
- •Inputs and outputs
- •Data models
- •API contracts
- •Error handling
- •Monitoring
5. Identify Bottlenecks
- •Single points of failure
- •Performance bottlenecks
- •Scaling limitations
- •Data consistency issues
6. Optimization
- •Caching strategy
- •Database indexing
- •Load balancing
- •CDN for static content
- •Async processing
- •Connection pooling
Best Practices
- •Start Simple: Begin with simplest architecture that works
- •Design for Failure: Assume components will fail
- •Loose Coupling: Minimize dependencies between components
- •High Cohesion: Group related functionality
- •Separation of Concerns: Each component has single responsibility
- •Document Decisions: Use ADRs for important choices
- •Consider Trade-offs: Every decision has pros and cons
- •Plan for Scale: Design with growth in mind
- •Security First: Build security in from the start
- •Measure Everything: Observability is crucial
When to Use This Skill
Use this skill when:
- •Designing new systems
- •Evaluating architecture options
- •Planning system migrations
- •Addressing scalability issues
- •Making technology decisions
- •Documenting architecture
- •Conducting architecture reviews
- •Planning for growth
- •Solving system design problems
- •Training team on architecture patterns
Examples
See EXAMPLES.md for complete architecture examples including:
- •E-commerce system design
- •Social media platform
- •Video streaming service
- •Real-time analytics system
- •Multi-tenant SaaS application
For architecture templates, see templates/.
For architecture decision records, see adr/.