System Architecture
When to Use
Activate this skill when:
- •Designing a new module, service, or major feature that requires structural decisions
- •Choosing between architectural approaches (e.g., where to place logic, how to structure data flow)
- •Planning database schema changes or refactoring existing schema
- •Making frontend state management decisions (server state vs client state, context vs store)
- •Evaluating technology trade-offs for a new capability
- •Creating or reviewing Architecture Decision Records (ADRs)
- •Setting up a new project or major subsystem from scratch
Do NOT use this skill for:
- •Writing implementation code (use
python-backend-expertorreact-frontend-expert) - •API contract design or endpoint specifications (use
api-design-patterns) - •Testing patterns or strategies (use
pytest-patternsorreact-testing-patterns) - •Deployment or infrastructure decisions (use
docker-best-practicesordeployment-pipeline)
Instructions
Project Layer Architecture
The standard Python/React full-stack architecture follows a layered pattern with strict dependency direction.
Backend Layers (FastAPI)
HTTP Request
↓
┌─────────────────────┐
│ Routers (routes/) │ ← HTTP concerns: request parsing, response formatting, status codes
│ │ Uses: Depends() for injection, Pydantic schemas for validation
├─────────────────────┤
│ Services │ ← Business logic: orchestration, validation rules, domain operations
│ (services/) │ No HTTP awareness. Raises domain exceptions, not HTTPException.
├─────────────────────┤
│ Repositories │ ← Data access: queries, CRUD operations, database interactions
│ (repositories/) │ No business logic. Returns model instances or None.
├─────────────────────┤
│ Models (models/) │ ← SQLAlchemy ORM models: table definitions, relationships, indexes
│ Schemas (schemas/) │ ← Pydantic v2 models: request/response contracts, validation
└─────────────────────┘
↓
Database
Dependency direction rules:
- •Routers depend on Services (never on Repositories directly)
- •Services depend on Repositories (never on Routers)
- •Repositories depend on Models (never on Services)
- •Schemas are shared across layers but define no dependencies themselves
- •Never skip layers: no direct database access from routes
Dependency injection pattern:
# Router depends on Service via Depends()
@router.post("/users", response_model=UserResponse)
async def create_user(
data: UserCreate,
service: UserService = Depends(get_user_service),
) -> UserResponse:
return await service.create_user(data)
# Service depends on Repository via constructor injection
class UserService:
def __init__(self, repo: UserRepository) -> None:
self.repo = repo
# Repository depends on AsyncSession via Depends()
class UserRepository:
def __init__(self, session: AsyncSession) -> None:
self.session = session
Frontend Layers (React/TypeScript)
┌─────────────────────┐ │ Pages (pages/) │ ← Route-level components: data fetching, layout composition ├─────────────────────┤ │ Layouts │ ← Page structure: navigation, sidebars, content areas │ (layouts/) │ ├─────────────────────┤ │ Features │ ← Domain-specific: UserProfile, OrderList, ChatPanel │ (features/) │ Composed from shared components + hooks ├─────────────────────┤ │ Shared Components │ ← Reusable UI: Button, Modal, Table, Form, Input │ (components/) │ No business logic. Configurable via props. ├─────────────────────┤ │ Hooks (hooks/) │ ← Custom hooks: useAuth, usePagination, useDebounce │ API (api/) │ ← API client functions, TanStack Query configurations ├─────────────────────┤ │ Types (types/) │ ← Shared TypeScript interfaces and type definitions └─────────────────────┘
Component dependency direction:
- •Pages import Features and Layouts
- •Features import Shared Components and Hooks
- •Shared Components import only other Shared Components and Types
- •Hooks import API functions and Types
- •API functions import Types only
Decision Framework
When facing architectural decisions, follow this structured process:
Step 1: Define the Problem
- •What capability is needed?
- •What are the non-functional requirements? (performance, scalability, maintainability)
- •What constraints exist? (team size, timeline, existing infrastructure)
Step 2: Identify Options
- •List 2-3 viable architectural approaches
- •For each option, document:
- •How it works (brief technical description)
- •Advantages
- •Disadvantages
- •Risks
Step 3: Evaluate Against Criteria
| Criterion | Weight | Description |
|---|---|---|
| Maintainability | High | Can the team understand, modify, and debug this easily? |
| Testability | High | Can each component be tested in isolation? |
| Performance | Medium | Does it meet latency and throughput requirements? |
| Team familiarity | Medium | Does the team have experience with this approach? |
| Operational cost | Low | What are the infrastructure and maintenance costs? |
| Future flexibility | Low | How easily can this evolve as requirements change? |
Step 4: Decide and Document
- •Choose the option that best satisfies the weighted criteria
- •Document the decision in an ADR (see
references/architecture-decision-record-template.md) - •Record what was NOT chosen and why — this context is valuable for future decisions
Step 5: Communicate
- •Share the ADR with the team
- •Identify any migration or rollout steps needed
- •Flag reversibility: is this a one-way door or a two-way door?
Database Schema Design
Design Principles
- •Start normalized (3NF) — Denormalize only for proven performance bottlenecks, not speculation
- •One migration per logical change — Each Alembic migration should represent a single, coherent schema modification
- •Always include downgrade — Every migration must have a working
downgrade()function - •Index strategically:
- •Primary keys (automatic)
- •Foreign keys (always)
- •Columns in WHERE clauses of frequent queries
- •Composite indexes for multi-column lookups
- •Partial indexes for filtered queries (e.g.,
WHERE is_active = true)
SQLAlchemy 2.0 Async Patterns
# Model definition with Mapped types (SQLAlchemy 2.0 style)
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
email: Mapped[str] = mapped_column(String(255), unique=True, index=True)
is_active: Mapped[bool] = mapped_column(default=True)
created_at: Mapped[datetime] = mapped_column(server_default=func.now())
# Relationships: ALWAYS use eager loading with async
posts: Mapped[list["Post"]] = relationship(
back_populates="author",
lazy="selectin", # or "joined" — NEVER "lazy" with async
)
Async session rules:
- •One
AsyncSessionper request — never share across concurrent tasks - •Use
async withcontext manager for automatic cleanup - •Map session boundaries to transaction boundaries
- •Use
selectinorjoinedloading — lazy loading is incompatible with asyncio - •Use
run_sync()only as a last resort for legacy code
Migration Planning
- •Schema change → Generate migration:
alembic revision --autogenerate -m "description" - •Review generated migration — verify column types, indexes, constraints
- •Test upgrade:
alembic upgrade head - •Test downgrade:
alembic downgrade -1 - •Test data preservation: ensure existing data survives the round-trip
Frontend Architecture
State Management Decision Tree
Is the data from the server?
├── YES → Use TanStack Query (useQuery, useMutation)
│ Configure staleTime, gcTime, query keys
│
└── NO → Is it needed across multiple components?
├── YES → Is it complex with actions/reducers?
│ ├── YES → Use Zustand store
│ └── NO → Use React Context
│
└── NO → Use useState / useReducer locally
TanStack Query conventions:
- •Query keys:
[resource, ...identifiers](e.g.,["users", userId],["posts", { page, limit }]) - •Use
queryOptions()factory to centralize key + fn definitions — prevents copy-paste key errors - •Set
staleTimebased on data freshness needs (default 0 is too aggressive for most cases) - •Invalidate with
invalidateQueries()after mutations — never manualrefetch() - •Handle all states:
isPending,isError,data
Component design rules:
- •Props for configuration, hooks for data
- •Lift state only as high as needed — no premature context creation
- •Keep components under 200 lines — extract sub-components or custom hooks when larger
- •Use
childrenand composition over deep prop drilling
Routing Structure
Organize routes to mirror the URL structure:
src/ ├── pages/ │ ├── HomePage.tsx → / │ ├── LoginPage.tsx → /login │ ├── users/ │ │ ├── UserListPage.tsx → /users │ │ └── UserDetailPage.tsx → /users/:id │ └── settings/ │ └── SettingsPage.tsx → /settings
Cross-Cutting Concerns
Authentication Flow
Login Request
↓
Backend: Validate credentials → Generate JWT (access + refresh tokens)
↓
Frontend: Store access token in memory, refresh token in httpOnly cookie
↓
API Calls: Attach access token via Authorization header
↓
Token Expired: Use refresh token to obtain new access token
↓
Refresh Failed: Redirect to login
Architecture decisions for auth:
- •Access tokens: short-lived (15-30 min), stored in memory (not localStorage)
- •Refresh tokens: longer-lived (7-30 days), stored in httpOnly cookie
- •Backend: FastAPI
Depends()chain for token validation → user extraction → permission check - •Frontend: Auth context providing
user,login(),logout(),isAuthenticated
Error Handling Strategy
Errors should be handled at the appropriate layer:
| Layer | Error Type | Action |
|---|---|---|
| Router | HTTPException | Return HTTP error response with status code |
| Service | Domain exceptions | Raise custom exceptions (e.g., UserNotFoundError) |
| Repository | Database exceptions | Catch and re-raise as domain exceptions or let propagate |
| Frontend | API errors | Display user-friendly messages, retry where appropriate |
Backend exception hierarchy:
class AppError(Exception):
"""Base application error."""
class NotFoundError(AppError):
"""Resource not found."""
class ConflictError(AppError):
"""Resource conflict (duplicate, version mismatch)."""
class ValidationError(AppError):
"""Business rule violation."""
Router-level exception handler maps domain exceptions to HTTP responses:
@app.exception_handler(NotFoundError)
async def not_found_handler(request: Request, exc: NotFoundError):
return JSONResponse(status_code=404, content={"detail": str(exc)})
Logging Architecture
Backend (structlog):
- •Structured JSON logs in production
- •Human-readable console in development
- •Bind request context (request_id, user_id) at middleware level
- •Log at service layer (business events), not repository layer (too noisy)
- •Use log levels: DEBUG (development only), INFO (business events), WARNING (recoverable issues), ERROR (failures requiring attention)
Frontend:
- •
console.*in development - •Structured error reporting to backend or Sentry in production
- •Log user actions for debugging, not for analytics
Configuration Management
Backend (pydantic-settings):
class Settings(BaseSettings):
model_config = SettingsConfigDict(env_file=".env")
database_url: str
redis_url: str = "redis://localhost:6379"
jwt_secret: str
debug: bool = False
Frontend (environment variables):
- •
VITE_API_URLfor API base URL - •Build-time injection via Vite's
import.meta.env - •No secrets in frontend environment variables
Examples
Architecture Decision: Real-Time Notifications
Problem: The application needs real-time notifications for users (new messages, status updates).
Options evaluated:
| Option | Pros | Cons |
|---|---|---|
| WebSocket | True bidirectional, low latency | Complex connection management, harder to scale |
| Server-Sent Events (SSE) | Simple, HTTP-based, auto-reconnect | Unidirectional (server→client only), limited browser connections |
| Polling | Simplest implementation, works everywhere | Higher latency, unnecessary server load |
Decision: WebSocket for this use case.
Rationale: Notifications require low latency and the system will eventually need bidirectional communication (typing indicators, presence). SSE would work for notifications alone but would require a separate solution for future bidirectional needs. Polling introduces unacceptable latency for real-time UX.
Architecture:
- •Backend: FastAPI WebSocket endpoint with
ConnectionManagerclass - •Frontend: Custom
useWebSockethook with automatic reconnection - •Scaling: Redis pub/sub for multi-instance message distribution
- •Persistence: Store notifications in database for offline users
- •Fallback: REST endpoint for notification history and initial load
See references/architecture-decision-record-template.md for the full ADR format.
Edge Cases
Monolith vs Microservices
Default to modular monolith for teams smaller than 10 developers. A modular monolith provides:
- •Clear module boundaries without network overhead
- •Shared database with module-specific schemas
- •Easy refactoring and code navigation
- •Simple deployment and debugging
Consider microservices only when:
- •Independent scaling is required for specific components
- •Different modules need different technology stacks
- •Team size exceeds 10 and ownership boundaries are clear
- •Deployment independence is a business requirement
Migration path: Design module boundaries in the monolith as if they were services (no direct cross-module database access, communicate via service interfaces). This makes extraction to microservices straightforward when needed.
When to Break the Layer Pattern
The strict Router → Service → Repository pattern should be followed for standard CRUD operations. Acceptable exceptions:
- •Background tasks: May call services directly without going through a router
- •Event handlers: Domain event listeners may call services from any context
- •CLI commands: Management scripts may access services or repositories directly
- •Migrations: Data migrations may access models directly (no service/repo layer needed)
- •Health checks: May access the database directly for simple connectivity verification
In all cases, business logic should still live in the service layer — these exceptions are about the entry point, not about bypassing business rules.
Evolving Architecture
When the architecture needs to change:
- •Write an ADR documenting the motivation and the proposed change
- •Identify all affected modules and their dependencies
- •Plan an incremental migration — never big-bang rewrites
- •Maintain backward compatibility during transition (strangler fig pattern)
- •Set a deadline for completing the migration and removing legacy code