System Architecture

When to Use

Activate this skill when:

•Designing a new module, service, or major feature that requires structural decisions
•Choosing between architectural approaches (e.g., where to place logic, how to structure data flow)
•Planning database schema changes or refactoring existing schema
•Making frontend state management decisions (server state vs client state, context vs store)
•Evaluating technology trade-offs for a new capability
•Creating or reviewing Architecture Decision Records (ADRs)
•Setting up a new project or major subsystem from scratch

Do NOT use this skill for:

•Writing implementation code (use python-backend-expert or react-frontend-expert)
•API contract design or endpoint specifications (use api-design-patterns)
•Testing patterns or strategies (use pytest-patterns or react-testing-patterns)
•Deployment or infrastructure decisions (use docker-best-practices or deployment-pipeline)

Instructions

Project Layer Architecture

The standard Python/React full-stack architecture follows a layered pattern with strict dependency direction.

Backend Layers (FastAPI)

code

HTTP Request
    ↓
┌─────────────────────┐
│   Routers (routes/)  │  ← HTTP concerns: request parsing, response formatting, status codes
│                      │     Uses: Depends() for injection, Pydantic schemas for validation
├─────────────────────┤
│   Services           │  ← Business logic: orchestration, validation rules, domain operations
│   (services/)        │     No HTTP awareness. Raises domain exceptions, not HTTPException.
├─────────────────────┤
│   Repositories       │  ← Data access: queries, CRUD operations, database interactions
│   (repositories/)    │     No business logic. Returns model instances or None.
├─────────────────────┤
│   Models (models/)   │  ← SQLAlchemy ORM models: table definitions, relationships, indexes
│   Schemas (schemas/) │  ← Pydantic v2 models: request/response contracts, validation
└─────────────────────┘
    ↓
Database

Dependency direction rules:

•Routers depend on Services (never on Repositories directly)
•Services depend on Repositories (never on Routers)
•Repositories depend on Models (never on Services)
•Schemas are shared across layers but define no dependencies themselves
•Never skip layers: no direct database access from routes

Dependency injection pattern:

python

# Router depends on Service via Depends()
@router.post("/users", response_model=UserResponse)
async def create_user(
    data: UserCreate,
    service: UserService = Depends(get_user_service),
) -> UserResponse:
    return await service.create_user(data)

# Service depends on Repository via constructor injection
class UserService:
    def __init__(self, repo: UserRepository) -> None:
        self.repo = repo

# Repository depends on AsyncSession via Depends()
class UserRepository:
    def __init__(self, session: AsyncSession) -> None:
        self.session = session

Frontend Layers (React/TypeScript)

code

┌─────────────────────┐
│   Pages (pages/)     │  ← Route-level components: data fetching, layout composition
├─────────────────────┤
│   Layouts            │  ← Page structure: navigation, sidebars, content areas
│   (layouts/)         │
├─────────────────────┤
│   Features           │  ← Domain-specific: UserProfile, OrderList, ChatPanel
│   (features/)        │     Composed from shared components + hooks
├─────────────────────┤
│   Shared Components  │  ← Reusable UI: Button, Modal, Table, Form, Input
│   (components/)      │     No business logic. Configurable via props.
├─────────────────────┤
│   Hooks (hooks/)     │  ← Custom hooks: useAuth, usePagination, useDebounce
│   API (api/)         │  ← API client functions, TanStack Query configurations
├─────────────────────┤
│   Types (types/)     │  ← Shared TypeScript interfaces and type definitions
└─────────────────────┘

Component dependency direction:

•Pages import Features and Layouts
•Features import Shared Components and Hooks
•Shared Components import only other Shared Components and Types
•Hooks import API functions and Types
•API functions import Types only

Decision Framework

When facing architectural decisions, follow this structured process:

Step 1: Define the Problem

•What capability is needed?
•What are the non-functional requirements? (performance, scalability, maintainability)
•What constraints exist? (team size, timeline, existing infrastructure)

Step 2: Identify Options

•List 2-3 viable architectural approaches
•
For each option, document:
- •How it works (brief technical description)
- •Advantages
- •Disadvantages
- •Risks

Step 3: Evaluate Against Criteria

Criterion	Weight	Description
Maintainability	High	Can the team understand, modify, and debug this easily?
Testability	High	Can each component be tested in isolation?
Performance	Medium	Does it meet latency and throughput requirements?
Team familiarity	Medium	Does the team have experience with this approach?
Operational cost	Low	What are the infrastructure and maintenance costs?
Future flexibility	Low	How easily can this evolve as requirements change?

Step 4: Decide and Document

•Choose the option that best satisfies the weighted criteria
•Document the decision in an ADR (see references/architecture-decision-record-template.md)
•Record what was NOT chosen and why — this context is valuable for future decisions

Step 5: Communicate

•Share the ADR with the team
•Identify any migration or rollout steps needed
•Flag reversibility: is this a one-way door or a two-way door?

Database Schema Design

Design Principles

•Start normalized (3NF) — Denormalize only for proven performance bottlenecks, not speculation
•One migration per logical change — Each Alembic migration should represent a single, coherent schema modification
•Always include downgrade — Every migration must have a working downgrade() function
•
Index strategically:
- •Primary keys (automatic)
- •Foreign keys (always)
- •Columns in WHERE clauses of frequent queries
- •Composite indexes for multi-column lookups
- •Partial indexes for filtered queries (e.g., WHERE is_active = true)

SQLAlchemy 2.0 Async Patterns

python

# Model definition with Mapped types (SQLAlchemy 2.0 style)
class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[str] = mapped_column(String(255), unique=True, index=True)
    is_active: Mapped[bool] = mapped_column(default=True)
    created_at: Mapped[datetime] = mapped_column(server_default=func.now())

    # Relationships: ALWAYS use eager loading with async
    posts: Mapped[list["Post"]] = relationship(
        back_populates="author",
        lazy="selectin",  # or "joined" — NEVER "lazy" with async
    )

Async session rules:

•One AsyncSession per request — never share across concurrent tasks
•Use async with context manager for automatic cleanup
•Map session boundaries to transaction boundaries
•Use selectin or joined loading — lazy loading is incompatible with asyncio
•Use run_sync() only as a last resort for legacy code

Migration Planning

•Schema change → Generate migration: alembic revision --autogenerate -m "description"
•Review generated migration — verify column types, indexes, constraints
•Test upgrade: alembic upgrade head
•Test downgrade: alembic downgrade -1
•Test data preservation: ensure existing data survives the round-trip

Frontend Architecture

State Management Decision Tree

code

Is the data from the server?
├── YES → Use TanStack Query (useQuery, useMutation)
│         Configure staleTime, gcTime, query keys
│
└── NO → Is it needed across multiple components?
         ├── YES → Is it complex with actions/reducers?
         │         ├── YES → Use Zustand store
         │         └── NO  → Use React Context
         │
         └── NO → Use useState / useReducer locally

TanStack Query conventions:

•Query keys: [resource, ...identifiers] (e.g., ["users", userId], ["posts", { page, limit }])
•Use queryOptions() factory to centralize key + fn definitions — prevents copy-paste key errors
•Set staleTime based on data freshness needs (default 0 is too aggressive for most cases)
•Invalidate with invalidateQueries() after mutations — never manual refetch()
•Handle all states: isPending, isError, data

Component design rules:

•Props for configuration, hooks for data
•Lift state only as high as needed — no premature context creation
•Keep components under 200 lines — extract sub-components or custom hooks when larger
•Use children and composition over deep prop drilling

Routing Structure

Organize routes to mirror the URL structure:

code

src/
├── pages/
│   ├── HomePage.tsx           → /
│   ├── LoginPage.tsx          → /login
│   ├── users/
│   │   ├── UserListPage.tsx   → /users
│   │   └── UserDetailPage.tsx → /users/:id
│   └── settings/
│       └── SettingsPage.tsx   → /settings

Cross-Cutting Concerns

Authentication Flow

code

Login Request
    ↓
Backend: Validate credentials → Generate JWT (access + refresh tokens)
    ↓
Frontend: Store access token in memory, refresh token in httpOnly cookie
    ↓
API Calls: Attach access token via Authorization header
    ↓
Token Expired: Use refresh token to obtain new access token
    ↓
Refresh Failed: Redirect to login

Architecture decisions for auth:

•Access tokens: short-lived (15-30 min), stored in memory (not localStorage)
•Refresh tokens: longer-lived (7-30 days), stored in httpOnly cookie
•Backend: FastAPI Depends() chain for token validation → user extraction → permission check
•Frontend: Auth context providing user, login(), logout(), isAuthenticated

Error Handling Strategy

Errors should be handled at the appropriate layer:

Layer	Error Type	Action
Router	`HTTPException`	Return HTTP error response with status code
Service	Domain exceptions	Raise custom exceptions (e.g., `UserNotFoundError`)
Repository	Database exceptions	Catch and re-raise as domain exceptions or let propagate
Frontend	API errors	Display user-friendly messages, retry where appropriate

Backend exception hierarchy:

python

class AppError(Exception):
    """Base application error."""

class NotFoundError(AppError):
    """Resource not found."""

class ConflictError(AppError):
    """Resource conflict (duplicate, version mismatch)."""

class ValidationError(AppError):
    """Business rule violation."""

Router-level exception handler maps domain exceptions to HTTP responses:

python

@app.exception_handler(NotFoundError)
async def not_found_handler(request: Request, exc: NotFoundError):
    return JSONResponse(status_code=404, content={"detail": str(exc)})

Logging Architecture

Backend (structlog):

•Structured JSON logs in production
•Human-readable console in development
•Bind request context (request_id, user_id) at middleware level
•Log at service layer (business events), not repository layer (too noisy)
•Use log levels: DEBUG (development only), INFO (business events), WARNING (recoverable issues), ERROR (failures requiring attention)

Frontend:

•console.* in development
•Structured error reporting to backend or Sentry in production
•Log user actions for debugging, not for analytics

Configuration Management

Backend (pydantic-settings):

python

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env")

    database_url: str
    redis_url: str = "redis://localhost:6379"
    jwt_secret: str
    debug: bool = False

Frontend (environment variables):

•VITE_API_URL for API base URL
•Build-time injection via Vite's import.meta.env
•No secrets in frontend environment variables

Examples

Architecture Decision: Real-Time Notifications

Problem: The application needs real-time notifications for users (new messages, status updates).

Options evaluated:

Option	Pros	Cons
WebSocket	True bidirectional, low latency	Complex connection management, harder to scale
Server-Sent Events (SSE)	Simple, HTTP-based, auto-reconnect	Unidirectional (server→client only), limited browser connections
Polling	Simplest implementation, works everywhere	Higher latency, unnecessary server load

Decision: WebSocket for this use case.

Rationale: Notifications require low latency and the system will eventually need bidirectional communication (typing indicators, presence). SSE would work for notifications alone but would require a separate solution for future bidirectional needs. Polling introduces unacceptable latency for real-time UX.

Architecture:

•Backend: FastAPI WebSocket endpoint with ConnectionManager class
•Frontend: Custom useWebSocket hook with automatic reconnection
•Scaling: Redis pub/sub for multi-instance message distribution
•Persistence: Store notifications in database for offline users
•Fallback: REST endpoint for notification history and initial load

See references/architecture-decision-record-template.md for the full ADR format.

Edge Cases

Monolith vs Microservices

Default to modular monolith for teams smaller than 10 developers. A modular monolith provides:

•Clear module boundaries without network overhead
•Shared database with module-specific schemas
•Easy refactoring and code navigation
•Simple deployment and debugging

Consider microservices only when:

•Independent scaling is required for specific components
•Different modules need different technology stacks
•Team size exceeds 10 and ownership boundaries are clear
•Deployment independence is a business requirement

Migration path: Design module boundaries in the monolith as if they were services (no direct cross-module database access, communicate via service interfaces). This makes extraction to microservices straightforward when needed.

When to Break the Layer Pattern

The strict Router → Service → Repository pattern should be followed for standard CRUD operations. Acceptable exceptions:

•Background tasks: May call services directly without going through a router
•Event handlers: Domain event listeners may call services from any context
•CLI commands: Management scripts may access services or repositories directly
•Migrations: Data migrations may access models directly (no service/repo layer needed)
•Health checks: May access the database directly for simple connectivity verification

In all cases, business logic should still live in the service layer — these exceptions are about the entry point, not about bypassing business rules.

Evolving Architecture

When the architecture needs to change:

•Write an ADR documenting the motivation and the proposed change
•Identify all affected modules and their dependencies
•Plan an incremental migration — never big-bang rewrites
•Maintain backward compatibility during transition (strangler fig pattern)
•Set a deadline for completing the migration and removing legacy code