Operational Patterns

Patterns and recommendations for security, observability, CI/CD, database migrations, and environment management. Use when generating the Security Architecture, Observability, and DevOps Blueprint deliverables.

Security Architecture

Auth Strategy Selection

Project Type	Recommended Auth	Rationale
Simple app, few roles	Clerk or Supabase Auth	Managed service, minimal setup, built-in UI components
Multi-tenant SaaS	Auth0 or Clerk (organizations)	Organization-level isolation, role management, SSO
API-only / B2B	API keys + JWT	Simple machine-to-machine auth
Enterprise / compliance-heavy	Auth0 or Keycloak (self-hosted)	Fine-grained control, audit logs, compliance certifications
Mobile app	Firebase Auth or Clerk	Native SDK support, social login, biometrics

API Security Checklist

Every REST API service should implement these protections:

Protection	Implementation	Priority
Rate limiting	Express-rate-limit, Upstash Ratelimit, or API gateway rate limits	Must-have
Input validation	Zod schemas on all request bodies and query params. Reject unknown fields.	Must-have
CORS	Whitelist specific origins. Never use `*` in production.	Must-have
Helmet headers	`helmet` middleware for security headers (CSP, X-Frame-Options, HSTS)	Must-have
SQL/NoSQL injection	Parameterized queries only. Never interpolate user input into queries.	Must-have
XSS prevention	Sanitize HTML output. Use frameworks with built-in escaping (React, Next.js).	Must-have
CSRF protection	SameSite cookies + CSRF tokens for cookie-based auth. Not needed for bearer-only APIs.	Conditional
Request size limits	Limit body size (e.g. 1MB default, higher for file uploads)	Should-have
Authentication	Verify JWT/session on every protected route. Middleware, not per-route.	Must-have
Authorization	Check user roles/permissions after authentication. Separate middleware.	Must-have

Data Protection

Concern	Recommendation
Encryption at rest	Use database provider's built-in encryption (RDS, Supabase, MongoDB Atlas all encrypt by default)
Encryption in transit	TLS 1.3 on all endpoints. Enforce HTTPS redirects. Internal service-to-service can use HTTP if within VPC.
PII handling	Identify PII fields (email, name, phone, address, IP). Log them only when necessary. Mask in non-prod environments.
Secrets management	Never hardcode secrets. Use platform env vars (Vercel, Railway) or a secrets manager (Doppler, AWS SSM). Rotate API keys periodically.
Data retention	Define retention periods per data type. Implement soft deletes for user data. Support data export/deletion for GDPR.
Backups	Automated daily database backups with point-in-time recovery. Test restores quarterly.

OWASP Top 10 Quick Reference

When assessing security, check against these common threats:

•Broken Access Control — Ensure authorization checks on every endpoint, not just authentication
•Cryptographic Failures — Don't store passwords in plaintext (use bcrypt/argon2 via auth provider), don't log tokens
•Injection — Parameterized queries, input validation, no eval/exec of user input
•Insecure Design — Rate limit auth endpoints, implement account lockout, use CAPTCHA on public forms
•Security Misconfiguration — Remove default credentials, disable debug mode in production, review CORS
•Vulnerable Components — Keep dependencies updated, run npm audit / pip audit in CI
•Authentication Failures — Use a managed auth provider, implement MFA for admin roles
•Data Integrity Failures — Verify webhook signatures, validate JWT signatures, use CSP headers
•Logging Failures — Log auth events, access denials, and input validation failures
•SSRF — Validate and whitelist URLs before making server-side requests

Observability

Observability Stack Recommendations

Project Size	Logging	Tracing	Metrics	Alerting	Monthly Cost
MVP / startup	Axiom (free tier) or console + Vercel logs	Not needed yet	Vercel/Railway built-in	Sentry (free tier)	$0
Growing (1K-10K users)	Axiom or Betterstack	Sentry performance	PostHog + Sentry	Sentry + Slack webhooks	$20-50/mo
Production (10K+ users)	Datadog or Grafana Cloud	OpenTelemetry → Jaeger/Datadog	Prometheus + Grafana or Datadog	PagerDuty + Datadog	$100-500/mo
Enterprise	Datadog or Splunk	Datadog APM or Honeycomb	Datadog or custom Prometheus	PagerDuty + Datadog	$500+/mo

Structured Logging

All services should use structured JSON logging:

json

{
  "level": "info",
  "timestamp": "2026-02-07T10:30:00.000Z",
  "service": "api-server",
  "requestId": "req_abc123",
  "userId": "usr_xyz",
  "action": "create_order",
  "duration_ms": 145,
  "message": "Order created successfully"
}

Logging rules:

•Use log levels consistently: error (broken), warn (degraded), info (business events), debug (dev only)
•Include requestId for request correlation across services
•Include userId for audit trail (mask in logs if compliance requires)
•Never log: passwords, tokens, full credit card numbers, API keys
•Always log: auth failures, permission denials, input validation errors, external API errors

Key Metrics to Track

Category	Metric	Alert Threshold
Availability	Uptime percentage	< 99.5% over 24h
Latency	Request duration p50, p95, p99	p99 > 2s
Error rate	5xx errors / total requests	> 1% over 5 minutes
Throughput	Requests per second	Unusual spike or drop (>3x baseline)
Queue depth	Jobs waiting in queue	> 1000 for > 5 minutes
Database	Connection pool usage, query duration	Pool > 80%, queries > 500ms
AI/LLM	Token usage, response time, failure rate	Failure rate > 5%, response > 30s
Business	Signups, conversions, active users	Unusual drops (context-dependent)

Health Check Pattern

Every service exposes /health with tiered checks:

json

{
  "status": "healthy",
  "service": "api-server",
  "version": "1.2.3",
  "uptime_seconds": 86400,
  "checks": {
    "database": { "status": "healthy", "latency_ms": 5 },
    "redis": { "status": "healthy", "latency_ms": 2 },
    "external_api": { "status": "degraded", "latency_ms": 1500, "note": "slow but responding" }
  }
}

•/health — quick liveness check (returns 200 if process is running)
•/health/ready — readiness check (returns 200 only if all dependencies are reachable)
•Used by load balancers, container orchestrators, and monitoring

CI/CD Pipeline

Pipeline Templates by Provider

GitHub Actions (recommended for most projects):

code

Stages: lint → test → build → deploy

Triggers:
  - Push to main → deploy to production
  - Push to develop → deploy to staging
  - Pull request → run lint + test only
  - Manual dispatch → deploy to any environment

Pipeline stages:

Stage	What It Does	Tools
Lint	Code style, formatting, type checking	ESLint, Prettier, tsc --noEmit / Ruff, mypy
Test	Unit tests, integration tests	Jest, Vitest, pytest
Build	Compile, bundle, Docker image	tsc, next build, docker build
Security	Dependency audit, secret scanning	npm audit, pip audit, Trivy, GitGuardian
Deploy	Push to hosting provider	Vercel CLI, Railway CLI, AWS CDK, Docker push

Branch Strategy Selection

Team Size	Recommended Strategy	Workflow
Solo / 1-2 devs	github-flow	`main` + feature branches. Merge via PR. Deploy on merge to main.
3-5 devs	github-flow	Same, but require PR reviews. Use staging environment for pre-prod testing.
5-10 devs	gitflow or trunk-based	Gitflow if you need scheduled releases. Trunk-based if you deploy continuously.
10+ devs	trunk-based with feature flags	Short-lived branches (<1 day). Feature flags for incomplete features.

Environment Promotion

code

Feature branch → PR review → merge to develop → auto-deploy to staging →
manual promote to production (merge develop → main) → auto-deploy to production

For simpler projects:

code

Feature branch → PR review → merge to main → auto-deploy to production

Database Migrations

Migration Tool Selection

Stack	Recommended Tool	Alternatives
Node.js + PostgreSQL	Prisma Migrate	Knex, TypeORM, Drizzle Kit
Node.js + MongoDB	Mongoose (schema-on-read)	migrate-mongo
Python + PostgreSQL	Alembic	Django migrations, SQLAlchemy-migrate
Python + MongoDB	No formal migrations needed	mongomock for testing

Migration Strategy

Concern	Recommendation
Versioning	Sequential numbered migrations (001_create_users.sql, 002_add_orders.sql). Never edit applied migrations.
Rollback	Every migration has an up and a down. Test rollbacks before deploying.
CI integration	Run pending migrations automatically in CI before tests. Run in staging before production.
Zero-downtime	Avoid breaking changes in one step. Add column → backfill → make required → remove old.
Seed data	Dev seeds: faker/factory data for local development. Staging seeds: anonymized subset of production data.
Production	Run migrations before deploying new code. Use advisory locks to prevent concurrent migrations.

Common Migration Patterns

Pattern	When	Example
Add nullable column	Safe, no downtime	`ALTER TABLE users ADD COLUMN phone TEXT;`
Rename column	Requires migration in 2 steps	Step 1: Add new column + backfill. Step 2: Drop old column.
Add index	Can lock table on large datasets	Use `CREATE INDEX CONCURRENTLY` on PostgreSQL
Change column type	Risky — may lose data	Create new column, migrate data, drop old column

Environment Strategy

Environment Definitions

Environment	Purpose	Data	Access	Deploy Trigger
Local	Developer machine	Seed data / Docker Compose	Developer only	Manual
Development	Shared dev environment	Seed data	Dev team	Push to `develop`
Staging	Pre-production testing	Anonymized prod data or rich seeds	Dev team + QA	Push to `staging` or manual promote
Production	Live users	Real data	Restricted access	Push to `main` or manual promote

Config Management

Environment variable categories:

Category	Examples	Where Stored
Service config	PORT, NODE_ENV, LOG_LEVEL	.env file (local), platform env vars (deployed)
Database	DATABASE_URL, REDIS_URL	Platform env vars, secrets manager
Third-party API keys	STRIPE_SECRET_KEY, SENDGRID_API_KEY	Secrets manager (Doppler, AWS SSM)
Feature flags	ENABLE_AI_AGENT, ENABLE_BETA_FEATURES	Feature flag service or env vars
Internal service URLs	API_SERVER_URL, AGENT_SERVICE_URL	Platform env vars, service discovery

Config validation:

•Validate all environment variables on service startup using Zod, envalid, or pydantic-settings
•Fail fast with clear error messages if required vars are missing
•Log which environment the service is running in (but never log secret values)

Feature Flags

Approach	When to Use	Tool
Environment variables	Simple on/off for 1-2 features	`ENABLE_FEATURE_X=true`
Config file	Multiple flags, no runtime changes needed	`features.json` loaded on startup
Feature flag service	Runtime toggling, gradual rollouts, A/B testing	LaunchDarkly ($10/mo), Unleash (open source), PostHog (free tier)

Choosing What to Include

Not every project needs all operational patterns. Use this guide:

Project Stage	Include	Skip
MVP / proof of concept	Basic auth, console logging, simple CI (lint + test + deploy), env vars	Tracing, alerting, feature flags, multi-environment
Early startup (pre-product-market fit)	Managed auth, structured logging, Sentry, GitHub Actions CI/CD, staging env	APM, custom metrics, PagerDuty, complex migration strategy
Growing product (1K+ users)	All security checklist items, observability stack, full CI/CD pipeline, migration tooling, staging + production	Enterprise compliance, self-hosted tooling
Production / enterprise	Everything above + compliance audits, APM, distributed tracing, PagerDuty, feature flags, multi-region	Nothing — you need it all

When generating blueprints, match the depth to the project's stage and complexity. Don't overwhelm an MVP with enterprise patterns.