Database Workflow
Language-agnostic guidelines for database design, migrations, schema management, ORM patterns, and query optimization.
CRITICAL: No Premature Optimization
Default to simplicity. Optimize only when you have measured evidence of a problem.
- •Don't add indexes "just in case"
- •Don't over-normalize without understanding query patterns
- •Don't implement complex caching without profiling first
- •Measure before optimizing (EXPLAIN, query logs, monitoring)
Migration Strategies
Version Control for Schema Changes
Treat migrations as code:
- •Store in version control alongside application code
- •Timestamp or sequential numbering (001, 002, 003...)
- •Atomic, single-responsibility changes
- •Document WHY in migration files, not just WHAT
Naming convention:
migrations/ ├── 001_create_users_table.sql ├── 002_add_email_index.sql ├── 003_create_orders_table.sql └── 004_add_user_fk_to_orders.sql
Up/Down Migrations
Every migration must be reversible:
-- UP: Create table
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(255) UNIQUE NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- DOWN: Drop table
DROP TABLE users;
Complex reversals require care:
-- UP: Add constraint
ALTER TABLE orders ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id);
-- DOWN: Remove constraint (PostgreSQL)
ALTER TABLE orders DROP CONSTRAINT fk_user;
Idempotent Migrations
Migrations must be safely re-runnable:
-- ✅ Good: Idempotent (safe to run multiple times) CREATE TABLE IF NOT EXISTS users (...); CREATE INDEX IF NOT EXISTS idx_users_email ON users(email); -- ❌ Bad: Fails if run twice CREATE TABLE users (...); CREATE INDEX idx_users_email ON users(email);
Rollback Strategies
Two approaches:
- •
Down migrations (reversible):
- •Run DOWN SQL to undo changes
- •Works if migration is reversible
- •Better for critical systems
- •
Snapshot migrations (not reversible):
- •Never roll back; always migrate forward
- •Create new migration to fix issues
- •Simpler, safer in practice
- •Better for high-availability systems
Best practice: Design migrations to be reversible when possible, but plan for forward-only rollbacks in production.
Migration Naming Conventions
Use descriptive, action-oriented names:
✅ Good: - 001_create_users_table - 002_add_email_unique_constraint - 003_create_index_users_email - 004_rename_column_user_id_to_author_id - 005_add_soft_delete_columns ❌ Bad: - 001_update - 002_fix - 003_schema_change - 004_v2
Include timestamp + sequence:
2024_11_17_001_create_users_table.sql 2024_11_17_002_add_email_index.sql
Schema Design Principles
Normalization (1NF, 2NF, 3NF)
First Normal Form (1NF):
- •Eliminate repeating groups
- •All columns contain atomic (non-divisible) values
- •Each row is unique
-- ❌ NOT 1NF: phone_numbers is a repeating group
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
phone_numbers VARCHAR(255) -- "555-1234, 555-5678"
);
-- ✅ 1NF: Separate table for phone numbers
CREATE TABLE user_phones (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
phone_number VARCHAR(20)
);
Second Normal Form (2NF):
- •Meets 1NF
- •Remove partial dependencies (non-key columns depend on ALL of primary key)
-- ❌ NOT 2NF: course_name depends only on course_id, not on (student_id, course_id)
CREATE TABLE enrollments (
student_id BIGINT,
course_id BIGINT,
course_name VARCHAR(255), -- Should be in courses table
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
-- ✅ 2NF: Move course_name to separate table
CREATE TABLE courses (
id BIGINT PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE enrollments (
student_id BIGINT,
course_id BIGINT REFERENCES courses(id),
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
Third Normal Form (3NF):
- •Meets 2NF
- •Remove transitive dependencies (non-key columns don't depend on other non-key columns)
-- ❌ NOT 3NF: city and state depend on zip_code, not on user_id
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
zip_code VARCHAR(5),
city VARCHAR(100),
state CHAR(2)
);
-- ✅ 3NF: Move location info to separate table
CREATE TABLE zip_codes (
code VARCHAR(5) PRIMARY KEY,
city VARCHAR(100),
state CHAR(2)
);
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
zip_code VARCHAR(5) REFERENCES zip_codes(code)
);
Denormalization (When and Why)
Denormalize when:
- •Query patterns are read-heavy (much more than writes)
- •Measurement shows join performance is a bottleneck
- •Reporting queries need fast access to aggregated data
- •Cache invalidation is simpler than join performance
Common denormalization patterns:
- •Stored aggregates:
-- Track order count on user without JOIN
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
order_count INT DEFAULT 0
);
-- Keep in sync with trigger or application code
- •Purposeful redundancy:
-- Store user email on order to avoid JOIN at read time
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
user_email VARCHAR(255), -- Denormalized for reporting
total_amount DECIMAL(10,2)
);
- •Pre-computed views:
-- Materialized view for dashboards
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as order_count,
SUM(total_amount) as total_revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at);
-- Refresh periodically, not on every insert
REFRESH MATERIALIZED VIEW monthly_sales;
Indexing Strategies
Index only when needed. Measure first.
-- ✅ Create index for frequently searched columns CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_orders_created_at ON orders(created_at DESC); -- ❌ Avoid: Index on every column -- ❌ Avoid: Index on low-cardinality columns (boolean flags) -- ❌ Avoid: Duplicate indexes
Index types:
- •Single-column index (most common):
CREATE INDEX idx_users_email ON users(email);
- •Composite index (for multi-column WHERE/JOIN):
-- Good for: WHERE user_id = X AND created_at > Y CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
- •Partial index (index subset of rows):
-- Index only active users (avoid indexing soft-deleted rows)
CREATE INDEX idx_active_users_email ON users(email)
WHERE deleted_at IS NULL;
- •Unique index (enforce constraint):
CREATE UNIQUE INDEX idx_users_email ON users(email);
Index maintenance:
- •Monitor for unused indexes (query database statistics)
- •Drop unused indexes after measurement
- •ANALYZE/VACUUM regularly to update statistics
Primary Keys
Use surrogate keys (auto-incrementing ID) by default:
-- ✅ Recommended: Surrogate key
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255)
);
-- ✅ Good for high-volume tables
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
event_type VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Natural keys only if:
- •Column(s) are guaranteed immutable
- •Never reassigned or repurposed
- •Are short and stable
-- ✅ Natural key (country code is stable, never changes)
CREATE TABLE countries (
code CHAR(2) PRIMARY KEY,
name VARCHAR(255)
);
-- ❌ Natural key (email changes, should use surrogate)
CREATE TABLE users (
email VARCHAR(255) PRIMARY KEY
);
Foreign Keys and Constraints
Always define foreign key relationships:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
total_amount DECIMAL(10,2) NOT NULL
);
-- With explicit constraint name for cleaner error messages
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL,
product_id BIGINT NOT NULL,
quantity INT NOT NULL CHECK (quantity > 0),
CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id),
CONSTRAINT fk_product FOREIGN KEY (product_id) REFERENCES products(id)
);
Cascade behaviors:
-- DELETE CASCADE: Delete orders when user is deleted (use with caution) CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE -- RESTRICT: Prevent user deletion if orders exist (safer default) CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT -- SET NULL: Set user_id to NULL if user deleted (for optional relationships) CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
Constraints
Use constraints to enforce data integrity at database level:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
age INT CHECK (age >= 18),
role VARCHAR(50) DEFAULT 'user' CHECK (role IN ('user', 'admin', 'moderator')),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
status VARCHAR(50) NOT NULL DEFAULT 'pending',
total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount > 0),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Composite constraint
CONSTRAINT valid_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
-- Unique constraint on composite columns
CONSTRAINT unique_user_order_date UNIQUE (user_id, DATE(created_at))
);
ORM Patterns
Active Record vs Data Mapper
Active Record:
- •Model contains both data and database logic
- •Simple for small projects
- •Model directly calls database
- •Example: Rails, Django ORM
# Active Record pattern
class User(Model):
name = CharField()
email = CharField()
def save(self):
# Object knows how to save itself
db.insert('users', {...})
@staticmethod
def find_by_email(email):
return db.query('SELECT * FROM users WHERE email = ?', email)
# Usage
user = User(name='John', email='john@example.com')
user.save()
found_user = User.find_by_email('john@example.com')
Data Mapper:
- •Model contains only data; repository handles database logic
- •Better separation of concerns
- •Model is a plain object
- •Example: SQLAlchemy, TypeORM
# Data Mapper pattern
class User:
def __init__(self, name, email):
self.name = name
self.email = email
class UserRepository:
def save(self, user):
# Repository handles persistence
db.insert('users', {'name': user.name, 'email': user.email})
def find_by_email(self, email):
row = db.query('SELECT * FROM users WHERE email = ?', email)
return User(row['name'], row['email']) if row else None
# Usage
user = User('John', 'john@example.com')
repo = UserRepository()
repo.save(user)
found_user = repo.find_by_email('john@example.com')
Choose based on project scale:
- •Small apps: Active Record (simpler)
- •Medium to large: Data Mapper (more flexible)
Query Builders
Use query builders to avoid string concatenation and SQL injection:
# ❌ Vulnerable to SQL injection
query = f"SELECT * FROM users WHERE email = '{email}'"
result = db.execute(query)
# ✅ Safe: Parameterized query
result = db.query('SELECT * FROM users WHERE email = ?', [email])
# ✅ Better: Query builder
result = (
db.select(User)
.where(User.email == email)
.where(User.active == True)
.order_by(User.created_at.desc())
.limit(10)
.execute()
)
Benefits of query builders:
- •Prevent SQL injection
- •Cleaner, more maintainable code
- •Database-agnostic (can switch databases)
- •Compose queries dynamically
Eager Loading vs Lazy Loading
Lazy Loading (default, but can cause N+1):
# Each user fetch triggers a separate query for posts
users = db.query(User).limit(10).all()
for user in users:
print(user.posts) # Additional query per user = 10+ queries!
Eager Loading (prevent N+1):
# Single query with JOIN
users = db.query(User).join(Post).limit(10).all()
for user in users:
print(user.posts) # No additional queries
# Or use explicit eager loading
users = db.query(User).options(joinedload(User.posts)).limit(10).all()
N+1 Query Problem
Recognize and fix N+1 problems:
# ❌ N+1 Problem: 1 query for users + N queries for posts
users = db.query(User).all() # 1 query
for user in users:
posts = db.query(Post).filter(Post.user_id == user.id).all() # N more queries
# ✅ Fix 1: Eager loading with JOIN
users = db.query(User).join(Post).distinct().all()
# ✅ Fix 2: Use IN clause for batch loading
user_ids = [u.id for u in users]
posts = db.query(Post).filter(Post.user_id.in_(user_ids)).all()
# Now merge posts back to users in memory
# ✅ Fix 3: Use ORM eager loading
users = db.query(User).options(selectinload(User.posts)).all()
Query Optimization
Index Usage
Write queries that use indexes:
-- ✅ Uses index on email SELECT * FROM users WHERE email = 'john@example.com'; -- ✅ Uses index on user_id, created_at SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01' ORDER BY created_at DESC; -- ❌ Can't use index (function on column) SELECT * FROM users WHERE LOWER(email) = 'john@example.com'; -- Fix: CREATE INDEX idx_users_email_lower ON users(LOWER(email)); -- ❌ Can't use index (leading wildcard) SELECT * FROM users WHERE email LIKE '%@example.com'; -- Fix: Use full-text search or store domain separately -- ✅ Can use index (trailing wildcard) SELECT * FROM users WHERE email LIKE 'john%';
Query Analysis (EXPLAIN)
Always analyze slow queries before optimizing:
-- PostgreSQL EXPLAIN ANALYZE SELECT * FROM orders o JOIN users u ON o.user_id = u.id WHERE u.email = 'john@example.com' ORDER BY o.created_at DESC LIMIT 10; -- MySQL EXPLAIN SELECT * FROM orders o JOIN users u ON o.user_id = u.id WHERE u.email = 'john@example.com' ORDER BY o.created_at DESC LIMIT 10;
Look for:
- •Seq Scan (full table scan) - add index?
- •Hash Join (expensive) - add index on join column
- •Sort (expensive) - add index with correct sort order
- •High execution time - which step is slow?
Avoiding SELECT *
Fetch only columns you need:
-- ❌ Fetches all columns (slower, more bandwidth) SELECT * FROM orders WHERE user_id = 123; -- ✅ Fetch only needed columns SELECT id, user_id, total_amount, created_at FROM orders WHERE user_id = 123; -- ✅ Reduces memory/bandwidth especially for large text columns SELECT id, user_id, total_amount FROM orders WHERE user_id = 123;
Connection Pooling
Use connection pooling in production:
# ❌ Anti-pattern: New connection per query
def get_user(user_id):
conn = connect() # New connection!
user = conn.query('SELECT * FROM users WHERE id = ?', user_id)
conn.close()
return user
# ✅ Use connection pool
pool = ConnectionPool(
host='localhost',
database='myapp',
min_size=5,
max_size=20,
timeout=30
)
def get_user(user_id):
with pool.get_connection() as conn: # Reuses from pool
return conn.query('SELECT * FROM users WHERE id = ?', user_id)
Pool configuration (tune for your workload):
- •
min_size: Minimum idle connections (default 5-10) - •
max_size: Maximum concurrent connections (default 20-50) - •
timeout: Connection acquisition timeout - •
idle_timeout: Close idle connections after N seconds
SQL vs NoSQL Considerations
When to Use SQL (ACID, Relations)
Use SQL when:
- •Data has strong relationships (orders → users → addresses)
- •Consistency is critical (financial transactions, inventory)
- •Need complex queries with JOINs
- •Data is structured and schema is stable
- •Multi-record transactions (ACID guarantees)
Example scenarios:
- •E-commerce: Products, Orders, Users with complex relationships
- •Banking: Transactions must be atomic and consistent
- •CRM: Complex queries across related entities
- •Content management: Structured articles with authors, tags, categories
When to Use NoSQL (Scale, Flexibility)
Use NoSQL when:
- •Data is unstructured or semi-structured
- •Schema evolves rapidly (different object shapes)
- •Horizontal scaling is priority (sharding)
- •High write throughput needed
- •Document-oriented data (JSON-like)
Example scenarios:
- •Real-time analytics: Time-series data
- •Content platforms: Variable document structures
- •Caching layer: Key-value store
- •Event streaming: Logs, events, activity feeds
- •Catalog systems: Products with variable attributes
Document database (MongoDB, Firebase):
✅ Good: User profiles with flexible attributes
✅ Good: Product catalog with variable specs
❌ Bad: Complex multi-entity queries and JOINs
Document structure:
{
_id: 1,
name: "John",
email: "john@example.com",
preferences: {
language: "en",
theme: "dark",
notifications: true
}
}
Key-value store (Redis, Memcached):
✅ Good: Caching, sessions, rate limiting
✅ Good: Real-time leaderboards
❌ Bad: Complex queries, relationships
Structure:
user:123 → { name, email, created_at }
session:abc123 → { user_id, expires_at }
Graph database (Neo4j):
✅ Good: Social networks, recommendations ✅ Good: Complex relationship queries ❌ Bad: Simple CRUD operations Relationships: User -[:FOLLOWS]-> User User -[:COMMENTED_ON]-> Post
Testing Database Code
Test Databases
Use separate test database:
# config.py
if os.getenv('ENV') == 'test':
DATABASE_URL = 'postgresql://test_user:test_pass@localhost/test_db'
else:
DATABASE_URL = os.getenv('DATABASE_URL')
# conftest.py (pytest)
@pytest.fixture(autouse=True)
def setup_test_db():
"""Create test database and tables before each test."""
# Create tables
db.create_all()
yield
# Cleanup
db.drop_all()
Fixtures and Seeds
Use fixtures for test data:
# conftest.py
import pytest
from app.models import User, Order
@pytest.fixture
def sample_user(db):
"""Create a test user."""
user = User(name='John Doe', email='john@example.com')
db.add(user)
db.commit()
return user
@pytest.fixture
def sample_orders(db, sample_user):
"""Create orders for test user."""
orders = [
Order(user_id=sample_user.id, total_amount=100.00),
Order(user_id=sample_user.id, total_amount=200.00),
]
db.add_all(orders)
db.commit()
return orders
# test_orders.py
def test_get_user_orders(sample_user, sample_orders):
"""Test fetching user orders."""
orders = Order.query.filter_by(user_id=sample_user.id).all()
assert len(orders) == 2
assert sum(o.total_amount for o in orders) == 300.00
Transaction Rollback
Rollback transactions to isolate tests:
# conftest.py - Automatic rollback after each test
@pytest.fixture(autouse=True)
def db_transaction(db):
"""Wrap each test in a transaction that rolls back."""
transaction = db.begin_nested()
yield
transaction.rollback() # Undo all changes from this test
# Or use explicit rollback
def test_create_user():
db.begin()
user = User(name='John', email='john@example.com')
db.add(user)
db.commit()
assert user.id is not None
db.rollback()
# Changes are undone, database is clean for next test
Common Patterns (SQL and NoSQL)
Soft Deletes
Mark records as deleted instead of removing:
-- Add deleted_at column ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL; -- Soft delete (update) UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE id = 123; -- Query only active records SELECT * FROM users WHERE deleted_at IS NULL; -- Create index on deleted_at for efficient queries CREATE INDEX idx_users_active ON users(deleted_at) WHERE deleted_at IS NULL;
Pros: Recoverable, audit trail, can restore data Cons: Need to remember to filter deleted records everywhere
Audit Logs
Track all data changes:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
entity_type VARCHAR(100),
entity_id BIGINT,
action VARCHAR(20), -- CREATE, UPDATE, DELETE
old_values JSONB,
new_values JSONB,
changed_by BIGINT REFERENCES users(id),
changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- PostgreSQL trigger to auto-log changes
CREATE FUNCTION audit_trigger() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO audit_log (entity_type, entity_id, action, new_values, changed_at)
VALUES ('user', NEW.id, TG_OP, row_to_json(NEW), NOW());
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER users_audit AFTER UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger();
Timestamps (Created/Updated)
Track record creation and modification:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Auto-update updated_at on changes (PostgreSQL)
CREATE FUNCTION update_timestamp() RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER users_update_timestamp BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_timestamp();
Hierarchical Data (Trees)
Store tree structures in relational database:
-- Option 1: Adjacency List (simple, slow to query)
CREATE TABLE categories (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255),
parent_id BIGINT REFERENCES categories(id)
);
-- Query children
SELECT * FROM categories WHERE parent_id = 5;
-- Query ancestors (recursive, expensive)
WITH RECURSIVE ancestors AS (
SELECT id, name, parent_id FROM categories WHERE id = 5
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM categories c
JOIN ancestors a ON c.id = a.parent_id
)
SELECT * FROM ancestors;
-- Option 2: Closure Table (trade space for query speed)
CREATE TABLE categories (id BIGSERIAL PRIMARY KEY, name VARCHAR(255));
CREATE TABLE category_closure (
ancestor_id BIGINT REFERENCES categories(id),
descendant_id BIGINT REFERENCES categories(id),
depth INT,
PRIMARY KEY (ancestor_id, descendant_id)
);
-- Query all descendants
SELECT c.* FROM categories c
JOIN category_closure cc ON c.id = cc.descendant_id
WHERE cc.ancestor_id = 5 AND cc.depth > 0;
Pagination
Implement efficient pagination:
-- ❌ OFFSET is slow for large offsets SELECT * FROM orders LIMIT 10 OFFSET 100000; -- ✅ Better: Keyset pagination (cursor-based) SELECT * FROM orders WHERE id > 12345 -- Last ID from previous page ORDER BY id LIMIT 10; -- ✅ With composite key SELECT * FROM orders WHERE (user_id, created_at) > (123, '2024-11-17') ORDER BY user_id, created_at LIMIT 10;
Transactions
Use transactions for data consistency:
# ❌ No transaction - inconsistent state if error occurs
user = db.query(User).get(123)
user.balance -= 50
db.commit()
account = db.query(Account).get(456)
account.balance += 50
db.commit() # If this fails, money disappears!
# ✅ Transaction - all-or-nothing
try:
with db.transaction():
user = db.query(User).get(123)
user.balance -= 50
account = db.query(Account).get(456)
account.balance += 50
db.flush()
except Exception:
# Everything rolls back automatically
raise
Common Pitfalls
N+1 Queries
Problem: One query per item instead of batching Fix: Use eager loading or batch queries
Missing Indexes
Problem: Slow queries on large tables Fix: Analyze slow queries with EXPLAIN, add indexes on WHERE/JOIN columns
No Transaction Boundaries
Problem: Inconsistent data if error occurs mid-operation Fix: Wrap multi-step operations in transactions
Over-Normalization
Problem: Too many JOINs make queries slow and complex Fix: Denormalize strategically where proven necessary
Unbounded Queries
Problem: SELECT * without LIMIT causes memory exhaustion Fix: Always LIMIT and paginate large result sets
Wrong Cascade Rules
Problem: Accidental data loss or orphaned records Fix: Choose CASCADE, RESTRICT, or SET NULL deliberately
Storing Passwords in Plain Text
Problem: Security breach if database compromised Fix: Store bcrypt hashes, never plain passwords
Accidental Type Mismatches
Problem: VARCHAR(255) used for numbers, can't query properly Fix: Use correct data types (INTEGER, BIGINT, DECIMAL, not VARCHAR)
Not Testing Migrations
Problem: Migration fails in production, downtime Fix: Test migrations on production-like data before deploying
Schema Changes Without Backward Compatibility
Problem: Old application code breaks with new schema Fix: Support both old and new columns temporarily, add deprecation period
No Query Timeouts
Problem: Slow query locks database, cascading failures Fix: Set statement timeouts and query timeouts in production
See project-specific database configuration in .claude/CLAUDE.md if present.