PostgreSQL Patterns

Problem Statement

Alembic generates migrations but doesn't understand PostgreSQL performance implications. This skill covers reviewing migrations for PostgreSQL-specific issues and writing efficient queries.

Pattern: Index Review

When to Add Indexes

sql

-- ✅ ADD INDEX: Foreign keys (almost always)
CREATE INDEX ix_assessments_user_id ON assessments (user_id);

-- ✅ ADD INDEX: Frequently filtered columns
CREATE INDEX ix_assessments_status ON assessments (status);

-- ✅ ADD INDEX: Columns in WHERE + ORDER BY together
CREATE INDEX ix_assessments_user_status ON assessments (user_id, status);

-- ✅ ADD INDEX: Columns used in JOIN conditions
CREATE INDEX ix_answers_question_id ON answers (question_id);

When NOT to Add Indexes

sql

-- ❌ SKIP: Small tables (< 1000 rows)
-- ❌ SKIP: Write-heavy tables with rare reads
-- ❌ SKIP: Low cardinality columns alone (boolean, status with 3 values)
-- ❌ SKIP: Columns rarely used in WHERE/JOIN/ORDER BY

Index Column Order Matters

sql

-- For query: WHERE user_id = ? AND status = ? ORDER BY created_at
-- ✅ CORRECT: Most selective first, ORDER BY column last
CREATE INDEX ix_assessments_user_status_created 
ON assessments (user_id, status, created_at);

-- ❌ WRONG: Order doesn't match query pattern
CREATE INDEX ix_assessments_created_status_user 
ON assessments (created_at, status, user_id);

Pattern: Partial Indexes

Problem: Full index on column where you only query subset of values.

sql

-- Full index (indexes all rows)
CREATE INDEX ix_assessments_status ON assessments (status);

-- ✅ BETTER: Partial index (only active assessments)
CREATE INDEX ix_assessments_active 
ON assessments (user_id, created_at) 
WHERE status = 'active';

-- Use case: "Get user's active assessments sorted by date"
-- The partial index is smaller and faster

-- Common patterns:
-- WHERE deleted_at IS NULL (soft deletes)
-- WHERE status != 'archived'
-- WHERE is_active = true

In Alembic:

python

op.execute("""
    CREATE INDEX ix_assessments_active 
    ON assessments (user_id, created_at) 
    WHERE status = 'active'
""")

Pattern: JSONB Indexes

sql

-- GIN index for @> (contains) queries
CREATE INDEX ix_settings_data ON user_settings USING GIN (data);

-- Query: Find users with specific setting
SELECT * FROM user_settings WHERE data @> '{"theme": "dark"}';

-- Expression index for specific JSON path
CREATE INDEX ix_settings_theme ON user_settings ((data->>'theme'));

-- Query: Find by specific key
SELECT * FROM user_settings WHERE data->>'theme' = 'dark';

Pattern: Concurrent Index Creation

Problem: CREATE INDEX locks the table. On large tables, this blocks writes.

sql

-- ❌ BLOCKS WRITES during creation
CREATE INDEX ix_events_user_id ON events (user_id);

-- ✅ DOESN'T BLOCK (but slower to create)
CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id);

In Alembic:

python

# Must disable transaction for CONCURRENTLY
def upgrade():
    op.execute("COMMIT")  # End current transaction
    op.execute(
        "CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id)"
    )

Pattern: Query Performance Analysis

sql

-- EXPLAIN ANALYZE shows actual execution
EXPLAIN ANALYZE 
SELECT * FROM assessments 
WHERE user_id = 'abc-123' AND status = 'active';

-- What to look for:
-- ✅ "Index Scan" or "Index Only Scan" - good
-- ❌ "Seq Scan" on large table - needs index
-- ❌ "Sort" with high cost - consider index on ORDER BY column
-- ❌ "Nested Loop" with many rows - might need different join strategy

Key metrics:

•cost: Estimated units (lower is better)
•rows: Estimated row count
•actual time: Real milliseconds
•loops: How many times executed

Pattern: UUID Performance

sql

-- UUIDs as primary keys have tradeoffs
-- ❌ Random UUIDs (uuid4) cause index fragmentation
-- ✅ Time-ordered UUIDs (uuid7) maintain insertion order

-- If using uuid4, consider:
-- 1. BRIN index for time-ordered queries (if you have created_at)
-- 2. Covering indexes to avoid heap fetches
-- 3. Accept some fragmentation (usually fine under 10M rows)

Pattern: Constraint Review

sql

-- ✅ GOOD: Named constraints (can be dropped/modified)
ALTER TABLE assessments 
ADD CONSTRAINT fk_assessments_user_id 
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;

-- ❌ BAD: Unnamed constraints (auto-generated names are ugly)
ALTER TABLE assessments 
ADD FOREIGN KEY (user_id) REFERENCES users(id);

-- ✅ GOOD: CHECK constraints for data integrity
ALTER TABLE assessments 
ADD CONSTRAINT chk_assessments_rating 
CHECK (rating >= 1.0 AND rating <= 5.5);

-- ✅ GOOD: Unique constraints with meaningful names
ALTER TABLE users 
ADD CONSTRAINT uq_users_email UNIQUE (email);

Pattern: Bulk Operations

sql

-- ❌ SLOW: Row-by-row updates
UPDATE users SET role = 'member' WHERE id = 'id1';
UPDATE users SET role = 'member' WHERE id = 'id2';
-- ... thousands more

-- ✅ FAST: Batch update
UPDATE users SET role = 'member' 
WHERE id IN ('id1', 'id2', 'id3', ...);

-- ✅ FAST: Update with subquery
UPDATE users SET role = 'member'
WHERE id IN (
    SELECT user_id FROM legacy_members WHERE migrated = false
);

-- For very large updates, batch to avoid long locks:
UPDATE users SET role = 'member'
WHERE id IN (
    SELECT id FROM users 
    WHERE role IS NULL 
    LIMIT 10000
);
-- Run in loop until no rows affected

Pattern: Table Locking Awareness

Know what locks what:

Operation	Lock Type	Blocks
SELECT	AccessShare	Nothing
INSERT/UPDATE/DELETE	RowExclusive	Nothing (row-level)
CREATE INDEX	ShareLock	INSERT/UPDATE/DELETE
CREATE INDEX CONCURRENTLY	ShareUpdateExclusive	Other schema changes
ALTER TABLE (most)	AccessExclusive	Everything
DROP TABLE	AccessExclusive	Everything

Danger zone:

sql

-- ❌ LOCKS ENTIRE TABLE
ALTER TABLE users ADD COLUMN bio TEXT NOT NULL DEFAULT '';

-- ✅ MINIMAL LOCKING (PostgreSQL 11+)
ALTER TABLE users ADD COLUMN bio TEXT;  -- Fast, nullable
-- Then backfill with UPDATE in batches
-- Then: ALTER TABLE users ALTER COLUMN bio SET NOT NULL;

Pattern: Connection Management

sql

-- Check active connections
SELECT 
    datname,
    usename,
    application_name,
    state,
    query_start,
    query
FROM pg_stat_activity
WHERE datname = 'your_db';

-- Kill long-running query
SELECT pg_cancel_backend(pid);  -- Graceful
SELECT pg_terminate_backend(pid);  -- Force

-- Check for locks
SELECT 
    l.locktype,
    l.relation::regclass,
    l.mode,
    l.granted,
    a.usename,
    a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

Pattern: Data Type Choices

Use Case	Type	Notes
Primary key	`UUID`	Use uuid7 for ordering if possible
Foreign key	Match parent type
Timestamps	`TIMESTAMPTZ`	Always with timezone
Money	`NUMERIC(12,2)`	Never FLOAT
JSON data	`JSONB`	Not JSON (JSONB is faster)
Short strings	`VARCHAR(n)`	With reasonable limit
Long text	`TEXT`	No length limit
Boolean	`BOOLEAN`	Not integer
Enum-like	`VARCHAR` or native ENUM	VARCHAR is more flexible

Migration Review Checklist (PostgreSQL-Specific)

Useful Diagnostic Queries

sql

-- Table sizes
SELECT 
    relname as table,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- Index usage
SELECT 
    indexrelname as index,
    idx_scan as times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;  -- Unused indexes at top

-- Slow queries (if pg_stat_statements enabled)
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;