PostgreSQL Patterns
Problem Statement
Alembic generates migrations but doesn't understand PostgreSQL performance implications. This skill covers reviewing migrations for PostgreSQL-specific issues and writing efficient queries.
Pattern: Index Review
When to Add Indexes
sql
-- ✅ ADD INDEX: Foreign keys (almost always) CREATE INDEX ix_assessments_user_id ON assessments (user_id); -- ✅ ADD INDEX: Frequently filtered columns CREATE INDEX ix_assessments_status ON assessments (status); -- ✅ ADD INDEX: Columns in WHERE + ORDER BY together CREATE INDEX ix_assessments_user_status ON assessments (user_id, status); -- ✅ ADD INDEX: Columns used in JOIN conditions CREATE INDEX ix_answers_question_id ON answers (question_id);
When NOT to Add Indexes
sql
-- ❌ SKIP: Small tables (< 1000 rows) -- ❌ SKIP: Write-heavy tables with rare reads -- ❌ SKIP: Low cardinality columns alone (boolean, status with 3 values) -- ❌ SKIP: Columns rarely used in WHERE/JOIN/ORDER BY
Index Column Order Matters
sql
-- For query: WHERE user_id = ? AND status = ? ORDER BY created_at -- ✅ CORRECT: Most selective first, ORDER BY column last CREATE INDEX ix_assessments_user_status_created ON assessments (user_id, status, created_at); -- ❌ WRONG: Order doesn't match query pattern CREATE INDEX ix_assessments_created_status_user ON assessments (created_at, status, user_id);
Pattern: Partial Indexes
Problem: Full index on column where you only query subset of values.
sql
-- Full index (indexes all rows) CREATE INDEX ix_assessments_status ON assessments (status); -- ✅ BETTER: Partial index (only active assessments) CREATE INDEX ix_assessments_active ON assessments (user_id, created_at) WHERE status = 'active'; -- Use case: "Get user's active assessments sorted by date" -- The partial index is smaller and faster -- Common patterns: -- WHERE deleted_at IS NULL (soft deletes) -- WHERE status != 'archived' -- WHERE is_active = true
In Alembic:
python
op.execute("""
CREATE INDEX ix_assessments_active
ON assessments (user_id, created_at)
WHERE status = 'active'
""")
Pattern: JSONB Indexes
sql
-- GIN index for @> (contains) queries
CREATE INDEX ix_settings_data ON user_settings USING GIN (data);
-- Query: Find users with specific setting
SELECT * FROM user_settings WHERE data @> '{"theme": "dark"}';
-- Expression index for specific JSON path
CREATE INDEX ix_settings_theme ON user_settings ((data->>'theme'));
-- Query: Find by specific key
SELECT * FROM user_settings WHERE data->>'theme' = 'dark';
Pattern: Concurrent Index Creation
Problem: CREATE INDEX locks the table. On large tables, this blocks writes.
sql
-- ❌ BLOCKS WRITES during creation CREATE INDEX ix_events_user_id ON events (user_id); -- ✅ DOESN'T BLOCK (but slower to create) CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id);
In Alembic:
python
# Must disable transaction for CONCURRENTLY
def upgrade():
op.execute("COMMIT") # End current transaction
op.execute(
"CREATE INDEX CONCURRENTLY ix_events_user_id ON events (user_id)"
)
Pattern: Query Performance Analysis
sql
-- EXPLAIN ANALYZE shows actual execution EXPLAIN ANALYZE SELECT * FROM assessments WHERE user_id = 'abc-123' AND status = 'active'; -- What to look for: -- ✅ "Index Scan" or "Index Only Scan" - good -- ❌ "Seq Scan" on large table - needs index -- ❌ "Sort" with high cost - consider index on ORDER BY column -- ❌ "Nested Loop" with many rows - might need different join strategy
Key metrics:
- •
cost: Estimated units (lower is better) - •
rows: Estimated row count - •
actual time: Real milliseconds - •
loops: How many times executed
Pattern: UUID Performance
sql
-- UUIDs as primary keys have tradeoffs -- ❌ Random UUIDs (uuid4) cause index fragmentation -- ✅ Time-ordered UUIDs (uuid7) maintain insertion order -- If using uuid4, consider: -- 1. BRIN index for time-ordered queries (if you have created_at) -- 2. Covering indexes to avoid heap fetches -- 3. Accept some fragmentation (usually fine under 10M rows)
Pattern: Constraint Review
sql
-- ✅ GOOD: Named constraints (can be dropped/modified) ALTER TABLE assessments ADD CONSTRAINT fk_assessments_user_id FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE; -- ❌ BAD: Unnamed constraints (auto-generated names are ugly) ALTER TABLE assessments ADD FOREIGN KEY (user_id) REFERENCES users(id); -- ✅ GOOD: CHECK constraints for data integrity ALTER TABLE assessments ADD CONSTRAINT chk_assessments_rating CHECK (rating >= 1.0 AND rating <= 5.5); -- ✅ GOOD: Unique constraints with meaningful names ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);
Pattern: Bulk Operations
sql
-- ❌ SLOW: Row-by-row updates
UPDATE users SET role = 'member' WHERE id = 'id1';
UPDATE users SET role = 'member' WHERE id = 'id2';
-- ... thousands more
-- ✅ FAST: Batch update
UPDATE users SET role = 'member'
WHERE id IN ('id1', 'id2', 'id3', ...);
-- ✅ FAST: Update with subquery
UPDATE users SET role = 'member'
WHERE id IN (
SELECT user_id FROM legacy_members WHERE migrated = false
);
-- For very large updates, batch to avoid long locks:
UPDATE users SET role = 'member'
WHERE id IN (
SELECT id FROM users
WHERE role IS NULL
LIMIT 10000
);
-- Run in loop until no rows affected
Pattern: Table Locking Awareness
Know what locks what:
| Operation | Lock Type | Blocks |
|---|---|---|
| SELECT | AccessShare | Nothing |
| INSERT/UPDATE/DELETE | RowExclusive | Nothing (row-level) |
| CREATE INDEX | ShareLock | INSERT/UPDATE/DELETE |
| CREATE INDEX CONCURRENTLY | ShareUpdateExclusive | Other schema changes |
| ALTER TABLE (most) | AccessExclusive | Everything |
| DROP TABLE | AccessExclusive | Everything |
Danger zone:
sql
-- ❌ LOCKS ENTIRE TABLE ALTER TABLE users ADD COLUMN bio TEXT NOT NULL DEFAULT ''; -- ✅ MINIMAL LOCKING (PostgreSQL 11+) ALTER TABLE users ADD COLUMN bio TEXT; -- Fast, nullable -- Then backfill with UPDATE in batches -- Then: ALTER TABLE users ALTER COLUMN bio SET NOT NULL;
Pattern: Connection Management
sql
-- Check active connections
SELECT
datname,
usename,
application_name,
state,
query_start,
query
FROM pg_stat_activity
WHERE datname = 'your_db';
-- Kill long-running query
SELECT pg_cancel_backend(pid); -- Graceful
SELECT pg_terminate_backend(pid); -- Force
-- Check for locks
SELECT
l.locktype,
l.relation::regclass,
l.mode,
l.granted,
a.usename,
a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;
Pattern: Data Type Choices
| Use Case | Type | Notes |
|---|---|---|
| Primary key | UUID | Use uuid7 for ordering if possible |
| Foreign key | Match parent type | |
| Timestamps | TIMESTAMPTZ | Always with timezone |
| Money | NUMERIC(12,2) | Never FLOAT |
| JSON data | JSONB | Not JSON (JSONB is faster) |
| Short strings | VARCHAR(n) | With reasonable limit |
| Long text | TEXT | No length limit |
| Boolean | BOOLEAN | Not integer |
| Enum-like | VARCHAR or native ENUM | VARCHAR is more flexible |
Migration Review Checklist (PostgreSQL-Specific)
- • Large table indexes use CONCURRENTLY
- • Foreign keys have ON DELETE behavior specified
- • Constraints have explicit names
- • Non-nullable columns on existing tables use 3-step process
- • Indexes match actual query patterns
- • Partial indexes considered for filtered queries
- • No unnecessary indexes on small tables
- • JSONB columns have appropriate GIN indexes if queried
- • UUIDs: aware of fragmentation implications
- • TIMESTAMPTZ used for all timestamps (not TIMESTAMP)
Useful Diagnostic Queries
sql
-- Table sizes
SELECT
relname as table,
pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
-- Index usage
SELECT
indexrelname as index,
idx_scan as times_used,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC; -- Unused indexes at top
-- Slow queries (if pg_stat_statements enabled)
SELECT
query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;