SQL Guide
Applies to: PostgreSQL 15+, MySQL 8+, Database Migrations, Query Optimization
Core Principles
- •Parameterized Queries Always: Never concatenate user input into SQL strings -- use bind parameters (
$1,?,:name) without exception - •Explicit Over Implicit: Name all constraints, specify column lists in INSERT, avoid
SELECT *in production code - •Migrations Are Immutable: Once applied to a shared environment, never modify a migration -- create a new one
- •Indexes Are Not Free: Every index speeds reads but slows writes; justify each index with a query plan
- •Transactions Are Boundaries: Keep transactions short, choose the correct isolation level, and always handle rollback
Guardrails
Naming Conventions
- •Tables:
snake_case, plural (users,order_items,audit_logs) - •Columns:
snake_case, singular (email,created_at,is_active) - •Primary keys:
id(integer or UUID depending on project convention) - •Foreign keys:
<singular_table>_id(user_id,order_id) - •Indexes:
idx_<table>_<columns>(idx_users_email,idx_orders_user_id_created_at) - •Unique constraints:
uq_<table>_<columns>(uq_users_email) - •Check constraints:
ck_<table>_<description>(ck_orders_positive_total) - •Boolean columns:
is_orhas_prefix (is_active,has_verified_email) - •Timestamps:
created_at,updated_at,deleted_at(alwaysTIMESTAMPTZ)
Query Safety
- •ALWAYS use parameterized queries -- no string interpolation of user input
- •ALWAYS specify column lists in
INSERTstatements - •NEVER use
SELECT *in application code (acceptable in ad-hoc queries only) - •ALWAYS add
LIMITto queries that return lists (prevent unbounded result sets) - •NEVER use
TRUNCATEorDROPin application code without explicit safeguards - •ALWAYS use
EXISTSinstead ofCOUNT(*) > 0for existence checks - •ALWAYS qualify column names with table aliases in JOINs
sql
-- GOOD: parameterized, explicit columns, bounded SELECT u.id, u.email, u.created_at FROM users u WHERE u.email = $1 AND u.is_active = true LIMIT 1; -- BAD: string concatenation, SELECT *, no LIMIT SELECT * FROM users WHERE email = '" + userInput + "';
Indexing
- •Create indexes to support
WHERE,JOIN, andORDER BYclauses - •Composite index column order: equality columns first, then range, then sort
- •Use
UNIQUEindexes to enforce business rules at the database level - •Prefer partial indexes when filtering on a known subset (
WHERE is_active = true) - •Use
CONCURRENTLYfor index creation on live tables (PostgreSQL) - •Review index usage periodically -- drop unused indexes
sql
-- Composite index: equality (status) first, then range (created_at) CREATE INDEX CONCURRENTLY idx_orders_status_created ON orders (status, created_at DESC); -- Partial index: only index active users CREATE INDEX idx_users_email_active ON users (email) WHERE is_active = true;
Transactions
- •Keep transactions as short as possible (no I/O or external calls inside)
- •Use
READ COMMITTEDfor most operations (PostgreSQL default) - •Use
REPEATABLE READwhen a transaction reads the same data multiple times - •Use
SERIALIZABLEfor financial or inventory operations requiring strict consistency - •Always explicitly
COMMITorROLLBACK-- never leave transactions hanging - •Use
SAVEPOINTfor partial rollback within complex transactions
sql
BEGIN; SAVEPOINT before_update; UPDATE accounts SET balance = balance - 100.00 WHERE id = $1; UPDATE accounts SET balance = balance + 100.00 WHERE id = $2; -- If second update fails, roll back to savepoint -- ROLLBACK TO SAVEPOINT before_update; COMMIT;
Migrations
- •Every migration MUST have both
upanddownfunctions - •Migrations MUST be idempotent (
IF NOT EXISTS,IF EXISTSguards) - •Never rename or drop columns in a single step -- use a multi-step process
- •Add columns as
NULLfirst, backfill, then addNOT NULLconstraint - •Never modify a migration that has been applied to a shared environment
- •Migration filenames:
YYYYMMDDHHMMSS_description.sqlor sequential numbering - •Test both
upanddownin development before committing
sql
-- UP: 20250115120000_add_users_phone.sql ALTER TABLE users ADD COLUMN IF NOT EXISTS phone VARCHAR(20); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone ON users (phone) WHERE phone IS NOT NULL; -- DOWN: 20250115120000_add_users_phone.sql DROP INDEX CONCURRENTLY IF EXISTS idx_users_phone; ALTER TABLE users DROP COLUMN IF EXISTS phone;
Schema Design
Standard Table Template
sql
CREATE TABLE IF NOT EXISTS orders (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CONSTRAINT ck_orders_valid_status
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
total_cents INTEGER NOT NULL
CONSTRAINT ck_orders_positive_total CHECK (total_cents >= 0),
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Auto-update updated_at (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_orders_updated_at
BEFORE UPDATE ON orders
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
Design Rules
- •Use
BIGINT GENERATED ALWAYS AS IDENTITYfor auto-increment PKs (PostgreSQL 10+) - •Use
UUIDPKs when IDs are exposed externally or distributed systems require it - •Use
TIMESTAMPTZ(notTIMESTAMP) for all time columns -- always store UTC - •Add
NOT NULLconstraints by default; allowNULLonly when semantically meaningful - •Use
CHECKconstraints to enforce domain rules at the database level - •Use
ENUMtypes sparingly -- preferVARCHARwithCHECKfor easier migration - •Add
ON DELETE CASCADEorON DELETE SET NULLexplicitly to every foreign key - •Use soft deletes (
deleted_at TIMESTAMPTZ) when audit trail is required
Key Patterns
Common Table Expressions (CTEs)
Prefer CTEs over subqueries for readability and maintainability.
sql
-- Readable: each step has a name
WITH active_orders AS (
SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent
FROM orders
WHERE status != 'cancelled'
AND created_at >= now() - INTERVAL '30 days'
GROUP BY user_id
),
high_value_users AS (
SELECT user_id
FROM active_orders
WHERE total_spent > 50000 -- over $500
)
SELECT u.id, u.email, ao.order_count, ao.total_spent
FROM users u
JOIN active_orders ao ON ao.user_id = u.id
WHERE u.id IN (SELECT user_id FROM high_value_users)
ORDER BY ao.total_spent DESC;
Window Functions
Use window functions for ranking, running totals, and row comparisons without self-joins.
sql
-- Rank users by spending within each region
SELECT
u.id,
u.region,
SUM(o.total_cents) AS total_spent,
RANK() OVER (PARTITION BY u.region ORDER BY SUM(o.total_cents) DESC) AS rank
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.region;
-- Running total of daily revenue
SELECT
date_trunc('day', created_at) AS day,
SUM(total_cents) AS daily_revenue,
SUM(SUM(total_cents)) OVER (ORDER BY date_trunc('day', created_at)) AS running_total
FROM orders
WHERE status = 'delivered'
GROUP BY date_trunc('day', created_at)
ORDER BY day;
UPSERT (INSERT ... ON CONFLICT)
sql
-- PostgreSQL: insert or update on conflict
INSERT INTO user_preferences (user_id, theme, language)
VALUES ($1, $2, $3)
ON CONFLICT (user_id)
DO UPDATE SET
theme = EXCLUDED.theme,
language = EXCLUDED.language,
updated_at = now();
Batch Operations
sql
-- Batch insert with unnest (PostgreSQL)
INSERT INTO tags (name, category)
SELECT unnest($1::text[]), unnest($2::text[])
ON CONFLICT (name) DO NOTHING;
-- Batch update with VALUES list
UPDATE products AS p
SET price_cents = v.new_price
FROM (VALUES
(1, 2999),
(2, 4999),
(3, 1499)
) AS v(id, new_price)
WHERE p.id = v.id;
Parameterized Queries (Application Code)
python
# Python (psycopg2/asyncpg) -- ALWAYS use parameterized queries
cursor.execute(
"SELECT id, email FROM users WHERE email = %s AND is_active = %s",
(email, True),
)
# Node.js (pg)
const result = await pool.query(
'SELECT id, email FROM users WHERE email = $1 AND is_active = $2',
[email, true]
);
# Go (database/sql)
row := db.QueryRowContext(ctx,
"SELECT id, email FROM users WHERE email = $1 AND is_active = $2",
email, true,
)
Performance
EXPLAIN ANALYZE
Always use EXPLAIN ANALYZE to validate query plans before deploying.
sql
-- Check execution plan and actual timing EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.id, u.email, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at >= '2025-01-01' GROUP BY u.id, u.email ORDER BY order_count DESC LIMIT 20;
What to look for:
- •
Seq Scanon large tables -- usually needs an index - •
Nested Loopwith large outer sets -- considerHash Joinvia index or rewrite - •
Sortoperations with high cost -- add index matchingORDER BY - •
Rows Removed by Filtermuch larger thanActual Rows-- index the filter column - •
Buffers: shared hitvsshared read-- high reads indicate cold cache
Index Strategy
| Access Pattern | Index Type | Example |
|---|---|---|
| Equality lookup | B-tree (default) | WHERE email = $1 |
| Range scan | B-tree | WHERE created_at > $1 |
| Full-text search | GIN + tsvector | WHERE search_vector @@ to_tsquery($1) |
| JSONB field access | GIN | WHERE metadata @> '{"key": "val"}' |
| Array containment | GIN | WHERE tags @> ARRAY['sql'] |
| Geometric/spatial | GiST | WHERE location <-> point($1,$2) < $3 |
Pattern matching (LIKE 'abc%') | B-tree with text_pattern_ops | WHERE name LIKE 'abc%' |
N+1 Prevention
sql
-- BAD: N+1 (one query per user to fetch orders) -- Application loops: for each user, SELECT orders WHERE user_id = ? -- GOOD: single query with JOIN SELECT u.id, u.email, o.id AS order_id, o.total_cents FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.is_active = true ORDER BY u.id, o.created_at DESC; -- GOOD: batch with ANY (when you have a list of IDs) SELECT id, email, created_at FROM users WHERE id = ANY($1::bigint[]);
Pagination
sql
-- Cursor-based pagination (preferred for large datasets) SELECT id, email, created_at FROM users WHERE created_at < $1 -- cursor: last seen created_at AND is_active = true ORDER BY created_at DESC LIMIT 25; -- Offset-based pagination (simpler but slower for deep pages) SELECT id, email, created_at FROM users WHERE is_active = true ORDER BY created_at DESC LIMIT 25 OFFSET $1;
Tooling
Essential Commands
bash
# PostgreSQL CLI psql -h localhost -U myuser -d mydb # Connect psql -f migration.sql mydb # Run migration file pg_dump -Fc mydb > backup.dump # Backup (custom format) pg_restore -d mydb backup.dump # Restore # pgcli (enhanced CLI with autocomplete) pgcli -h localhost -U myuser -d mydb # Schema inspection \dt # List tables \d+ table_name # Describe table with details \di # List indexes \df # List functions
Migration Tools
bash
# golang-migrate migrate create -ext sql -dir db/migrations -seq add_users_phone migrate -path db/migrations -database "$DATABASE_URL" up migrate -path db/migrations -database "$DATABASE_URL" down 1 # Alembic (Python/SQLAlchemy) alembic revision -m "add users phone" alembic upgrade head alembic downgrade -1 # Prisma (Node.js) npx prisma migrate dev --name add_users_phone npx prisma migrate deploy # dbmate dbmate new add_users_phone dbmate up dbmate rollback
Useful Diagnostic Queries (PostgreSQL)
sql
-- Find slow queries (requires pg_stat_statements extension) SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10; -- Find unused indexes SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexname NOT LIKE '%pkey%' ORDER BY pg_relation_size(indexrelid) DESC; -- Table sizes SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC; -- Active connections and queries SELECT pid, state, query, query_start, now() - query_start AS duration FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;
References
For detailed patterns and examples, see:
- •references/patterns.md -- Window functions, CTE patterns, indexing strategies, advanced query techniques
External References
- •PostgreSQL Documentation
- •Use The Index, Luke -- SQL indexing and tuning
- •Modern SQL -- Modern SQL features across databases
- •pgMustard EXPLAIN Guide -- Reading query plans
- •SQL Style Guide (Holywell) -- Formatting conventions
- •Postgres Wiki: Don't Do This -- Common PostgreSQL anti-patterns