SQL Query Optimization Patterns
Comprehensive guide to optimizing SQL queries for performance, including EXPLAIN plan analysis, index design strategies, query rewriting patterns, and N+1 query detection. Works across PostgreSQL, MySQL, and SQLite.
Quick Reference
When to use this skill:
- •Slow database queries (>100ms for simple queries, >1s for complex)
- •High database CPU usage
- •Analyzing EXPLAIN plans
- •Designing database indexes
- •Debugging N+1 query problems
- •Optimizing JOIN operations
- •Reducing table scans
Common triggers:
- •"This query is too slow"
- •"How do I optimize this SQL"
- •"What indexes should I add"
- •"Explain this EXPLAIN plan"
- •"Fix N+1 queries"
- •"Database CPU at 100%"
Typical improvements:
- •3 seconds → 50ms (60x faster)
- •Full table scan → Index scan
- •1000 queries → 2 queries (N+1 elimination)
Part 1: Understanding EXPLAIN Plans
Reading PostgreSQL EXPLAIN
EXPLAIN ANALYZE SELECT u.name, p.title FROM users u JOIN posts p ON u.id = p.user_id WHERE u.created_at > '2024-01-01' ORDER BY p.created_at DESC LIMIT 10;
Key Metrics to Watch:
- •Seq Scan (bad): Full table scan, reads every row
- •Index Scan (good): Uses index, reads only needed rows
- •Cost: Estimated computational cost (lower is better)
- •Actual time: Real execution time in milliseconds
- •Rows: Number of rows processed at each step
Bad EXPLAIN Example
Seq Scan on users u (cost=0.00..1234.00 rows=1000 width=50)
(actual time=0.123..45.678 rows=950 loops=1)
Filter: (created_at > '2024-01-01'::date)
Rows Removed by Filter: 50000
Planning Time: 0.234 ms
Execution Time: 3456.789 ms
Problems:
- •Seq Scan: Reading entire table (50,950 rows)
- •Rows Removed by Filter: Filtering after reading (wasteful)
- •Execution Time: 3.5 seconds (way too slow)
Good EXPLAIN Example (After Index)
Index Scan using users_created_at_idx on users u (cost=0.29..123.45 rows=950 width=50) (actual time=0.012..3.456 rows=950 loops=1) Index Cond: (created_at > '2024-01-01'::date) Planning Time: 0.123 ms Execution Time: 4.567 ms
Improvements:
- •Index Scan: Using index (only reads needed rows)
- •Index Cond: Filtering during index scan (efficient)
- •Execution Time: 4.5ms (750x faster!)
Part 2: Index Design Strategies
When to Add an Index
✅ Add index when:
- •Column frequently in WHERE clauses
- •Column frequently in JOIN conditions
- •Column frequently in ORDER BY
- •Query does Seq Scan on large table (>10K rows)
- •Query execution time >100ms
❌ Don't add index when:
- •Table has <1000 rows (Seq Scan is fast enough)
- •Column rarely queried
- •Column has very low cardinality (e.g., boolean with 50/50 distribution)
- •Table has heavy writes (indexes slow down INSERTs/UPDATEs)
Single-Column Indexes
-- Create index on frequently queried column CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_posts_user_id ON posts(user_id); CREATE INDEX idx_orders_created_at ON orders(created_at);
Query that benefits:
-- Before index: Seq Scan (3000ms) -- After index: Index Scan (5ms) SELECT * FROM users WHERE email = 'user@example.com';
Composite Indexes (Multi-Column)
Order matters! Put most selective column first.
-- Good: Specific to general CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at); -- Query that uses all columns SELECT * FROM orders WHERE user_id = 123 AND status = 'pending' ORDER BY created_at DESC;
Index usage rules:
-- ✅ Uses index (matches from left) WHERE user_id = 123 WHERE user_id = 123 AND status = 'pending' WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01' -- ❌ Doesn't use index (skips first column) WHERE status = 'pending' WHERE created_at > '2024-01-01'
Partial Indexes (Filtered)
Index only subset of rows (smaller, faster):
-- Only index active users CREATE INDEX idx_users_active_email ON users(email) WHERE status = 'active'; -- Query that uses partial index SELECT * FROM users WHERE email = 'user@example.com' AND status = 'active'; -- Must include filter condition!
Benefits:
- •Smaller index (faster scans, less storage)
- •Only indexes rows you actually query
- •Great for status fields with skewed distribution
Covering Indexes (INCLUDE)
Include columns in index to avoid table access:
-- Postgres: INCLUDE clause CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (name, created_at); -- Query doesn't need to access table (all data in index) SELECT name, created_at FROM users WHERE email = 'user@example.com';
Benefit: Index-Only Scan (even faster than Index Scan)
Part 3: Query Rewriting Patterns
Problem 1: SELECT * (Fetching Unnecessary Data)
❌ Bad (fetches all columns):
SELECT * FROM users WHERE id = 123;
✅ Good (fetches only needed columns):
SELECT id, name, email FROM users WHERE id = 123;
Impact:
- •Smaller result set
- •Less memory usage
- •Faster network transfer
- •Can use covering indexes
Problem 2: N+1 Queries
❌ Bad (1 query for posts + N queries for users):
# 1 query
posts = db.execute("SELECT * FROM posts LIMIT 10")
# N queries (10 separate queries!)
for post in posts:
user = db.execute("SELECT * FROM users WHERE id = ?", post.user_id)
print(f"{user.name}: {post.title}")
✅ Good (2 queries total):
# 1 query for posts
posts = db.execute("SELECT * FROM posts LIMIT 10")
# 1 query for all users
user_ids = [p.user_id for p in posts]
users = db.execute("SELECT * FROM users WHERE id IN (?)", user_ids)
users_by_id = {u.id: u for u in users}
# No additional queries
for post in posts:
user = users_by_id[post.user_id]
print(f"{user.name}: {post.title}")
Even Better (1 query with JOIN):
SELECT u.name, p.title FROM posts p JOIN users u ON p.user_id = u.id LIMIT 10;
Problem 3: Implicit Type Conversion
❌ Bad (index not used due to implicit conversion):
-- user_id is INTEGER, but '123' is string SELECT * FROM posts WHERE user_id = '123'; -- Seq Scan!
✅ Good (explicit type, index used):
SELECT * FROM posts WHERE user_id = 123; -- Index Scan
Problem 4: Function Calls in WHERE Clause
❌ Bad (can't use index):
-- Function prevents index usage SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
✅ Good (use functional index):
-- Create index on function result CREATE INDEX idx_users_lower_email ON users(LOWER(email)); -- Now query can use index SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
✅ Even Better (store lowercase, regular index):
-- Store email in lowercase
CREATE INDEX idx_users_email ON users(email);
-- Query with lowercase value
SELECT * FROM users WHERE email = LOWER('user@example.com');
Problem 5: OR Conditions
❌ Bad (often does Seq Scan):
SELECT * FROM users WHERE email = 'user@example.com' OR username = 'johndoe';
✅ Good (use UNION, allows index usage):
SELECT * FROM users WHERE email = 'user@example.com' UNION SELECT * FROM users WHERE username = 'johndoe';
Problem 6: NOT IN with Subquery
❌ Bad (very slow, especially with large subquery):
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM blocked_users);
✅ Good (use LEFT JOIN with NULL check):
SELECT u.* FROM users u LEFT JOIN blocked_users b ON u.id = b.user_id WHERE b.user_id IS NULL;
Part 4: JOIN Optimization
JOIN Order Matters
Database optimizes join order, but you can help:
Small table first, large table second:
-- Good: Join small table (categories: 10 rows) to large table (products: 10M rows) SELECT p.* FROM categories c JOIN products p ON c.id = p.category_id WHERE c.name = 'Electronics';
Avoid Cartesian Products
❌ Bad (produces rows1 × rows2 × rows3):
SELECT * FROM users, posts, comments; -- 100 × 10000 × 50000 = 50 billion rows!
✅ Good (proper JOIN conditions):
SELECT * FROM users u JOIN posts p ON u.id = p.user_id JOIN comments c ON p.id = c.post_id;
EXISTS vs IN for Subqueries
Scenario: Find users who have at least one post
❌ Slower (IN loads all post IDs):
SELECT * FROM users WHERE id IN (SELECT user_id FROM posts);
✅ Faster (EXISTS stops at first match):
SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);
JOIN vs Subquery
Use JOIN when you need data from both tables:
SELECT u.name, p.title FROM users u JOIN posts p ON u.id = p.user_id;
Use Subquery when you only need filtering:
SELECT * FROM users WHERE id IN (SELECT DISTINCT user_id FROM posts);
Part 5: Aggregation Optimization
COUNT(*) Optimization
Problem: COUNT(*) on large tables is slow
❌ Slow (counts all rows):
SELECT COUNT(*) FROM orders; -- Full table scan on 10M rows
✅ Faster (approximate count from statistics):
-- PostgreSQL: Use statistics (fast but approximate) SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'orders';
✅ Also Good (exact count with index):
-- If you have an index, COUNT can use index-only scan CREATE INDEX idx_orders_id ON orders(id); SELECT COUNT(*) FROM orders; -- Index-only scan
GROUP BY Optimization
❌ Bad (groups after fetching all data):
SELECT user_id, COUNT(*) FROM posts GROUP BY user_id;
✅ Good (index helps grouping):
-- Create index on grouped column CREATE INDEX idx_posts_user_id ON posts(user_id); SELECT user_id, COUNT(*) FROM posts GROUP BY user_id; -- Can use index for grouping
HAVING vs WHERE
❌ Bad (filters after grouping):
SELECT user_id, COUNT(*) as post_count FROM posts GROUP BY user_id HAVING user_id > 1000; -- Filters after grouping all users
✅ Good (filters before grouping):
SELECT user_id, COUNT(*) as post_count FROM posts WHERE user_id > 1000 -- Filters before grouping (can use index) GROUP BY user_id;
Part 6: Pagination Optimization
OFFSET is Slow for Large Offsets
❌ Bad (OFFSET scans all skipped rows):
-- Page 1000 (skips 50,000 rows, then returns 50) SELECT * FROM posts ORDER BY created_at DESC LIMIT 50 OFFSET 50000; -- Scans 50,050 rows!
✅ Good (keyset pagination with WHERE):
-- Save last seen ID from previous page SELECT * FROM posts WHERE created_at < '2024-01-15 10:30:00' -- Last seen timestamp ORDER BY created_at DESC LIMIT 50; -- Only fetches 50 rows
Implementation:
# Page 1
results = db.execute("""
SELECT id, created_at, title
FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 50
""")
last_created_at = results[-1].created_at
last_id = results[-1].id
# Page 2 (using last seen values)
results = db.execute("""
SELECT id, created_at, title
FROM posts
WHERE (created_at, id) < (%s, %s)
ORDER BY created_at DESC, id DESC
LIMIT 50
""", last_created_at, last_id)
Benefits:
- •Consistent performance for any page number
- •No OFFSET overhead
- •Works with millions of rows
Part 7: Database-Specific Patterns
PostgreSQL-Specific
1. EXPLAIN ANALYZE (shows actual execution):
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = 'user@example.com';
2. pg_stat_statements (find slow queries):
-- Enable extension
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries
SELECT
calls,
mean_exec_time,
total_exec_time,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
3. VACUUM and ANALYZE:
-- Update statistics after bulk changes VACUUM ANALYZE table_name;
MySQL-Specific
1. EXPLAIN FORMAT=JSON (detailed output):
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE email = 'user@example.com';
2. SHOW PROFILE (detailed timing):
SET profiling = 1; SELECT * FROM users WHERE email = 'user@example.com'; SHOW PROFILE FOR QUERY 1;
3. Query Cache (deprecated in MySQL 8.0):
-- Old MySQL versions only SHOW VARIABLES LIKE 'query_cache%';
SQLite-Specific
1. EXPLAIN QUERY PLAN:
EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = 'user@example.com';
2. ANALYZE (update statistics):
ANALYZE table_name;
3. Indexes are critical (SQLite has no query optimizer):
-- SQLite relies heavily on indexes CREATE INDEX idx_users_email ON users(email);
Part 8: Monitoring and Diagnosis
Find Missing Indexes (PostgreSQL)
-- Queries doing sequential scans on large tables
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 25;
Find Unused Indexes (PostgreSQL)
-- Indexes that are never used
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey' -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;
Check Index Bloat
-- Postgres: Check for bloated indexes
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated indexes
REINDEX INDEX index_name;
Part 9: Common Patterns by Use Case
Pattern 1: Recent Records with Pagination
-- Create composite index CREATE INDEX idx_posts_created_user ON posts(created_at DESC, user_id); -- Query uses index efficiently SELECT * FROM posts WHERE user_id = 123 ORDER BY created_at DESC LIMIT 20;
Pattern 2: Search Across Multiple Columns
-- Create GIN index for full-text search (Postgres)
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || content));
-- Fast full-text search
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@
to_tsquery('english', 'database & optimization');
Pattern 3: Range Queries
-- Create index on range column CREATE INDEX idx_orders_created_at ON orders(created_at); -- Query uses index SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31' ORDER BY created_at;
Pattern 4: Distinct Values with Filter
-- Create composite index CREATE INDEX idx_posts_category_user ON posts(category_id, user_id); -- Query uses index SELECT DISTINCT category_id FROM posts WHERE user_id = 123;
Part 10: Quick Optimization Checklist
Diagnosis:
- • Run EXPLAIN ANALYZE on slow queries
- • Check for Seq Scan on tables >10K rows
- • Look for missing indexes
- • Identify N+1 query patterns
- • Check query execution time (aim for <100ms)
Index Strategy:
- • Index columns in WHERE clauses
- • Index columns in JOIN conditions
- • Index columns in ORDER BY
- • Use composite indexes for multi-column queries
- • Consider partial indexes for filtered queries
- • Remove unused indexes
Query Rewriting:
- • Select only needed columns (not SELECT *)
- • Use JOIN instead of N+1 queries
- • Use proper types (avoid implicit conversion)
- • Avoid functions in WHERE (use functional indexes)
- • Use UNION instead of OR for multiple conditions
- • Use EXISTS instead of IN for large subqueries
Pagination:
- • Use keyset pagination (not OFFSET) for large datasets
- • Include stable sort column (id) in pagination
- • Index pagination columns
Monitoring:
- • Enable query logging for slow queries
- • Monitor index usage
- • Check for index bloat
- • Regularly VACUUM ANALYZE (Postgres)
Resources
PostgreSQL:
- •EXPLAIN docs: https://www.postgresql.org/docs/current/sql-explain.html
- •Index types: https://www.postgresql.org/docs/current/indexes-types.html
- •Performance tips: https://wiki.postgresql.org/wiki/Performance_Optimization
MySQL:
- •EXPLAIN output: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
- •Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
Tools:
- •pgAdmin (Postgres GUI)
- •MySQL Workbench
- •DBeaver (multi-database)
- •explain.depesz.com (Postgres EXPLAIN visualizer)