SQL Query Optimizer
Analyze and optimize SQL queries for better performance through query rewriting, index suggestions, and execution plan analysis.
Trigger Conditions
Activate this skill when the user:
- •Says "this query is slow", "optimize this query"
- •Asks about query performance or execution time
- •Wants to understand why a query is slow
- •Needs help with execution plans (EXPLAIN)
- •Asks about index usage or missing indexes
Optimization Workflow
1. ANALYZE -> Understand query structure and intent 2. IDENTIFY -> Find performance bottlenecks 3. MEASURE -> Review execution plan if available 4. REWRITE -> Apply optimization techniques 5. VALIDATE -> Ensure results remain correct 6. RECOMMEND -> Suggest indexes or schema changes
Pre-Flight Checklist
Before optimizing:
- [ ] Understand the query's purpose - [ ] Know the approximate data volumes - [ ] Identify current indexes - [ ] Have execution plan if possible - [ ] Verify correctness requirements
Reading Execution Plans
PostgreSQL EXPLAIN
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
Key metrics:
- •actual time: Real execution time in ms
- •rows: Estimated vs actual row counts
- •loops: Number of iterations
- •buffers: Disk I/O (shared hit = cached, read = disk)
MySQL EXPLAIN
EXPLAIN SELECT ...; EXPLAIN ANALYZE SELECT ...; -- MySQL 8.0.18+
Key columns:
- •type: Access method (ALL = bad, ref/range/const = good)
- •possible_keys: Indexes that could be used
- •key: Index actually used
- •rows: Estimated rows to examine
- •Extra: Additional info (Using filesort, Using temporary = warnings)
SQL Server
SET STATISTICS IO ON; SET STATISTICS TIME ON; -- Run query SET SHOWPLAN_ALL ON;
Key metrics:
- •logical reads: Pages read from cache
- •physical reads: Pages read from disk
- •CPU time: Processing time
Common Performance Issues
Issue 1: Full Table Scan
Symptom: Seq Scan/TABLE ACCESS FULL on large table
Causes:
- •No index on WHERE/JOIN columns
- •Function applied to indexed column
- •Type mismatch in comparison
- •LIKE with leading wildcard
Solutions:
Bad (can't use index):
SELECT * FROM users WHERE LOWER(email) = 'john@example.com'; SELECT * FROM users WHERE YEAR(created_at) = 2024; SELECT * FROM orders WHERE customer_id::text = '12345';
Good (index-friendly):
SELECT * FROM users WHERE email = 'john@example.com'; -- case-insensitive collation SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'; SELECT * FROM orders WHERE customer_id = 12345;
Issue 2: Inefficient JOINs
Symptom: Nested loop join with high row counts
Solutions:
Ensure join columns are indexed:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Use appropriate join type:
SELECT * FROM large_table l JOIN small_table s ON l.key = s.key;
Issue 3: Subquery vs JOIN
Symptom: Correlated subquery executes once per row
Bad (executes subquery for each order):
SELECT o.*,
(SELECT name FROM customers WHERE id = o.customer_id) AS customer_name
FROM orders o;
Good (single join operation):
SELECT o.*, c.name AS customer_name FROM orders o JOIN customers c ON o.customer_id = c.id;
Issue 4: IN vs EXISTS
For checking existence (large subquery result):
Less efficient:
SELECT * FROM products WHERE category_id IN (SELECT id FROM categories WHERE active = true);
More efficient:
SELECT * FROM products p WHERE EXISTS (SELECT 1 FROM categories c WHERE c.id = p.category_id AND c.active = true);
For small, known value lists, IN is fine:
SELECT * FROM products WHERE category_id IN (1, 2, 3, 4, 5);
Issue 5: SELECT *
Problem: Retrieves unnecessary columns, can't use covering index
Bad:
SELECT * FROM orders WHERE customer_id = 123;
Good:
SELECT id, order_date, total FROM orders WHERE customer_id = 123;
With covering index:
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) INCLUDE (id, order_date, total);
Issue 6: ORDER BY Without Index
Symptom: "Using filesort" in execution plan
Bad (sorts in memory/disk):
SELECT * FROM products ORDER BY created_at DESC LIMIT 10;
Good (with index):
CREATE INDEX idx_products_created_at ON products(created_at DESC); SELECT * FROM products ORDER BY created_at DESC LIMIT 10;
Issue 7: Pagination with Large Offset
Problem: OFFSET scans and discards rows
Bad (scans 10,010 rows):
SELECT * FROM products ORDER BY id LIMIT 10 OFFSET 10000;
Good (keyset/cursor pagination):
SELECT * FROM products WHERE id > 10000 ORDER BY id LIMIT 10;
Issue 8: OR Conditions
Problem: OR often prevents index usage
Bad:
SELECT * FROM users WHERE email = 'a@b.com' OR phone = '1234567890';
Good (if both columns indexed):
SELECT * FROM users WHERE email = 'a@b.com' UNION SELECT * FROM users WHERE phone = '1234567890';
Issue 9: Redundant DISTINCT
Problem: DISTINCT requires sorting/hashing all rows
Bad (if relationship guarantees uniqueness):
SELECT DISTINCT c.id, c.name FROM customers c JOIN orders o ON c.id = o.customer_id;
Good:
SELECT c.id, c.name FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
Issue 10: Functions in WHERE
Problem: Functions prevent index usage
Bad:
SELECT * FROM events WHERE DATE(event_time) = '2024-01-15';
Good:
SELECT * FROM events WHERE event_time >= '2024-01-15 00:00:00' AND event_time < '2024-01-16 00:00:00';
Optimization Techniques
Technique 1: Query Rewriting
Split complex queries:
WITH filtered_orders AS (
SELECT * FROM orders WHERE status = 'completed' AND order_date >= '2024-01-01'
)
SELECT c.name, COUNT(*)
FROM filtered_orders fo
JOIN customers c ON fo.customer_id = c.id
GROUP BY c.name;
Materialize intermediate results:
CREATE TEMP TABLE recent_orders AS SELECT * FROM orders WHERE order_date >= '2024-01-01'; CREATE INDEX idx_temp_customer ON recent_orders(customer_id); SELECT c.name, COUNT(*) FROM recent_orders ro JOIN customers c ON ro.customer_id = c.id GROUP BY c.name;
Technique 2: Index Optimization
Composite index for multiple columns:
CREATE INDEX idx_orders_status_date ON orders(status, order_date);
Partial/filtered index:
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
Covering index:
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (order_date, total);
Technique 3: Batch Processing
Process in chunks instead of one large query:
DO $$
DECLARE
batch_size INT := 10000;
last_id INT := 0;
BEGIN
LOOP
UPDATE large_table
SET processed = true
WHERE id > last_id
AND id <= last_id + batch_size
AND processed = false;
EXIT WHEN NOT FOUND;
SELECT MAX(id) INTO last_id FROM large_table WHERE processed = true;
COMMIT;
END LOOP;
END $$;
Technique 4: Denormalization
Add redundant data to avoid joins:
ALTER TABLE orders ADD COLUMN customer_name VARCHAR(100); UPDATE orders o SET customer_name = (SELECT name FROM customers c WHERE c.id = o.customer_id);
Technique 5: Materialized Views
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
product_id,
DATE_TRUNC('month', order_date) AS month,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_revenue
FROM order_items oi
JOIN orders o ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY product_id, DATE_TRUNC('month', order_date);
CREATE INDEX idx_sales_summary_product ON sales_summary(product_id);
REFRESH MATERIALIZED VIEW sales_summary;
Index Recommendations
When to Create Indexes
| Scenario | Index Type |
|---|---|
| WHERE clause equality | B-tree (default) |
| WHERE clause range | B-tree |
| Full-text search | GIN/Full-text |
| Array/JSON containment | GIN |
| Geospatial queries | GiST/SP-GiST |
| Multiple columns in WHERE | Composite |
| Frequently updated columns | Consider carefully |
Index Creation Guidelines
- •Index selective columns - High cardinality (many unique values)
- •Index foreign keys - Used in JOINs
- •Index ORDER BY columns - For sorted queries
- •Consider write overhead - Indexes slow down INSERT/UPDATE
- •Monitor unused indexes - Remove if not used
Output Format
When optimizing a query, provide:
- •Analysis: What makes the query slow
- •Optimized Query: Rewritten version
- •Explanation: What changed and why
- •Index Recommendations: Indexes to create
- •Expected Improvement: Estimated performance gain
Additional Resources
| File | Content |
|---|---|
| examples.md | Before/after optimization examples |