SQL Performance Tuning
Database-level performance optimization including indexing strategies, query profiling, schema design, and bottleneck identification.
Trigger Conditions
Activate this skill when the user:
- •Asks about database performance or slow queries
- •Needs help with indexing strategies
- •Wants to understand query execution plans
- •Has database bottlenecks to investigate
- •Needs schema optimization advice
- •Asks about database configuration tuning
Performance Tuning Workflow
code
1. BASELINE -> Measure current performance 2. IDENTIFY -> Find bottlenecks and slow queries 3. ANALYZE -> Understand root causes 4. OPTIMIZE -> Apply improvements 5. VERIFY -> Confirm improvements 6. MONITOR -> Set up ongoing monitoring
Indexing Strategies
Index Types
| Type | Use Case | Example |
|---|---|---|
| B-Tree | Default, equality/range queries | CREATE INDEX idx ON table(col) |
| Hash | Equality only (PostgreSQL) | CREATE INDEX idx ON table USING HASH(col) |
| GIN | Arrays, JSONB, full-text | CREATE INDEX idx ON table USING GIN(col) |
| GiST | Geometric, full-text, ranges | CREATE INDEX idx ON table USING GIST(col) |
| BRIN | Large tables with sorted data | CREATE INDEX idx ON table USING BRIN(col) |
When to Create Indexes
code
DO create indexes on: - Primary keys (automatic) - Foreign keys (for JOINs) - Columns in WHERE clauses - Columns in ORDER BY - Columns in GROUP BY - Columns in JOIN conditions DON'T over-index: - Tables with frequent writes - Low-cardinality columns (few unique values) - Small tables (full scan may be faster) - Columns rarely used in queries
Composite Index Guidelines
sql
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
Column Order Matters:
- •Put equality columns first
- •Put range/inequality columns last
- •Most selective columns first
code
Good for: WHERE customer_id = 123 AND order_date > '2024-01-01' Good for: WHERE customer_id = 123 Bad for: WHERE order_date > '2024-01-01' (can't use leading column)
Covering Indexes
Include non-key columns to enable index-only scans:
PostgreSQL:
sql
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (order_date, total);
SQL Server:
sql
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (order_date, total);
Partial/Filtered Indexes
Index only relevant rows:
sql
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active'; CREATE INDEX idx_recent_orders ON orders(customer_id) WHERE order_date >= '2024-01-01';
Expression/Functional Indexes
Index computed values:
sql
CREATE INDEX idx_users_email_lower ON users(LOWER(email)); CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM order_date));
Query Profiling
PostgreSQL
Enable timing:
sql
\timing on
Execution plan:
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
Key metrics to watch:
- •actual time: Real execution time
- •rows: Estimated vs actual (large difference = statistics issue)
- •buffers shared hit/read: Cache efficiency
- •loops: Iteration count for nested operations
Find slow queries:
sql
SELECT
query,
calls,
total_time / 1000 AS total_seconds,
mean_time / 1000 AS mean_seconds,
rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
MySQL
Enable profiling:
sql
SET profiling = 1; SHOW PROFILES; SHOW PROFILE FOR QUERY 1;
Execution plan:
sql
EXPLAIN ANALYZE SELECT ...;
Find slow queries:
sql
SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 20;
SQL Server
Execution plan:
sql
SET STATISTICS IO ON; SET STATISTICS TIME ON;
Find slow queries:
sql
SELECT TOP 20
qs.total_elapsed_time / qs.execution_count AS avg_elapsed,
qs.execution_count,
SUBSTRING(qt.text, qs.statement_start_offset / 2,
CASE WHEN qs.statement_end_offset = -1 THEN LEN(qt.text)
ELSE (qs.statement_end_offset - qs.statement_start_offset) / 2 END) AS query
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY avg_elapsed DESC;
Execution Plan Analysis
Common Operations
| Operation | Description | Concern Level |
|---|---|---|
| Seq Scan / Table Scan | Reads entire table | High for large tables |
| Index Scan | Uses index, fetches from table | Good |
| Index Only Scan | Uses index only, no table fetch | Best |
| Bitmap Index Scan | Uses multiple indexes | Good for OR conditions |
| Nested Loop | For each outer row, scan inner | Okay for small inner |
| Hash Join | Build hash, probe | Good for equality joins |
| Merge Join | Sort both, merge | Good for sorted data |
| Sort | Orders result set | Watch memory usage |
Warning Signs
code
- Seq Scan on large table -> Add index - High estimated vs actual rows -> Run ANALYZE - Sort with external merge -> Increase work_mem - Nested Loop with many iterations -> Check indexes on inner table - Hash/Sort spilling to disk -> Increase memory settings
Schema Optimization
Data Types
code
Use appropriate sizes: - SMALLINT (2 bytes) vs INT (4 bytes) vs BIGINT (8 bytes) - VARCHAR(50) vs VARCHAR(255) vs TEXT - TIMESTAMP vs DATE - DECIMAL(10,2) vs FLOAT Avoid: - TEXT/BLOB for small known-length strings - VARCHAR(MAX) when length is known - FLOAT for money (use DECIMAL)
Normalization vs Denormalization
code
Normalize for: - Data integrity - Reduced redundancy - Write-heavy workloads Denormalize for: - Read-heavy workloads - Eliminating expensive JOINs - Reporting/analytics queries
Table Partitioning
PostgreSQL range partitioning:
sql
CREATE TABLE orders (
id SERIAL,
order_date DATE,
total DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
Benefits:
- •Partition pruning (skip irrelevant partitions)
- •Easier data archival/deletion
- •Parallel query execution
Connection and Resource Management
Connection Pooling
code
Use connection pooler (PgBouncer, ProxySQL): - Reduces connection overhead - Limits max connections - Reuses existing connections
Memory Settings
PostgreSQL:
sql
shared_buffers = 25% of RAM work_mem = 256MB (per operation, careful with concurrency) maintenance_work_mem = 1GB effective_cache_size = 75% of RAM
MySQL:
sql
innodb_buffer_pool_size = 70% of RAM query_cache_size = 0 (disabled in MySQL 8.0+) sort_buffer_size = 4MB join_buffer_size = 4MB
Lock and Concurrency Issues
Identifying Locks
PostgreSQL:
sql
SELECT
pid,
usename,
pg_blocking_pids(pid) AS blocked_by,
query
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0;
MySQL:
sql
SELECT * FROM information_schema.INNODB_LOCK_WAITS; SHOW ENGINE INNODB STATUS;
Lock Prevention
sql
SELECT ... FOR UPDATE SKIP LOCKED;
UPDATE ... WHERE id IN (
SELECT id FROM table
WHERE condition
LIMIT 100
FOR UPDATE SKIP LOCKED
);
Deadlock Prevention
- •Access tables in consistent order
- •Keep transactions short
- •Use appropriate isolation levels
- •Implement retry logic
Maintenance Tasks
Statistics Updates
PostgreSQL:
sql
ANALYZE table_name; ANALYZE;
MySQL:
sql
ANALYZE TABLE table_name;
SQL Server:
sql
UPDATE STATISTICS table_name;
Index Maintenance
Reindex bloated indexes:
sql
REINDEX INDEX idx_name; REINDEX TABLE table_name;
Rebuild (SQL Server):
sql
ALTER INDEX idx_name ON table_name REBUILD;
Vacuuming (PostgreSQL)
sql
VACUUM table_name; VACUUM FULL table_name; VACUUM ANALYZE table_name;
Autovacuum settings:
sql
autovacuum = on autovacuum_vacuum_threshold = 50 autovacuum_analyze_threshold = 50 autovacuum_vacuum_scale_factor = 0.1
Monitoring Queries
Table Size
sql
SELECT
relname AS table,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
Index Usage
sql
SELECT
relname AS table,
indexrelname AS index,
idx_scan AS scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
Unused Indexes
sql
SELECT
relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (
SELECT conindid FROM pg_constraint WHERE contype = 'p'
)
ORDER BY pg_relation_size(indexrelid) DESC;
Cache Hit Ratio
sql
SELECT
sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS table_hit_ratio,
sum(idx_blks_hit) / nullif(sum(idx_blks_hit) + sum(idx_blks_read), 0) AS index_hit_ratio
FROM pg_statio_user_tables;
Output Format
When providing performance recommendations:
- •Current Issue: What's causing poor performance
- •Impact: How it affects the system
- •Solution: Specific actions to take
- •Implementation: SQL/commands to run
- •Verification: How to confirm improvement
- •Trade-offs: Any downsides to consider
Additional Resources
| File | Content |
|---|---|
| examples.md | Performance tuning case studies |