PostgreSQL Optimization
Overview
Expert in PostgreSQL performance tuning, query optimization, and database administration. Specializes in EXPLAIN analysis, indexing strategies, connection pooling, partitioning, and production-grade PostgreSQL operations.
When to Use
- •Diagnosing slow queries with EXPLAIN ANALYZE
- •Creating optimal indexes for query patterns
- •Designing database schemas for performance
- •Configuring PostgreSQL for production workloads
- •Implementing connection pooling (PgBouncer, Supavisor)
- •Setting up partitioning for large tables
- •Analyzing and reducing lock contention
- •Migrating or upgrading PostgreSQL versions
Capabilities
Query Optimization
- •EXPLAIN / EXPLAIN ANALYZE interpretation
- •Query plan analysis and optimization
- •Identifying sequential scans vs index scans
- •Join optimization and query rewriting
- •CTE vs subquery performance trade-offs
- •Window function optimization
Indexing Strategies
- •B-tree, GIN, GiST, BRIN index selection
- •Partial indexes for filtered queries
- •Expression indexes for computed values
- •Covering indexes (INCLUDE clause)
- •Index-only scans optimization
- •Concurrent index creation
Schema Design
- •Normalization vs denormalization trade-offs
- •JSONB column design and indexing
- •Array columns and operations
- •Enum types vs lookup tables
- •Foreign key cascade strategies
- •Table inheritance and partitioning
Configuration Tuning
- •Memory settings (shared_buffers, work_mem, effective_cache_size)
- •Connection limits and pooling
- •WAL and checkpoint tuning
- •Autovacuum configuration
- •Statistics collection settings
Advanced Features
- •Partitioning (range, list, hash)
- •Materialized views with refresh strategies
- •Full-text search with tsvector/tsquery
- •PostGIS geospatial queries
- •Logical replication setup
- •pg_stat_statements analysis
Dependencies
Works well with:
- •
database-modeler- Schema design and ERD creation - •
data-pipeline-engineer- ETL and data processing - •
site-reliability-engineer- Database monitoring and alerting - •
nextjs-app-router-expert- Full-stack data fetching
Examples
Reading EXPLAIN ANALYZE Output
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.*, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > '2024-01-01' GROUP BY u.id; -- Key metrics to look for: -- - "Seq Scan" on large tables → needs index -- - "Rows Removed by Filter" high → filter before join -- - "Sort Method: external merge" → increase work_mem -- - "Buffers: shared hit" vs "shared read" → cache efficiency
Creating Effective Indexes
sql
-- Basic B-tree for equality and range queries CREATE INDEX CONCURRENTLY idx_orders_user_created ON orders (user_id, created_at DESC); -- Partial index for common filter CREATE INDEX CONCURRENTLY idx_orders_pending ON orders (created_at) WHERE status = 'pending'; -- GIN index for JSONB containment queries CREATE INDEX CONCURRENTLY idx_products_metadata ON products USING GIN (metadata jsonb_path_ops); -- Covering index to enable index-only scans CREATE INDEX CONCURRENTLY idx_users_email_covering ON users (email) INCLUDE (name, created_at); -- Expression index for case-insensitive search CREATE INDEX CONCURRENTLY idx_users_email_lower ON users (LOWER(email));
Optimizing N+1 Queries
sql
-- BAD: N+1 pattern (1 + N queries) SELECT * FROM posts WHERE user_id = $1; -- Then for each post: SELECT * FROM comments WHERE post_id = $1; -- GOOD: Single query with lateral join SELECT p.*, c.comments FROM posts p LEFT JOIN LATERAL ( SELECT json_agg(c.*) as comments FROM comments c WHERE c.post_id = p.id ) c ON true WHERE p.user_id = $1; -- GOOD: Window function for aggregates SELECT p.*, COUNT(*) OVER (PARTITION BY p.user_id) as user_post_count FROM posts p WHERE p.user_id = $1;
Table Partitioning
sql
-- Create partitioned table by date range
CREATE TABLE events (
id BIGSERIAL,
event_type TEXT NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Automate partition creation with pg_partman
CREATE EXTENSION pg_partman;
SELECT partman.create_parent('public.events', 'created_at', 'native', 'monthly');
Connection Pooling Config (PgBouncer)
ini
; pgbouncer.ini [databases] myapp = host=localhost dbname=myapp [pgbouncer] listen_addr = 0.0.0.0 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt ; Pool settings pool_mode = transaction ; Recommended for most apps max_client_conn = 1000 default_pool_size = 20 reserve_pool_size = 5 ; Timeouts server_idle_timeout = 600 client_idle_timeout = 0
Performance Configuration
sql
-- Check current settings SHOW shared_buffers; -- ~25% of RAM SHOW effective_cache_size; -- ~75% of RAM SHOW work_mem; -- Per-operation, start small (64MB) SHOW maintenance_work_mem; -- For VACUUM, CREATE INDEX (512MB-1GB) -- Recommended production settings (for 32GB RAM server) ALTER SYSTEM SET shared_buffers = '8GB'; ALTER SYSTEM SET effective_cache_size = '24GB'; ALTER SYSTEM SET work_mem = '64MB'; ALTER SYSTEM SET maintenance_work_mem = '1GB'; ALTER SYSTEM SET random_page_cost = 1.1; -- For SSD storage ALTER SYSTEM SET effective_io_concurrency = 200; -- For SSD -- Reload configuration SELECT pg_reload_conf();
Finding Slow Queries
sql
-- Enable pg_stat_statements CREATE EXTENSION pg_stat_statements; -- Top 10 slowest queries by total time SELECT round(total_exec_time::numeric, 2) as total_ms, calls, round(mean_exec_time::numeric, 2) as avg_ms, round((100 * total_exec_time / sum(total_exec_time) OVER())::numeric, 2) as pct, query FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; -- Queries with most I/O SELECT round(shared_blks_read::numeric, 2) as disk_reads, round(shared_blks_hit::numeric, 2) as cache_hits, round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_ratio, query FROM pg_stat_statements ORDER BY shared_blks_read DESC LIMIT 10;
Analyzing Table Bloat
sql
-- Check table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size, pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size, n_dead_tup, n_live_tup, round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10; -- Manual VACUUM for critical tables VACUUM (VERBOSE, ANALYZE) orders; -- Reclaim space (requires exclusive lock) VACUUM FULL orders; -- Use during maintenance window
Best Practices
- •Always use EXPLAIN ANALYZE - Don't guess, measure actual query performance
- •Create indexes CONCURRENTLY - Avoid blocking writes during index creation
- •Partial indexes for hot paths - Index only the rows you query frequently
- •Use connection pooling - PgBouncer or Supavisor for production
- •Monitor pg_stat_statements - Track query performance over time
- •Regular ANALYZE - Keep statistics current for query planner
- •**Avoid SELECT *** - Only fetch columns you need
- •Batch large updates - Process in chunks to avoid lock contention
- •Use prepared statements - Reduce parsing overhead for repeated queries
Common Pitfalls
- •Missing indexes - Check for sequential scans on large tables
- •Over-indexing - Too many indexes slow down writes
- •work_mem too low - Causes disk-based sorts and hash joins
- •Connection exhaustion - Not using connection pooling
- •Stale statistics - Autovacuum not running frequently enough
- •Bloated tables - Not vacuuming after large deletes/updates
- •N+1 queries - Fetching related data in loops instead of joins
- •SELECT * everywhere - Fetching unnecessary columns