PostgreSQL Optimization

Overview

Expert in PostgreSQL performance tuning, query optimization, and database administration. Specializes in EXPLAIN analysis, indexing strategies, connection pooling, partitioning, and production-grade PostgreSQL operations.

When to Use

•Diagnosing slow queries with EXPLAIN ANALYZE
•Creating optimal indexes for query patterns
•Designing database schemas for performance
•Configuring PostgreSQL for production workloads
•Implementing connection pooling (PgBouncer, Supavisor)
•Setting up partitioning for large tables
•Analyzing and reducing lock contention
•Migrating or upgrading PostgreSQL versions

Capabilities

Query Optimization

•EXPLAIN / EXPLAIN ANALYZE interpretation
•Query plan analysis and optimization
•Identifying sequential scans vs index scans
•Join optimization and query rewriting
•CTE vs subquery performance trade-offs
•Window function optimization

Indexing Strategies

•B-tree, GIN, GiST, BRIN index selection
•Partial indexes for filtered queries
•Expression indexes for computed values
•Covering indexes (INCLUDE clause)
•Index-only scans optimization
•Concurrent index creation

Schema Design

•Normalization vs denormalization trade-offs
•JSONB column design and indexing
•Array columns and operations
•Enum types vs lookup tables
•Foreign key cascade strategies
•Table inheritance and partitioning

Configuration Tuning

•Memory settings (shared_buffers, work_mem, effective_cache_size)
•Connection limits and pooling
•WAL and checkpoint tuning
•Autovacuum configuration
•Statistics collection settings

Advanced Features

•Partitioning (range, list, hash)
•Materialized views with refresh strategies
•Full-text search with tsvector/tsquery
•PostGIS geospatial queries
•Logical replication setup
•pg_stat_statements analysis

Dependencies

Works well with:

•database-modeler - Schema design and ERD creation
•data-pipeline-engineer - ETL and data processing
•site-reliability-engineer - Database monitoring and alerting
•nextjs-app-router-expert - Full-stack data fetching

Examples

Reading EXPLAIN ANALYZE Output

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

-- Key metrics to look for:
-- - "Seq Scan" on large tables → needs index
-- - "Rows Removed by Filter" high → filter before join
-- - "Sort Method: external merge" → increase work_mem
-- - "Buffers: shared hit" vs "shared read" → cache efficiency

Creating Effective Indexes

sql

-- Basic B-tree for equality and range queries
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders (user_id, created_at DESC);

-- Partial index for common filter
CREATE INDEX CONCURRENTLY idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';

-- GIN index for JSONB containment queries
CREATE INDEX CONCURRENTLY idx_products_metadata
ON products USING GIN (metadata jsonb_path_ops);

-- Covering index to enable index-only scans
CREATE INDEX CONCURRENTLY idx_users_email_covering
ON users (email) INCLUDE (name, created_at);

-- Expression index for case-insensitive search
CREATE INDEX CONCURRENTLY idx_users_email_lower
ON users (LOWER(email));

Optimizing N+1 Queries

sql

-- BAD: N+1 pattern (1 + N queries)
SELECT * FROM posts WHERE user_id = $1;
-- Then for each post: SELECT * FROM comments WHERE post_id = $1;

-- GOOD: Single query with lateral join
SELECT p.*, c.comments
FROM posts p
LEFT JOIN LATERAL (
  SELECT json_agg(c.*) as comments
  FROM comments c
  WHERE c.post_id = p.id
) c ON true
WHERE p.user_id = $1;

-- GOOD: Window function for aggregates
SELECT
  p.*,
  COUNT(*) OVER (PARTITION BY p.user_id) as user_post_count
FROM posts p
WHERE p.user_id = $1;

Table Partitioning

sql

-- Create partitioned table by date range
CREATE TABLE events (
  id BIGSERIAL,
  event_type TEXT NOT NULL,
  payload JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automate partition creation with pg_partman
CREATE EXTENSION pg_partman;
SELECT partman.create_parent('public.events', 'created_at', 'native', 'monthly');

Connection Pooling Config (PgBouncer)

ini

; pgbouncer.ini

[databases]
myapp = host=localhost dbname=myapp

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool settings
pool_mode = transaction        ; Recommended for most apps
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5

; Timeouts
server_idle_timeout = 600
client_idle_timeout = 0

Performance Configuration

sql

-- Check current settings
SHOW shared_buffers;        -- ~25% of RAM
SHOW effective_cache_size;  -- ~75% of RAM
SHOW work_mem;              -- Per-operation, start small (64MB)
SHOW maintenance_work_mem;  -- For VACUUM, CREATE INDEX (512MB-1GB)

-- Recommended production settings (for 32GB RAM server)
ALTER SYSTEM SET shared_buffers = '8GB';
ALTER SYSTEM SET effective_cache_size = '24GB';
ALTER SYSTEM SET work_mem = '64MB';
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET random_page_cost = 1.1;  -- For SSD storage
ALTER SYSTEM SET effective_io_concurrency = 200;  -- For SSD

-- Reload configuration
SELECT pg_reload_conf();

Finding Slow Queries

sql

-- Enable pg_stat_statements
CREATE EXTENSION pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT
  round(total_exec_time::numeric, 2) as total_ms,
  calls,
  round(mean_exec_time::numeric, 2) as avg_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER())::numeric, 2) as pct,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Queries with most I/O
SELECT
  round(shared_blks_read::numeric, 2) as disk_reads,
  round(shared_blks_hit::numeric, 2) as cache_hits,
  round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_ratio,
  query
FROM pg_stat_statements
ORDER BY shared_blks_read DESC
LIMIT 10;

Analyzing Table Bloat

sql

-- Check table bloat
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size,
  n_dead_tup,
  n_live_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- Manual VACUUM for critical tables
VACUUM (VERBOSE, ANALYZE) orders;

-- Reclaim space (requires exclusive lock)
VACUUM FULL orders;  -- Use during maintenance window

Best Practices

•Always use EXPLAIN ANALYZE - Don't guess, measure actual query performance
•Create indexes CONCURRENTLY - Avoid blocking writes during index creation
•Partial indexes for hot paths - Index only the rows you query frequently
•Use connection pooling - PgBouncer or Supavisor for production
•Monitor pg_stat_statements - Track query performance over time
•Regular ANALYZE - Keep statistics current for query planner
•**Avoid SELECT *** - Only fetch columns you need
•Batch large updates - Process in chunks to avoid lock contention
•Use prepared statements - Reduce parsing overhead for repeated queries

Common Pitfalls

•Missing indexes - Check for sequential scans on large tables
•Over-indexing - Too many indexes slow down writes
•work_mem too low - Causes disk-based sorts and hash joins
•Connection exhaustion - Not using connection pooling
•Stale statistics - Autovacuum not running frequently enough
•Bloated tables - Not vacuuming after large deletes/updates
•N+1 queries - Fetching related data in loops instead of joins
•SELECT * everywhere - Fetching unnecessary columns