Query Optimization

Analyze database queries and provide actionable optimization recommendations.

Workflow

Copy and track progress:

code

Query Optimization:
- [ ] Identify database type
- [ ] Analyze query structure
- [ ] Check for red flags
- [ ] Provide recommendations
- [ ] Suggest verification

•Identify database type from syntax or user context
•Analyze query structure - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
•Check for red flags (see Quick Reference)
•Provide concrete recommendations with copy-pasteable code
•Suggest verification - EXPLAIN command or metric to confirm

For complex queries (3+ JOINs or nested subqueries):

•Decompose into logical units
•Analyze each unit's access pattern
•Synthesize recommendations in priority order

Analysis Checklist

For each query, evaluate:

•Index usage: Missing or inefficient indexes
•Scan type: Full scans on large tables
•Join efficiency: Join order, type, missing conditions
•Filtering: WHERE placement, function usage on columns
•Projection: SELECT * vs specific columns
•Pagination: OFFSET vs keyset/cursor
•N+1 detection: Loop patterns in code context

Quick Reference

SQL Red Flags

•SELECT * on large tables
•WHERE function(column) prevents index use
•LIKE '%pattern' (leading wildcard)
•OR conditions spanning different columns
•Large OFFSET values
•NOT IN with nullable subquery
•Missing LIMIT on unbounded queries

NoSQL Red Flags

•MongoDB: COLLSCAN, $where, unbounded arrays
•Redis: KEYS *, missing TTL, wrong data structure
•DynamoDB: Scan operations, hot partitions
•Elasticsearch: Deep pagination, wildcard prefix

Examples

Example 1: Missing Index

Input:

sql

SELECT * FROM orders WHERE customer_email = 'user@example.com';

Output:

code

## Issue [HIGH]
Full table scan on `orders` - no index on `customer_email`

## Impact
Query time grows linearly with table size; causes lock contention

## Solution
CREATE INDEX idx_orders_customer_email ON orders(customer_email);

## Verification
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com';
-- Should show "Index Scan" instead of "Seq Scan"

Example 2: N+1 Query Pattern

Input:

python

users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")

Output:

code

## Issue [HIGH]
N+1 query pattern - executes 1 + N queries instead of 1

## Impact
100 users = 101 queries; network latency multiplied

## Solution
SELECT u.*, o.* FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

## Verification
Check query count before/after; should reduce from N+1 to 1

Example 3: When NOT to Optimize

Input:

sql

SELECT * FROM config WHERE key = 'app_version';

Output:

code

## No Issues Detected

This query on a small lookup table (typically <100 rows) is acceptable.
Adding an index would add write overhead without meaningful read benefit.

## When to Reconsider
- If table grows beyond 10,000 rows
- If query is called >100 times/second

Example 4: Incomplete Information

Input:

sql

SELECT * FROM large_table WHERE status = ?;

Output:

code

## Issue [MEDIUM]
Preliminary analysis without schema information

## Potential Issues
1. `SELECT *` - may fetch unnecessary columns
2. `status` column - if low cardinality (e.g., 5 values), index may not help

## Information Needed
- Table row count
- Cardinality of `status` column
- Existing indexes: `SHOW INDEX FROM large_table;`

Example 5: MongoDB COLLSCAN

Input:

javascript

db.orders.find({ customer_email: "user@example.com" })

Output:

code

## Issue [HIGH]
COLLSCAN on `orders` - no index on `customer_email`

## Impact
Query scans entire collection; O(n) performance

## Solution
db.orders.createIndex({ customer_email: 1 })

## Verification
db.orders.find({ customer_email: "..." }).explain("executionStats")
// Should show "stage": "IXSCAN" instead of "COLLSCAN"

Response Format

Structure recommendations as:

code

## Issue [HIGH/MEDIUM/LOW]
[Specific problem identified]

## Impact
[Performance/resource cost]

## Solution
[Concrete fix with code]

## Verification
[EXPLAIN command or metric to confirm]

Confidence Levels

When uncertainty exists, indicate confidence:

•HIGH: Pattern clearly matches known anti-pattern
•MEDIUM: Likely issue, depends on data distribution/schema
•LOW: Potential issue, needs EXPLAIN to confirm

Response Constraints

•Max 5 recommendations per query - prioritize by impact
•Order by severity: HIGH → MEDIUM → LOW
•One primary fix per issue - mention alternatives briefly
•Code must be copy-pasteable - no placeholders like <table_name>
•Verification commands must be complete - include all flags

Prioritization Matrix

Issue Type	Impact	Priority
Missing JOIN index	Query blocks	P0
Full table scan (>100k rows)	Slow response	P0
N+1 pattern	Latency × N	P1
SELECT *	Memory/bandwidth	P1
Missing LIMIT	Resource exhaustion	P1
Large OFFSET	Slow pagination	P2
Suboptimal ORDER BY	Sort overhead	P2

Reference Files

Consult these for detailed patterns (use grep to find specific sections):

•
references/sql-patterns.md - SQL optimization
- •Grep: ## Index, ## Query Rewrite, ## Join, ## Aggregation
•
references/nosql-patterns.md - MongoDB, Redis, DynamoDB, Elasticsearch
- •Grep: ## MongoDB, ## Redis, ## DynamoDB, ## Elasticsearch
•
references/explain-analysis.md - Execution plans
- •Grep: ## PostgreSQL, ## MySQL, ## MongoDB
•
references/common-scenarios.md - Common patterns
- •Grep: ## Slow pagination, ## N+1, ## Dashboard, ## Bulk
•
references/orm-patterns.md - ORM fixes
- •Grep: ### Django, ### SQLAlchemy, ### ActiveRecord, ### Prisma

Utility Scripts

All scripts use Python standard library only (no pip install required).

Run these for automated analysis:

Query Analysis

analyze_query.py - Detect anti-patterns and score complexity:

bash

python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json

suggest_index.py - Generate index recommendations:

bash

python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"

EXPLAIN & Logs

parse_explain.py - Analyze EXPLAIN output:

bash

python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt

diff_explain.py - Compare before/after EXPLAIN outputs:

bash

python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json

parse_slow_log.py - Analyze slow query logs:

bash

python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json

Schema & Code

analyze_schema.py - Find schema optimization opportunities:

bash

python scripts/analyze_schema.py schema.sql

detect_orm_issues.py - Find ORM anti-patterns in code:

bash

python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models

check_migration.py - Check migration safety:

bash

python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql

Workflow with Scripts

For query optimization:

•analyze_query.py → detect issues
•suggest_index.py → generate indexes
•User runs EXPLAIN (before)
•User applies optimization
•User runs EXPLAIN (after)
•diff_explain.py → verify improvement

For verifying optimization impact:

bash

# Save before state
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt

# Apply optimization (add index, rewrite query, etc.)

# Save after state  
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt

# Compare
python scripts/diff_explain.py --pg before.txt after.txt

For codebase audit:

•detect_orm_issues.py --dir ./ → find N+1 patterns
•analyze_schema.py → check schema
•Prioritize fixes by severity

For migration review:

•check_migration.py → safety check
•Fix critical issues before deployment
•Use safe alternatives (CONCURRENTLY, etc.)

EXPLAIN Commands

Provide appropriate commands:

sql

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;

javascript

// MongoDB
db.collection.find({...}).explain("executionStats")

Index Recommendation Format

When suggesting indexes:

sql

-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- Rationale: [why this column order, covering considerations]

javascript

// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })

Handling Missing Information

When user provides incomplete context:

No schema provided:

"To give precise recommendations, I need the table schema. Could you share:

•Column data types

•Existing indexes

•Approximate row counts

Meanwhile, here's what I can infer from the query..."

No EXPLAIN output:

"Run this command and share the output:
sql
EXPLAIN (ANALYZE, BUFFERS) <their query>;
This will show actual vs estimated rows and scan types."

Unknown database type:

"Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.) The optimization strategies differ significantly between them."

Conditional Workflows

Analyzing a query:

•Run analyze_query.py first
•Check for anti-patterns
•Suggest fixes with examples

Interpreting EXPLAIN output:

•Run parse_explain.py with appropriate flag
•Identify warnings
•Correlate with query structure

Full optimization request:

•Analyze query → identify issues
•Suggest indexes → generate CREATE statements
•Request EXPLAIN → interpret results
•Provide before/after comparison

Complex optimization (3+ issues or architectural decisions): Use extended reasoning to:

•Map full query execution path
•Identify all optimization opportunities
•Evaluate trade-offs between approaches
•Prioritize by impact and implementation cost

Script Error Handling

If scripts fail:

•Check Python version (requires 3.7+)
•Verify input format (SQL string, valid file path)
•Fall back to manual analysis using reference files

Common errors:

•FileNotFoundError → Check file path, use absolute paths
•json.JSONDecodeError → Input not valid JSON, check --pg/--mysql flags
•Empty output → Query may be valid, no issues detected

When scripts are unavailable: Perform manual analysis using Quick Reference red flags and reference files.

Unsupported Databases

For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):

•Apply general SQL patterns from references/sql-patterns.md
•Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
•Recommend user consult database-specific documentation for EXPLAIN syntax and index creation