Query Optimization
Analyze database queries and provide actionable optimization recommendations.
Workflow
Copy and track progress:
Query Optimization: - [ ] Identify database type - [ ] Analyze query structure - [ ] Check for red flags - [ ] Provide recommendations - [ ] Suggest verification
- •Identify database type from syntax or user context
- •Analyze query structure - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
- •Check for red flags (see Quick Reference)
- •Provide concrete recommendations with copy-pasteable code
- •Suggest verification - EXPLAIN command or metric to confirm
For complex queries (3+ JOINs or nested subqueries):
- •Decompose into logical units
- •Analyze each unit's access pattern
- •Synthesize recommendations in priority order
Analysis Checklist
For each query, evaluate:
- •Index usage: Missing or inefficient indexes
- •Scan type: Full scans on large tables
- •Join efficiency: Join order, type, missing conditions
- •Filtering: WHERE placement, function usage on columns
- •Projection: SELECT * vs specific columns
- •Pagination: OFFSET vs keyset/cursor
- •N+1 detection: Loop patterns in code context
Quick Reference
SQL Red Flags
- •
SELECT *on large tables - •
WHERE function(column)prevents index use - •
LIKE '%pattern'(leading wildcard) - •
ORconditions spanning different columns - •Large
OFFSETvalues - •
NOT INwith nullable subquery - •Missing
LIMITon unbounded queries
NoSQL Red Flags
- •MongoDB: COLLSCAN,
$where, unbounded arrays - •Redis:
KEYS *, missing TTL, wrong data structure - •DynamoDB: Scan operations, hot partitions
- •Elasticsearch: Deep pagination, wildcard prefix
Examples
Example 1: Missing Index
Input:
SELECT * FROM orders WHERE customer_email = 'user@example.com';
Output:
## Issue [HIGH] Full table scan on `orders` - no index on `customer_email` ## Impact Query time grows linearly with table size; causes lock contention ## Solution CREATE INDEX idx_orders_customer_email ON orders(customer_email); ## Verification EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com'; -- Should show "Index Scan" instead of "Seq Scan"
Example 2: N+1 Query Pattern
Input:
users = db.query("SELECT * FROM users")
for user in users:
orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
Output:
## Issue [HIGH] N+1 query pattern - executes 1 + N queries instead of 1 ## Impact 100 users = 101 queries; network latency multiplied ## Solution SELECT u.*, o.* FROM users u LEFT JOIN orders o ON u.id = o.user_id; ## Verification Check query count before/after; should reduce from N+1 to 1
Example 3: When NOT to Optimize
Input:
SELECT * FROM config WHERE key = 'app_version';
Output:
## No Issues Detected This query on a small lookup table (typically <100 rows) is acceptable. Adding an index would add write overhead without meaningful read benefit. ## When to Reconsider - If table grows beyond 10,000 rows - If query is called >100 times/second
Example 4: Incomplete Information
Input:
SELECT * FROM large_table WHERE status = ?;
Output:
## Issue [MEDIUM] Preliminary analysis without schema information ## Potential Issues 1. `SELECT *` - may fetch unnecessary columns 2. `status` column - if low cardinality (e.g., 5 values), index may not help ## Information Needed - Table row count - Cardinality of `status` column - Existing indexes: `SHOW INDEX FROM large_table;`
Example 5: MongoDB COLLSCAN
Input:
db.orders.find({ customer_email: "user@example.com" })
Output:
## Issue [HIGH]
COLLSCAN on `orders` - no index on `customer_email`
## Impact
Query scans entire collection; O(n) performance
## Solution
db.orders.createIndex({ customer_email: 1 })
## Verification
db.orders.find({ customer_email: "..." }).explain("executionStats")
// Should show "stage": "IXSCAN" instead of "COLLSCAN"
Response Format
Structure recommendations as:
## Issue [HIGH/MEDIUM/LOW] [Specific problem identified] ## Impact [Performance/resource cost] ## Solution [Concrete fix with code] ## Verification [EXPLAIN command or metric to confirm]
Confidence Levels
When uncertainty exists, indicate confidence:
- •HIGH: Pattern clearly matches known anti-pattern
- •MEDIUM: Likely issue, depends on data distribution/schema
- •LOW: Potential issue, needs EXPLAIN to confirm
Response Constraints
- •Max 5 recommendations per query - prioritize by impact
- •Order by severity: HIGH → MEDIUM → LOW
- •One primary fix per issue - mention alternatives briefly
- •Code must be copy-pasteable - no placeholders like
<table_name> - •Verification commands must be complete - include all flags
Prioritization Matrix
| Issue Type | Impact | Priority |
|---|---|---|
| Missing JOIN index | Query blocks | P0 |
| Full table scan (>100k rows) | Slow response | P0 |
| N+1 pattern | Latency × N | P1 |
| SELECT * | Memory/bandwidth | P1 |
| Missing LIMIT | Resource exhaustion | P1 |
| Large OFFSET | Slow pagination | P2 |
| Suboptimal ORDER BY | Sort overhead | P2 |
Reference Files
Consult these for detailed patterns (use grep to find specific sections):
- •
references/sql-patterns.md- SQL optimization- •Grep:
## Index,## Query Rewrite,## Join,## Aggregation
- •Grep:
- •
references/nosql-patterns.md- MongoDB, Redis, DynamoDB, Elasticsearch- •Grep:
## MongoDB,## Redis,## DynamoDB,## Elasticsearch
- •Grep:
- •
references/explain-analysis.md- Execution plans- •Grep:
## PostgreSQL,## MySQL,## MongoDB
- •Grep:
- •
references/common-scenarios.md- Common patterns- •Grep:
## Slow pagination,## N+1,## Dashboard,## Bulk
- •Grep:
- •
references/orm-patterns.md- ORM fixes- •Grep:
### Django,### SQLAlchemy,### ActiveRecord,### Prisma
- •Grep:
Utility Scripts
All scripts use Python standard library only (no pip install required).
Run these for automated analysis:
Query Analysis
analyze_query.py - Detect anti-patterns and score complexity:
python scripts/analyze_query.py "SELECT * FROM orders WHERE ..." python scripts/analyze_query.py --file query.sql --json
suggest_index.py - Generate index recommendations:
python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"
EXPLAIN & Logs
parse_explain.py - Analyze EXPLAIN output:
python scripts/parse_explain.py --pg explain_output.txt python scripts/parse_explain.py --mysql explain_output.txt
diff_explain.py - Compare before/after EXPLAIN outputs:
python scripts/diff_explain.py --pg before.txt after.txt python scripts/diff_explain.py --mysql before.txt after.txt --json
parse_slow_log.py - Analyze slow query logs:
python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20 python scripts/parse_slow_log.py --mysql slow-query.log --json
Schema & Code
analyze_schema.py - Find schema optimization opportunities:
python scripts/analyze_schema.py schema.sql
detect_orm_issues.py - Find ORM anti-patterns in code:
python scripts/detect_orm_issues.py app.py python scripts/detect_orm_issues.py --dir ./models
check_migration.py - Check migration safety:
python scripts/check_migration.py --pg migration.sql python scripts/check_migration.py --mysql migration.sql
Workflow with Scripts
For query optimization:
- •
analyze_query.py→ detect issues - •
suggest_index.py→ generate indexes - •User runs EXPLAIN (before)
- •User applies optimization
- •User runs EXPLAIN (after)
- •
diff_explain.py→ verify improvement
For verifying optimization impact:
# Save before state psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt # Apply optimization (add index, rewrite query, etc.) # Save after state psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt # Compare python scripts/diff_explain.py --pg before.txt after.txt
For codebase audit:
- •
detect_orm_issues.py --dir ./→ find N+1 patterns - •
analyze_schema.py→ check schema - •Prioritize fixes by severity
For migration review:
- •
check_migration.py→ safety check - •Fix critical issues before deployment
- •Use safe alternatives (CONCURRENTLY, etc.)
EXPLAIN Commands
Provide appropriate commands:
-- PostgreSQL EXPLAIN (ANALYZE, BUFFERS) SELECT ...; -- MySQL EXPLAIN ANALYZE SELECT ...; -- SQLite EXPLAIN QUERY PLAN SELECT ...;
// MongoDB
db.collection.find({...}).explain("executionStats")
Index Recommendation Format
When suggesting indexes:
-- PostgreSQL/MySQL CREATE INDEX [CONCURRENTLY] idx_table_columns ON table(col1, col2, col3); -- Rationale: [why this column order, covering considerations]
// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })
Handling Missing Information
When user provides incomplete context:
No schema provided:
"To give precise recommendations, I need the table schema. Could you share:
- •Column data types
- •Existing indexes
- •Approximate row counts
Meanwhile, here's what I can infer from the query..."
No EXPLAIN output:
"Run this command and share the output:
sqlEXPLAIN (ANALYZE, BUFFERS) <their query>;This will show actual vs estimated rows and scan types."
Unknown database type:
"Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.) The optimization strategies differ significantly between them."
Conditional Workflows
Analyzing a query:
- •Run
analyze_query.pyfirst - •Check for anti-patterns
- •Suggest fixes with examples
Interpreting EXPLAIN output:
- •Run
parse_explain.pywith appropriate flag - •Identify warnings
- •Correlate with query structure
Full optimization request:
- •Analyze query → identify issues
- •Suggest indexes → generate CREATE statements
- •Request EXPLAIN → interpret results
- •Provide before/after comparison
Complex optimization (3+ issues or architectural decisions): Use extended reasoning to:
- •Map full query execution path
- •Identify all optimization opportunities
- •Evaluate trade-offs between approaches
- •Prioritize by impact and implementation cost
Script Error Handling
If scripts fail:
- •Check Python version (requires 3.7+)
- •Verify input format (SQL string, valid file path)
- •Fall back to manual analysis using reference files
Common errors:
- •
FileNotFoundError→ Check file path, use absolute paths - •
json.JSONDecodeError→ Input not valid JSON, check --pg/--mysql flags - •Empty output → Query may be valid, no issues detected
When scripts are unavailable: Perform manual analysis using Quick Reference red flags and reference files.
Unsupported Databases
For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):
- •Apply general SQL patterns from
references/sql-patterns.md - •Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
- •Recommend user consult database-specific documentation for EXPLAIN syntax and index creation