---
name: performance-optimization
description: >
Use when profiling and optimizing code for performance — bottleneck
identification, caching, async patterns, DB query optimization, memory
reduction, and latency improvement.
trigger: >
When user reports slow performance, asks for optimization, profiling,
or mentions high latency, slow queries, memory pressure, or CPU spikes.
references:
- CLAUDE.md [ARCHITECTURE] caching and async patterns
---
Performance Optimization Skill
Purpose
Systematically identify and resolve performance bottlenecks using profiling, categorization, and targeted optimization techniques. Every optimization must be backed by before/after benchmarks and protected by regression tests.
Workflow — Follow Each Step in Order
Step 1: Identify the Performance Complaint
Ask clarifying questions or inspect the reported issue to determine what is slow:
- •Slow endpoint — HTTP response time exceeds acceptable threshold
- •Slow query — Database queries taking too long or running too frequently
- •High memory — Process memory grows unbounded or exceeds limits
- •High CPU — CPU saturation during specific operations
- •Slow startup — Application boot time is excessive
- •Slow batch job — Scheduled or background tasks taking too long
Document the specific symptom, the affected code path, and any available metrics (response time, p50/p95/p99 latencies, memory usage, query counts).
Step 2: Explore the Relevant Code Path End-to-End
Trace the full execution path from entry point to completion:
- •Identify the entry point (route handler, CLI command, cron job, message consumer)
- •Map every function call, service interaction, and I/O operation in the path
- •Note external dependencies: databases, caches, APIs, file systems, queues
- •Identify any middleware, decorators, or interceptors in the path
- •Check for existing caching, connection pooling, or optimization attempts
Step 3: Profile — Add Timing and Logging at Each Stage
Insert instrumentation to measure where time is actually spent:
- •Add timing decorators or context managers around each major stage
- •Log wall-clock time, CPU time, and I/O wait time separately
- •For database paths: enable query logging with execution times
- •For HTTP paths: log time-to-first-byte and total response time
- •For memory concerns: snapshot heap usage before and after key operations
Example instrumentation pattern:
import time
import logging
logger = logging.getLogger(__name__)
def timed_section(name):
class Timer:
def __enter__(self):
self.start = time.perf_counter()
return self
def __exit__(self, *args):
elapsed = time.perf_counter() - self.start
logger.info(f"[PERF] {name}: {elapsed:.4f}s")
return Timer()
# Usage
with timed_section("fetch_user_data"):
user = db.query(User).filter_by(id=user_id).first()
Step 4: Categorize the Bottleneck
Based on profiling results, classify the bottleneck into one of four categories:
| Category | Indicators |
|---|---|
| CPU-bound | High CPU usage, slow computation, no I/O wait |
| I/O-bound | Low CPU, long wait times on network/disk, many sequential calls |
| Query-bound | Slow SQL queries, high query count (N+1), missing indexes |
| Memory-bound | High memory usage, frequent GC, large object allocations |
If multiple categories apply, address them in order of greatest impact.
Step 5: Fix I/O-Bound Bottlenecks
Apply these techniques for I/O-bound problems:
- •Convert sequential I/O to async: Use
asyncio,aiohttp,httpxfor concurrent HTTP calls; use async DB drivers (asyncpg,aiomysql) - •Add connection pooling: Configure pool sizes for DB connections, HTTP sessions, and Redis connections; reuse connections across requests
- •Batch requests: Replace N individual API calls with a single batch call;
use
INclauses instead of loops of single-row queries - •Add timeouts: Set connect and read timeouts on all external calls to prevent indefinite waits
- •Use streaming: For large payloads, stream data instead of buffering entire response in memory
# BEFORE: Sequential HTTP calls
results = []
for url in urls:
resp = requests.get(url)
results.append(resp.json())
# AFTER: Concurrent HTTP calls
import asyncio
import httpx
async def fetch_all(urls):
async with httpx.AsyncClient() as client:
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks)
return [r.json() for r in responses]
Step 6: Fix CPU-Bound Bottlenecks
Apply these techniques for CPU-bound problems:
- •Optimize the algorithm: Replace O(n^2) with O(n log n); use appropriate data structures (sets for membership, dicts for lookup)
- •Add caching: Use
functools.lru_cachefor pure functions; use Redis or Memcached for shared/distributed caching; set appropriate TTLs - •Precompute: Move expensive computations to build time or startup time when inputs are known in advance
- •Offload to async workers: Use Celery, RQ, or similar task queues to move heavy computation out of the request path
- •Use efficient serialization: Replace JSON with msgpack or protobuf for internal communication; avoid repeated serialization
# BEFORE: Recomputes expensive result every call
def get_report(user_id):
data = fetch_all_transactions(user_id)
return compute_aggregates(data) # expensive
# AFTER: Cache with TTL
from functools import lru_cache
import redis
cache = redis.Redis()
def get_report(user_id):
cache_key = f"report:{user_id}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
data = fetch_all_transactions(user_id)
result = compute_aggregates(data)
cache.setex(cache_key, 300, json.dumps(result)) # 5 min TTL
return result
Step 7: Fix Query-Bound Bottlenecks
Apply these techniques for database query problems:
- •Analyze query plans: Run
EXPLAIN ANALYZEon slow queries; look for sequential scans, nested loops, and high row estimates - •Add indexes: Create indexes on columns used in WHERE, JOIN, and ORDER BY clauses; use composite indexes for multi-column filters
- •Fix N+1 queries: Use eager loading (
joinedload,selectinloadin SQLAlchemy;select_related,prefetch_relatedin Django) - •Optimize ORM usage: Use
.only()or.defer()to load only needed columns; avoid loading entire objects when only IDs are needed - •Use database-level aggregation: Push SUM, COUNT, GROUP BY to the database instead of computing in application code
- •Add query result caching: Cache frequently-read, rarely-changed query results with appropriate invalidation
# BEFORE: N+1 query — 1 query for orders + N queries for users
orders = session.query(Order).all()
for order in orders:
print(order.user.name) # triggers lazy load each iteration
# AFTER: Eager load with joinedload — 1 query total
from sqlalchemy.orm import joinedload
orders = session.query(Order).options(joinedload(Order.user)).all()
for order in orders:
print(order.user.name) # no additional query
Step 8: Fix Memory-Bound Bottlenecks
Apply these techniques for memory problems:
- •Use generators: Replace list comprehensions with generator expressions
for large datasets; use
yieldinstead of building lists in memory - •Reduce data copies: Avoid unnecessary
.copy()calls; use views/slices where possible; process data in chunks - •Profile with memory_profiler: Use
@profiledecorator ormemory_usage()to identify allocation hotspots - •Use appropriate data types: Use
__slots__on frequently-instantiated classes; usearraymodule or NumPy for numeric data instead of lists - •Implement streaming processing: Process large files line-by-line instead of loading entirely into memory
# BEFORE: Loads all rows into memory
def process_large_file(path):
with open(path) as f:
data = f.readlines() # entire file in memory
return [transform(line) for line in data]
# AFTER: Generator-based streaming
def process_large_file(path):
with open(path) as f:
for line in f: # one line at a time
yield transform(line)
Step 9: Implement Fix with Before/After Benchmarks
- •Record the baseline metric before making changes (response time, query count, memory peak, CPU usage)
- •Implement the optimization using the technique from the applicable step above
- •Record the same metric after the change
- •Document the improvement as a percentage and absolute value
- •If improvement is less than 10%, reconsider whether the optimization is worthwhile given added complexity
Step 10: Write Regression Test
Create a test that will fail if the performance bottleneck is reintroduced:
import time
import pytest
def test_endpoint_response_time():
"""Regression test: /api/reports must respond within 500ms."""
start = time.perf_counter()
response = client.get("/api/reports")
elapsed = time.perf_counter() - start
assert response.status_code == 200
assert elapsed < 0.5, f"Response took {elapsed:.2f}s, exceeds 500ms threshold"
def test_query_count(django_assert_num_queries):
"""Regression test: report generation must use <= 5 queries."""
with django_assert_num_queries(5):
generate_report(user_id=1)
Step 11: Verify
Run the complete verification sequence:
- •Run the specific benchmark to confirm the improvement meets expectations
- •Run the full test suite to confirm no regressions
- •Review the change against CLAUDE.md [ARCHITECTURE] caching and async patterns
- •Confirm the optimization does not introduce new failure modes (cache stampede, connection pool exhaustion, race conditions)
- •Document the optimization: what was slow, why, what was changed, and the measured improvement
Enforced Standards
Google-Style Docstrings (MANDATORY)
Every function, method, and class written or modified during performance optimization MUST have a Google-style docstring. No exceptions. This includes:
- •One-line summary in imperative mood
- •Args section for all parameters
- •Returns section describing what is returned
- •Raises section for all exceptions
- •See CLAUDE.md [STANDARDS] for full specification and examples.
Git Commit Format (MANDATORY)
All commits created during performance optimization MUST follow this format:
- •Signed commits: Always use
git commit -S - •Semantic prefix:
feat:,fix:,refactor:,test:,docs:,chore:,ci: - •File-change table in the commit body:
code
type: concise description | File (Location) | Summary of Change | |---|---| | path/to/file.py | What changed in this file | Author: PrabhukumarSivamoorthy@gmail.com
- •See CLAUDE.md [GIT] for full specification.
Checklist Before Completion
- • Bottleneck identified and categorized
- • Root cause understood (not just symptom treated)
- • Before/after benchmarks recorded
- • Regression test written and passing
- • Full test suite passing
- • No new failure modes introduced
- • Change documented with metrics
- • Consistent with CLAUDE.md [ARCHITECTURE] patterns