Python Performance Optimization
Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.
When to Use This Skill
- •Code runs too slowly for production requirements
- •High CPU usage or memory consumption issues
- •Need to reduce API response times or batch processing duration
- •Application fails to scale under load
- •Optimizing data processing pipelines or scientific computing
- •Reducing cloud infrastructure costs through efficiency gains
- •Profile-guided optimization after measuring performance bottlenecks
Core Concepts
The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.
Optimization Hierarchy (in priority order):
- •Algorithm complexity - O(n²) → O(n log n) provides exponential gains
- •Data structure choice - List → Set for lookups (10,000x faster)
- •Language features - Comprehensions, built-ins, generators
- •Caching - Memoization for repeated calculations
- •Compiled extensions - NumPy, Numba, Cython for hot paths
- •Parallelism - Multiprocessing for CPU-bound work
Key Principle: Algorithmic improvements beat micro-optimizations every time.
Quick Reference
Load detailed guides for specific optimization areas:
| Task | Load reference |
|---|---|
| Profile code and find bottlenecks | skills/python-performance-optimization/references/profiling.md |
| Algorithm and data structure optimization | skills/python-performance-optimization/references/algorithms.md |
| Memory optimization and generators | skills/python-performance-optimization/references/memory.md |
| String concatenation and file I/O | skills/python-performance-optimization/references/string-io.md |
| NumPy, Numba, Cython, multiprocessing | skills/python-performance-optimization/references/acceleration.md |
Optimization Workflow
Phase 1: Measure
- •Profile with cProfile - Identify slow functions
- •Line profile hot paths - Find exact slow lines
- •Memory profile - Check for memory bottlenecks
- •Benchmark baseline - Record current performance
Phase 2: Analyze
- •Check algorithm complexity - Is it O(n²) or worse?
- •Evaluate data structures - Are you using lists for lookups?
- •Identify repeated work - Can results be cached?
- •Find I/O bottlenecks - Database queries, file operations
Phase 3: Optimize
- •Improve algorithms first - Biggest impact
- •Use appropriate data structures - Set/dict for O(1) lookups
- •Apply caching -
@lru_cachefor expensive functions - •Use generators - For large datasets
- •Leverage NumPy/Numba - For numerical code
- •Parallelize - Multiprocessing for CPU-bound tasks
Phase 4: Validate
- •Re-profile - Verify improvements
- •Benchmark - Measure speedup quantitatively
- •Test correctness - Ensure optimizations didn't break functionality
- •Document - Explain why optimization was needed
Common Optimization Patterns
Pattern 1: Replace List with Set for Lookups
python
# Slow: O(n) lookup if item in large_list: # Bad # Fast: O(1) lookup if item in large_set: # Good
Pattern 2: Use Comprehensions
python
# Slower
result = []
for i in range(n):
result.append(i * 2)
# Faster (35% speedup)
result = [i * 2 for i in range(n)]
Pattern 3: Cache Expensive Calculations
python
from functools import lru_cache
@lru_cache(maxsize=None)
def expensive_function(n):
# Result cached automatically
return complex_calculation(n)
Pattern 4: Use Generators for Large Data
python
# Memory inefficient
def read_file(path):
return [line for line in open(path)] # Loads entire file
# Memory efficient
def read_file(path):
for line in open(path): # Streams line by line
yield line.strip()
Pattern 5: Vectorize with NumPy
python
# Pure Python: ~500ms result = sum(i**2 for i in range(1000000)) # NumPy: ~5ms (100x faster) import numpy as np result = np.sum(np.arange(1000000)**2)
Common Mistakes to Avoid
- •Optimizing before profiling - You'll optimize the wrong code
- •Using lists for membership tests - Use sets/dicts instead
- •String concatenation in loops - Use
"".join()orStringIO - •Loading entire files into memory - Use generators
- •N+1 database queries - Use JOINs or batch queries
- •Ignoring built-in functions - They're C-optimized and fast
- •Premature optimization - Focus on algorithmic improvements first
- •Not benchmarking - Always measure improvements quantitatively
Decision Tree
Start here: Profile with cProfile to find bottlenecks
Hot path is algorithm?
- •Yes → Check complexity, improve algorithm, use better data structures
- •No → Continue
Hot path is computation?
- •Numerical loops → NumPy or Numba
- •CPU-bound → Multiprocessing
- •Already fast enough → Done
Hot path is memory?
- •Large data → Generators, streaming
- •Many objects →
__slots__, object pooling - •Caching needed →
@lru_cacheor custom cache
Hot path is I/O?
- •Database → Batch queries, indexes, connection pooling
- •Files → Buffering, streaming
- •Network → Async I/O, request batching
Best Practices
- •Profile before optimizing - Measure to find real bottlenecks
- •Optimize algorithms first - O(n²) → O(n) beats micro-optimizations
- •Use appropriate data structures - Set/dict for lookups, not lists
- •Leverage built-ins - C-implemented built-ins are faster than pure Python
- •Avoid premature optimization - Optimize hot paths identified by profiling
- •Use generators for large data - Reduce memory usage with lazy evaluation
- •Batch operations - Minimize overhead from syscalls and network requests
- •Cache expensive computations - Use
@lru_cacheor custom caching - •Consider NumPy/Numba - Vectorization and JIT for numerical code
- •Parallelize CPU-bound work - Use multiprocessing to utilize all cores
Resources
- •Python Performance: https://wiki.python.org/moin/PythonSpeed
- •cProfile: https://docs.python.org/3/library/profile.html
- •NumPy: https://numpy.org/doc/stable/user/absolute_beginners.html
- •Numba: https://numba.pydata.org/
- •Cython: https://cython.readthedocs.io/
- •High Performance Python (Book by Gorelick & Ozsvald)