Python Performance
Performance profiling, benchmarking, and optimization for Python.
Core Principle
Profile before optimizing - Use profiling tools to identify real bottlenecks. Premature optimization wastes time.
Profiling Tools Decision Matrix
| Tool | Use When | What It Shows |
|---|---|---|
| cProfile | Find slow functions | Function call times |
| line_profiler | Bottleneck in specific function | Time per line |
| memory_profiler | Memory issues suspected | Memory per line |
| py-spy | Production profiling | Sampling profiler |
| timeit | Micro-benchmarks | Execution time only |
Basic Profiling
cProfile - Function-level
python
import cProfile
import pstats
profiler = cProfile.Profile()
profiler.enable()
result = expensive_function()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)
line_profiler - Line-level
python
@profile
def slow_function():
results = []
for i in range(10000):
results.append(i ** 2)
return results
# Run: kernprof -l -v script.py
See profiling-workflow.md for:
- •Complete profiling workflow
- •Interpreting profiler output
Optimization Strategies
Algorithm Optimization (Biggest Impact)
python
# BAD - O(n²)
def find_duplicates_slow(items):
for i, item in enumerate(items):
for j, other in enumerate(items[i+1:]):
if item == other:
return True
# GOOD - O(n)
def find_duplicates_fast(items):
return len(items) != len(set(items))
Data Structure Choice
python
# Use set for membership testing
allowed_set = {1, 2, 3, 4, 5} # O(1) lookup
if x in allowed_set:
pass
See optimization-strategies.md for:
- •Function call overhead
- •String operations
- •Dictionary optimizations
NumPy for Numerical Computing
python
import numpy as np # BAD - Pure Python loop result = [x**2 + 2*x + 1 for x in data] # GOOD - NumPy vectorization (10-100x faster) arr = np.array(data) result = arr**2 + 2*arr + 1
See numpy-optimization.md for:
- •Broadcasting
- •Avoiding loops with vectorization
Numba for JIT Compilation
python
from numba import jit
@jit(nopython=True)
def monte_carlo_pi_fast(n):
inside = 0
for i in range(n):
x = np.random.random()
y = np.random.random()
if x**2 + y**2 <= 1:
inside += 1
return 4 * inside / n
See numba-patterns.md for:
- •Type signatures
- •Parallel execution
Multiprocessing for CPU-Bound Work
python
from multiprocessing import Pool
def process_parallel(datasets):
with Pool() as pool:
return pool.map(cpu_intensive_task, datasets)
See parallel-processing.md for:
- •Process vs thread pools
- •Shared memory
Performance Anti-Patterns
See performance-anti-patterns.md for examples.
source: Python performance docs