Pytest Profiling & Optimization
Systematic workflow for diagnosing, fixing, and reporting on slow Python/pytest test suites. Measure first, optimize based on data, never guess.
Before Starting
- •
Read project conventions — Check
AGENTS.md,CLAUDE.md, orpyproject.tomlto understand:- •Which test directories exist (unit, integration, e2e)
- •How to run tests (
uv run pytest,pytest,make test, etc.) - •Test database setup and fixture patterns
- •
Ask the user:
- •Which test directory or marker to profile (or profile the full suite)
- •Where to write the report file
- •
Check for pyinstrument — If not installed, ask the user before adding it:
bashuv pip list | grep pyinstrument # If missing, ask before running: uv add --dev pyinstrument
Phase 1: Measure
All steps are read-only. No code changes.
Step 1 — Baseline timing + slowest tests
uv run pytest <test_dir> --durations=20 -q
Record:
- •Total wall time
- •Number of tests (passed, failed, skipped)
- •Top 20 slowest phases (setup, call, teardown)
Note whether the slowest items are setup, call, or teardown — this determines where to dig.
Step 2 — Fixture setup/teardown chains
Pick 2-3 of the slowest tests and run:
uv run pytest <test_dir> -k "name_of_slow_test" -q --setup-show
This shows the full fixture dependency chain. Look for:
- •Function-scoped fixtures that could be session-scoped
- •Expensive fixtures called repeatedly (DB setup, app creation, crypto)
- •Long teardown sequences
Step 3 — CPU profiling with pyinstrument
uv run pyinstrument -r text -m pytest <test_dir> -k "name_of_slow_test" -q
The call tree shows exactly where CPU time is spent. Repeat for 2-3 of the slowest tests to find common patterns.
Common hotspots to look for:
- •
bcrypt.hashpw/bcrypt.gensalt— password hashing (~200ms per call at default 12 rounds) - •
MetaData.create_all— SQLAlchemy table creation - •
TRUNCATE ... CASCADE— PostgreSQL lock acquisition - •
TestClient(app)— ASGI lifespan enter/exit - •Module imports — heavy libraries loaded at import time (matplotlib, boto3, numpy)
- •RSA key generation —
rsa.generate_private_key() - •Network connections — Redis/Valkey, external services
Step 4 — Micro-benchmarks (optional)
If a specific operation is suspect, isolate and measure it:
uv run python -c "
import time
# ... setup code ...
times = []
for i in range(10):
start = time.time()
# ... the suspect operation ...
times.append(time.time() - start)
print(f'avg: {sum(times)/len(times)*1000:.1f}ms')
"
This is useful for comparing alternatives (e.g., TRUNCATE vs DELETE, bcrypt rounds 12 vs 4).
Phase 2: Analyze
Summarize findings before touching any code:
- •Where is time spent? — Categorize by fixture setup, test body, teardown
- •Which fixtures are expensive? — List per-call cost and how many tests use them
- •Is there a common bottleneck? — One or two things that dominate across all tests
- •Estimate total impact —
(per-call cost) × (number of calls) = total savings
Phase 3: Optimize
Design targeted fixes based on the profiling data. Apply one fix at a time and measure after each.
Common fixes
Slow bcrypt hashing
Patch bcrypt.gensalt to use minimum rounds (4) in tests via a session-scoped autouse fixture in the root tests/conftest.py:
@pytest.fixture(autouse=True, scope="session")
def _fast_bcrypt():
original_gensalt = bcrypt.gensalt
def fast_gensalt(rounds=4, prefix=b"2b"):
return original_gensalt(rounds=4, prefix=prefix)
with patch("bcrypt.gensalt", fast_gensalt):
yield
Slow TRUNCATE teardown
Replace TRUNCATE ... CASCADE with DELETE FROM in reverse FK order. TRUNCATE acquires ACCESS EXCLUSIVE locks (~300ms even on empty tables). DELETE with row-level locks takes ~2ms for small row counts.
Expensive function-scoped fixtures
If a fixture is pure setup with no per-test state, consider widening its scope:
- •
scope="module"— shared within one test file - •
scope="session"— shared across the entire run
Only do this if tests don't mutate the fixture's state.
Slow RSA key generation
Cache a test keypair as a session-scoped fixture instead of generating per-test.
Slow module imports
These are one-time costs and generally not worth optimizing. Note them in the report but don't act on them unless they dominate.
Measurement after each fix
After each optimization:
- •Run the full suite:
uv run pytest <test_dir> -q --tb=no - •Record the new wall time
- •Calculate delta from previous state
Phase 4: Commit
Commit each optimization as a separate commit with measured timings in the message:
perf(tests): <what changed> <Why it helps.> <previous>s → <new>s (saved ~<delta>s)
Use conventional commits (perf: prefix for performance improvements).
Phase 5: Report
Write a report to the location chosen by the user. The report must include all of the following sections:
Report template
# Test Suite Performance: Analysis & Fixes **Date:** YYYY-MM-DD **Result:** <before>s → <after>s (<speedup>x faster) ## Baseline - Test command: `<exact command>` - Total tests: X passed, Y failed (pre-existing), Z skipped - Wall time: Xs ## Methodology 1. `pytest --durations=20` — identify slowest test phases 2. `pytest --setup-show` — trace fixture setup chains 3. `pyinstrument` — CPU call trees on slowest tests ## Profiling Findings | Component | Time | Scope | Notes | |---|---|---|---| | ... | ... | ... | ... | ## Bottleneck Breakdown (estimated) - **<bottleneck 1>:** N calls × Xms = ~Ys (Z%) - **<bottleneck 2>:** N calls × Xms = ~Ys (Z%) - **Actual test work:** ~Ys (Z%) ## Fixes ### Fix N: <title> **Commit:** `<commit message>` **File:** `<path>` **Time saved:** <before>s → <after>s (**-Xs**) <Description of what changed and why.> ### Combined Result | State | Wall time | Delta | |---|---|---| | Baseline | Xs | — | | + fix 1 | Xs | -Xs | | + fix 2 | Xs | -Xs | | **Total saved** | | **-Xs (Nx)** | ## Remaining Slow Tests | Test | Time | Reason | |---|---|---| | ... | ... | ... |
Commit the report as a separate docs: commit after all optimization commits.
Key Principles
- •Measure before optimizing — never guess where time is spent
- •One fix at a time — measure after each change to attribute savings accurately
- •Don't optimize genuine workload — raster I/O, large batch operations, etc. are expected costs
- •Test correctness first — verify the same tests pass/fail before and after each fix
- •Keep production code unchanged — all optimizations go in test infrastructure only