Pytest Development
Workflow
- •Identify behavior - Target a single function/behavior per test
- •Write the test - Plain function (never a class) with AAA structure
- •Run and iterate - Refine for readability and isolation
Best Practices
- •Write plain functions, no test classes
- •Name tests
test_<function>_<behavior>— be specific about inputs and outcomes - •One test per distinct behavior; multiple assertions are fine when verifying one operation's effects
- •Use parametrization when testing the same logic with different inputs (e.g., True/False flags)
- •Skip trivial tests that verify unchanged/default behavior (e.g., "False stays False")
- •Use
HTTPStatus.OKnot200for status assertions - •Test both success and error paths
- •Verify database state after operations
- •Use parametrization for input/output matrices, not loops
- •Keep fixtures in app-specific
conftest.pyfiles - •Use
pytestmarkfor module-level markers - •Aim for meaningful coverage, not just high percentages
- •Keep imports at the top of the file, never inside test functions
- •Use blank lines to separate AAA sections, no comments needed
- •Use modern type hints:
dict[str, Any],list[int],str | None(notDict,List,Optional) - •Consolidate assertions: Verify status code, response body, and side effects (DB state) in a single test function. Do not split these into separate tests.
- •Avoid implementation details: Do not test logging calls, private methods, path construction strings, or simple wrapper delegation.
- •Dataclasses for Parametrization: If a parametrized test needs more than 2 arguments, use a dataclass to structure the test cases.
Project Structure
Always organize tests using src layout with unit/integration split:
my-service/ ├── src/ │ └── my_service/ ├── tests/ │ ├── conftest.py # Global fixtures │ ├── mocks/ # Shared mock objects (optional) │ │ └── fake_s3.py │ ├── unit/ │ │ ├── conftest.py # Unit fixtures (mocks) │ │ └── test_api.py │ └── integration/ │ ├── conftest.py # Integration fixtures (containers) │ └── test_database.py └── pyproject.toml
- •Place unit tests in
tests/unit/ - •Place integration tests in
tests/integration/ - •Create the directories if they don't exist
Unit tests: Mock everything, run in milliseconds, no external services.
Integration tests: Real dependencies via testcontainers.
Anti-Patterns (Do Not Generate)
- •Test Classes: Do not use
class TestFoo:to group tests. Use plain functions with descriptive names liketest_foo_does_x(). Test classes add unnecessary indentation andselfparameters. - •Logging Verification: Do not test that
logger.errorwas called. Exception raising is sufficient contract verification. - •Wrapper Tests: Do not test methods that simply call another method (delegation). Test the underlying logic or the full chain.
- •Path Construction: Do not write tests solely to verify a URL string is built correctly; this is covered by the actual API call test.
- •Atomic Fragmentation: Do not write separate tests for
status_code,response_data, anddb_state. Combine them into one behavioral test.
Test Naming
Good names — specific about function and behavior:
def test_user_creation_with_valid_data(): ... def test_login_fails_with_invalid_password(): ... def test_api_returns_404_for_missing_resource(): ... def test_order_total_includes_tax_for_california(): ...
Bad names — vague or meaningless:
def test_1(): ... # Not descriptive def test_user(): ... # Too vague def test_it_works(): ... # What works?
Test Structure (AAA Pattern)
Use blank lines to separate Arrange/Act/Assert — no comments needed:
from http import HTTPStatus
# Good — plain function
def test_create_user_returns_201_with_valid_data(client, make_user_payload):
payload = make_user_payload(email="new@example.com")
response = client.post("/users", json=payload)
assert response.status_code == HTTPStatus.CREATED
assert response.json()["email"] == "new@example.com"
# Bad — test class adds unnecessary structure
class TestCreateUser: # Don't do this
def test_returns_201(self, client, make_user_payload):
payload = make_user_payload(email="new@example.com")
response = client.post("/users", json=payload)
assert response.status_code == HTTPStatus.CREATED
Test both success and error paths:
def test_create_user_returns_400_with_invalid_email(client, make_user_payload):
payload = make_user_payload(email="invalid-email")
response = client.post("/users", json=payload)
assert response.status_code == HTTPStatus.BAD_REQUEST
assert "email" in response.json()["errors"]
Verify database state after operations:
def test_delete_user_removes_from_database(client, db_session, make_user):
user = make_user(email="delete@example.com")
db_session.add(user)
db_session.commit()
user_id = user.id
response = client.delete(f"/users/{user_id}")
assert response.status_code == HTTPStatus.NO_CONTENT
assert db_session.query(User).filter_by(id=user_id).first() is None
Testing exceptions:
def test_withdraw_raises_on_insufficient_funds(make_account):
account = make_account(balance=50.00)
with pytest.raises(InsufficientFundsError) as exc_info:
account.withdraw(100.00)
assert exc_info.value.available == 50.00
assert exc_info.value.requested == 100.00
Test Consolidation
Combine when one operation affects multiple fields:
# Good: One test verifies all effects of mark_completed()
def test_mark_order_completed_updates_status_and_timestamp(db_session, make_order):
order = make_order(status="pending", completed_at=None)
mark_completed(order.id)
result = db_session.query(Order).filter_by(id=order.id).first()
assert result.status == "completed"
assert result.completed_at is not None
# Bad: Separate tests for each field
def test_mark_order_completed_sets_status(): ...
def test_mark_order_completed_sets_timestamp(): ...
Parametrize for input variations:
# Good: One parametrized test
@pytest.mark.parametrize("mark_error", [
pytest.param(False, id="without_error"),
pytest.param(True, id="with_error"),
])
def test_create_job_status(db_session, mark_error):
create_job_status("run-123", mark_error=mark_error)
result = db_session.query(JobStatus).first()
assert result.has_error is mark_error
# Bad: Separate tests for True and False
def test_create_job_status_without_error(): ...
def test_create_job_status_with_error(): ...
Skip trivial tests:
# Skip: Testing that False stays False adds no value def test_error_stays_false_when_not_marked(): ... # Don't write this
Consolidate Initialization Tests: Instead of testing every service property separately, verify the container in one go.
# Good: Comprehensive Container Test
def test_service_container_initializes_all_services(mock_client):
container = ServiceContainer(mock_client)
# Verify all services exist and share the client
assert isinstance(container.users, UserService)
assert isinstance(container.orders, OrderService)
assert container.users.client is mock_client
assert container.orders.client is mock_client
Fixtures
Factory Pattern
Use make_* naming with inner factory functions:
@pytest.fixture
def make_chat_session():
def _make(user, **kwargs):
defaults = {"session_id": uuid4()}
defaults.update(kwargs)
return ChatSession.objects.create(user=user, **defaults)
return _make
# Usage
def test_session_behavior(admin_user, make_chat_session):
session = make_chat_session(admin_user, status="active")
assert session.user == admin_user
Full example with factory fixture:
@pytest.fixture
def make_order():
def _make(user, items=None, **kwargs):
defaults = {"status": "pending", "items": items or []}
defaults.update(kwargs)
return Order(user=user, **defaults)
return _make
def test_order_total_calculates_with_tax(make_order, make_user):
user = make_user(state="CA")
order = make_order(user=user, items=[{"name": "Widget", "price": 100.00, "quantity": 2}])
total = order.calculate_total()
assert total == 214.50 # 200 + 7.25% tax
Conftest Hierarchy
tests/conftest.py — Global fixtures:
@pytest.fixture(scope="session")
def app_settings():
return {"debug": True, "db_url": "sqlite:///:memory:"}
tests/unit/conftest.py — Unit-specific mocks:
@pytest.fixture
def mock_db_session(mocker):
return mocker.MagicMock()
tests/integration/conftest.py — Real resources:
@pytest.fixture(scope="module")
def db_session(postgres_container):
engine = create_engine(postgres_container.get_connection_url())
with Session(engine) as session:
yield session
session.rollback()
Fixture Scopes
| Scope | Lifecycle | Use for |
|---|---|---|
function | Each test (default) | Test-specific data |
module | Per test file | Expensive setup shared by file |
session | Entire test run | Containers, app config |
Teardown
Use yield for cleanup:
@pytest.fixture
def temp_file():
path = Path("/tmp/test_file.txt")
path.write_text("test")
yield path
path.unlink() # Cleanup after test
Mocking
pytest-mock
Use the mocker fixture (not unittest.mock directly):
def test_api_call(mocker):
mock_client = mocker.patch("myservice.api.external_client")
mock_client.return_value.fetch.return_value = {"data": "test"}
result = call_external_api()
assert result == {"data": "test"}
mock_client.return_value.fetch.assert_called_once()
Mock assertions:
mock_client.assert_called_once() mock_client.assert_awaited_once_with(user=admin_user, project=project) mock_client.assert_not_called() assert mock_client.call_count == 1 assert mock_client.call_args.kwargs["id"] == "test_123"
Patching patterns:
# Patch where it's used, not where it's defined
mocker.patch("myservice.handlers.requests.get")
# Patch object attribute
mocker.patch.object(MyClass, "method", return_value="mocked")
# Patch with side effect
mocker.patch("module.func", side_effect=ValueError("error"))
Shared Mocks
Inline mocks — For simple, single-test mocks:
def test_payment(mocker):
mock_stripe = mocker.patch("myservice.payments.stripe")
mock_stripe.charge.return_value = {"id": "ch_123"}
# test logic
Shared mocks — For complex fakes reused across multiple tests, create tests/mocks/ module:
# tests/mocks/fake_s3.py
class FakeS3Client:
def __init__(self):
self.storage = {}
def put_object(self, Bucket, Key, Body):
self.storage[f"{Bucket}/{Key}"] = Body
def get_object(self, Bucket, Key):
return {"Body": self.storage[f"{Bucket}/{Key}"]}
# tests/conftest.py
from tests.mocks.fake_s3 import FakeS3Client
@pytest.fixture
def fake_s3():
return FakeS3Client()
freezegun (Time-Based Testing)
from freezegun import freeze_time
def test_session_expiry(session_service, admin_user):
with freeze_time("2023-01-01 12:00:00"):
session = session_service.create_session(user=admin_user)
with freeze_time("2023-01-01 13:00:00"): # 1 hour later
assert session_service.is_expired(session) is True
testcontainers (Integration Tests)
PostgreSQL:
from testcontainers.postgres import PostgresContainer
@pytest.fixture(scope="module")
def postgres_container():
with PostgresContainer("postgres:15") as postgres:
yield postgres
@pytest.fixture
def db_session(postgres_container):
engine = create_engine(postgres_container.get_connection_url())
Base.metadata.create_all(engine)
with Session(engine) as session:
yield session
session.rollback()
Redis:
from testcontainers.redis import RedisContainer
@pytest.fixture(scope="module")
def redis_container():
with RedisContainer("redis:7") as redis:
yield redis
Place container fixtures in tests/integration/conftest.py with scope="module" or scope="session".
Parametrization
Rule: Use pytest.mark.parametrize for inputs. If the test case requires more than 2 arguments, define a dataclass at module level.
Dataclass Placement
Define parametrization dataclasses after imports, before tests:
from dataclasses import dataclass, field
from typing import Any
import pytest
pytestmark = pytest.mark.unit
@dataclass
class ApiRequestCase:
method: str
path: str
status_code: int
response_data: dict[str, Any]
request_kwargs: dict[str, Any] = field(default_factory=dict)
assertion_checks: dict[str, Any] = field(default_factory=dict)
def test_api_client_request_successful_scenarios(case):
...
Complex Parametrization (Dataclass Pattern)
Use dataclasses for 3+ parameters with modern type hints:
from dataclasses import dataclass, field
from typing import Any
@dataclass
class ApiCase:
payload: dict[str, Any]
expected_status: int
expected_error: str | None = None
headers: dict[str, str] = field(default_factory=dict)
@pytest.mark.parametrize("case", [
ApiCase(
payload={"email": "bad-format", "name": "Test"},
expected_status=400,
expected_error="Invalid email"
),
ApiCase(
payload={"email": "good@test.com"},
expected_status=400,
expected_error="Field 'name' required"
),
], ids=lambda c: f"status_{c.expected_status}")
def test_create_user_validation(client, case):
response = client.post("/users", json=case.payload, headers=case.headers)
assert response.status_code == case.expected_status
if case.expected_error:
assert case.expected_error in response.json()["detail"]
ID Generation Tips:
- •Use lambda to generate IDs from dataclass fields
- •Keep IDs concise but meaningful
- •Do not add fields solely for test IDs (like
description) - •Examples:
- •
ids=lambda c: c.method.lower() - •
ids=lambda c: f"{c.status}_{c.type}" - •
ids=lambda c: f"{c.method.lower()}_{c.status_code}" - •
ids=lambda c: "guaranteed_pass" if c.guaranteed_pass else "guaranteed_fail"
- •
Type Hints:
- •Use lowercase:
dict[str, Any],list[int], notDict,List - •Use pipe syntax:
str | None, notOptional[str] - •Only import
Anyfrom typing when needed
Simple Parametrization (Tuple Pattern)
For 1 or 2 arguments, tuples are acceptable:
@pytest.mark.parametrize("status_code, should_retry", [
(500, True),
(400, False),
], ids=["server_error", "client_error"])
def test_retry_logic(status_code, should_retry):
assert should_retry_request(status_code) == should_retry
Markers
import pytest pytestmark = pytest.mark.integration # Module-level @pytest.mark.slow def test_heavy_computation(): ... @pytest.mark.skip(reason="Not implemented yet") def test_future_feature(): ... @pytest.mark.skipif(sys.platform == "win32", reason="Unix only") def test_unix_specific(): ... @pytest.mark.xfail(reason="Known bug #123") def test_known_bug(): ...
Run by marker: pytest -m "not slow" or pytest -m integration
Running Tests
uv run pytest # All tests uv run pytest -m unit # Unit tests only uv run pytest -m integration # Integration tests only uv run pytest tests/unit/test_module.py::test_name # Specific test uv run pytest -v -s # Verbose with print output uv run pytest --cov=src --cov-report=term-missing # Coverage report uv run pytest --cov-fail-under=80 # Enforce 80% coverage uv run pytest -k "user" # Match test names
Configuration
# pyproject.toml
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
addopts = "-v --tb=short"
markers = [
"unit: fast running unit tests",
"integration: slow running integration tests",
]
[tool.pytest_env]
# Environment variables for tests (requires pytest-env)
DATABASE_URL = "postgresql://test:test@localhost:5432/test_db"
REDIS_URL = "redis://localhost:6379/0"
ENV = "test"
DEBUG = "false"
Checklist
Before finishing, verify:
- • Tests placed in
tests/unit/ortests/integration/ - • Imports at top of file, not inside functions
- • Modern type hints:
dict[str, Any],list[int],str | None - • Dataclasses defined at module level (after imports, before tests)
- • Plain functions, no test classes
- • Names follow
test_<function>_<behavior> - • Blank lines separate AAA sections (no comments)
- • One test per behavior (combine assertions for one operation's effects)
- • Parametrization used for input variations (True/False, different values)
- • Dataclasses used for 3+ parametrization arguments
- • Lambda-based test IDs generated from dataclass fields
- • No trivial tests (e.g., "False stays False")
- • Using
HTTPStatusenum, not raw status codes - • Testing both success and error paths
- • Factory fixtures use
make_*naming - • Mocks use
mockerfixture, notunittest.mock - • Patching where used, not where defined
- • Time tests use
freezegun - • Integration tests use
testcontainers