AgentSkillsCN

pytest-unit-testing

全面指南:如何用Python中的pytest高效编写单元测试。适用于以下场景:(1) 新建测试文件或测试函数;(2) 在项目中搭建pytest环境;(3) 测试会抛出错误或异常的函数;(4) 确定测试范围与测试方式;(5) 通过fixture与参数化复用测试代码;(6) 测试数据处理函数(尤其是pandas与numpy);(7) 调试失败的测试用例。本技能强调纯函数设计、清晰的测试组织结构,以及可测试代码与良好代码设计之间的紧密关联。

SKILL.md
--- frontmatter
name: pytest-unit-testing
description: "Comprehensive guide for writing effective unit tests with pytest in Python. Use when: (1) Writing new test files or test functions, (2) Setting up pytest in a project, (3) Testing functions that raise errors or exceptions, (4) Deciding what to test and how, (5) Reusing test code with fixtures and parametrization, (6) Testing data processing functions (especially pandas/numpy), (7) Debugging failing tests. This skill emphasizes pure functions, clear test organization, and the relationship between testable code and good code design."

Unit Testing with pytest

Core Philosophy

Unit testing validates that individual pieces of code work correctly in isolation. The key insight: code that is easy to test is usually well-designed code. Writing tests forces you to think about function interfaces, inputs, outputs, and edge cases.

Why Test?

  1. Catch bugs early - Before they compound into harder problems
  2. Enable safe refactoring - Change code confidently knowing tests verify behavior
  3. Document expected behavior - Tests serve as executable specifications
  4. Improve code design - Testable code tends to be modular and well-structured

Pure Functions: The Foundation of Testable Code

Pure functions are the easiest to test because they:

  • Always return the same output for the same input
  • Have no side effects (don't modify external state)
  • Don't depend on external state (files, databases, time, random)
python
# ✅ Pure function - easy to test
def calculate_tax(income, rate):
    return income * rate

# ❌ Impure function - hard to test
def calculate_tax_and_log(income, rate):
    result = income * rate
    with open("log.txt", "a") as f:  # Side effect
        f.write(f"{result}\n")
    return result

Design principle: Extract the pure logic from impure operations. Test the pure part thoroughly.

pytest Mechanics

File and Function Discovery

pytest automatically discovers tests using these conventions:

PatternWhat pytest finds
test_*.py or *_test.pyTest files
test_* functionsTest functions
Test* classesTest classes
code
project/
├── src/
│   └── calculations.py
└── tests/
    ├── test_calculations.py    # ✅ discovered
    └── helper_utils.py         # ❌ not discovered (no test_ prefix)

Running Tests

bash
# Run all tests
pixi run pytest

# Verbose output (shows each test name)
pixi run pytest -v

# Run specific file
pixi run pytest tests/test_calculations.py

# Run specific test function
pixi run pytest tests/test_calculations.py::test_add_positive_numbers

# Stop on first failure
pixi run pytest -x

# Run with debugger on failure
pixi run pytest --pdbp

Writing Tests

Basic Assertions

pytest uses plain assert statements. On failure, pytest shows detailed comparison:

python
def test_add_numbers():
    result = add(2, 3)
    assert result == 5

def test_list_contains_item():
    items = get_items()
    assert "apple" in items
    assert len(items) == 3

def test_approximate_equality():
    # For floats, use pytest.approx
    assert calculate_pi() == pytest.approx(3.14159, rel=1e-5)

Testing DataFrames (pandas)

python
import pandas as pd
import pandas.testing as tm

def test_clean_data():
    raw = pd.DataFrame({"value": ["-99", "10", "-77"]})
    result = clean_missing_codes(raw)
    
    expected = pd.DataFrame({"value": [pd.NA, 10, pd.NA]})
    tm.assert_frame_equal(result, expected)

def test_column_types():
    df = process_data(input_df)
    assert df["category"].dtype == pd.CategoricalDtype()

Testing NumPy Arrays

python
import numpy as np
import numpy.testing as npt

def test_array_calculation():
    result = normalize(np.array([1, 2, 3]))
    expected = np.array([0.0, 0.5, 1.0])
    npt.assert_array_almost_equal(result, expected)

Testing Exceptions

Use pytest.raises to verify code raises expected errors:

python
import pytest

def test_divide_by_zero_raises():
    with pytest.raises(ZeroDivisionError):
        divide(10, 0)

def test_invalid_input_message():
    # Also verify the error message
    with pytest.raises(ValueError, match="must be positive"):
        calculate_sqrt(-5)

def test_type_error_for_wrong_input():
    with pytest.raises(TypeError) as exc_info:
        process_data("not a list")
    assert "must be a list" in str(exc_info.value)

What to Test

Test Categories

  1. Normal cases - Expected inputs produce expected outputs
  2. Edge cases - Boundary conditions, empty inputs, single elements
  3. Error cases - Invalid inputs raise appropriate exceptions

Example: Testing a Data Cleaning Function

python
def clean_agreement_scale(sr):
    """Convert survey responses to ordered categorical."""
    sr = sr.replace({"-77": pd.NA, "-99": pd.NA})
    categories = ["strongly disagree", "disagree", "neutral", "agree", "strongly agree"]
    return sr.astype(pd.CategoricalDtype(categories=categories, ordered=True))

Tests to write:

python
def test_clean_agreement_normal_values():
    """Normal case: valid responses are preserved."""
    sr = pd.Series(["agree", "disagree"])
    result = clean_agreement_scale(sr)
    assert list(result) == ["agree", "disagree"]

def test_clean_agreement_missing_codes():
    """Edge case: missing codes become NA."""
    sr = pd.Series(["-77", "-99", "agree"])
    result = clean_agreement_scale(sr)
    assert pd.isna(result.iloc[0])
    assert pd.isna(result.iloc[1])
    assert result.iloc[2] == "agree"

def test_clean_agreement_returns_ordered_categorical():
    """Output type: categorical with correct ordering."""
    sr = pd.Series(["agree"])
    result = clean_agreement_scale(sr)
    assert result.dtype.ordered
    assert result.dtype.categories.tolist() == [
        "strongly disagree", "disagree", "neutral", "agree", "strongly agree"
    ]

def test_clean_agreement_empty_series():
    """Edge case: empty input returns empty categorical."""
    sr = pd.Series([], dtype=str)
    result = clean_agreement_scale(sr)
    assert len(result) == 0

Reusing Test Code

Fixtures: Shared Setup

Fixtures provide reusable test data and setup. Defined with @pytest.fixture:

python
import pytest

@pytest.fixture
def sample_survey_data():
    """Survey data with various response patterns."""
    return pd.DataFrame({
        "q1": ["agree", "-77", "disagree"],
        "q2": ["-99", "strongly agree", "neutral"]
    })

@pytest.fixture
def empty_dataframe():
    return pd.DataFrame()

# Tests automatically receive fixtures by parameter name
def test_clean_survey(sample_survey_data):
    result = clean_survey(sample_survey_data)
    assert result.shape == sample_survey_data.shape

def test_handle_empty(empty_dataframe):
    result = process(empty_dataframe)
    assert len(result) == 0

Parametrization: Multiple Test Cases

Test the same logic with different inputs using @pytest.mark.parametrize:

python
import pytest

@pytest.mark.parametrize("input_val,expected", [
    (0, 0),
    (1, 1),
    (5, 120),      # 5! = 120
    (10, 3628800), # 10! = 3628800
])
def test_factorial(input_val, expected):
    assert factorial(input_val) == expected

@pytest.mark.parametrize("invalid_input", [-1, -5, -100])
def test_factorial_negative_raises(invalid_input):
    with pytest.raises(ValueError, match="must be non-negative"):
        factorial(invalid_input)

Combining Fixtures and Parametrization

python
@pytest.fixture
def calculator():
    return Calculator()

@pytest.mark.parametrize("a,b,expected", [
    (1, 2, 3),
    (0, 0, 0),
    (-1, 1, 0),
])
def test_calculator_add(calculator, a, b, expected):
    assert calculator.add(a, b) == expected

Test Organization

File Structure

code
project/
├── src/
│   ├── data_cleaning.py
│   └── analysis.py
└── tests/
    ├── conftest.py          # Shared fixtures
    ├── test_data_cleaning.py
    └── test_analysis.py

conftest.py: Shared Fixtures

Place fixtures used across multiple test files in conftest.py:

python
# tests/conftest.py
import pytest
import pandas as pd

@pytest.fixture
def sample_data():
    return pd.read_csv("tests/data/sample.csv")

@pytest.fixture
def database_connection():
    conn = create_connection()
    yield conn  # Test runs here
    conn.close()  # Cleanup after test

Debugging Test Failures

Understanding pytest Output

code
FAILED tests/test_calc.py::test_add - AssertionError: assert 4 == 5

pytest shows:

  1. Which test failed
  2. The assertion that failed
  3. Actual vs expected values

Using the Debugger

bash
# Drop into debugger on failure
pytest --pdb

# Drop into debugger at test start
pytest --pdb --trace

Inspecting Failures

python
def test_data_processing():
    result = complex_processing(data)
    
    # Add debug prints (shown on failure)
    print(f"Result shape: {result.shape}")
    print(f"Result columns: {result.columns.tolist()}")
    
    assert result.shape == (100, 5)

Error Handling Design

Good error handling makes code easier to test and debug.

Pattern: Fail Early with Clear Messages

Validate inputs at function entry, before any processing. This makes debugging easier and prevents partial work.

  1. Identify risky inputs - Those from users or not validated elsewhere
  2. List failure modes - Start easy (wrong type) → specific (wrong structure)
  3. Write _fail_if_... functions - One per condition
  4. Call validators early - Before any processing
  5. Test error messages - Ensure they're helpful
python
def create_markdown_table(data):
    """Create markdown table from list of dicts or dict of lists."""
    _fail_if_neither_dict_nor_list(data)
    
    if isinstance(data, dict):
        _fail_if_dict_of_wrong_types(data)
        _fail_if_dict_of_lists_with_different_lengths(data)
        data = convert_dol_to_lod(data)
    else:
        _fail_if_list_of_wrong_types(data)
        _fail_if_list_of_dicts_with_different_keys(data)
    
    return _format_table(data)

def _fail_if_neither_dict_nor_list(data):
    if not isinstance(data, (list, dict)):
        raise TypeError(
            f"data must be a list of dicts or dict of lists. Got {type(data)}"
        )

def _fail_if_list_of_wrong_types(data):
    invalid_rows = [i for i, row in enumerate(data) if not isinstance(row, dict)]
    if invalid_rows:
        report = "The following rows are not dictionaries:\n"
        for i in invalid_rows:
            report += f"  Row {i} has type {type(data[i])}\n"
        raise TypeError(report)

Testing Error Handling

python
def test_create_table_rejects_string():
    with pytest.raises(TypeError, match="must be a list of dicts"):
        create_markdown_table("not valid")

def test_create_table_reports_invalid_rows():
    with pytest.raises(TypeError) as exc_info:
        create_markdown_table([{"a": 1}, "invalid", {"a": 2}])
    assert "Row 1" in str(exc_info.value)

Best Practices Summary

  1. Test pure functions - Extract pure logic, test it thoroughly
  2. One concept per test - Each test verifies one behavior
  3. Descriptive names - test_factorial_negative_raises not test_1
  4. Test edge cases - Empty inputs, boundaries, single elements
  5. Test error conditions - Verify exceptions and messages
  6. Use fixtures - Share setup code, keep tests DRY
  7. Use parametrization - Test multiple inputs without duplication
  8. Assert specific expectations - Not just "no error", but correct values
  9. Keep tests fast - Slow tests get run less often
  10. Tests document behavior - Write tests that explain what code does

Quick Reference

python
# Basic test
def test_something():
    assert function() == expected

# Testing exceptions
with pytest.raises(ValueError, match="pattern"):
    bad_function()

# Fixture
@pytest.fixture
def data():
    return setup_data()

# Parametrization
@pytest.mark.parametrize("input,expected", [(1, 2), (3, 4)])
def test_func(input, expected):
    assert func(input) == expected

# DataFrame comparison
pd.testing.assert_frame_equal(result, expected)

# Array comparison  
np.testing.assert_array_almost_equal(result, expected)

# Float comparison
assert result == pytest.approx(expected, rel=1e-5)