Python Library Usage Guidelines
Overview
Standardized library choices for all Python projects. Ensures consistency, avoids reinventing the wheel, and enforces preferred tools over alternatives.
General Rule: Always prefer existing library functions over custom implementations.
Library Selection Policy:
- •The first entry in each list is the preferred option.
- •Alternatives may be used only if they offer specific advantages or required functionality.
- •Propose new functions to our internal libraries (
lib_*) if needed — we can change them.
Quick Reference Table
| Category | Preferred Library | Do NOT Use |
|---|---|---|
| CLI exit/traceback | lib_cli_exit_tools | manual sys.exit |
| Logging (apps) | lib_log_rich | print(), bare logging |
| Logging (libs) | logging stdlib | lib_log_rich |
| Terminal output | rich | colorama (fallback only) |
| TUI | textual | curses |
| Domain models | dataclasses | dict, Pydantic |
| Boundary validation | pydantic | manual parsing |
| Enums | IntEnum / StrEnum | plain Enum, magic strings |
| TOML | rtoml | tomllib, tomli |
| JSON | orjson | json stdlib |
| YAML | ruamel.yaml | PyYAML |
| HTTP | httpx | requests |
| Compression (stream) | isal | gzip stdlib |
| Compression (store) | libdeflate | gzip stdlib |
| Paths | pathlib.Path | os.path |
| .env files | python-dotenv | manual parsing |
| Config management | lib_layered_config | manual argparse + env |
| Database (ODBC) | pyodbc | raw ODBC bindings |
| Database (MySQL) | mysql-connector-python / SQLAlchemy | PyMySQL, mysqlclient |
| ORM / query builder | SQLAlchemy | custom ORM, raw SQL for complex apps |
| Testing | pytest | unittest |
| Date/time | datetime + zoneinfo | pytz |
| Subprocess | subprocess.run | os.system |
| Type checking | mypy | no type checking |
CLI Tools
- •
lib_cli_exit_tools— for handling CLI exit codes, tracebacks, and related utilities. Propose new functions if needed.
Logging
| Context | Library | Configuration |
|---|---|---|
| Applications | lib_log_rich | Full structured logging with rich output |
| Libraries/Packages | logging (stdlib) | logging.getLogger(__name__) + NullHandler() |
Why the distinction:
- •Libraries must not configure logging (leave that to the consuming application)
- •Libraries install
NullHandlerto prevent "no handler found" warnings - •Applications control output format, level, and destination via
lib_log_rich
Library pattern:
# src/<pkg>/__init__.py import logging logging.getLogger(__name__).addHandler(logging.NullHandler())
Application pattern (basic):
import lib_log_rich
from lib_log_rich.runtime import attach_std_logging
# Initialize with RuntimeConfig
lib_log_rich.init(lib_log_rich.RuntimeConfig(
service="my-app",
environment="dev",
console_level="INFO", # INFO and above to console
backend_level="WARNING", # WARNING and above to journald/EventLog
))
# Bridge stdlib logging so existing logging.* calls work
attach_std_logging()
# Use lib_log_rich's LoggerProxy for new code
logger = lib_log_rich.getLogger(__name__)
logger.info("Application started", extra={"port": 8080})
# Or use stdlib logging (bridged automatically)
import logging
logging.info("This also works!")
# Clean shutdown (flushes queues, closes adapters)
lib_log_rich.shutdown()
Application pattern (with lib_layered_config):
# logging_setup.py - centralized logging initialization
from lib_layered_config import Config
import lib_log_rich.config
import lib_log_rich.runtime
def init_logging(config: Config) -> None:
"""Initialize logging from layered config [lib_log_rich] section."""
if not lib_log_rich.runtime.is_initialised():
lib_log_rich.config.enable_dotenv() # optional: load LOG_* from .env
log_config = dict(config.get("lib_log_rich", default={}))
log_config.setdefault("service", "my-app")
log_config.setdefault("environment", "prod")
runtime_config = lib_log_rich.runtime.RuntimeConfig(**log_config)
lib_log_rich.runtime.init(runtime_config)
lib_log_rich.runtime.attach_std_logging()
Key RuntimeConfig options:
- •
service(required): Logical service name - •
environment(required): Deployment label (dev/stage/prod) - •
console_level: Threshold for Rich console (default: INFO) - •
backend_level: Threshold for journald/EventLog (default: WARNING) - •
queue_enabled: Background queue for async delivery (default: True) - •
enable_journald/enable_eventlog/enable_graylog: Backend adapters
Shutdown (required for clean exit):
# Synchronous shutdown - flushes queues, drains adapters, clears state lib_log_rich.shutdown() # Async shutdown - for asyncio applications await lib_log_rich.shutdown_async()
Important: Always call shutdown() before process exit to ensure:
- •Background queue is drained (pending log events are delivered)
- •Graylog/journald/EventLog adapters flush their buffers
- •Global runtime state is cleared
Safe to call multiple times after initialization.
Propose new functions to lib_log_rich if needed — we can change that library.
Colored Terminal Output
- •
rich— preferred for beautiful, formatted console output. - •
colorama— may be used ifrichis unavailable or not suitable.
Text-based User Interfaces (TUI)
- •
textual— for building rich, interactive TUIs in the terminal.
Data Models: Dataclasses vs Pydantic
Rule: Prefer structured types (dataclasses or Pydantic) over dicts. Data flowing between modules must use typed models, not raw dicts.
Decision Matrix
| Use Case | Recommended | Why |
|---|---|---|
| Domain entities & value objects | dataclasses | Keep domain dependency-free |
| Internal DTOs (trusted data) | dataclasses | Lightweight, no parsing needed |
| External input validation | pydantic.BaseModel | Type coercion, sanitization |
| JSON/dict serialization | pydantic.BaseModel | Built-in .model_dump() |
| Configuration from env/files | pydantic.BaseSettings | Validation + env parsing |
| Performance-critical paths | dataclasses | Lower overhead |
Preference Order
- •
dataclasses(stdlib) — default for domain/application layer.- •Always use
@dataclass(frozen=True, slots=True)for immutability and efficiency. - •Use for internal data that doesn't need validation or format conversion.
- •Always use
- •
pydantic.BaseModel— use at boundaries where data enters/exits the system.- •Parsing untrusted input (HTTP requests, CLI args, config files).
- •Type coercion needed (string -> int, string -> datetime).
- •Serialization to JSON, dict, or other formats required.
- •Use
model_config = ConfigDict(frozen=True)for immutability.
- •
pydantic.dataclasses— when you want dataclass syntax with Pydantic validation.- •Useful for gradual migration from stdlib dataclasses.
Dict Usage Policy
Default: Use typed structures (dataclass, Pydantic) for all data.
Dicts are acceptable only when ALL of these apply:
- •Truly dynamic/schema-less data (e.g., arbitrary JSON passthrough with no business logic)
- •Fewer than 3 keys
- •No cross-module flow (contained within a single function/helper)
- •No business logic operates on the dict
Never acceptable:
- •Passing
dict[str, Any]between modules or layers - •Using dict keys in conditionals or business logic
- •Dict as function return type for structured data
When in doubt: Convert to a typed structure at the boundary.
Anti-patterns
- •Passing
dict[str, Any]between modules - •Using Pydantic in the domain layer
- •Using dataclasses for untrusted external input
- •Mixing dict-based and model-based data in the same flow
Data Flow Pattern
External Input -> Pydantic (validate) -> Dataclass (domain logic) -> Pydantic (serialize) -> External Output
Enums: Prefer IntEnum
Rule: Replace string literals representing categories, statuses, modes, event types, or command names with Enum classes. Prefer IntEnum over Enum where possible.
Why IntEnum?
| Feature | Enum | IntEnum |
|---|---|---|
| JSON serialization | Requires custom encoder | Works natively (serializes as int) |
| Database storage | Needs explicit conversion | Stores as integer directly |
| Comparison with int | Status.ACTIVE == 1 -> False | Status.ACTIVE == 1 -> True |
| Arithmetic operations | Not supported | Supported (inherits from int) |
| Pydantic compatibility | Works | Works + auto-coerces from int |
When to use each
| Use Case | Recommended |
|---|---|
| Status codes, error codes, log levels | IntEnum |
| Database-backed enumerations | IntEnum |
| API response codes | IntEnum |
| Purely semantic categories (no numeric meaning) | Enum or StrEnum |
| String-based identifiers (e.g., HTTP methods) | StrEnum |
Example
from enum import IntEnum, StrEnum, auto
# Preferred for numeric categories
class TaskStatus(IntEnum):
PENDING = 0
RUNNING = 1
COMPLETED = 2
FAILED = 3
# Works seamlessly with JSON, databases, comparisons
assert TaskStatus.RUNNING == 1 # True
assert TaskStatus.RUNNING > TaskStatus.PENDING # True
# Use StrEnum for string-based identifiers
class HttpMethod(StrEnum):
GET = auto()
POST = auto()
PUT = auto()
DELETE = auto()
Anti-patterns
- •Using plain
Enumwhen values need JSON serialization - •Magic strings:
if status == "pending"instead ofif status == Status.PENDING - •Magic numbers:
if code == 1instead ofif code == ErrorCode.INVALID_INPUT
Serialization Formats
TOML
Use rtoml for all TOML parsing operations.
- •Preferred over stdlib
tomllibfor consistency across Python versions - •Preferred over
tomli(third-party read-only) - •Provides both read and write capabilities
import rtoml # Reading from file config = rtoml.load(path) # path as str or Path # Writing to file rtoml.dump(data, path) # Reading from string config = rtoml.loads(toml_string) # Writing to string toml_string = rtoml.dumps(data)
JSON
Use orjson for all JSON serialization/deserialization.
- •Significantly faster than stdlib
json(10-50x for large payloads) - •Native support for
dataclasses,datetime,UUID,numpyarrays - •Returns
bytes(notstr) — more efficient for I/O - •Strict by default (no NaN/Infinity, sorted keys optional)
import orjson
# Reading
data = orjson.loads(json_bytes) # accepts bytes or str
# Writing
json_bytes = orjson.dumps(data)
# With options (pretty print, sorted keys, datetime as ISO)
json_bytes = orjson.dumps(
data,
option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS
)
# Dataclass serialization (native support)
from dataclasses import dataclass
from datetime import datetime, timezone
@dataclass
class User:
name: str
created: datetime
orjson.dumps(User("Alice", datetime.now(timezone.utc))) # works directly
YAML
Use ruamel.yaml for all YAML operations.
- •Preserves comments and formatting on round-trip (read -> modify -> write)
- •Full YAML 1.2 support (unlike
PyYAMLwhich only supports YAML 1.1) - •Preferred over
PyYAML
from ruamel.yaml import YAML
yaml = YAML()
yaml.preserve_quotes = True
# Reading
with open("config.yaml") as f:
data = yaml.load(f)
# Writing (preserves comments and formatting)
with open("config.yaml", "w") as f:
yaml.dump(data, f)
HTTP/REST Calls
Use httpx for all HTTP requests. Do not use requests.
- •Modern async-first design with sync support
- •HTTP/1.1 and HTTP/2 support
- •Connection pooling and keep-alive by default
- •Type-annotated API
- •Timeout configuration required (no silent hangs)
import httpx
# Synchronous usage
response = httpx.get("https://api.example.com/data", timeout=10.0)
data = response.json()
# With client (recommended for multiple requests)
with httpx.Client(timeout=30.0) as client:
response = client.post(
"https://api.example.com/submit",
json={"key": "value"},
)
# Async usage
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.get("https://api.example.com/data")
Required patterns:
- •Always specify
timeout(never rely on defaults) - •Use
Client/AsyncClientcontext managers for connection reuse - •Prefer async in async codebases
Database Access
Decision Matrix
| Use Case | Library | Notes |
|---|---|---|
| ODBC connections (any DB) | pyodbc | Most widely used, stable, works with any ODBC driver |
| MySQL (simple tasks) | mysql-connector-python | Official Oracle driver, pure Python, no external libs |
| MySQL (ORM / complex) | SQLAlchemy | Use mysql-connector-python as MySQL backend |
| Async database access | SQLAlchemy (async) | With asyncio extension + async driver |
Usage Patterns
# Direct ODBC connection (any database with ODBC driver)
import pyodbc
conn = pyodbc.connect(connection_string)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
rows = cursor.fetchall()
conn.close()
# Direct MySQL connection (simple tasks)
import mysql.connector
conn = mysql.connector.connect(
host="localhost", database="mydb", user="user", password="pass"
)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
rows = cursor.fetchall()
conn.close()
# SQLAlchemy with mysql-connector-python backend (ORM / complex queries)
from sqlalchemy import create_engine
engine = create_engine(
"mysql+mysqlconnector://user:pass@host/db",
pool_pre_ping=True,
pool_recycle=3600,
)
Rules
- •Use
pyodbcfor ad-hoc or cross-database ODBC access - •Use
mysql-connector-pythondirectly for simple MySQL tasks - •Use
SQLAlchemy(withmysql-connector-pythonas backend) for ORM, migrations, or complex queries - •Always use parameterized queries — never string-format SQL
- •Always configure connection pooling for long-running applications
Testing
| Component | Library | Notes |
|---|---|---|
| Test framework | pytest | Preferred over unittest |
| Coverage | pytest-cov | Coverage reporting integrated with pytest |
| Mocking | unittest.mock | stdlib — no extra dependency needed |
| Async tests | pytest-asyncio | For testing async code |
| Parameterization | pytest built-in | @pytest.mark.parametrize |
| Fixtures | pytest built-in | Prefer fixtures over setUp/tearDown |
Rules:
- •Use
pyteststyle (functions + fixtures), notunitteststyle (classes + setUp/tearDown) - •Use
pytest.raisesfor exception testing, not try/except - •Name test files
test_*.py, test functionstest_* - •Use
conftest.pyfor shared fixtures
Compression (gzip alternatives)
Use high-performance compression libraries instead of stdlib gzip.
| Use Case | Library | Why |
|---|---|---|
| Web transport | isal (ISA-L / igzip) | Intel-optimized, 2-4x faster compression/decompression |
| Storage | libdeflate | Higher compression ratios, optimized for single-shot ops |
# Web transport: ISA-L (igzip) — streaming, fast
import isal.igzip as igzip
# Compress for HTTP response
compressed = igzip.compress(data, compresslevel=1) # level 0-3
# Decompress incoming
decompressed = igzip.decompress(compressed)
# Streaming with file-like objects
with igzip.open("file.gz", "wb") as f:
f.write(data)
# Storage: libdeflate — single-shot, high ratio import libdeflate # Compress for archival (levels 1-12, default 6) compressor = libdeflate.Compressor(compression_level=9) compressed = compressor.compress(data) # Decompress decompressor = libdeflate.Decompressor() decompressed = decompressor.decompress(compressed, len(original_data))
Selection guide:
- •Streaming / chunked data ->
isal.igzip - •HTTP compression ->
isal.igzip(compatible with standard gzip) - •Archive / backup ->
libdeflate(better ratios) - •Unknown decompressed size ->
isal.igzip(handles streaming)
Date & Time
Use datetime (stdlib) with timezone awareness enforced. Use zoneinfo (stdlib) for named timezones.
- •Always use timezone-aware datetimes — never naive datetimes
- •Use
zoneinfo(Python 3.9+) for named timezones — do not usepytz - •Store/transmit as UTC; convert to local only for display
from datetime import datetime, timezone
from zoneinfo import ZoneInfo
# Always timezone-aware
now_utc = datetime.now(timezone.utc)
now_berlin = datetime.now(ZoneInfo("Europe/Berlin"))
# Convert between timezones
local_time = now_utc.astimezone(ZoneInfo("Europe/Berlin"))
# Parse ISO format
parsed = datetime.fromisoformat("2025-01-15T10:30:00+01:00")
# Format for storage/transmission (ISO 8601 UTC)
timestamp_str = now_utc.isoformat()
Anti-patterns
- •
datetime.now()without timezone (creates naive datetime) - •Using
pytz(replaced byzoneinfoin stdlib since Python 3.9) - •Storing naive datetimes in databases
- •Comparing naive and aware datetimes (raises
TypeError)
Directory & Filepath Handling
Always use pathlib.Path and ensure cross-platform compatibility.
Critical Rule: Accept Linux-style paths (//share/directory, /path/to/file) even on Windows.
- •UNC paths (
//server/share) must work on all platforms - •Never use
os.pathfor new code — preferpathlib - •Use
Path.as_posix()when storing/transmitting paths - •Use forward slashes (
/) in configuration files and APIs
from pathlib import Path
import sys
def normalize_path(path_str: str) -> Path:
"""Convert any path string to a native Path, preserving UNC paths.
Args:
path_str: Path string (Linux or Windows style).
Returns:
Native Path object.
Examples:
>>> normalize_path('//server/share/file.txt') # UNC path
>>> normalize_path('/home/user/file.txt') # Linux path
"""
# On Windows, convert forward-slash UNC paths to native backslash form
if sys.platform == "win32" and path_str.startswith("//"):
return Path(path_str.replace("/", "\\"))
return Path(path_str)
# For storage/transmission, always use POSIX style
def path_to_string(path: Path) -> str:
"""Convert Path to portable string representation."""
return path.as_posix()
# Accept both styles in function signatures
def process_file(file_path: str | Path) -> None:
path = Path(file_path) if isinstance(file_path, str) else file_path
# ... process
Configuration & API Guidelines:
- •Store paths as POSIX strings (
/path/to/file,//server/share) - •Parse with
Path()ornormalize_path()at runtime - •Never hardcode backslashes in source code or configs
.env File Reading
Use python-dotenv for loading .env files.
- •Use
find_dotenv()to walk up parent directories automatically - •Do not commit
.envfiles to version control
from dotenv import load_dotenv, find_dotenv
# find_dotenv() walks up directories to find .env
load_dotenv(find_dotenv(usecwd=True))
# Or with explicit path
load_dotenv("/path/to/project/.env")
# Access loaded values
import os
db_host = os.getenv("DB_HOST", "localhost")
Integration with Pydantic BaseSettings (pip install pydantic-settings):
from pydantic import ConfigDict
from pydantic_settings import BaseSettings
class AppSettings(BaseSettings):
db_host: str = "localhost"
db_port: int = 5432
debug: bool = False
model_config = ConfigDict(env_file=".env", env_file_encoding="utf-8")
settings = AppSettings() # auto-reads .env + environment variables
Configuration Management
Use lib_layered_config for application configuration.
- •Supports layered config: defaults -> config file -> environment variables -> CLI args
- •Integrates with
lib_log_richfor logging setup (see Logging section)
For simpler cases or libraries, use pydantic.BaseSettings (see .env section above).
Subprocess / External Commands
Use subprocess.run (stdlib) for running external commands.
- •Always use list form for arguments, not
shell=True - •Set
check=Trueto raise on non-zero exit codes - •Set
capture_output=Truewhen you need stdout/stderr - •Set
timeoutto prevent hanging
import subprocess
result = subprocess.run(
["git", "status", "--porcelain"],
capture_output=True,
text=True,
check=True,
timeout=30,
)
print(result.stdout)
Anti-patterns
- •
shell=Truewith unsanitized input (command injection risk) - •
os.system()(no output capture, no error handling) - •Missing
timeouton external calls
Type Checking
Use mypy for static type checking.
- •Run as part of CI/CD pipeline
- •Use strict mode (
--strict) for new projects - •Use
py.typedmarker file for typed libraries (PEP 561)
# pyproject.toml [tool.mypy] python_version = "3.10" strict = true warn_return_any = true warn_unused_configs = true