Observability Skill
Comprehensive guide to logging, debugging, profiling, and performance monitoring in Python applications.
When This Skill Activates
- •Adding logging to code
- •Debugging production issues
- •Profiling performance bottlenecks
- •Monitoring application metrics
- •Analyzing stack traces
- •Performance optimization
- •Keywords: "logging", "debug", "profiling", "performance", "monitoring"
Core Concepts
1. Structured Logging
Structured logging with JSON format for machine-readable logs and rich context.
Why Structured Logging?
- •Machine-parseable (easy to search, filter, aggregate)
- •Context-rich (attach metadata to log entries)
- •Consistent format across services
Key Features:
- •JSON-formatted logs
- •Log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- •Context logging with extra metadata
- •Best practices for meaningful logs
Example:
import logging
import json
logger = logging.getLogger(__name__)
logger.info("User action", extra={
"user_id": 123,
"action": "login",
"ip": "192.168.1.1"
})
See: docs/structured-logging.md for Python logging setup and patterns
2. Debugging Techniques
Interactive debugging with pdb/ipdb and effective debugging strategies.
Tools:
- •Print debugging - Quick and simple
- •pdb - Python's built-in debugger
- •ipdb - IPython-enhanced debugger
- •Post-mortem debugging - Debug after crash
pdb Commands:
- •
n(next) - Execute current line - •
s(step) - Step into function - •
c(continue) - Continue execution - •
p variable- Print variable value - •
l- List source code - •
q- Quit debugger
Example:
import pdb; pdb.set_trace() # Debugger starts here
See: docs/debugging.md for interactive debugging patterns
3. Profiling
CPU and memory profiling to identify performance bottlenecks.
Tools:
- •cProfile - CPU profiling (built-in)
- •line_profiler - Line-by-line CPU profiling
- •memory_profiler - Memory usage analysis
- •py-spy - Sampling profiler (no code changes)
cProfile Example:
python -m cProfile -s cumulative script.py
Profile Decorator:
import cProfile
import pstats
def profile(func):
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 functions
return result
return wrapper
@profile
def slow_function():
# Your code here
pass
See: docs/profiling.md for comprehensive profiling techniques
4. Monitoring & Metrics
Performance monitoring, timing decorators, and simple metrics.
Timing Patterns:
- •Timing decorator - Measure function execution time
- •Context manager timer - Measure code block duration
- •Performance assertions - Fail if too slow
Simple Metrics:
- •Counters - Track event occurrences
- •Histograms - Track value distributions
Example:
import time
from functools import wraps
def timer(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
print(f"{func.__name__} took {duration:.2f}s")
return result
return wrapper
@timer
def process_data():
# Your code here
pass
See: docs/monitoring-metrics.md for stack traces, timers, and metrics
5. Best Practices & Anti-Patterns
Debugging strategies and logging anti-patterns to avoid.
Debugging Best Practices:
- •Binary Search Debugging - Narrow down the problem area
- •Rubber Duck Debugging - Explain the problem to someone (or something)
- •Add Assertions - Catch bugs early
- •Simplify and Isolate - Reproduce with minimal code
Logging Anti-Patterns to Avoid:
- •Logging sensitive data (passwords, tokens)
- •Logging in loops (use counters instead)
- •No context in error logs
- •Inconsistent log formats
- •Too verbose logging (noise)
See: docs/best-practices-antipatterns.md for detailed strategies
Quick Reference
| Tool | Use Case | Details |
|---|---|---|
| Structured Logging | Production logs | docs/structured-logging.md |
| pdb/ipdb | Interactive debugging | docs/debugging.md |
| cProfile | CPU profiling | docs/profiling.md |
| line_profiler | Line-by-line profiling | docs/profiling.md |
| memory_profiler | Memory analysis | docs/profiling.md |
| Timer decorator | Function timing | docs/monitoring-metrics.md |
| Context timer | Code block timing | docs/monitoring-metrics.md |
Logging Cheat Sheet
import logging
# Setup
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
# Usage
logger.debug("Debug message") # Detailed diagnostic
logger.info("Info message") # General information
logger.warning("Warning message") # Warning (recoverable)
logger.error("Error message") # Error (handled)
logger.critical("Critical message") # Critical (unrecoverable)
# With context
logger.info("User action", extra={"user_id": 123, "action": "login"})
Debugging Cheat Sheet
# pdb
import pdb; pdb.set_trace()
# ipdb (enhanced)
import ipdb; ipdb.set_trace()
# Post-mortem (debug after crash)
import pdb, sys
try:
# Your code
pass
except Exception:
pdb.post_mortem(sys.exc_info()[2])
Profiling Cheat Sheet
# CPU profiling python -m cProfile -s cumulative script.py # Line profiling kernprof -l -v script.py # Memory profiling python -m memory_profiler script.py # Sampling profiler (no code changes) py-spy top --pid 12345
Progressive Disclosure
This skill uses progressive disclosure to prevent context bloat:
- •Index (this file): High-level concepts and quick reference (<500 lines)
- •Detailed docs:
docs/*.mdfiles with implementation details (loaded on-demand)
Available Documentation:
- •
docs/structured-logging.md- Logging setup, levels, JSON format, best practices - •
docs/debugging.md- Print debugging, pdb/ipdb, post-mortem debugging - •
docs/profiling.md- cProfile, line_profiler, memory_profiler, py-spy - •
docs/monitoring-metrics.md- Stack traces, timing patterns, simple metrics - •
docs/best-practices-antipatterns.md- Debugging strategies and logging anti-patterns
Cross-References
Related Skills:
- •error-handling-patterns - Error handling best practices
- •python-standards - Python coding conventions
- •testing-guide - Testing and debugging strategies
- •performance-optimization - Performance tuning techniques
Related Tools:
- •Python logging - Standard library logging module
- •pdb/ipdb - Interactive debuggers
- •cProfile - CPU profiling
- •memory_profiler - Memory analysis
- •py-spy - Sampling profiler
Key Takeaways
- •Use structured logging - JSON format for machine-readable logs
- •Log at appropriate levels - DEBUG < INFO < WARNING < ERROR < CRITICAL
- •Include context - Add metadata to logs (user_id, request_id, etc.)
- •Don't log sensitive data - Passwords, tokens, PII
- •Use pdb/ipdb for debugging - Interactive debugging is powerful
- •Profile before optimizing - Measure to find real bottlenecks
- •Use cProfile for CPU profiling - Identify slow functions
- •Use line_profiler for line-level profiling - Fine-grained analysis
- •Use memory_profiler for memory leaks - Track memory usage
- •Time critical sections - Decorator or context manager
- •Binary search debugging - Narrow down problem area
- •Simplify and isolate - Reproduce with minimal code
Hard Rules
FORBIDDEN:
- •Logging sensitive data (passwords, tokens, API keys) at any level
- •Using
print()for production logging (MUST use structured logging) - •Swallowing exceptions silently without logging
- •
except Exception:(orexcept Exception as e:) without a subsequentraiseorlogging.exception()/logger.error(..., exc_info=True) - •Bare
except: pass— discards exception with zero handling - •
except Exception: pass— syntactically explicit but semantically identical to bareexcept: pass - •
contextlib.suppress()wrapping error-critical operations without inline justification comment - •
finallyblocks that containreturn,break, orcontinue— these suppress any pending exception from thetrybody
REQUIRED (compliant exception handling MUST use at least one of):
- •Re-raise: After logging, call
raise(bare) orraise NewError(...) from original_excto propagate the exception - •Log with
exc_info:logger.error("Operation failed", exc_info=True)orlogging.exception("Operation failed")— preserves full stack trace without suppressing - •
contextlib.suppress()with justification: Acceptable ONLY for genuinely non-critical cleanup operations; MUST include an inline comment explaining why suppression is safe
# COMPLIANT: re-raise after logging
try:
process(data)
except ValueError as exc:
logger.error("Invalid data: %s", exc, exc_info=True)
raise
# COMPLIANT: log with exc_info (caller gets full stack trace in logs)
try:
send_metric(value)
except ExternalServiceError:
logger.exception("Metric send failed — continuing without metric")
# COMPLIANT: contextlib.suppress with justification
with contextlib.suppress(FileNotFoundError):
# Optional cache file; absence is expected on first run
cache_path.unlink()
# NON-COMPLIANT: silent swallow
try:
critical_operation()
except Exception:
pass # FORBIDDEN
# NON-COMPLIANT: log without exc_info and without re-raise
try:
critical_operation()
except Exception as e:
logger.error("Failed: %s", e) # FORBIDDEN — no stack trace, exception swallowed