Structured Logging Skill
Overview
This skill provides JSON-formatted logging for audit trails, debugging, and compliance monitoring. Logs are written as newline-delimited JSON to daily files (logs/YYYY-MM-DD.json) with automatic 30-day retention.
When to Use
Use this skill to:
- •Log agent invocations and iterations
- •Track quality metrics over time
- •Record errors with full stack traces
- •Monitor execution times and performance
- •Create audit trails for compliance
- •Debug agent behavior and refinement loops
Installation
IMPORTANT: This skill has its own isolated virtual environment (.venv) managed by uv. Do NOT use system Python.
Initialize the skill's environment:
# From the skill directory cd .agent/skills/structured-logging uv sync # Creates .venv (no external dependencies, uses Python stdlib)
No external dependencies - uses Python standard library.
Usage
CRITICAL: Always use uv run to execute code with this skill's .venv, NOT system Python.
Initialize Logger
# From .agent/skills/structured-logging/ directory
# Run with: uv run python -c "..."
from structured_logging import StructuredLogger
# Initialize with defaults
logger = StructuredLogger(
log_dir="logs", # Directory for log files
retention_days=30 # Auto-delete logs older than 30 days
)
Log Agent Operations
from src.models.evaluator_schema import QualityMetrics
# Log successful operation
logger.log(
log_level="INFO",
agent_or_skill_name="summary_subagent",
operation_type="invoke",
input_summary="Clinical note: 2500 words, cardiology",
output_summary="Summary generated: 5 key problems, 12 citations",
execution_time_ms=45000,
quality_metrics=QualityMetrics(
citation_coverage=0.92,
hallucination_rate=0.03,
jaccard_overlap=0.75
)
)
Log Errors
from src.models.evaluator_schema import ErrorDetails
# Log error with full context
logger.log(
log_level="ERROR",
agent_or_skill_name="ollama_client",
operation_type="error",
input_summary="Prompt: Generate clinical summary...",
execution_time_ms=300500,
error_details=ErrorDetails(
error_reference_id="ERR-2025-A3F",
stack_trace="Traceback (most recent call last)...",
context={"model": "phi4:14b", "timeout": 300},
file_paths=["src/skills/ollama_client.py"]
)
)
Log Iterative Refinement
# Log each iteration in refinement loop
for iteration in range(1, 6):
logger.log(
log_level="INFO",
agent_or_skill_name="main_orchestrator",
operation_type="iterate",
input_summary=f"Iteration {iteration}: Refining based on evaluator feedback",
output_summary=f"Status: {'pass' if metrics_pass else 'fail'}",
execution_time_ms=iteration_time,
quality_metrics=current_metrics
)
Read and Filter Logs
from datetime import datetime # Read today's logs entries = logger.read_logs() # Read specific date entries = logger.read_logs(date=datetime(2025, 10, 24)) # Filter by log level errors = logger.read_logs(log_level="ERROR") # Filter by agent agent_logs = logger.read_logs(agent_name="evaluator_agent")
Cleanup Old Logs
# Manually trigger cleanup (also runs automatically)
deleted_count = logger.cleanup_old_logs()
print(f"Deleted {deleted_count} expired log files")
Log Format
File Path: logs/YYYY-MM-DD.json
Format: Newline-delimited JSON (one entry per line)
Example Entry:
{
"timestamp": "2025-10-24T14:30:22Z",
"log_level": "INFO",
"agent_or_skill_name": "summary_subagent",
"operation_type": "invoke",
"input_summary": "Clinical note: 2500 words",
"output_summary": "Summary: 5 problems, 12 citations",
"execution_time_ms": 45000,
"quality_metrics": {
"citation_coverage": 0.92,
"hallucination_rate": 0.03,
"jaccard_overlap": 0.75
},
"error_details": null
}
Querying Logs with jq
# Show all errors from today cat logs/$(date +%Y-%m-%d).json | jq 'select(.log_level == "ERROR")' # Show quality metrics for iterations ≥3 cat logs/*.json | jq 'select(.operation_type == "iterate" and .iteration_number >= 3) | .quality_metrics' # Find error by reference ID cat logs/*.json | jq 'select(.error_details.error_reference_id == "ERR-2025-A3F")' # Calculate average execution time cat logs/$(date +%Y-%m-%d).json | jq '[.execution_time_ms] | add/length'
Best Practices
- •Sanitize PHI: Never log actual clinical content - use summaries only
- •Include Execution Time: Always track performance metrics
- •Use Error References: Generate unique error IDs (ERR-YYYY-NNN) for user-facing messages
- •Log Quality Metrics: Track citation coverage, hallucination rate, Jaccard overlap
- •Rotation: Rely on daily rotation, not manual log management
- •Retention: 30 days is suitable for debugging and compliance
Integration with Agents
All agents and skills should log:
- •Start: Before operation begins (input summary)
- •End: After operation completes (output summary, execution time)
- •Errors: With full stack trace and error reference ID
- •Metrics: Quality scores for validation operations
Implementation
See structured_logging.py for the full Python implementation.