Run Evaluation Skill
Perform a comprehensive post-mortem analysis of the latest Universal Agent run.
Workflow
Step 1: Identify the Latest Session
Find the most recent session directory:
ls -lt /home/kjdragan/lrepos/universal_agent/AGENT_RUN_WORKSPACES/ | grep session_ | head -1
Extract the session path (e.g., /home/kjdragan/lrepos/universal_agent/AGENT_RUN_WORKSPACES/session_20260115_094820).
Step 2: Read the Run Log
Load the full run log for context:
cat {session_dir}/run.log
This contains the complete terminal output including:
- •Tool calls and responses
- •Error messages
- •Timing information
- •Agent decisions
Step 3: Extract Key Metrics from run.log
Parse the log for:
- •Tool call count: Count occurrences of
🔧 [ - •Error indicators: Search for
Error,Failed,Exception,❌ - •Timing: Look at
+Xstimestamps for latency - •Retries/deduplication: Search for
Idempotent,retry,deduped
Step 4: Query Logfire for Trace Analysis
Use the Logfire MCP tools to analyze the run. Get the trace_id from the run.log (appears near the top).
Key queries:
- •Find all exceptions in the run:
SELECT start_timestamp, span_name, exception_type, exception_message FROM records WHERE is_exception = true ORDER BY start_timestamp DESC
- •Find slowest operations:
SELECT span_name, duration, message FROM records WHERE duration IS NOT NULL ORDER BY duration DESC LIMIT 20
- •Find tool execution timeline:
SELECT start_timestamp, span_name, duration, message FROM records WHERE span_name LIKE '%tool%' OR message LIKE '%Tool%' ORDER BY start_timestamp
- •Find warnings and errors:
SELECT start_timestamp, message, level, exception_message FROM records WHERE level >= 30 ORDER BY start_timestamp
Step 5: Analyze Session Artifacts
Check the session directory structure:
find {session_dir} -type f -name "*.md" -o -name "*.json" -o -name "*.html" | head -30
Verify expected outputs exist:
- •
tasks/{task_name}/refined_corpus.md- Research corpus - •
work_products/*.html- Final report - •
search_results/- Search result JSON files (may be archived)
Step 6: Generate Evaluation Report
Produce a structured report with these sections:
Evaluation Report Template
# Agent Run Evaluation Report
**Session:** {session_dir}
**Timestamp:** {datetime}
**Total Duration:** {total_time}
## Executive Summary
[1-2 sentence overall assessment]
## Metrics Overview
| Metric | Value | Status |
|--------|-------|--------|
| Total Tool Calls | X | ✅/⚠️/❌ |
| Exceptions | X | ✅/⚠️/❌ |
| Average Tool Latency | Xs | ✅/⚠️/❌ |
| Retries/Dedupes | X | ✅/⚠️/❌ |
## Happy Path Analysis
- [Did the agent follow the expected workflow?]
- [Were there any unexpected detours?]
- [Did sub-agents complete their tasks?]
## Exceptions & Errors
[List each exception with context and potential cause]
## Performance Bottlenecks
[List slowest operations and why they were slow]
## Opportunities for Improvement
1. [Specific actionable recommendation]
2. [Specific actionable recommendation]
3. [Specific actionable recommendation]
## Logfire Trace Links
- [Link to full trace in Logfire UI]
Evaluation Criteria
Happy Path Indicators (✅)
- •Sub-agents return successfully
- •No more than 1 retry per tool
- •finalize_research finds search results
- •Report written to work_products/
- •Email sent successfully
Warning Indicators (⚠️)
- •Tool retries (2-3 attempts)
- •Idempotency guard triggered
- •Long latencies (>60s per tool)
- •Missing expected files
Critical Indicators (❌)
- •Exceptions raised
- •Tool returning
None - •Infinite loop detection
- •Budget exceeded
- •HarnessError raised
Output
Write the evaluation report to:
{session_dir}/run_evaluation.md