Dashboard Skill
Generate a metrics dashboard from parallel agent output logs. Aggregates data
from ~/.claude/.agent_outputs/ to show trends, consensus patterns, and model
usage over time.
Arguments
- •
$ARGUMENTS-- One of:- •(empty) -- Show full dashboard for last 30 days
- •
today-- Show today's metrics only - •
week-- Show last 7 days - •
month-- Show last 30 days (default) - •
all-- Show all available data
Phase 1: Data Collection
Scan ~/.claude/.agent_outputs/ for result files:
| File Pattern | Data Source |
|---|---|
results_*.json | JSON output from --json runs |
summary_*.md | Markdown summaries |
cursor_*.txt | Raw Cursor agent output |
gemini_*.txt | Raw Gemini agent output |
claude_*.txt | Raw Claude agent output |
Parse timestamps from filenames (YYYYMMDD_HHMMSS) to filter by date range.
If no result files exist, report:
text
No agent output data found in ~/.claude/.agent_outputs/ Run parallel_agent.sh with --json to generate trackable data.
Phase 2: Metrics Extraction
From each results_*.json, extract:
| Metric | JSON Path | Type |
|---|---|---|
| Mode | .mode | categorical |
| Agent status | .agents.{name}.status | categorical |
| Agent model | .agents.{name}.model | categorical |
| Credit fallback | .agents.{name}.credit_fallback | boolean |
| Consensus score | .cross_verification.consensus_score | numeric |
| Confidence level | .cross_verification.confidence | categorical |
| Agent count | .cross_verification.agent_count | numeric |
| Validation pass | .agents.{name}.validated | boolean |
Phase 3: Aggregation
Compute the following aggregate metrics:
Task Metrics
| Metric | Computation |
|---|---|
| Total runs | Count of result files in range |
| Runs per day | Total / days in range |
| Mode distribution | Count by mode (review, analyze, prompt) |
Agent Metrics
| Metric | Computation |
|---|---|
| Availability rate | complete / (complete + failed + missing) per agent |
| Failure rate | failed / total per agent |
| Credit fallback rate | credit_fallback=true / total per agent |
| Model distribution | Count by model tier per agent |
Consensus Metrics
| Metric | Computation |
|---|---|
| Mean consensus | Average consensus score |
| High confidence % | Runs with confidence=high / total |
| Medium confidence % | Runs with confidence=medium / total |
| Low confidence % | Runs with confidence=low / total |
Error Patterns
Scan agent output files for common error patterns:
| Pattern | Regex |
|---|---|
| Timeout | timeout|timed out|exceeded |
| Auth failure | auth|unauthorized|401|403 |
| Rate limit | rate.limit|quota|429|too many requests |
| Not found | not found|404|no such |
| Credit exhaustion | credit|billing|exceeded.*limit |
Phase 4: Output Dashboard
markdown
## Agent Dashboard
**Period**: {start-date} to {end-date}
**Total runs**: {count}
**Avg runs/day**: {avg}
### Task Distribution
| Mode | Count | % |
|------|-------|---|
| review | 15 | 50% |
| analyze | 8 | 27% |
| prompt | 7 | 23% |
### Agent Availability
| Agent | Available | Failed | Missing | Availability |
|-------|-----------|--------|---------|-------------|
| Cursor | 25 | 3 | 2 | 83% |
| Gemini | 28 | 1 | 1 | 93% |
| Claude | 27 | 2 | 1 | 90% |
### Model Usage
| Agent | Model | Count | Fallbacks |
|-------|-------|-------|-----------|
| Cursor | auto | 18 | 0 |
| Cursor | gpt-5.1-codex | 8 | 2 |
| Claude | sonnet | 20 | 0 |
| Claude | haiku | 5 | 3 |
| Claude | opus | 5 | 0 |
### Consensus Trends
| Metric | Value |
|--------|-------|
| Mean consensus score | 78% |
| High confidence runs | 60% |
| Medium confidence runs | 30% |
| Low confidence runs | 10% |
### Error Patterns
| Pattern | Occurrences | Most Affected Agent |
|---------|-------------|-------------------|
| Timeout | 3 | Cursor |
| Rate limit | 2 | Gemini |
| Credit exhaustion | 1 | Claude |
### Recommendations
Based on the data:
- {actionable insight 1}
- {actionable insight 2}
- {actionable insight 3}
Generate 2-3 actionable recommendations based on the data. Examples:
- •"Cursor failure rate is 17% -- consider using
--no-cursoror checking installation" - •"Credit fallback triggered 3 times for Claude opus -- consider defaulting to sonnet"
- •"Mean consensus is 65% (medium) -- review synthesis quality for disagreements"
Safety Checks
- •Read-only analysis -- never modify or delete log files
- •Handle malformed JSON gracefully (skip and note)
- •Report if
.agent_outputs/directory is missing or empty - •Cap file scanning to 1000 most recent files