AgentSkillsCN

dashboard

可视化代理效率指标:任务完成率、常见错误模式、共识得分,以及模型使用分布。从.claude/.agent_outputs/日志中读取数据,并以Markdown表格与摘要形式输出。

SKILL.md
--- frontmatter
name: dashboard
description: |
  Visualize agent efficiency metrics: task completion rates, common error
  patterns, consensus scores, and model usage distribution. Reads from
  .claude/.agent_outputs/ logs and outputs markdown tables and summaries.

Dashboard Skill

Generate a metrics dashboard from parallel agent output logs. Aggregates data from ~/.claude/.agent_outputs/ to show trends, consensus patterns, and model usage over time.

Arguments

  • $ARGUMENTS -- One of:
    • (empty) -- Show full dashboard for last 30 days
    • today -- Show today's metrics only
    • week -- Show last 7 days
    • month -- Show last 30 days (default)
    • all -- Show all available data

Phase 1: Data Collection

Scan ~/.claude/.agent_outputs/ for result files:

File PatternData Source
results_*.jsonJSON output from --json runs
summary_*.mdMarkdown summaries
cursor_*.txtRaw Cursor agent output
gemini_*.txtRaw Gemini agent output
claude_*.txtRaw Claude agent output

Parse timestamps from filenames (YYYYMMDD_HHMMSS) to filter by date range.

If no result files exist, report:

text
No agent output data found in ~/.claude/.agent_outputs/
Run parallel_agent.sh with --json to generate trackable data.

Phase 2: Metrics Extraction

From each results_*.json, extract:

MetricJSON PathType
Mode.modecategorical
Agent status.agents.{name}.statuscategorical
Agent model.agents.{name}.modelcategorical
Credit fallback.agents.{name}.credit_fallbackboolean
Consensus score.cross_verification.consensus_scorenumeric
Confidence level.cross_verification.confidencecategorical
Agent count.cross_verification.agent_countnumeric
Validation pass.agents.{name}.validatedboolean

Phase 3: Aggregation

Compute the following aggregate metrics:

Task Metrics

MetricComputation
Total runsCount of result files in range
Runs per dayTotal / days in range
Mode distributionCount by mode (review, analyze, prompt)

Agent Metrics

MetricComputation
Availability ratecomplete / (complete + failed + missing) per agent
Failure ratefailed / total per agent
Credit fallback ratecredit_fallback=true / total per agent
Model distributionCount by model tier per agent

Consensus Metrics

MetricComputation
Mean consensusAverage consensus score
High confidence %Runs with confidence=high / total
Medium confidence %Runs with confidence=medium / total
Low confidence %Runs with confidence=low / total

Error Patterns

Scan agent output files for common error patterns:

PatternRegex
Timeouttimeout|timed out|exceeded
Auth failureauth|unauthorized|401|403
Rate limitrate.limit|quota|429|too many requests
Not foundnot found|404|no such
Credit exhaustioncredit|billing|exceeded.*limit

Phase 4: Output Dashboard

markdown
## Agent Dashboard

**Period**: {start-date} to {end-date}
**Total runs**: {count}
**Avg runs/day**: {avg}

### Task Distribution

| Mode | Count | % |
|------|-------|---|
| review | 15 | 50% |
| analyze | 8 | 27% |
| prompt | 7 | 23% |

### Agent Availability

| Agent | Available | Failed | Missing | Availability |
|-------|-----------|--------|---------|-------------|
| Cursor | 25 | 3 | 2 | 83% |
| Gemini | 28 | 1 | 1 | 93% |
| Claude | 27 | 2 | 1 | 90% |

### Model Usage

| Agent | Model | Count | Fallbacks |
|-------|-------|-------|-----------|
| Cursor | auto | 18 | 0 |
| Cursor | gpt-5.1-codex | 8 | 2 |
| Claude | sonnet | 20 | 0 |
| Claude | haiku | 5 | 3 |
| Claude | opus | 5 | 0 |

### Consensus Trends

| Metric | Value |
|--------|-------|
| Mean consensus score | 78% |
| High confidence runs | 60% |
| Medium confidence runs | 30% |
| Low confidence runs | 10% |

### Error Patterns

| Pattern | Occurrences | Most Affected Agent |
|---------|-------------|-------------------|
| Timeout | 3 | Cursor |
| Rate limit | 2 | Gemini |
| Credit exhaustion | 1 | Claude |

### Recommendations

Based on the data:
- {actionable insight 1}
- {actionable insight 2}
- {actionable insight 3}

Generate 2-3 actionable recommendations based on the data. Examples:

  • "Cursor failure rate is 17% -- consider using --no-cursor or checking installation"
  • "Credit fallback triggered 3 times for Claude opus -- consider defaulting to sonnet"
  • "Mean consensus is 65% (medium) -- review synthesis quality for disagreements"

Safety Checks

  • Read-only analysis -- never modify or delete log files
  • Handle malformed JSON gracefully (skip and note)
  • Report if .agent_outputs/ directory is missing or empty
  • Cap file scanning to 1000 most recent files