AgentSkillsCN

get-results

从Coval运行中提取并分析模拟结果。当用户想要回顾评估成果,或调试智能体行为时使用。

SKILL.md
--- frontmatter
name: get-results
description: Retrieve and analyze simulation results from a Coval run. Use when user wants to review evaluation outcomes or debug agent behavior.
argument-hint: "[run-id or simulation-id]"

Get Simulation Results

Retrieve results for $ARGUMENTS.

Workflow

If Run ID Provided

List all simulations for the run:

bash
coval simulations list --run-id <run_id> --format json

If Simulation ID Provided

Get detailed simulation data:

bash
coval simulations get <simulation_id> --format json

This returns:

  • Status (COMPLETED, FAILED)
  • Test case ID
  • Transcript (conversation history)
  • Timestamps
  • Error message (if failed)

Step 2: Analyze Results

For each simulation, extract:

FieldDescription
statusCOMPLETED or FAILED
test_case_idWhich test case was run
transcriptFull conversation
has_audioWhether audio is available
error_messageFailure reason (if any)

Step 3: Present Summary

code
## Results for Run <run_id>

| Simulation | Status | Test Case | Audio |
|------------|--------|-----------|-------|
| sim_abc123 | COMPLETED | tc_xyz | Yes |
| sim_def456 | FAILED | tc_uvw | No |

### Failed Simulations
- sim_def456: "Connection timeout"

### View Details
`coval simulations get <sim_id>`

### Download Audio
`coval simulations audio <sim_id> -o output.wav`

Filtering

List simulations with filters:

bash
coval simulations list --run-id <run_id> --filter "status=FAILED"
coval simulations list --run-id <run_id> --filter "has_audio=true"