Benchmark Logging
Use this skill to run and document benchmark comparisons between sciClaw and baseline workflows.
When to use
- •"run benchmark"
- •"compare baseline vs sciclaw"
- •"log benchmark outcomes"
- •"add acceptance criteria"
Minimum benchmark record
- •Benchmark ID and date.
- •Task category and scenario definition.
- •Baseline command sequence.
- •sciClaw command sequence.
- •Metrics: task success, reproducibility, latency, and resource usage.
- •Acceptance decision (pass/fail) with rationale.
Workflow
- •Freeze scenario definitions before running.
- •Execute baseline and sciClaw runs with the same inputs.
- •Record metric values and artifact paths.
- •Log failures with root-cause notes and retry policy.
- •Add manuscript-ready summary sentences only after data is logged.