Run Evaluation
Evaluate a Przeprogramowani website implementation against benchmark criteria.
10xBench Structure
- •10x-bench (this repository) - contains the implementation to evaluate
- •10x-bench-eval (companion repository) - contains the evaluation criteria and scoring methodology
What this skill does
Systematically evaluates website implementations by:
- •Reading benchmark criteria from
10x-bench-eval/benchmark/criteria.md - •Setting up the implementation (npm install, npm run build, npm run dev)
- •Testing against all evaluation criteria and asking user for feedback where needed
- •Generating structured results in
10x-bench/eval-results/{model-name}-attempt-{number}/eval-results.csv
How to use
Invoke with the directory path to evaluate:
code
/run-eval /path/to/implementation
Or provide the path when prompted if not specified.
Output
Generates eval-results.csv in ./eval-results/{model-name}-attempt-{number}/eval-results.csv directory
See 10x-bench-eval/benchmark/eval.md for complete evaluation guidelines.