run-eval

根据Przeprogramowani网站的基准标准，评估其实施效果。分析技术栈、页面内容、内容准确性、SEO表现以及响应式设计。当您需要评估Przeprogramowani基准仓库中由大语言模型生成的网站尝试时使用此功能。重要提示：请勿在创建网站的过程中使用此技能，仅在用户直接提出评估需求时使用。

name: run-eval description: Evaluate Przeprogramowani website implementations against benchmark criteria. Analyzes tech stack, pages, content accuracy, SEO, and responsiveness. Use when evaluating LLM-generated website attempts in the Przeprogramowani benchmark repository. IMPORTANT: Do not use this skill during the task of creating the website. Use it only to evaluate the website based on a direct request from the user.

Run Evaluation

Name: run-eval
Rating: 78
Author: przeprogramowani

Evaluate a Przeprogramowani website implementation against benchmark criteria.

10xBench Structure

•10x-bench (this repository) - contains the implementation to evaluate
•10x-bench-eval (companion repository) - contains the evaluation criteria and scoring methodology

What this skill does

Systematically evaluates website implementations by:

•Reading benchmark criteria from 10x-bench-eval/benchmark/criteria.md
•Setting up the implementation (npm install, npm run build, npm run dev)
•Testing against all evaluation criteria and asking user for feedback where needed
•Generating structured results in 10x-bench/eval-results/{model-name}-attempt-{number}/eval-results.csv

How to use

Invoke with the directory path to evaluate:

code

/run-eval /path/to/implementation

Or provide the path when prompted if not specified.

Output

Generates eval-results.csv in ./eval-results/{model-name}-attempt-{number}/eval-results.csv directory

See 10x-bench-eval/benchmark/eval.md for complete evaluation guidelines.