Researcher Evaluation Skill

Name: researcher-evaluation
Rating: 62
Author: VibeTechnologies

Technical playbook for evaluating GenAI agents. OpenCode uses agents/benchmark.py directly.

Required Output Format

Framework	Input	Output	Score	Feedback	Recommendations
AutoGen	{task}	{truncated}...	4/5	{feedback}	{improvements}
CrewAI	{task}	{truncated}...	3/5	{feedback}	{improvements}
OpenHands	{task}	{truncated}...	5/5	{feedback}	{improvements}

Dimension	Description
Accuracy	Facts correct, no hallucinations
Completeness	All sub-tasks addressed
Actionability	Concrete next steps
Clarity	Well-structured
Relevance	On topic
Efficiency	Concise

Chain-of-Thought evaluation steps:

bash

python -m agents.benchmark --tasks github-issue-triage --frameworks autogen crewai openhands