Evals Codebase Context
When to Use This Skill
Use this skill when the task involves:
- •Modifying or debugging the evals execution infrastructure
- •Adding new eval exercises or languages
- •Working with the evals web interface (apps/web-evals)
- •Modifying the public evals display page on roocode.com
- •Understanding where evals code lives in this monorepo
When NOT to Use This Skill
Do NOT use this skill when:
- •Working on unrelated parts of the codebase (extension, webview-ui, etc.)
- •The task is purely about the VS Code extension's core functionality
- •Working on the main website pages that don't involve evals
Key Disambiguation: Two "Evals" Locations
This monorepo has two distinct evals-related locations that can cause confusion:
| Component | Path | Purpose |
|---|---|---|
| Evals Execution System | packages/evals/ | Core eval infrastructure: CLI, DB schema, Docker configs |
| Evals Management UI | apps/web-evals/ | Next.js app for creating/monitoring eval runs (localhost:3446) |
| Website Evals Page | apps/web-roo-code/src/app/evals/ | Public roocode.com page displaying eval results |
| External Exercises Repo | Roo-Code-Evals | Actual coding exercises (NOT in this monorepo) |
Directory Structure Reference
packages/evals/ - Core Evals Package
packages/evals/
├── ARCHITECTURE.md # Detailed architecture documentation
├── ADDING-EVALS.md # Guide for adding new exercises/languages
├── README.md # Setup and running instructions
├── docker-compose.yml # Container orchestration
├── Dockerfile.runner # Runner container definition
├── Dockerfile.web # Web app container
├── drizzle.config.ts # Database ORM config
├── src/
│ ├── index.ts # Package exports
│ ├── cli/ # CLI commands for running evals
│ │ ├── runEvals.ts # Orchestrates complete eval runs
│ │ ├── runTask.ts # Executes individual tasks in containers
│ │ ├── runUnitTest.ts # Validates task completion via tests
│ │ └── redis.ts # Redis pub/sub integration
│ ├── db/
│ │ ├── schema.ts # Database schema (runs, tasks)
│ │ ├── queries/ # Database query functions
│ │ └── migrations/ # SQL migrations
│ └── exercises/
│ └── index.ts # Exercise loading utilities
└── scripts/
└── setup.sh # Local macOS setup script
apps/web-evals/ - Evals Management Web App
apps/web-evals/ ├── src/ │ ├── app/ │ │ ├── page.tsx # Home page (runs list) │ │ ├── runs/ │ │ │ ├── new/ # Create new eval run │ │ │ └── [id]/ # View specific run status │ │ └── api/runs/ # SSE streaming endpoint │ ├── actions/ # Server actions │ │ ├── runs.ts # Run CRUD operations │ │ ├── tasks.ts # Task queries │ │ ├── exercises.ts # Exercise listing │ │ └── heartbeat.ts # Controller health checks │ ├── hooks/ # React hooks (SSE, models, etc.) │ └── lib/ # Utilities and schemas
apps/web-roo-code/src/app/evals/ - Public Website Evals Page
apps/web-roo-code/src/app/evals/ ├── page.tsx # Fetches and displays public eval results ├── evals.tsx # Main evals display component ├── plot.tsx # Visualization component └── types.ts # EvalRun type (extends packages/evals types)
This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.
Architecture Overview
The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:
┌─────────────────────────────────────────────────────────────┐ │ Web App (apps/web-evals) ──────────────────────────────── │ │ │ │ │ ▼ │ │ PostgreSQL ◄────► Controller Container │ │ │ │ │ │ ▼ ▼ │ │ Redis ◄───► Runner Containers (1-25 parallel) │ └─────────────────────────────────────────────────────────────┘
Key components:
- •Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
- •Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes
- •Redis: Pub/sub for real-time events (NOT task queuing)
- •PostgreSQL: Stores runs, tasks, metrics
Common Tasks Quick Reference
Adding a New Eval Exercise
- •Add exercise to Roo-Code-Evals repo (external)
- •See
packages/evals/ADDING-EVALS.mdfor structure
Modifying Eval CLI Behavior
Edit files in packages/evals/src/cli/:
- •
runEvals.ts- Run orchestration - •
runTask.ts- Task execution - •
runUnitTest.ts- Test validation
Modifying the Evals Web Interface
Edit files in apps/web-evals/src/:
- •
app/runs/new/new-run.tsx- New run form - •
actions/runs.ts- Run server actions
Modifying the Public Evals Display Page
Edit files in apps/web-roo-code/src/app/evals/:
Database Schema Changes
- •Edit
packages/evals/src/db/schema.ts - •Generate migration:
cd packages/evals && pnpm drizzle-kit generate - •Apply migration:
pnpm drizzle-kit migrate
Running Evals Locally
# From repo root pnpm evals # Opens web UI at http://localhost:3446
Ports (defaults):
- •PostgreSQL: 5433
- •Redis: 6380
- •Web: 3446
Testing
# packages/evals tests cd packages/evals && npx vitest run # apps/web-evals tests cd apps/web-evals && npx vitest run
Key Types/Exports from @roo-code/evals
The package exports are defined in packages/evals/src/index.ts:
- •Database queries:
getRuns,getTasks,getTaskMetrics, etc. - •Schema types:
Run,Task,TaskMetrics - •Used by both
apps/web-evalsandapps/web-roo-code