HarshJudge E2E Testing
AI-native E2E testing with MCP tools and visual evidence capture.
Core Principles
- •Evidence First: Screenshot before and after every action
- •Fail Fast: Stop on error, report with context
- •Complete Runs: Always call
completeRun, even on failure - •Step Isolation: Each step executes in its own spawned agent for token efficiency
- •Knowledge Accumulation: Learnings go to
prd.md, not scenarios
Step-Based Execution
HarshJudge uses a step-based agent pattern for token-efficient test execution:
code
Main Agent Step Agents (spawned per step)
│
├─ startRun(scenarioSlug)
│ ↓
│ Returns: runId, steps[]
│
├─► Spawn Agent: Step 01 ──────────────────────► Execute actions
│ │ │
│ │ ◄─────────────────────────────────── Return: { status, evidencePaths }
│ │
│ completeStep(runId, "01", status)
│ │
├─► Spawn Agent: Step 02 ──────────────────────► Execute actions
│ │ │
│ │ ◄─────────────────────────────────── Return: { status, evidencePaths }
│ │
│ completeStep(runId, "02", status)
│ │
│ ... (repeat for each step)
│
└─ completeRun(runId, finalStatus)
Benefits:
- •Each step agent has isolated context (no token accumulation)
- •Large outputs (screenshots, logs) saved to files, not returned
- •Main agent only receives concise summaries
- •Automatic token optimization without manual management
Workflows
| Intent | Reference | Key Tools |
|---|---|---|
| Initialize project | references/setup.md | initProject |
| Create scenario | references/create.md | createScenario |
| Run scenario | references/run.md | startRun, completeStep, completeRun |
| Fix failed test | references/iterate.md | getStatus, createScenario |
| Check status | references/status.md | getStatus |
Project Structure
code
.harshJudge/
config.yaml # Project configuration
prd.md # Product requirements (from assets/prd.md template)
scenarios/{slug}/
meta.yaml # Scenario definition + run statistics
steps/ # Individual step files
01-step-slug.md # Step 01 details
02-step-slug.md # Step 02 details
...
runs/{runId}/ # Run history
result.json # Run result with per-step data
step-01/evidence/ # Step 01 evidence
step-02/evidence/ # Step 02 evidence
...
snapshots/ # Inspection tool outputs (token-saving pattern)
Quick Reference
HarshJudge MCP Tools
| Tool | Purpose |
|---|---|
initProject | Initialize project (spawns dashboard) |
createScenario | Create/update scenario with step files |
toggleStar | Toggle/set scenario starred status |
startRun | Start test run, returns step list |
recordEvidence | Capture evidence for a step |
completeStep | Complete a step, get next step ID |
completeRun | Finalize run with status |
getStatus | Check project or scenario status |
openDashboard / closeDashboard | Manage dashboard server |
Playwright MCP Tools
| Tool | Purpose |
|---|---|
browser_navigate | Navigate to URL |
browser_snapshot | Get accessibility tree (use before click/type) |
browser_click | Click element using ref |
browser_type | Type into input using ref |
browser_take_screenshot | Capture screenshot for evidence |
browser_console_messages | Get console logs |
browser_network_requests | Get network activity |
browser_wait_for | Wait for text/condition |
Step Agent Prompt Template
When spawning an agent for each step:
code
Execute step {stepId} of scenario {scenarioSlug}:
## Step Content
{content from steps/{stepId}-{slug}.md}
## Project Context
Base URL: {from config.yaml}
Auth: {from prd.md if needed}
## Previous Step
Status: {pass|fail|first step}
## Your Task
1. Execute the actions using Playwright MCP tools
2. Use browser_snapshot before clicking to get element refs
3. Capture before/after screenshots using browser_take_screenshot
4. Record evidence using recordEvidence with step={stepNumber}
Return ONLY a JSON object:
{
"status": "pass" | "fail",
"evidencePaths": ["path1.png", "path2.png"],
"error": null | "error message"
}
DO NOT return full evidence content. DO NOT explain your work.
Error Handling
On ANY error:
- •STOP - Do not proceed
- •Report - Tool, params, error, resolution
- •Check prd.md - Is this a known pattern?
- •Do NOT retry - Unless user instructs