Multi-Agent Code Review

Overview

Dispatch 3+ parallel specialist agents to independently verify code correctness, then synthesize results. Designed for validating ports/rewrites where a reference implementation exists.

When to Use

•Porting code between frameworks (e.g. Taichi -> Warp, NumPy -> JAX)
•Major refactors where original behavior must be preserved
•Any scenario with a reference implementation to compare against

Agent Roles

dot

digraph agents {
  rankdir=LR;
  A1 [label="Numerical\nTester" shape=box];
  A2 [label="Visual\nComparator" shape=box];
  A3 [label="Code\nReviewer" shape=box];
  A4 [label="Coordinator" shape=doubleoctagon];
  A1 -> A4; A2 -> A4; A3 -> A4;
}

Agent	Responsibility	Output
Numerical Tester	Run both implementations headless, compare outputs at checkpoints	Max/mean error per step, pass/fail
Visual Comparator	Render intermediate frames, save side-by-side images	PNG files in `output/`, visual diff report
Code Reviewer	Line-by-line comparison of kernels, math, data types	Critical issues, warnings, missing features
Coordinator	Synthesize all findings, make go/no-go decision	Final verdict with evidence

Workflow

1. Preparation (main context)

Read key files to understand scope. Identify:

•Reference implementation (original)
•Target implementation (port/rewrite)
•Shared config/data files
•Known differences (intentional omissions)

2. Dispatch (parallel)

Launch agents 1-3 simultaneously with run_in_background: true. Each prompt must include:

•Exact file paths for both implementations
•How to run (uv run, wp.set_device("cpu"), etc.)
•What to compare (positions, volumes, centroids, etc.)
•Where to write output (test files, image dirs)
•Constraints ("do NOT modify production code", "code in English")

3. Iterate on Failures

Agents may hit errors (wrong attribute names, missing deps, permission issues). Resume or redo the agent with fixes. Do not abandon on first failure.

4. Synthesize (coordinator)

After all agents complete, compile:

•Numerical: pass/fail per mode, error magnitudes
•Visual: qualitative match, divergence frames
•Code: critical issues, warnings, missing features
•Root cause if any test fails (e.g. "GPU race condition in Gauss-Seidel mode")

Agent Prompt Template

code

You are Agent N: [Role Name].

## Task
[One sentence goal]

## Project
- Directory: [path]
- Reference: [file] (original)
- Target: [file] (port)
- Run with: `uv run python ...`
- Config: [config file path]

## Steps
1. Read [specific files]
2. Write test script to [specific location]
3. Run and capture output
4. Analyze results

## Constraints
- gui=False, render_mode=None (headless)
- [device constraints]
- Code and comments in English

## Output
Return: [exact deliverables]

Key Lessons

•Specify device explicitly - GPU vs CPU can cause completely different results due to race conditions
•Test multiple solver modes - Jacobi (deterministic) and Gauss-Seidel (non-deterministic) may behave differently
•Check NaN/Inf early - Diverging simulations produce NaN within a few steps; detect and report immediately
•Shared axis limits - Visual comparisons need identical scales to be meaningful
•Subagents may lack permissions - If a background agent fails on Write/Bash, redo the work in main context

Quick Reference: Test Commands

bash

# Numerical (CPU, both modes)
uv run python tests/test_muscle_warp_vs_taichi.py --mode both --steps 100

# Visual comparison (generates output/*.png)
uv run python tests/test_visual_comparison.py

# GPU stability
uv run python tests/test_warp_cuda_jacobi.py
uv run python tests/test_warp_cpu_vs_cuda.py