Multi-Agent Review Protocol
Overview
Multi-agent review launches real external AI CLIs (Codex, Gemini, Cursor-Agent) in parallel with an in-session Claude review. Each model reviews from its strength perspective, then results are consolidated with consensus/divergence detection.
Key Change (v2.0): Replaced fake "Claude-pretending-to-be-other-models" with actual CLI invocations.
Model Configuration
Active Models
| Model | CLI Command | Focus Areas |
|---|---|---|
| Claude | In-session | Edge cases, security, architectural coherence |
| Codex | codex exec | Implementation feasibility, API design, DX |
| Gemini | gemini --yolo | Industry patterns, breadth, documentation |
| Cursor | cursor-agent --print | Code architecture, file structure, modules |
CLI Locations
CODEX_BIN="$(which codex)" # Usually ~/.nvm/.../bin/codex GEMINI_BIN="$(which gemini)" # Usually ~/.nvm/.../bin/gemini CURSOR_BIN="$(which cursor-agent)" # Usually ~/.local/bin/cursor-agent
Model-Specific Focus Areas
Claude (In-Session):
Focus especially on: - Edge cases and failure modes - Security implications - Architectural coherence - Long-term maintainability - Subtle logical errors
Codex (GPT/OpenAI):
Focus especially on: - Implementation feasibility - API design patterns - Code structure recommendations - Developer experience - Practical tradeoffs
Gemini (Google):
Focus especially on: - Industry patterns and alternatives - Documentation completeness - Breadth of considerations - Research-backed recommendations - Cross-domain insights
Cursor-Agent:
Focus especially on: - File and folder structure recommendations - Module boundaries and organization - How this would be navigated in an IDE - Import/export patterns - Where code should actually live
Architecture
File Structure
~/.claude/ ├── scripts/ │ ├── multi-model-review.sh # Main orchestrator │ ├── lib/ │ │ └── cli-wrappers.sh # Per-model launch functions │ └── templates/ │ ├── codex-review-prompt.txt │ ├── gemini-review-prompt.txt │ └── cursor-review-prompt.txt ├── reviews/ │ └── reviews-YYYY-MM-DD-HHMM/ # Output directories │ ├── manifest.json │ ├── claude_feedback.md │ ├── codex_feedback.md │ ├── gemini_feedback.md │ ├── cursor_feedback.md │ └── consolidated_feedback.md
When installed as a plugin, scripts are located at ${CLAUDE_PLUGIN_ROOT}/scripts/ instead of ~/.claude/scripts/.
Execution Flow
User: /spec-review-multi specs/my-feature.md
1. Create: ~/.claude/reviews/reviews-2026-02-01-1630/
2. Launch in parallel:
|
|-- [Background] codex exec -> codex_feedback.md
|-- [Background] gemini -> gemini_feedback.md
|-- [Background] cursor -> cursor_feedback.md
|
+-- [In-session] Claude reviews -> claude_feedback.md
(edge cases, security, architecture)
3. Monitor: Poll every 30s, show progress
4. Consolidate: Apply consensus/divergence logic (4 sources)
5. Output: consolidated_feedback.md
Spawning Pattern
External CLIs (Background Processes)
The orchestrator script launches all CLIs with appropriate flags:
# Codex - uses exec mode with full-auto codex exec --full-auto -o "$OUTPUT_FILE" "$PROMPT" # Gemini - uses yolo mode for auto-approval gemini --yolo "$PROMPT" > "$OUTPUT_FILE" # Cursor - uses print mode for non-interactive cursor-agent --print --output-format text "$PROMPT" > "$OUTPUT_FILE"
Claude (In-Session)
Claude performs its review while external CLIs work, maximizing parallelism:
- •Read spec file content
- •Apply Claude-specific review prompt
- •Write output to
claude_feedback.md - •Continue monitoring external progress
Monitoring Strategy
Sentinel Files
Each CLI wrapper creates a .done sentinel file on completion:
# Success
echo "success" > "${OUTPUT_FILE}.done"
# Failure
echo "error:$EXIT_CODE" > "${OUTPUT_FILE}.done"
Progress Polling
Check every 30 seconds:
# Count completed ls -la ~/.claude/reviews/reviews-*/*.done 2>/dev/null | wc -l # Check status cat ~/.claude/reviews/reviews-*/*.done
Progress Report Format
Review Progress: - Claude: Complete (in-session) - Codex: Working (5 min) - Gemini: Complete - Cursor: Failed (error:1) Files found: 2/4
Timeout Configuration
| Setting | Value |
|---|---|
| Per-model timeout | 15 minutes |
| Total timeout | 20 minutes |
| Poll interval | 30 seconds |
| Minimum for consensus | 2 reviews |
Completion Handling
Full Completion (4/4)
Proceed immediately to consolidation.
Partial Completion (2-3/4)
After timeout, offer options:
3/4 reviews complete. Codex timed out. Options: A) Proceed with 3 reviews (sufficient for consensus detection) B) Wait 5 more minutes C) Retry failed CLI
Minimal Completion (1/4)
Only 1/4 reviews complete. Options: A) Proceed with single review (no consensus possible) B) Wait longer C) Abort and try manual review
No Completion (0/4)
No reviews completed within timeout. Possible causes: - CLI authentication issues - Network problems - Prompt too large Options: A) Retry with fresh processes B) Generate prompt pack for manual execution C) Fall back to Claude-only review
Consolidation Logic
Source Attribution
Every point must be tagged with source model(s):
- [Point] — [Claude] - [Point] — [Codex, Gemini] - [Point] — [Claude, Codex, Gemini, Cursor]
Consensus Detection
Definition: Item raised by 2+ models (even if worded differently)
Detection Method:
- •Parse all feedback files
- •Extract individual points
- •Semantic matching (not exact wording)
- •Tag matches with CONSENSUS
Semantic Matching Examples:
- •"Needs error handling" = "Failure states undefined"
- •"Missing dependency" = "Order unclear"
- •"Too complex" = "Should simplify"
Divergence Detection
Definition: Models explicitly disagree on approach
Format:
DIVERGENT: [Topic] - [Position A] — Claude, Codex - [Position B] — Gemini, Cursor - **Resolution needed:** [What decision is required]
Output Format
Consolidated Feedback
# Consolidated Spec Feedback — [PROJECT_NAME] **Sources:** claude_feedback.md, codex_feedback.md, gemini_feedback.md, cursor_feedback.md **Date:** [DATE] **Review Duration:** [X minutes] --- ## Consensus Summary Items flagged by 2+ reviewers: 1. [Issue] — Claude, Codex, Gemini 2. [Issue] — Claude, Cursor 3. [Issue] — All 4 models --- ## By Category ### Implementation Feasibility (Codex Lead) [Points with validation from others] ### Architecture & Structure (Cursor Lead) [Points with validation from others] ### Security & Edge Cases (Claude Lead) [Points] ### Patterns & Breadth (Gemini Lead) [Points with validation from others] --- ## Divergent Opinions [Any DIVERGENT items] --- ## Appendix: Full Reviews ### Claude (In-Session) [Full content] ### Codex (CLI) [Full content] ### Gemini (CLI) [Full content] ### Cursor (CLI) [Full content]
Manifest File (JSON)
{
"timestamp": "2026-02-01T16:30:00-08:00",
"spec_file": "/path/to/spec.md",
"output_dir": "~/.claude/reviews/reviews-2026-02-01-1630",
"duration_seconds": 423,
"reviewers": [
{"name": "claude", "output": "claude_feedback.md", "status": "success"},
{"name": "codex", "output": "codex_feedback.md", "status": "success"},
{"name": "gemini", "output": "gemini_feedback.md", "status": "success"},
{"name": "cursor", "output": "cursor_feedback.md", "status": "error:1"}
]
}
Quality Validation
Pre-Consolidation Checks
For each feedback file, verify:
- • File exists and has content (> 100 bytes)
- • Contains expected section headers
- • Has substantive content (not just headers)
- • Follows expected output format
Post-Consolidation Checks
- • All source files referenced
- • Consensus items tagged
- • Divergent items tagged
- • Appendix contains full reviews
- • No points lost in consolidation
Error Handling
CLI Not Found
codex CLI not found Options: A) Install: npm install -g @openai/codex B) Proceed without Codex (3 reviewers) C) Generate prompt for manual execution
CLI Authentication Failure
gemini authentication failed The CLI may need re-authentication: gemini login Options: A) Skip Gemini for now B) Wait while you authenticate
Output File Corruption
[Model] output file appears corrupted or incomplete. Options: A) Exclude from consolidation B) Request retry C) Include partial content with warning
Fallback: Prompt Pack
If CLIs aren't working, generate copy-paste commands:
# Save prompts to files cat > /tmp/codex_prompt.txt << 'EOF' [Full prompt here] EOF # Terminal 1 - Codex: codex exec --full-auto -o codex_feedback.md "$(cat /tmp/codex_prompt.txt)" # Terminal 2 - Gemini: gemini --yolo "$(cat /tmp/gemini_prompt.txt)" > gemini_feedback.md # Terminal 3 - Cursor: cursor-agent --print --output-format text "$(cat /tmp/cursor_prompt.txt)" > cursor_feedback.md
Protocol version: 2.0 | Updated: 2026-02-01 v1.0: Fake Claude subagents pretending to be other models v2.0: Real CLI invocations with actual model diversity