You are executing the /showdown judge skill. Models will cross-judge each other's responses from a previous showdown, blind and anonymized.
Step 1: Determine Input Source
Check if a /showdown was run earlier in this conversation and you still have the responses in context.
- •If yes: Use the in-memory responses (prompt + 3 model responses). Skip to Step 2.
- •If no (or user provided a file path as argument):
- •If
$ARGUMENTScontains a file path, read that file - •Otherwise, list available showdown files:
bash
ls -t ./showdown-output/showdown-*.md 2>/dev/null | head -10
- •If files found, present a numbered list using
AskUserQuestionand let the user pick - •If no files found, tell the user: "No showdown output files found. Run
/showdownfirst." - •Read the chosen file and extract: the original prompt, and each model's full response (between the
---separators)
- •If
Step 2: Choose Custom Dimension
Based on the original prompt topic, choose ONE custom scoring dimension that's relevant. Examples:
- •Technical prompt → "Technical Correctness"
- •Creative prompt → "Creativity"
- •Business/strategy → "Actionability"
- •Debate/opinion → "Persuasiveness"
- •Code-related → "Code Quality"
The 4 fixed dimensions are always: Accuracy, Depth, Clarity, Originality.
Announce the 5 dimensions to the user before proceeding.
Step 3: Anonymize Responses
Randomly assign letters A, B, C to the three models. Record the mapping internally (e.g., A=Claude, B=GPT, C=Gemini). The assignment MUST be randomized — do not always use the same order.
To randomize, use:
echo "A B C" | tr ' ' '\n' | sort -R | tr '\n' ' '
Assign the first letter to the first model in models.conf order, second to second, etc.
Step 4: Construct Judge Prompts
For each judge model, construct a prompt containing ONLY the 2 responses from the OTHER models (not the judge's own). Use the anonymous letters.
Judge prompt template:
You are evaluating two AI-generated responses to the following prompt.
---
ORIGINAL PROMPT:
{original_prompt}
---
RESPONSE {letter_1}:
{response_from_other_model_1}
---
RESPONSE {letter_2}:
{response_from_other_model_2}
---
SCORING RUBRIC:
Rate each response 1-10 on the following dimensions. For each score, provide a 1-2 sentence justification.
1. **Accuracy** — Factual correctness and absence of hallucinations
2. **Depth** — Thoroughness of analysis, nuance, and insight
3. **Clarity** — How well-organized, readable, and understandable the response is
4. **Originality** — Novel framing, unique insights, or creative approach
5. **{custom_dimension}** — {custom_description}
IMPORTANT: Evaluate solely on content quality. Do not attempt to identify which model wrote which response. One of these may share your architecture — judge purely on merit.
OUTPUT FORMAT (use exactly this structure):
### Response {letter_1}
- Accuracy: X/10 — justification
- Depth: X/10 — justification
- Clarity: X/10 — justification
- Originality: X/10 — justification
- {custom_dimension}: X/10 — justification
**Overall:** 2-3 sentence assessment
### Response {letter_2}
- Accuracy: X/10 — justification
- Depth: X/10 — justification
- Clarity: X/10 — justification
- Originality: X/10 — justification
- {custom_dimension}: X/10 — justification
**Overall:** 2-3 sentence assessment
### Winner: {letter} (or Tie)
**Reasoning:** 2-3 sentences
Step 5: Fire Judge Calls in Parallel
Build the JSON input for judge.sh and pipe it in:
echo '<json>' | bash ~/.claude/skills/showdown/scripts/judge.sh
The JSON format:
{
"judges": [
{
"model": "claude-opus-4-6",
"display_name": "Claude Opus 4.6",
"prompt": "<constructed judge prompt for Claude>"
},
{
"model": "gpt-5.3-codex",
"display_name": "GPT-5.3 Codex",
"prompt": "<constructed judge prompt for GPT>"
},
{
"model": "gemini-3-pro-preview",
"display_name": "Gemini 3 Pro",
"prompt": "<constructed judge prompt for Gemini>"
}
]
}
Important: Use a temp file for the JSON input since it may be very large:
# Write JSON to temp file, then pipe
TMPJSON=$(mktemp)
cat > "$TMPJSON" << 'ENDJSON'
{...}
ENDJSON
cat "$TMPJSON" | bash ~/.claude/skills/showdown/scripts/judge.sh
rm "$TMPJSON"
Step 6: Present Anonymized Verdicts
Show each judge's verdict as returned, keeping responses anonymous (A/B/C):
## Judge Verdicts (Anonymized)
### Judge: Claude Opus 4.6 (judging responses {letter_x} and {letter_y}) — <duration>s
<judge response verbatim>
---
### Judge: GPT-5.3 Codex (judging responses {letter_x} and {letter_y}) — <duration>s
<judge response verbatim>
---
### Judge: Gemini 3 Pro (judging responses {letter_x} and {letter_y}) — <duration>s
<judge response verbatim>
If a judge failed, note: **<Judge>**: Failed — <error message>
Step 7: Reveal & Leaderboard
After ALL verdicts are shown, reveal the mapping and generate the leaderboard.
## Reveal | Letter | Model | |--------|-------| | A | <model name> | | B | <model name> | | C | <model name> |
Then parse all judge scores and compute the leaderboard. For each model, average the scores it received from the 2 judges that evaluated it.
### Leaderboard
| Model | Accuracy | Depth | Clarity | Originality | {Custom} | Avg | Wins |
|-------|----------|-------|---------|-------------|----------|-----|------|
| ... | ... | ... | ... | ... | ... | ... | ... |
*Scores averaged across judges. Wins = times picked as winner by judges.*
Then write a Narrative Synthesis (3-5 sentences):
- •Who won overall and why
- •Where judges agreed/disagreed
- •Any surprising patterns (e.g., a model rated highest by its competitors)
- •Whether the blind judging aligned with or diverged from the original comparison analysis
Step 8: Save (Append to Existing File)
Ask the user if they want to save the judge results using AskUserQuestion.
If yes:
- •If the source was a markdown file from
./showdown-output/, append the entire judge section (Steps 6-7 output) to that same file - •If the source was in-memory (same session), check if a showdown markdown was saved earlier. If so, append to it. If not, save as a new file:
showdown-YYYY-MM-DD-HHMMSS-judge.md
The appended section should be:
---
## Judge Verdicts
**Custom Dimension:** {custom_dimension} — {custom_description}
**Anonymization:** A={model}, B={model}, C={model}
### Judge: <name> (judging {letters}) — <duration>s
<full verdict>
---
### Judge: <name> (judging {letters}) — <duration>s
<full verdict>
---
### Judge: <name> (judging {letters}) — <duration>s
<full verdict>
---
### Leaderboard
| Model | Accuracy | Depth | Clarity | Originality | {Custom} | Avg | Wins |
|-------|----------|-------|---------|-------------|----------|-----|------|
| ... | ... | ... | ... | ... | ... | ... | ... |
### Narrative Synthesis
<3-5 sentence synthesis>
Tell the user the file path after saving.