You are executing the /showdown judge skill. Models will cross-judge each other's responses from a previous showdown, blind and anonymized.

Step 1: Determine Input Source

Check if a /showdown was run earlier in this conversation and you still have the responses in context.

•If yes: Use the in-memory responses (prompt + 3 model responses). Skip to Step 2.
•
If no (or user provided a file path as argument):
1. •If $ARGUMENTS contains a file path, read that file
2. •
  Otherwise, list available showdown files:
  bash
  ls -t ./showdown-output/showdown-*.md 2>/dev/null | head -10
3. •If files found, present a numbered list using AskUserQuestion and let the user pick
4. •If no files found, tell the user: "No showdown output files found. Run /showdown first."
5. •Read the chosen file and extract: the original prompt, and each model's full response (between the --- separators)

Step 2: Choose Custom Dimension

Based on the original prompt topic, choose ONE custom scoring dimension that's relevant. Examples:

•Technical prompt → "Technical Correctness"
•Creative prompt → "Creativity"
•Business/strategy → "Actionability"
•Debate/opinion → "Persuasiveness"
•Code-related → "Code Quality"

The 4 fixed dimensions are always: Accuracy, Depth, Clarity, Originality.

Announce the 5 dimensions to the user before proceeding.

Step 3: Anonymize Responses

Randomly assign letters A, B, C to the three models. Record the mapping internally (e.g., A=Claude, B=GPT, C=Gemini). The assignment MUST be randomized — do not always use the same order.

To randomize, use:

bash

echo "A B C" | tr ' ' '\n' | sort -R | tr '\n' ' '

Assign the first letter to the first model in models.conf order, second to second, etc.

Step 4: Construct Judge Prompts

For each judge model, construct a prompt containing ONLY the 2 responses from the OTHER models (not the judge's own). Use the anonymous letters.

Judge prompt template:

code

You are evaluating two AI-generated responses to the following prompt.

---
ORIGINAL PROMPT:
{original_prompt}
---

RESPONSE {letter_1}:
{response_from_other_model_1}

---

RESPONSE {letter_2}:
{response_from_other_model_2}

---

SCORING RUBRIC:
Rate each response 1-10 on the following dimensions. For each score, provide a 1-2 sentence justification.

1. **Accuracy** — Factual correctness and absence of hallucinations
2. **Depth** — Thoroughness of analysis, nuance, and insight
3. **Clarity** — How well-organized, readable, and understandable the response is
4. **Originality** — Novel framing, unique insights, or creative approach
5. **{custom_dimension}** — {custom_description}

IMPORTANT: Evaluate solely on content quality. Do not attempt to identify which model wrote which response. One of these may share your architecture — judge purely on merit.

OUTPUT FORMAT (use exactly this structure):

### Response {letter_1}
- Accuracy: X/10 — justification
- Depth: X/10 — justification
- Clarity: X/10 — justification
- Originality: X/10 — justification
- {custom_dimension}: X/10 — justification
**Overall:** 2-3 sentence assessment

### Response {letter_2}
- Accuracy: X/10 — justification
- Depth: X/10 — justification
- Clarity: X/10 — justification
- Originality: X/10 — justification
- {custom_dimension}: X/10 — justification
**Overall:** 2-3 sentence assessment

### Winner: {letter} (or Tie)
**Reasoning:** 2-3 sentences

Step 5: Fire Judge Calls in Parallel

Build the JSON input for judge.sh and pipe it in:

bash

echo '<json>' | bash ~/.claude/skills/showdown/scripts/judge.sh

The JSON format:

json

{
  "judges": [
    {
      "model": "claude-opus-4-6",
      "display_name": "Claude Opus 4.6",
      "prompt": "<constructed judge prompt for Claude>"
    },
    {
      "model": "gpt-5.3-codex",
      "display_name": "GPT-5.3 Codex",
      "prompt": "<constructed judge prompt for GPT>"
    },
    {
      "model": "gemini-3-pro-preview",
      "display_name": "Gemini 3 Pro",
      "prompt": "<constructed judge prompt for Gemini>"
    }
  ]
}

Important: Use a temp file for the JSON input since it may be very large:

bash

# Write JSON to temp file, then pipe
TMPJSON=$(mktemp)
cat > "$TMPJSON" << 'ENDJSON'
{...}
ENDJSON
cat "$TMPJSON" | bash ~/.claude/skills/showdown/scripts/judge.sh
rm "$TMPJSON"

Step 6: Present Anonymized Verdicts

Show each judge's verdict as returned, keeping responses anonymous (A/B/C):

code

## Judge Verdicts (Anonymized)

### Judge: Claude Opus 4.6 (judging responses {letter_x} and {letter_y}) — <duration>s

<judge response verbatim>

---

### Judge: GPT-5.3 Codex (judging responses {letter_x} and {letter_y}) — <duration>s

<judge response verbatim>

---

### Judge: Gemini 3 Pro (judging responses {letter_x} and {letter_y}) — <duration>s

<judge response verbatim>

If a judge failed, note: **<Judge>**: Failed — <error message>

Step 7: Reveal & Leaderboard

After ALL verdicts are shown, reveal the mapping and generate the leaderboard.

code

## Reveal

| Letter | Model |
|--------|-------|
| A | <model name> |
| B | <model name> |
| C | <model name> |

Then parse all judge scores and compute the leaderboard. For each model, average the scores it received from the 2 judges that evaluated it.

code

### Leaderboard

| Model | Accuracy | Depth | Clarity | Originality | {Custom} | Avg | Wins |
|-------|----------|-------|---------|-------------|----------|-----|------|
| ... | ... | ... | ... | ... | ... | ... | ... |

*Scores averaged across judges. Wins = times picked as winner by judges.*

Then write a Narrative Synthesis (3-5 sentences):

•Who won overall and why
•Where judges agreed/disagreed
•Any surprising patterns (e.g., a model rated highest by its competitors)
•Whether the blind judging aligned with or diverged from the original comparison analysis

Step 8: Save (Append to Existing File)

Ask the user if they want to save the judge results using AskUserQuestion.

If yes:

•If the source was a markdown file from ./showdown-output/, append the entire judge section (Steps 6-7 output) to that same file
•If the source was in-memory (same session), check if a showdown markdown was saved earlier. If so, append to it. If not, save as a new file: showdown-YYYY-MM-DD-HHMMSS-judge.md

The appended section should be:

markdown


---

## Judge Verdicts

**Custom Dimension:** {custom_dimension} — {custom_description}
**Anonymization:** A={model}, B={model}, C={model}

### Judge: <name> (judging {letters}) — <duration>s

<full verdict>

---

### Judge: <name> (judging {letters}) — <duration>s

<full verdict>

---

### Judge: <name> (judging {letters}) — <duration>s

<full verdict>

---

### Leaderboard

| Model | Accuracy | Depth | Clarity | Originality | {Custom} | Avg | Wins |
|-------|----------|-------|---------|-------------|----------|-----|------|
| ... | ... | ... | ... | ... | ... | ... | ... |

### Narrative Synthesis

<3-5 sentence synthesis>

Tell the user the file path after saving.