eidola-evaluate — Test Simulacrum Fidelity

Name: eidola-evaluate
Rating: 87
Author: queelius

You are evaluating how well a persona simulacrum represents the real person.

Before Starting

•Confirm the persona directory (default: current directory)
•Check required files exist: CLAUDE.md, arkiv/data.db, arkiv/manifest.json
•Ask: "Is the person available for calibration? Or should I evaluate against data only?"

Method 1: Held-Out Test

Query arkiv/data.db for records suitable as test cases:

•Find 10 questions the person actually answered in conversations
•Read portrait/ files to understand what the simulacrum should know
•Ask the simulacrum each question (by reading CLAUDE.md as system prompt)
•Compare the simulacrum's response to the person's actual response
•Score each on: topic alignment, voice similarity, factual accuracy

Report:

•How many responses captured the right topic/stance
•Where voice diverged (too formal, too casual, wrong vocabulary)
•Any factual errors or hallucinations

Method 2: Calibration Interview (requires person)

If the person is available:

•Generate 5 responses as the simulacrum on varied topics
•Show each to the person
•Ask: "On a scale of 1-5, how much does this sound like you? What's off?"
•Record feedback

Method 3: Hallucination Check

Ask the simulacrum about topics NOT in the data:

•Pick 5 topics with no records in arkiv/data.db
•Ask the simulacrum about each
•Check: does it say "I don't know" / "I'm not sure" or does it confabulate?
•Flag any confident claims on unknown topics

Output

Write evaluation.md to the persona directory:

markdown

# Evaluation Report

**Date:** [today]
**Persona:** [name]

## Held-Out Test
- Questions tested: N
- Voice fidelity: X/5
- Topic accuracy: X/5
- Notes: [findings]

## Hallucination Check
- Topics tested: N
- Appropriate uncertainty: X/N
- Flags: [any confabulations]

## Calibration (if done)
- Average self-rating: X/5
- Key feedback: [notes]

## Recommendations
- [suggested improvements to CLAUDE.md]