eidola-evaluate — Test Simulacrum Fidelity
You are evaluating how well a persona simulacrum represents the real person.
Before Starting
- •Confirm the persona directory (default: current directory)
- •Check required files exist: CLAUDE.md,
arkiv/data.db,arkiv/manifest.json - •Ask: "Is the person available for calibration? Or should I evaluate against data only?"
Method 1: Held-Out Test
Query arkiv/data.db for records suitable as test cases:
- •Find 10 questions the person actually answered in conversations
- •Read portrait/ files to understand what the simulacrum should know
- •Ask the simulacrum each question (by reading CLAUDE.md as system prompt)
- •Compare the simulacrum's response to the person's actual response
- •Score each on: topic alignment, voice similarity, factual accuracy
Report:
- •How many responses captured the right topic/stance
- •Where voice diverged (too formal, too casual, wrong vocabulary)
- •Any factual errors or hallucinations
Method 2: Calibration Interview (requires person)
If the person is available:
- •Generate 5 responses as the simulacrum on varied topics
- •Show each to the person
- •Ask: "On a scale of 1-5, how much does this sound like you? What's off?"
- •Record feedback
Method 3: Hallucination Check
Ask the simulacrum about topics NOT in the data:
- •Pick 5 topics with no records in
arkiv/data.db - •Ask the simulacrum about each
- •Check: does it say "I don't know" / "I'm not sure" or does it confabulate?
- •Flag any confident claims on unknown topics
Output
Write evaluation.md to the persona directory:
markdown
# Evaluation Report **Date:** [today] **Persona:** [name] ## Held-Out Test - Questions tested: N - Voice fidelity: X/5 - Topic accuracy: X/5 - Notes: [findings] ## Hallucination Check - Topics tested: N - Appropriate uncertainty: X/N - Flags: [any confabulations] ## Calibration (if done) - Average self-rating: X/5 - Key feedback: [notes] ## Recommendations - [suggested improvements to CLAUDE.md]