Compare Image Evaluations
Overview
Analyzes the three evaluation methods used during story generation to understand image quality issues. Reads directly from the database (not logs) for accurate per-page results.
When to Use
- •After story generation completes with quality issues
- •When debugging why characters look wrong or inconsistent
- •When the user asks to "analyze evaluations" or "check image quality"
- •When investigating why certain pages have low scores
Quick Start
bash
# Analyze latest story node scripts/compare-image-evaluations-db.js # Analyze specific story node scripts/compare-image-evaluations-db.js job_1769380512008_aog73mrs6
The Three Evaluation Methods
| Method | What It Checks | Stored In |
|---|---|---|
| Quality Eval | Single image against prompt and references | sceneImages[].qualityReasoning |
| Incremental Consistency | Current page vs previous pages | sceneImages[].retryHistory |
| Final Consistency | All pages in batches | Triggers regeneration, logged only |
Data Sources
Quality Evaluation (Primary)
Stored in stories.data.sceneImages[].qualityReasoning:
javascript
{
"figures": [...], // Detected figures with position, hair, clothing
"matches": [{ // Character identification
"figure": 1,
"reference": "Lukas",
"confidence": 0.95,
"face_bbox": [0.32, 0.48, 0.45, 0.55],
"hair_ok": true,
"clothing_ok": true,
"issues": ["expression mismatch"]
}],
"score": 8, // 0-10 (multiply by 10 for percentage)
"verdict": "PASS", // PASS, SOFT_FAIL, HARD_FAIL
"issues_summary": "...", // Human-readable issues
"fixable_issues": [...] // Issues that can be fixed by inpainting
}
Incremental Consistency
Stored in sceneImages[].retryHistory:
javascript
{
"type": "consistency",
"consistencyScore": 7, // 0-10
"consistencyIssues": [
"[MAJOR] clothing: Manuel's outfit changed..."
]
}
Final Consistency
Not stored in DB - only triggers regeneration. Check logs if needed:
bash
grep "CONSISTENCY REGEN" logfile.log
Output Format
The script produces a table plus detailed issues:
code
| Page | Score | Verdict | Characters | Issues | |------|-------|---------|------------|--------| | 7 | 50% ⚠️ | PASS | Lukas, Man | Compositional error... | | 8 | 55% ⚠️ | PASS | Lukas, Man | Manuel in Roger's position... | ## Detailed Issues (pages with score < 80%) ### Page 8 (55%) Characters: Lukas, Manuel, Roger Issues: Incorrect character in central position (Manuel instead of Roger) Fixable issues: - [CRITICAL] character: The character in the center should be Roger, not Manuel Figure matches: - Figure 1: Manuel (90%) - Figure 2: Manuel (90%) ← DUPLICATE, should be Roger - Figure 3: Lukas (95%)
Why Database > Logs
| Approach | Pros | Cons |
|---|---|---|
| Database | Accurate per-page mapping, structured data, no parsing errors | Only has final state |
| Logs | Has timing, retries, all attempts | Interleaved/parallel execution causes wrong page associations |
Always prefer database for evaluation analysis. Use logs only for timing/cost analysis.
Common Issues Found
| Issue Type | Example | Severity |
|---|---|---|
| Character swap | "Manuel in Roger's position" | CRITICAL |
| Missing character | "Roger missing from scene" | CRITICAL |
| Clothing mismatch | "Matthias wearing Lukas's outfit" | CRITICAL |
| Wrong colors | "Sneakers green instead of blue" | MODERATE |
| Pose mismatch | "Hand in pocket instead of pointing" | MODERATE |
| Object issues | "Medallion not glowing" | MINOR |
Related Scripts
bash
# Full story log analysis (timing, costs) node scripts/analyze-story-log.js # Extract faces from evaluations node scripts/extract-faces.js <storyId> # Check evaluation data structure node scripts/show-eval-fields.js <storyId>