AgentSkillsCN

compare-image-evaluations

分析故事中的图像质量评价——直接从数据库读取数据,确保每页结果精准无误

SKILL.md
--- frontmatter
name: compare-image-evaluations
description: Analyze image quality evaluations from a story - reads directly from database for accurate per-page results

Compare Image Evaluations

Overview

Analyzes the three evaluation methods used during story generation to understand image quality issues. Reads directly from the database (not logs) for accurate per-page results.

When to Use

  • After story generation completes with quality issues
  • When debugging why characters look wrong or inconsistent
  • When the user asks to "analyze evaluations" or "check image quality"
  • When investigating why certain pages have low scores

Quick Start

bash
# Analyze latest story
node scripts/compare-image-evaluations-db.js

# Analyze specific story
node scripts/compare-image-evaluations-db.js job_1769380512008_aog73mrs6

The Three Evaluation Methods

MethodWhat It ChecksStored In
Quality EvalSingle image against prompt and referencessceneImages[].qualityReasoning
Incremental ConsistencyCurrent page vs previous pagessceneImages[].retryHistory
Final ConsistencyAll pages in batchesTriggers regeneration, logged only

Data Sources

Quality Evaluation (Primary)

Stored in stories.data.sceneImages[].qualityReasoning:

javascript
{
  "figures": [...],           // Detected figures with position, hair, clothing
  "matches": [{               // Character identification
    "figure": 1,
    "reference": "Lukas",
    "confidence": 0.95,
    "face_bbox": [0.32, 0.48, 0.45, 0.55],
    "hair_ok": true,
    "clothing_ok": true,
    "issues": ["expression mismatch"]
  }],
  "score": 8,                 // 0-10 (multiply by 10 for percentage)
  "verdict": "PASS",          // PASS, SOFT_FAIL, HARD_FAIL
  "issues_summary": "...",    // Human-readable issues
  "fixable_issues": [...]     // Issues that can be fixed by inpainting
}

Incremental Consistency

Stored in sceneImages[].retryHistory:

javascript
{
  "type": "consistency",
  "consistencyScore": 7,      // 0-10
  "consistencyIssues": [
    "[MAJOR] clothing: Manuel's outfit changed..."
  ]
}

Final Consistency

Not stored in DB - only triggers regeneration. Check logs if needed:

bash
grep "CONSISTENCY REGEN" logfile.log

Output Format

The script produces a table plus detailed issues:

code
| Page | Score | Verdict | Characters | Issues |
|------|-------|---------|------------|--------|
|  7 |  50% ⚠️ | PASS    | Lukas, Man | Compositional error... |
|  8 |  55% ⚠️ | PASS    | Lukas, Man | Manuel in Roger's position... |

## Detailed Issues (pages with score < 80%)

### Page 8 (55%)
Characters: Lukas, Manuel, Roger
Issues: Incorrect character in central position (Manuel instead of Roger)
Fixable issues:
  - [CRITICAL] character: The character in the center should be Roger, not Manuel
Figure matches:
  - Figure 1: Manuel (90%)
  - Figure 2: Manuel (90%) ← DUPLICATE, should be Roger
  - Figure 3: Lukas (95%)

Why Database > Logs

ApproachProsCons
DatabaseAccurate per-page mapping, structured data, no parsing errorsOnly has final state
LogsHas timing, retries, all attemptsInterleaved/parallel execution causes wrong page associations

Always prefer database for evaluation analysis. Use logs only for timing/cost analysis.

Common Issues Found

Issue TypeExampleSeverity
Character swap"Manuel in Roger's position"CRITICAL
Missing character"Roger missing from scene"CRITICAL
Clothing mismatch"Matthias wearing Lukas's outfit"CRITICAL
Wrong colors"Sneakers green instead of blue"MODERATE
Pose mismatch"Hand in pocket instead of pointing"MODERATE
Object issues"Medallion not glowing"MINOR

Related Scripts

bash
# Full story log analysis (timing, costs)
node scripts/analyze-story-log.js

# Extract faces from evaluations
node scripts/extract-faces.js <storyId>

# Check evaluation data structure
node scripts/show-eval-fields.js <storyId>