MIP Clustering Phase
Analyze all chapter annotations and cluster metaphors into 4 major Conceptual Metaphor Theory (CMT) systems.
Your Task
Read all 9 annotation files and perform systematic clustering to identify the 4 most significant conceptual metaphor systems in the novel.
Process
1. Load All Annotations
Read each file in mip-analysis/annotations/:
- •chapter-01.json through chapter-09.json
- •Extract all metaphor annotations into a unified dataset
2. Cluster by Domain Mapping
Group metaphors by their source→target domain mappings:
- •Identify recurring patterns
- •Note variations within patterns
- •Track frequency across chapters
3. Rank CMT Systems
Evaluate each potential system by:
- •Frequency: How many instances across all chapters?
- •Distribution: How evenly spread across the novel?
- •Thematic significance: How central to the novel's themes?
- •Variety: How many different lexical manifestations?
4. Select Top 4 Systems
Choose the 4 most significant systems. Expected candidates:
- •DESIRE IS LIGHT - green light, Daisy's luminosity, brightness imagery
- •MORAL DECAY IS PHYSICAL WASTE - ashes, dust, grey, rot
- •THE PAST IS A PLACE - spatial metaphors for time, "back" imagery
- •WEALTH IS DISPLAY/PERFORMANCE - shirts, parties, possessions as presentation
Output Files
mip-analysis/cmt-systems.json
json
{
"total_metaphors_analyzed": 67,
"systems": [
{
"id": "system-1",
"name": "DESIRE IS LIGHT",
"mapping": "DESIRE IS LIGHT",
"frequency": 18,
"distribution": [2, 1, 3, 2, 4, 1, 2, 1, 2],
"key_instances": ["ch1-003", "ch5-007", "ch9-012"],
"lexical_manifestations": ["green light", "gleaming", "luminous", "bright"],
"thematic_function": "Gatsby's romantic aspiration and its ultimate unattainability"
}
],
"unclustered": [
{
"id": "ch3-005",
"reason": "Unique instance, no recurring pattern"
}
]
}
mip-analysis/cmt-summary.md
markdown
# CMT Systems Analysis Summary ## Quantitative Overview - Total metaphors analyzed: X - Metaphors clustered into systems: Y (Z%) - Unclustered/unique metaphors: N ## System Rankings ### 1. [SYSTEM NAME] (N instances, X% of total) **Mapping**: TARGET IS SOURCE **Distribution**: Chapters where it appears **Key examples**: 3-4 representative quotes **Thematic significance**: Why this matters to the novel [Repeat for all 4 systems] ## Distribution Analysis [Chapter-by-chapter breakdown] ## Clustering Notes [Any decisions made, edge cases, overlapping metaphors]
Success Criteria
- •All 9 annotation files read and processed
- •4 distinct CMT systems identified with clear mappings
- •Quantitative data accurate (frequencies, distributions)
- •Each system has 3+ key examples identified
- •Summary provides clear rationale for system selection
- •Ready for
/mip-outlinephase