ORM Structure Guide
This skill provides context about the ORM structure in narrativegraphs/db/.
Core Concepts
The data model supports two graph paradigms:
| Graph Type | Primary Annotations | Has Relations/Predicates |
|---|---|---|
| NarrativeGraph | Triplets (subject-predicate-object) | Yes |
| CooccurrenceGraph | Tuplets (entity-entity pairs) | No |
Annotation Types (have doc_id directly)
TripletOrm (triplets.py)
- •Represents a subject-predicate-object extraction from text
- •Has:
doc_id,subject_id,predicate_id,object_id,relation_id,cooccurrence_id - •Stores span positions and text for subject, predicate, and object
- •Mixes in
AnnotationMixin(providesdoc_id,timestamp,documentrelationship)
TupletOrm (tuplets.py)
- •Represents an entity-entity cooccurrence extraction
- •Has:
doc_id,entity_one_id,entity_two_id,cooccurrence_id - •Stores span positions and text for both entities
- •Mixes in
AnnotationMixin
EntityOccurrenceOrm (entityoccurrences.py)
- •Represents a single entity mention/occurrence in text
- •Has:
doc_id,entity_id,span_start,span_end,span_text - •Relationships:
entity(→ EntityOrm),document(→ DocumentOrm) - •Mixes in
AnnotationMixin - •Used by EntityOrm to derive
alt_labels(alternative surface forms)
Higher-Level ORMs (backed by annotations)
All these mix in AnnotationBackedTextStatsMixin which provides:
- •Stats columns:
frequency,doc_frequency,spread,adjusted_tf_idf,first_occurrence,last_occurrence - •
_annotationsproperty (abstract, returns backing triplets/tuplets) - •
doc_idsproperty (derived from_annotations)
EntityOrm (entities.py)
- •Canonical entity (e.g., "Microsoft", "Satya Nadella")
- •Relationships:
- •
occurrences→ EntityOccurrenceOrm (all mentions of this entity) - •
subject_triplets/object_triplets→tripletsproperty - •
_entity_one_tuplets/_entity_two_tuplets→tupletsproperty - •
subject_relations/object_relations→relationsproperty - •
_entity_one_cooccurrences/_entity_two_cooccurrences→cooccurrencesproperty
- •
- •
_annotationsreturnstriplets + tuplets(union for both graph types) - •Has
alt_labelshybrid property (derived fromoccurrencesspan texts)
PredicateOrm (predicates.py)
- •Canonical predicate/verb (e.g., "acquired", "announced")
- •Relationships:
triplets,relations - •
_annotationsreturnstriplets - •Has
alt_labelshybrid property
RelationOrm (relations.py)
- •Canonical relation tuple: (subject_entity, predicate, object_entity)
- •Has:
subject_id,predicate_id,object_id,significance - •Relationships:
subject,predicate,object,triplets - •
_annotationsreturnstriplets - •Has
alt_labelshybrid property
CooccurrenceOrm (cooccurrences.py)
- •Canonical cooccurrence: (entity_one, entity_two) where entity_one_id <= entity_two_id
- •Has:
entity_one_id,entity_two_id,pmi - •Relationships:
entity_one,entity_two,tuplets - •
_annotationsreturnstuplets
DocumentOrm (documents.py)
- •Source document with
text,str_id,timestamp - •Relationships:
triplets,tuplets,entity_occurrences - •Has categories via
CategorizableMixin
Mixins (common.py, documents.py)
- •CategorizableMixin: Provides category support
- •CategoryMixin: Base for category tables (e.g.,
EntityCategory) - •HasAltLabels: For ORMs with alternative surface forms
- •AnnotationMixin: For triplets/tuplets (provides
doc_id,documentrelationship) - •AnnotationBackedTextStatsMixin: For higher-level ORMs (stats +
doc_ids)
Relationship Diagram
code
DocumentOrm
│
├── triplets ──────────► TripletOrm ◄── subject/object ── EntityOrm
│ │ │
│ ├── predicate ── PredicateOrm │
│ │ │ │
│ └── relation ─── RelationOrm ◄─┘
│ │
├── tuplets ────────────► TupletOrm ◄────────────┼── entity_one/two ── EntityOrm
│ │ │
│ └── cooccurrence ── CooccurrenceOrm
│
└── entity_occurrences ─► EntityOccurrenceOrm ◄── entity ── EntityOrm