Class Diagram to Neo4j Extraction Skill
Overview
This skill extracts structured data from UML class diagrams (images) and populates Neo4j graph databases. It's designed for:
- •TMF (TM Forum) API specification diagrams
- •UML class diagrams
- •Entity-relationship diagrams
- •Schema diagrams
Workflow
1. Image Analysis
- •Use vision models (GPT-4 Vision, Claude Vision, etc.) to analyze diagram images
- •Extract text, boxes, lines, and relationships
- •Identify entities, properties, and relationships
2. Structured Extraction
- •Parse entities (classes) with their properties
- •Extract relationships (associations, inheritance, etc.)
- •Capture cardinality and relationship metadata
- •Handle color coding and visual indicators
3. Data Normalization
- •Convert to structured format (YAML/JSON)
- •Normalize entity names and types
- •Standardize relationship types
- •Handle references and aliases
4. Neo4j Population
- •Generate Cypher queries
- •Create nodes with properties
- •Create relationships with metadata
- •Handle constraints and indexes
Usage Patterns
Pattern 1: Direct Image → Neo4j
from classdiagram_to_neo4j import extract_and_populate
# Extract from image and populate Neo4j
extract_and_populate(
image_path="diagrams/product_offering.png",
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
Pattern 2: Extract → Review → Populate
from classdiagram_to_neo4j import extract_diagram, populate_neo4j
# Step 1: Extract to JSON/YAML
data = extract_diagram(
image_path="diagrams/product_offering.png",
output_format="json",
output_path="extracted.json"
)
# Step 2: Review/edit JSON if needed
# ... manual review ...
# Step 3: Populate Neo4j
populate_neo4j(
data=data,
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
Pattern 3: Batch Processing
from classdiagram_to_neo4j import extract_diagram, populate_neo4j
# Process multiple diagrams
diagrams = [
"diagrams/product_offering.png",
"diagrams/category.png",
"diagrams/pricing.png"
]
for diagram_path in diagrams:
data = extract_diagram(diagram_path, output_format="json")
populate_neo4j(
data=data,
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password"
)
Diagram Types Supported
TMF-Style Diagrams
- •ProductOffering hub diagrams
- •Category relationships
- •Specification diagrams
- •Reference entity diagrams
UML Class Diagrams
- •Classes with attributes
- •Associations with multiplicities
- •Inheritance hierarchies
- •Aggregations and compositions
Schema Diagrams
- •Database schemas
- •API schemas
- •Domain models
Extraction Process
Step 1: Vision Analysis
The vision model analyzes the image and extracts:
- •Entities: Boxes/classes with names
- •Properties: Attributes within entities
- •Relationships: Lines/arrows between entities
- •Metadata: Cardinality, roles, types
- •Visual Indicators: Colors, borders, dashed lines
Step 2: Structured Output
Extracted data is normalized into:
meta:
source: "diagrams/product_offering.png"
extracted_at: "2024-01-01T00:00:00Z"
diagram_type: "uml_class"
entities:
ProductOffering:
label: "ProductOffering"
properties:
- name: "id"
type: "string"
required: true
- name: "name"
type: "string"
required: true
- name: "isBundle"
type: "boolean"
required: false
relationships:
- from: "ProductOffering"
to: "ProductSpecification"
type: "has_specification"
cardinality: "0..1"
direction: "out"
properties:
role: null
Step 3: Neo4j Population
Generates Cypher queries:
// Create schema block
MERGE (sb:SchemaBlock {id: 'tmf620_productoffering'})
SET sb.title = 'ProductOffering Diagram',
sb.artifact = 'diagrams/productoffering.png';
// Create entities with FQN
MERGE (e:Entity {fqn: 'tmf620_productoffering#ProductOffering'})
SET e.name = 'ProductOffering',
e.specId = 'tmf620_productoffering',
e.kind = 'Entity';
// Create fields
MERGE (f:Field {fqn: 'tmf620_productoffering#ProductOffering.name'})
SET f.name = 'name',
f.type = 'string',
f.required = true;
// Link field to entity
MATCH (e:Entity {fqn: 'tmf620_productoffering#ProductOffering'})
MATCH (f:Field {fqn: 'tmf620_productoffering#ProductOffering.name'})
MERGE (e)-[:HAS_FIELD]->(f);
// Create relationships
MATCH (from:Entity {fqn: 'tmf620_productoffering#ProductOffering'})
MATCH (to:Entity {fqn: 'tmf620_productoffering#ProductSpecification'})
MERGE (from)-[r:RELATES_TO {
type: 'has_specification',
fromCardinality: '0..1',
toCardinality: '1',
direction: 'out'
}]->(to);
Key Features
1. Scalable Data Model
- •Uses stable labels (
:Entity,:RefType,:SchemaBlock) instead of per-class labels - •Uses FQN (Fully Qualified Name) for entity identity:
<specId>#<entityName> - •Uses generic
RELATES_TOrelationship type withtypeproperty - •Avoids label explosion and supports namespacing
- •See
references/SCALABLE_RELATIONSHIP_MODEL.md
2. Provenance Tracking
- •Tracks source diagram via
SchemaBlocknodes - •Uses FQN for entity identity (supports multiple versions)
- •Maintains extraction metadata (
specId,extracted_at) - •Links entities to schema blocks via
CONTAINS_ENTITY
3. Conflict Resolution
- •Handles duplicate entities
- •Merges properties intelligently
- •Resolves relationship conflicts
4. Validation
- •Validates extracted data structure before population
- •Checks for missing required fields
- •Verifies relationship consistency
- •Validates cardinality formats
- •Can be disabled with
--no-validateflag
5. Property Persistence
- •Properties are stored as
:Fieldnodes - •Fields linked to entities via
HAS_FIELDrelationships - •Property metadata (type, required, default) fully persisted
Configuration
Vision Model Settings
vision: provider: "openai" # or "anthropic" model: "gpt-4o" # or "claude-3-5-sonnet-20241022" max_tokens: 8000 temperature: 0.1 use_structured_output: true # Uses JSON mode when available
Neo4j Settings
neo4j: uri: "bolt://localhost:7687" user: "neo4j" password: "password" database: "neo4j" create_constraints: true create_indexes: true
Extraction Settings
extraction: include_properties: true include_methods: false normalize_names: true handle_references: true extract_cardinality: true
Output Formats
YAML Format
See schema_examples/tmf620/productoffering_hub.core.example.yaml for example.
JSON Format
{
"meta": {
"source": "diagrams/product_offering.png",
"extracted_at": "2024-01-01T00:00:00Z"
},
"entities": {
"ProductOffering": {
"label": "ProductOffering",
"properties": [...]
}
},
"relationships": [...]
}
Cypher Format
See schema_examples/neo4j/tmf620_productoffering_scalable_model.cypher for example.
Integration with Existing Tools
With TMF MCP Builder
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent / "scripts"))
from extract_and_populate import extract_and_populate
from neo4j import GraphDatabase
# Extract and populate
extract_and_populate(
image_path="diagrams/tmf620_productoffering.png",
neo4j_password="password"
)
# Query for relevant subgraph
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
result = session.run("""
MATCH (e:Entity {name: 'ProductOffering'})-[r:RELATES_TO*1..2]->(related)
WHERE r.type IN ['has_specification', 'has_price']
RETURN e, r, related
""")
# Process results...
driver.close()
Best Practices
- •
Pre-process Images
- •Ensure high resolution
- •Remove noise and artifacts
- •Standardize format (PNG preferred)
- •
Validate Extraction
- •Review extracted YAML/JSON
- •Verify entity names
- •Check relationship cardinalities
- •
Incremental Updates
- •Use merge strategies
- •Track changes
- •Maintain provenance
- •
Query Optimization
- •Create indexes on common properties
- •Use relationship type filters
- •Limit hop depth
- •
Error Handling
- •Handle missing entities
- •Validate relationships
- •Log extraction issues
Examples
See examples/ directory for:
- •Simple UML class diagram extraction
- •TMF ProductOffering diagram extraction
- •Batch processing example
- •Custom extraction rules
References
- •
references/SCALABLE_RELATIONSHIP_MODEL.md- Relationship modeling approach - •
references/VISION_EXTRACTION_PROMPTS.md- Vision model prompts - •
NEO4J_REQUIREMENTS.md- Neo4j server version requirements - •
schema_examples/neo4j/- Example Cypher scripts
Neo4j Server Requirements
Important: Relationship property indexes require Neo4j server version 4.3+.
- •The
requirements.txtspecifies the Python driver version, not the server version - •Check your Neo4j server version:
neo4j versionorCALL dbms.components() - •See
NEO4J_REQUIREMENTS.mdfor full compatibility details
Troubleshooting
Common Issues
- •
Low Extraction Quality
- •Increase image resolution
- •Use better vision model
- •Provide more context in prompts
- •
Missing Relationships
- •Check diagram clarity
- •Verify relationship detection logic
- •Review extraction output
- •
Neo4j Population Errors
- •Check constraints
- •Verify relationship types
- •Review Cypher syntax
- •
Performance Issues
- •Batch operations
- •Use transactions
- •Create indexes
Future Enhancements
- •Support for sequence diagrams
- •Support for activity diagrams
- •Multi-page diagram handling
- •Automatic relationship inference
- •Diagram versioning and diff