PrivGraph: Mermaid Security & Privacy Annotation

PrivGraph annotates Mermaid diagrams with privacy and security metadata using comment-based directives, then evaluates configurable rules to surface control gaps.

Core Workflow

When a user presents a Mermaid diagram for privacy/security review:

•Parse the diagram — identify nodes, edges, subgraphs, and any existing @pg: annotations
•Classify — determine what data flows where (direct identifiers, health data, credentials, etc.)
•Annotate — add @pg: directives as Mermaid comments for data classes, controls, boundaries, compliance
•Evaluate — check the annotated diagram against the rule set (see references/rules.yaml)
•Report — surface findings with severity, description, and remediation guidance

When a user asks to create a new annotated diagram from scratch, follow steps 2–5 after helping them build the diagram.

Annotation Syntax

All annotations are Mermaid comments and do not affect rendering:

code

%% @pg:<directive> <target> <parameters>

Directives

@pg:data-class — classify data on a flow

code

%% @pg:data-class A-->B DIRECT_ID, LOCATION

@pg:control — declare present or missing (!) controls on a node or flow

code

%% @pg:control UserDB encrypted-at-rest, access-controlled
%% @pg:control UserDB !retention-policy
%% @pg:control A-->B encrypted-in-transit

@pg:boundary — declare trust boundary crossing on a flow

code

%% @pg:boundary A-->B third-party

@pg:compliance — declare applicable compliance frameworks

code

%% @pg:compliance UserDB GDPR, CCPA

@pg:risk-accept — suppress a rule finding with justification

code

%% @pg:risk-accept Cache PG-003 "24hr TTL enforced at Redis level. Approved 2024-01-10"

@pg:meta — diagram-level metadata

code

%% @pg:meta owner "platform-team"
%% @pg:meta system "checkout-service"
%% @pg:meta reviewed "2024-01-15"

Data Classifications

Classification	Description	Examples
`DIRECT_ID`	Direct identifiers	Names, emails, SSNs, phone numbers
`INDIRECT_ID`	Indirect/pseudonymous identifiers	Opaque IDs, UUIDs, session tokens, pseudonyms
`PHI`	Protected health information	EHR, Diagnoses, prescriptions, lab results
`PCI`	Payment card data	Card numbers, CVVs
`CREDENTIALS`	Authentication secrets	Passwords, API keys, tokens
`BIOMETRIC`	Biometric data	Fingerprints, face data, voice prints
`LOCATION`	Location data	GPS coordinates, IP-derived location
`FINANCIAL`	Financial data	Bank accounts, transactions
`SENSITIVE`	Other sensitive data	Catch-all for sensitive categories
`PUBLIC`	Non-sensitive data	Public content, marketing copy

Controls

encrypted-at-rest, encrypted-in-transit, access-controlled, audit-logged, retention-policy, anonymized, minimized, consent-managed, dpa-in-place

Prefix with ! to mark a control as explicitly missing.

Boundaries

internal, third-party, cross-region, cross-cloud, public-internet, user-device

Compliance Frameworks

GDPR, CCPA, HIPAA, PCI-DSS, SOC2, FERPA

Rule Evaluation

After annotating, evaluate the diagram against the rule set. Load references/rules.yaml for the full rule definitions. Here is the summary:

Rule	Sev	Trigger
PG-001	HIGH	DIRECT_ID/INDIRECT_ID/PHI flow missing `encrypted-in-transit`
PG-002	HIGH	DIRECT_ID/INDIRECT_ID/PHI node missing `encrypted-at-rest`
PG-003	MED	Sensitive data node missing `retention-policy`
PG-004	HIGH	CREDENTIALS flow missing encryption
PG-005	CRIT	DIRECT_ID to `third-party` missing `dpa-in-place`
PG-006	HIGH	PHI to `third-party` boundary
PG-007	MED	`cross-region` flow missing compliance annotation
PG-008	HIGH	HIPAA-scoped flow missing `audit-logged`
PG-009	MED	GDPR-scoped node missing `retention-policy`
PG-010	LOW	Flow crosses boundary with no controls documented
PG-011	INFO	Node has zero controls documented

For each violation, check if a @pg:risk-accept suppresses it (matching target + rule ID). Suppressed findings should be noted but not reported as active violations.

Reporting Format

Present findings grouped by severity. Use this structure:

code

## PrivGraph Review: [system name or file]

### Findings

🔴 **CRITICAL — PG-005: Direct Identifiers to Third Party Without DPA**
Flow: UserSvc → Analytics
DIRECT_ID is transferred to a third party without a documented DPA.
→ Add: `%% @pg:control Analytics dpa-in-place`

🟡 **HIGH — PG-001: Unencrypted Identifier Transfer**
Flow: Mobile → API
DIRECT_ID transferred without encryption in transit.
→ Add: `%% @pg:control Mobile-->API encrypted-in-transit`

### Summary
X critical, Y high, Z medium | N findings suppressed by @pg:risk-accept

Always output the complete annotated diagram (original diagram with all @pg: annotations added) so the user can copy-paste it directly.

Handling Ambiguity

When data classifications are unclear from the diagram alone:

•Ask the user what data flows through ambiguous edges
•Default to the more sensitive classification when context suggests it
•Flag uncertain classifications with a comment: %% TODO: Confirm data classification for A-->B

When a diagram has no annotations yet, annotate it fully rather than just listing what's missing. Show the user what a complete annotation looks like.

Supported Diagram Types

•Flowcharts: flowchart LR/TB/etc. — nodes, edges, subgraphs all supported
•Sequence diagrams: sequenceDiagram — participants map to nodes, messages map to edges, boxes map to subgraphs

For annotation target syntax in sequence diagrams, use participant names: %% @pg:data-class Client->>Server DIRECT_ID

Examples

See examples/flowchart-annotated.md and examples/sequence-annotated.md for complete worked examples with annotations and findings.