Data Classification

Purpose

Classify data elements by sensitivity tier, define handling requirements for each tier, and map PII flows to ensure appropriate protections are applied throughout the data lifecycle.

Inputs

•Data model or schema being analyzed
•Data elements and their sources (user input, system generated, third-party)
•Storage and processing architecture
•Integration points and data sharing arrangements
•Applicable regulatory context (from compliance review if available)

Process

Step 1: Inventory Data Elements

Catalog every data element in scope. For each element, document:

•Name and description: What is the field and what does it represent?
•Source: User-provided, system-generated, derived, or third-party
•Format: Free text, structured (email, phone), numeric, binary, etc.
•Volume: Approximate record count and growth rate
•Current protections: Encryption at rest/transit, access controls, masking

Step 2: Classify by Sensitivity Tier

Assign each data element to a sensitivity tier:

•Public: Information intended for public access. No PII. No competitive sensitivity. Example: marketing copy, public API docs, open-source code.
•Internal: Not public, but low impact if disclosed. Non-identifying operational data. Example: internal project names, non-sensitive configs, aggregated metrics.
•Confidential: Business-sensitive or indirect PII. Disclosure causes material harm. Example: email addresses, IP addresses, financial reports, API keys, internal strategies.
•Restricted: Direct PII, protected health information, payment data, credentials. Disclosure triggers regulatory obligations. Example: SSN, medical records, credit card numbers, passwords, biometric data.

Step 3: Map PII Flows

For each PII element, trace the complete data flow:

•Collection: Where does it enter the system? What consent covers it?
•Transit: How does it move between services? Is it encrypted in transit?
•Processing: Which services touch it? Is the access minimized to what's necessary?
•Storage: Where is it persisted? Is it encrypted at rest? Is it in the right geography?
•Sharing: Does it leave the system boundary? Under what contractual protections?
•Deletion: How is it removed? Does deletion cascade to all copies?

Step 4: Define Handling Requirements Per Tier

Specify the minimum controls required for each sensitivity tier:

•Access control: Who can read/write/delete? What authentication is required?
•Encryption: At rest, in transit, application-level? Key management approach?
•Masking/redaction: In logs, error messages, API responses, UI displays?
•Retention: Maximum retention period, automated expiry mechanism?
•Audit: What access events must be logged? What level of detail?
•Backup/recovery: Backup frequency, encryption of backups, geographic constraints?

Step 5: Identify Cross-Boundary Data Transfers

Document every case where data moves across trust boundaries:

•Internal service to external API (analytics, payment processors, email services)
•Production to non-production environments (test data, staging copies)
•Geographic transfers (EU to US, cross-region replication)
•Employee access (admin tools, support dashboards, database queries)
•For each transfer: document the justification, contractual protections (DPA, SCCs), and technical safeguards

Step 6: Specify Encryption and Masking Requirements

Define specific technical controls for each data element:

•Encryption at rest: AES-256 for Restricted, AES-128 minimum for Confidential
•Encryption in transit: TLS 1.2+ for all tiers, mTLS for Restricted inter-service
•Application-level encryption: Field-level encryption for Restricted elements in shared databases
•Log masking: Restricted fields never appear in logs; Confidential fields are masked/truncated
•Display masking: SSN shows last 4 only, credit cards show last 4 only, emails partially masked in admin views
•Test data: Restricted and Confidential data must be synthesized or anonymized for non-production

Output Format

Data Classification Matrix

Data Element	Source	Sensitivity Tier	PII?	Regulatory Scope	Owner
Email address	User input	Confidential	Yes	GDPR, CCPA	User Service
Session token	System	Restricted	No	SOC2	Auth Service
Page views	System	Internal	No	—	Analytics
...	...	...	...	...	...

Handling Requirements Table

Tier	Access Control	Encryption (Rest)	Encryption (Transit)	Log Masking	Retention	Audit Level
Public	Open	Optional	TLS 1.2+	None	Unlimited	None
Internal	Role-based	Optional	TLS 1.2+	None	2 years	Read-only
Confidential	Role-based + MFA	AES-128+	TLS 1.2+	Masked	Defined per type	Read/Write
Restricted	Need-to-know + MFA	AES-256	TLS 1.2+ / mTLS	Never logged	Minimum viable	Full (who/what/when)

PII Flow Diagram

code

[User Input] ──TLS──▶ [API Gateway] ──mTLS──▶ [User Service]
                                                    │
                                              [Encrypted DB]
                                                    │
                                    ┌───────────────┼───────────────┐
                                    ▼               ▼               ▼
                              [Analytics]     [Email SaaS]    [Backup Store]
                              (anonymized)    (DPA in place)  (AES-256)

Quality Checks

• Every data element in scope is inventoried and classified
• All PII elements are identified and mapped through their full lifecycle
• Handling requirements are defined for every sensitivity tier
• Cross-boundary data transfers are documented with justification and protections
• Encryption requirements are specific (algorithm, key size) not generic
• Log masking rules prevent Restricted data from appearing in any log output
• Test/staging environments have no production Restricted or Confidential data
• Classification matrix identifies an owner for each data element

Evolution Notes