Data Classification
Purpose
Classify data elements by sensitivity tier, define handling requirements for each tier, and map PII flows to ensure appropriate protections are applied throughout the data lifecycle.
Inputs
- •Data model or schema being analyzed
- •Data elements and their sources (user input, system generated, third-party)
- •Storage and processing architecture
- •Integration points and data sharing arrangements
- •Applicable regulatory context (from compliance review if available)
Process
Step 1: Inventory Data Elements
Catalog every data element in scope. For each element, document:
- •Name and description: What is the field and what does it represent?
- •Source: User-provided, system-generated, derived, or third-party
- •Format: Free text, structured (email, phone), numeric, binary, etc.
- •Volume: Approximate record count and growth rate
- •Current protections: Encryption at rest/transit, access controls, masking
Step 2: Classify by Sensitivity Tier
Assign each data element to a sensitivity tier:
- •Public: Information intended for public access. No PII. No competitive sensitivity. Example: marketing copy, public API docs, open-source code.
- •Internal: Not public, but low impact if disclosed. Non-identifying operational data. Example: internal project names, non-sensitive configs, aggregated metrics.
- •Confidential: Business-sensitive or indirect PII. Disclosure causes material harm. Example: email addresses, IP addresses, financial reports, API keys, internal strategies.
- •Restricted: Direct PII, protected health information, payment data, credentials. Disclosure triggers regulatory obligations. Example: SSN, medical records, credit card numbers, passwords, biometric data.
Step 3: Map PII Flows
For each PII element, trace the complete data flow:
- •Collection: Where does it enter the system? What consent covers it?
- •Transit: How does it move between services? Is it encrypted in transit?
- •Processing: Which services touch it? Is the access minimized to what's necessary?
- •Storage: Where is it persisted? Is it encrypted at rest? Is it in the right geography?
- •Sharing: Does it leave the system boundary? Under what contractual protections?
- •Deletion: How is it removed? Does deletion cascade to all copies?
Step 4: Define Handling Requirements Per Tier
Specify the minimum controls required for each sensitivity tier:
- •Access control: Who can read/write/delete? What authentication is required?
- •Encryption: At rest, in transit, application-level? Key management approach?
- •Masking/redaction: In logs, error messages, API responses, UI displays?
- •Retention: Maximum retention period, automated expiry mechanism?
- •Audit: What access events must be logged? What level of detail?
- •Backup/recovery: Backup frequency, encryption of backups, geographic constraints?
Step 5: Identify Cross-Boundary Data Transfers
Document every case where data moves across trust boundaries:
- •Internal service to external API (analytics, payment processors, email services)
- •Production to non-production environments (test data, staging copies)
- •Geographic transfers (EU to US, cross-region replication)
- •Employee access (admin tools, support dashboards, database queries)
- •For each transfer: document the justification, contractual protections (DPA, SCCs), and technical safeguards
Step 6: Specify Encryption and Masking Requirements
Define specific technical controls for each data element:
- •Encryption at rest: AES-256 for Restricted, AES-128 minimum for Confidential
- •Encryption in transit: TLS 1.2+ for all tiers, mTLS for Restricted inter-service
- •Application-level encryption: Field-level encryption for Restricted elements in shared databases
- •Log masking: Restricted fields never appear in logs; Confidential fields are masked/truncated
- •Display masking: SSN shows last 4 only, credit cards show last 4 only, emails partially masked in admin views
- •Test data: Restricted and Confidential data must be synthesized or anonymized for non-production
Output Format
Data Classification Matrix
| Data Element | Source | Sensitivity Tier | PII? | Regulatory Scope | Owner |
|---|---|---|---|---|---|
| Email address | User input | Confidential | Yes | GDPR, CCPA | User Service |
| Session token | System | Restricted | No | SOC2 | Auth Service |
| Page views | System | Internal | No | — | Analytics |
| ... | ... | ... | ... | ... | ... |
Handling Requirements Table
| Tier | Access Control | Encryption (Rest) | Encryption (Transit) | Log Masking | Retention | Audit Level |
|---|---|---|---|---|---|---|
| Public | Open | Optional | TLS 1.2+ | None | Unlimited | None |
| Internal | Role-based | Optional | TLS 1.2+ | None | 2 years | Read-only |
| Confidential | Role-based + MFA | AES-128+ | TLS 1.2+ | Masked | Defined per type | Read/Write |
| Restricted | Need-to-know + MFA | AES-256 | TLS 1.2+ / mTLS | Never logged | Minimum viable | Full (who/what/when) |
PII Flow Diagram
code
[User Input] ──TLS──▶ [API Gateway] ──mTLS──▶ [User Service]
│
[Encrypted DB]
│
┌───────────────┼───────────────┐
▼ ▼ ▼
[Analytics] [Email SaaS] [Backup Store]
(anonymized) (DPA in place) (AES-256)
Quality Checks
- • Every data element in scope is inventoried and classified
- • All PII elements are identified and mapped through their full lifecycle
- • Handling requirements are defined for every sensitivity tier
- • Cross-boundary data transfers are documented with justification and protections
- • Encryption requirements are specific (algorithm, key size) not generic
- • Log masking rules prevent Restricted data from appearing in any log output
- • Test/staging environments have no production Restricted or Confidential data
- • Classification matrix identifies an owner for each data element