Database Implementation Patterns Skill

Database implementation patterns and best practices for the CODE phase of PACT framework.

Quick Reference

Schema Design Decision Tree

code

Is this a new table?
├─ YES → Start with 3NF normalization
│   ├─ High read frequency (>80% reads)?
│   │   └─ Consider selective denormalization for hot paths
│   └─ High write frequency (>80% writes)?
│       └─ Keep normalized, optimize with indexes
└─ NO → Extending existing table?
    ├─ Adding columns → Check NULL handling strategy
    ├─ Changing columns → Plan migration strategy
    └─ Removing columns → Implement soft deprecation first

Data type selection:
├─ Identifiers → BIGINT (future-proof) or UUID (distributed)
├─ Timestamps → TIMESTAMP WITH TIME ZONE (always)
├─ Money → DECIMAL(19,4) (never FLOAT)
├─ Text → VARCHAR with explicit limits (avoid TEXT unless needed)
└─ Boolean → BOOLEAN (not TINYINT or CHAR)

Normalization Guidelines

First Normal Form (1NF)

•Atomic values only (no arrays in columns)
•Each column contains single value type
•Unique column names
•Order doesn't matter

Second Normal Form (2NF)

•Already in 1NF
•No partial dependencies on composite keys
•All non-key attributes depend on entire primary key

Third Normal Form (3NF)

•Already in 2NF
•No transitive dependencies
•Non-key attributes depend only on primary key

When to Denormalize

•Read-heavy queries accessing multiple tables (proven by metrics)
•Aggregation calculations performed frequently (>1000x/day)
•Historical snapshots needed for audit/compliance
•Always document the performance justification

Index Strategy Cheat Sheet

Always Index

•Primary keys (automatic in most RDBMS)
•Foreign keys (not automatic in all RDBMS)
•Columns in WHERE clauses (high selectivity)
•Columns in JOIN conditions
•Columns in ORDER BY (if used frequently)

Consider Indexing

•Columns in GROUP BY
•Columns with high cardinality (many unique values)
•Frequently filtered date ranges

Never Index

•Small tables (<1000 rows)
•Columns with low cardinality (gender, boolean)
•Frequently updated columns (index maintenance cost)
•Wide text columns (use full-text search instead)

Composite Index Guidelines

sql

-- Order matters! Most selective first
CREATE INDEX idx_user_activity
ON user_events(user_id, event_type, created_at);

-- This query uses the index efficiently:
WHERE user_id = ? AND event_type = ? AND created_at > ?

-- This query only uses first part:
WHERE user_id = ?

-- This query CANNOT use the index:
WHERE event_type = ? AND created_at > ?

Index Types Quick Guide

•B-Tree (default): Equality, range queries, sorting
•Hash: Equality only, faster than B-Tree for exact matches
•GIN/GiST: Full-text search, JSON, arrays, geometric data
•Partial: Index subset of rows (e.g., WHERE deleted_at IS NULL)
•Covering: Include extra columns to avoid table lookup

Query Optimization Checklist

Before Writing Queries

• Understand the access pattern (read/write ratio)
• Know the data volume (current and projected)
• Identify the most selective filter (start here)
• Plan JOIN order (smallest result set first)

While Writing Queries

• Use explicit JOIN syntax (not implicit comma joins)
• Filter early (WHERE before JOIN when possible)
• Avoid SELECT * (specify needed columns)
• Use appropriate JOIN type (INNER vs LEFT vs RIGHT)
• Consider EXISTS vs IN for subqueries
• Limit result sets with WHERE, not application code

After Writing Queries

• Run EXPLAIN/EXPLAIN ANALYZE
• Check for sequential scans on large tables
• Verify index usage on JOIN conditions
• Look for nested loops on large datasets
• Ensure cardinality estimates are accurate
• Test with production-like data volume

Common Anti-Patterns to Avoid

sql

-- ❌ N+1 Query Problem
SELECT * FROM users;
-- Then for each user:
SELECT * FROM orders WHERE user_id = ?;

-- ✅ Use JOIN instead
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- ❌ Function on indexed column prevents index use
WHERE YEAR(created_at) = 2024

-- ✅ Rewrite to preserve index
WHERE created_at >= '2024-01-01'
  AND created_at < '2025-01-01'

-- ❌ OR on different columns prevents index use
WHERE email = ? OR username = ?

-- ✅ Use UNION if both columns are indexed
SELECT * FROM users WHERE email = ?
UNION
SELECT * FROM users WHERE username = ?

-- ❌ Implicit type conversion
WHERE user_id = '123'  -- user_id is INT

-- ✅ Use correct type
WHERE user_id = 123

Data Integrity Patterns

Constraint Hierarchy

•NOT NULL: Required fields
•UNIQUE: Enforce uniqueness at DB level
•CHECK: Business rule validation
•FOREIGN KEY: Referential integrity
•PRIMARY KEY: Combines NOT NULL + UNIQUE

Referential Integrity Actions

sql

-- Prevent deletion if referenced
ON DELETE RESTRICT  -- Default, explicit is better

-- Delete dependent rows automatically
ON DELETE CASCADE  -- Use carefully, can cascade widely

-- Set FK to NULL when parent deleted
ON DELETE SET NULL  -- Parent must allow NULL

-- Prevent orphans without cascade
ON DELETE NO ACTION  -- Similar to RESTRICT

Soft Delete Pattern

sql

-- Add columns
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
ALTER TABLE users ADD COLUMN deleted_by BIGINT;

-- Create partial index for active records only
CREATE INDEX idx_active_users
ON users(email)
WHERE deleted_at IS NULL;

-- Queries always filter
SELECT * FROM users WHERE deleted_at IS NULL;

-- Instead of DELETE
UPDATE users
SET deleted_at = NOW(), deleted_by = ?
WHERE id = ?;

Audit Trail Pattern

sql

-- Audit table captures all changes
CREATE TABLE user_audit (
  audit_id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL,
  operation CHAR(1) NOT NULL,  -- I, U, D
  changed_at TIMESTAMP NOT NULL DEFAULT NOW(),
  changed_by BIGINT,
  old_values JSONB,
  new_values JSONB
);

-- Trigger to populate audit table
CREATE TRIGGER user_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_user_changes();

Migration Patterns

Zero-Downtime Migration Strategy

•Additive changes first: Add new columns/tables
•Dual write: Write to both old and new schema
•Backfill data: Populate new schema from old
•Dual read: Read from new schema with fallback
•Remove old schema: After validation period

Safe Column Addition

sql

-- Step 1: Add column with default (fast, no rewrite)
ALTER TABLE users
ADD COLUMN status VARCHAR(20) DEFAULT 'active' NOT NULL;

-- Step 2: Update existing rows if needed (in batches)
UPDATE users
SET status = CASE
  WHEN email_verified THEN 'active'
  ELSE 'pending'
END
WHERE id >= ? AND id < ?;

-- Step 3: Remove default after backfill (allows NULLs for new rows)
ALTER TABLE users ALTER COLUMN status DROP DEFAULT;

Safe Column Removal

sql

-- Step 1: Stop writing to column (code deploy)
-- Step 2: Wait 1+ deployment cycles
-- Step 3: Drop column (can cause table rewrite in some RDBMS)
ALTER TABLE users DROP COLUMN old_field;

Renaming Strategy

sql

-- Don't rename! Instead:
-- 1. Add new column
-- 2. Dual write to both
-- 3. Backfill
-- 4. Switch reads
-- 5. Remove old column

-- If you must rename immediately (small table):
BEGIN;
ALTER TABLE users RENAME COLUMN old_name TO new_name;
-- Update all application code simultaneously
COMMIT;

Transaction Management

ACID Review

•Atomicity: All or nothing
•Consistency: Valid state transitions
•Isolation: Concurrent transactions don't interfere
•Durability: Committed data survives crashes

Isolation Levels (from weakest to strongest)

sql

-- Read Uncommitted: Dirty reads possible
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- Read Committed: Most common default
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read: Prevents non-repeatable reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Serializable: Strongest, can cause deadlocks
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Transaction Best Practices

•Keep transactions as short as possible
•Acquire locks in consistent order (prevent deadlocks)
•Don't perform I/O inside transactions
•Use appropriate isolation level (not always serializable)
•Handle deadlocks with retry logic

Deadlock Prevention

sql

-- ❌ Can deadlock if two processes reverse order
-- Process A:
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- Process B:
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;

-- ✅ Always lock in same order (by ID)
UPDATE accounts SET balance = balance + CASE
  WHEN id = 1 THEN -100
  WHEN id = 2 THEN 100
END
WHERE id IN (1, 2)
ORDER BY id;  -- Consistent lock order

Security Best Practices

Access Control

sql

-- Principle of least privilege
CREATE ROLE app_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;

CREATE ROLE app_readwrite;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO app_readwrite;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO app_readwrite;

-- Never grant DELETE to application roles
-- Deletions should go through admin processes or soft delete

Row-Level Security (RLS)

sql

-- Enable RLS on table
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Policy: users see only their own documents
CREATE POLICY user_documents ON documents
FOR SELECT
USING (user_id = current_user_id());

-- Policy: admins see everything
CREATE POLICY admin_documents ON documents
FOR ALL
USING (is_admin());

Data Encryption

•Encrypt sensitive data at rest (database level)
•Encrypt in transit (SSL/TLS connections)
•Never store passwords in plain text (use bcrypt, argon2)
•Store API keys in encrypted columns
•Consider field-level encryption for PII

SQL Injection Prevention

sql

-- ❌ NEVER construct queries with string concatenation
query = "SELECT * FROM users WHERE email = '" + user_input + "'"

-- ✅ ALWAYS use parameterized queries
query = "SELECT * FROM users WHERE email = ?"
params = [user_input]

Performance Monitoring

Key Metrics to Track

•Query execution time (p50, p95, p99)
•Slow query log (queries >100ms)
•Connection pool utilization
•Cache hit ratio (>90% ideal)
•Index usage statistics
•Table bloat (vacuum effectiveness)
•Lock wait times
•Deadlock frequency

Query Analysis

sql

-- PostgreSQL: Find slow queries
SELECT
  calls,
  total_time,
  mean_time,
  query
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- MySQL: Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;  -- 100ms

-- Check index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,  -- Number of index scans
  idx_tup_read  -- Number of index entries returned
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

When to Use Sequential Thinking

Database design often requires deep reasoning about trade-offs. Use the mcp__sequential-thinking__sequentialthinking tool when:

Schema Design Decisions

•Evaluating normalization vs denormalization trade-offs
•Choosing between different data modeling approaches
•Deciding on partitioning strategies for large tables
•Planning complex migration paths

Performance Optimization

•Analyzing query execution plans with multiple optimization options
•Designing composite indexes with multiple column orderings
•Evaluating caching strategies vs query optimization
•Balancing read performance vs write performance

Data Integrity Design

•Designing constraint strategies that balance integrity and flexibility
•Planning soft delete vs hard delete implications
•Evaluating cascade delete vs manual cleanup approaches

Transaction Design

•Choosing appropriate isolation levels for complex business logic
•Designing retry strategies for deadlock scenarios
•Planning distributed transaction patterns

Example Sequential Thinking Prompt

code

I need to design a schema for a multi-tenant SaaS application with these requirements:
- 1000+ tenants, average 10k records per tenant
- Strong data isolation between tenants required for compliance
- 90% of queries filter by tenant_id
- Need to support per-tenant schema customization in the future

Should I use:
1. Shared schema with tenant_id column
2. Separate schema per tenant
3. Separate database per tenant

Walk through the trade-offs of each approach considering:
- Query performance and indexing
- Data isolation and security
- Operational complexity
- Future extensibility
- Cost at scale

Reference Files

Detailed patterns and examples available in:

•schema-design.md: Normalization, denormalization, relationships, data types
•query-optimization.md: Indexes, query plans, performance tuning, caching
•data-integrity.md: Constraints, transactions, validation, audit trails

Integration with PACT Workflow

This skill supports the CODE phase for database implementation:

Input from ARCHITECT Phase

•Entity-relationship diagrams
•Data model specifications
•Access pattern requirements
•Performance SLAs
•Security requirements

Output for TEST Phase

•Database schema DDL scripts
•Sample data DML scripts
•Query examples for common access patterns
•Index definitions with justification
•Migration scripts
•Performance baseline metrics

Quality Gates

•All tables have primary keys
•All foreign keys have indexes
•All queries have EXPLAIN plans reviewed
•All migrations tested on copy of production data
•All sensitive data has encryption plan
•All access patterns have appropriate indexes

Common Pitfalls

Design Pitfalls

•Over-normalization causing excessive JOINs
•Under-normalization causing data anomalies
•Using wrong data types (especially for money, dates)
•Not planning for NULL handling
•Forgetting time zone handling

Implementation Pitfalls

•Adding indexes without measuring impact
•Not testing migrations on production-size data
•Using SELECT * in application code
•Not handling transaction deadlocks
•Performing I/O operations inside transactions

Security Pitfalls

•Granting excessive permissions to application users
•Not encrypting sensitive data at rest
•Using string concatenation for queries (SQL injection)
•Logging sensitive data in plain text
•Not implementing row-level security for multi-tenant apps

Performance Pitfalls

•Not analyzing query execution plans
•Creating too many indexes (write performance suffers)
•Using functions on indexed columns in WHERE clauses
•Not batching large updates
•Not monitoring connection pool exhaustion

Success Criteria

A well-implemented database solution demonstrates:

•Data Integrity: Constraints enforce business rules at DB level
•Performance: Query response times meet SLAs (usually <100ms)
•Security: Least privilege access, encryption for sensitive data
•Maintainability: Clear schema documentation, standard patterns
•Scalability: Design handles 10x current data volume
•Reliability: Backup/recovery tested, transaction handling robust
•Observability: Query performance monitored, slow queries identified

This skill is part of the PACT Framework for principled AI-assisted development.