Database Design - Schema, Indexes, and Query Optimization
When to use this skill
- •Designing new database schemas and data models
- •Planning table relationships and foreign key constraints
- •Creating indexes to optimize query performance
- •Defining database constraints and validation rules
- •Designing for scalability and future growth
- •Planning database migration strategies
- •Choosing between SQL and NoSQL databases
- •Implementing database sharding or partitioning
- •Normalizing data to reduce redundancy
- •Optimizing slow database queries
- •Designing audit trails and soft deletes
- •Planning backup and disaster recovery strategies
When to use this skill
- •Designing database schemas, choosing between SQL/NoSQL, optimizing queries, planning migrations, or solving data modeling problems.
- •When working on related tasks or features
- •During development that requires this expertise
Use when: Designing database schemas, choosing between SQL/NoSQL, optimizing queries, planning migrations, or solving data modeling problems.
Core Principles
- •Normalize First, Denormalize When Needed - Start with proper normalization
- •Index Strategically - Balance query speed vs write overhead
- •Design for Scale - Consider growth patterns from day one
- •Choose the Right Tool - SQL vs NoSQL depends on use case
- •Migrations Are Forever - Plan schema changes carefully
Schema Design Fundamentals
1. Normalization (Reducing Redundancy)
sql
-- ❌ Denormalized - data duplication CREATE TABLE orders ( id INT PRIMARY KEY, customer_name VARCHAR(100), customer_email VARCHAR(100), customer_address TEXT, -- Repeated for every order! product_name VARCHAR(200), product_price DECIMAL(10,2), -- Duplicated product data quantity INT ); -- ✅ Normalized - separate concerns CREATE TABLE customers ( id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, email VARCHAR(100) UNIQUE NOT NULL, address TEXT ); CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(200) NOT NULL, price DECIMAL(10,2) NOT NULL, stock INT DEFAULT 0 ); CREATE TABLE orders ( id INT PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers(id), created_at TIMESTAMP DEFAULT NOW(), status VARCHAR(20) DEFAULT 'pending' ); CREATE TABLE order_items ( id INT PRIMARY KEY, order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE, product_id INT NOT NULL REFERENCES products(id), quantity INT NOT NULL CHECK (quantity > 0), price_at_purchase DECIMAL(10,2) NOT NULL -- Snapshot for history );
2. Primary Keys & Foreign Keys
sql
-- ✅ Auto-incrementing integer (traditional) CREATE TABLE users ( id SERIAL PRIMARY KEY, -- PostgreSQL -- id INT AUTO_INCREMENT PRIMARY KEY, -- MySQL email VARCHAR(255) UNIQUE NOT NULL ); -- ✅ UUID (distributed systems, no conflicts) CREATE TABLE posts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id), title VARCHAR(200) NOT NULL ); -- ✅ Composite primary key (junction tables) CREATE TABLE user_roles ( user_id INT NOT NULL REFERENCES users(id), role_id INT NOT NULL REFERENCES roles(id), granted_at TIMESTAMP DEFAULT NOW(), PRIMARY KEY (user_id, role_id) ); -- ✅ Foreign key with cascade options CREATE TABLE comments ( id INT PRIMARY KEY, post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE, -- Delete comments when post deleted user_id INT NOT NULL REFERENCES users(id) ON DELETE SET NULL, -- Keep comment, nullify user content TEXT NOT NULL );
3. Indexes for Performance
sql
-- ✅ Single column index (frequently queried)
CREATE INDEX idx_users_email ON users(email);
-- ✅ Composite index (multi-column queries)
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at DESC);
-- ✅ Partial index (conditional)
CREATE INDEX idx_active_users
ON users(email) WHERE status = 'active';
-- ✅ Full-text search index
CREATE INDEX idx_posts_search
ON posts USING GIN(to_tsvector('english', title || ' ' || content));
-- ✅ Unique index (enforce constraint + speed lookups)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- ❌ Over-indexing (slows writes)
-- Don't index every column blindly
-- Indexes cost disk space and slow INSERT/UPDATE/DELETE
Index Strategy:
code
Index when: ✓ Frequently used in WHERE clauses ✓ Used in JOIN conditions ✓ Used in ORDER BY / GROUP BY ✓ High cardinality (many unique values) Don't index when: ✗ Small tables (< 1000 rows) ✗ Low cardinality (e.g., boolean, status with 2-3 values) ✗ Frequently updated columns (unless reads >> writes) ✗ Already covered by composite index
4. Relationships
sql
-- ✅ One-to-Many: User has many posts CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100)); CREATE TABLE posts ( id INT PRIMARY KEY, user_id INT NOT NULL REFERENCES users(id), title VARCHAR(200) ); -- ✅ Many-to-Many: Users ↔ Roles (junction table) CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100)); CREATE TABLE roles (id INT PRIMARY KEY, name VARCHAR(50)); CREATE TABLE user_roles ( user_id INT REFERENCES users(id), role_id INT REFERENCES roles(id), PRIMARY KEY (user_id, role_id) ); -- ✅ One-to-One: User has one profile CREATE TABLE users (id INT PRIMARY KEY, email VARCHAR(255)); CREATE TABLE profiles ( id INT PRIMARY KEY, user_id INT UNIQUE NOT NULL REFERENCES users(id), -- UNIQUE enforces 1:1 bio TEXT, avatar_url VARCHAR(500) ); -- ✅ Self-referencing: Tree structure CREATE TABLE categories ( id INT PRIMARY KEY, name VARCHAR(100), parent_id INT REFERENCES categories(id) -- Points to itself );
5. Data Types
sql
-- ✅ Choose appropriate types CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(200) NOT NULL, -- Variable length string description TEXT, -- Long text price DECIMAL(10,2) NOT NULL, -- Exact precision for money stock INT DEFAULT 0, -- Whole number weight FLOAT, -- Approximate number is_active BOOLEAN DEFAULT true, -- True/false created_at TIMESTAMP DEFAULT NOW(), -- Date + time metadata JSONB, -- JSON data (PostgreSQL) tags TEXT[] -- Array (PostgreSQL) ); -- ❌ Wrong types -- price FLOAT - ❌ Floating point errors for money -- created_at VARCHAR(50) - ❌ Use proper datetime types -- status INT - ❌ Use ENUM or VARCHAR for readability
SQL vs NoSQL Choice
When to Use SQL (PostgreSQL, MySQL)
code
✓ Complex relationships and joins ✓ ACID transactions required ✓ Structured, predictable data ✓ Complex queries with aggregations ✓ Strong consistency needed ✓ Examples: Financial systems, e-commerce, CRM
When to Use NoSQL
Document Stores (MongoDB, Firestore):
code
✓ Flexible, evolving schema ✓ Nested/hierarchical data ✓ Rapid prototyping ✓ Document-centric access patterns ✓ Examples: CMS, catalogs, user profiles
Key-Value Stores (Redis, DynamoDB):
code
✓ Simple key lookups ✓ Caching layer ✓ Session storage ✓ Rate limiting counters ✓ Examples: Cache, sessions, real-time leaderboards
Time-Series (InfluxDB, TimescaleDB):
code
✓ Time-stamped data ✓ High write throughput ✓ Aggregations over time ✓ Examples: Metrics, logs, IoT sensors
Query Optimization
1. Identify Slow Queries
sql
-- PostgreSQL - explain query plan EXPLAIN ANALYZE SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id, u.name; /* Output shows: - Seq Scan (bad) vs Index Scan (good) - Execution time - Rows scanned vs returned */
2. Avoid N+1 Queries
typescript
// ❌ N+1 Problem - 1 query + N queries for each user
const users = await db.query('SELECT * FROM users');
for (const user of users) {
const posts = await db.query(
'SELECT * FROM posts WHERE user_id = ?',
[user.id]
); // N queries!
}
// ✅ Single query with JOIN
const results = await db.query(`
SELECT
u.id, u.name, u.email,
p.id as post_id, p.title as post_title
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
`);
// ✅ Using Prisma ORM with eager loading
const users = await prisma.user.findMany({
include: {
posts: true // Efficiently loads related posts
}
});
3. Use Efficient Joins
sql
-- ✅ INNER JOIN - only matching rows SELECT u.name, p.title FROM users u INNER JOIN posts p ON p.user_id = u.id; -- ✅ LEFT JOIN - all users, even without posts SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id, u.name; -- ❌ Avoid subqueries in SELECT when possible SELECT u.name, (SELECT COUNT(*) FROM posts WHERE user_id = u.id) as post_count -- Slow! FROM users u; -- ✅ Better: use JOIN SELECT u.name, COUNT(p.id) as post_count FROM users u LEFT JOIN posts p ON p.user_id = u.id GROUP BY u.id, u.name;
4. Pagination
sql
-- ✅ Offset pagination (simple but slower for deep pages) SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 40; -- Page 3 -- ✅ Cursor pagination (faster for deep pages) SELECT * FROM posts WHERE created_at < '2024-01-01 12:00:00' ORDER BY created_at DESC LIMIT 20;
5. Aggregations
sql
-- ✅ Efficient counting SELECT COUNT(*) FROM users WHERE status = 'active'; -- ✅ Grouping with aggregates SELECT category, COUNT(*) as total, AVG(price) as avg_price, MAX(price) as max_price FROM products GROUP BY category HAVING COUNT(*) > 10; -- HAVING filters groups, WHERE filters rows -- ✅ Window functions (PostgreSQL) SELECT name, salary, AVG(salary) OVER (PARTITION BY department) as dept_avg_salary FROM employees;
Migration Strategies
1. Schema Migrations with Prisma
typescript
// schema.prisma
model User {
id Int @id @default(autoincrement())
email String @unique
name String?
posts Post[]
createdAt DateTime @default(now())
}
model Post {
id Int @id @default(autoincrement())
title String
content String?
published Boolean @default(false)
author User @relation(fields: [authorId], references: [id])
authorId Int
createdAt DateTime @default(now())
@@index([authorId])
}
bash
# Generate migration npx prisma migrate dev --name add_post_table # Apply to production npx prisma migrate deploy
2. Safe Migration Patterns
sql
-- ✅ Add column (safe - doesn't lock table) ALTER TABLE users ADD COLUMN phone VARCHAR(20); -- ✅ Add index concurrently (PostgreSQL) CREATE INDEX CONCURRENTLY idx_users_email ON users(email); -- ❌ Risky - locks entire table ALTER TABLE users ALTER COLUMN email TYPE TEXT; -- ✅ Better: Multi-step migration -- Step 1: Add new column ALTER TABLE users ADD COLUMN email_new TEXT; -- Step 2: Backfill data UPDATE users SET email_new = email; -- Step 3: Swap columns (in separate deployment) ALTER TABLE users DROP COLUMN email; ALTER TABLE users RENAME COLUMN email_new TO email;
3. Data Migrations
typescript
// migration-001-backfill-slugs.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function up() {
const posts = await prisma.post.findMany({
where: { slug: null }
});
for (const post of posts) {
await prisma.post.update({
where: { id: post.id },
data: { slug: generateSlug(post.title) }
});
}
}
function generateSlug(title: string) {
return title.toLowerCase().replace(/\s+/g, '-');
}
Advanced Patterns
1. Soft Deletes
sql
-- ✅ Keep deleted records CREATE TABLE posts ( id INT PRIMARY KEY, title VARCHAR(200), deleted_at TIMESTAMP NULL -- NULL = not deleted ); -- Query only active records SELECT * FROM posts WHERE deleted_at IS NULL; -- "Delete" record (soft delete) UPDATE posts SET deleted_at = NOW() WHERE id = 123; -- Create view for convenience CREATE VIEW active_posts AS SELECT * FROM posts WHERE deleted_at IS NULL;
2. Audit Trails
sql
-- ✅ Track all changes CREATE TABLE users_audit ( id SERIAL PRIMARY KEY, user_id INT NOT NULL, field_changed VARCHAR(50), old_value TEXT, new_value TEXT, changed_by INT REFERENCES users(id), changed_at TIMESTAMP DEFAULT NOW() ); -- Trigger to auto-populate CREATE OR REPLACE FUNCTION audit_user_changes() RETURNS TRIGGER AS $$ BEGIN INSERT INTO users_audit (user_id, field_changed, old_value, new_value) VALUES (OLD.id, 'email', OLD.email, NEW.email); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER user_email_audit AFTER UPDATE OF email ON users FOR EACH ROW EXECUTE FUNCTION audit_user_changes();
3. Polymorphic Associations
sql
-- ✅ Comments on multiple resource types CREATE TABLE comments ( id INT PRIMARY KEY, commentable_type VARCHAR(50) NOT NULL, -- 'Post', 'Photo', 'Video' commentable_id INT NOT NULL, content TEXT NOT NULL, -- Composite index for polymorphic lookup INDEX idx_commentable (commentable_type, commentable_id) ); -- Query comments for a post SELECT * FROM comments WHERE commentable_type = 'Post' AND commentable_id = 123;
Database Checklist
code
Schema Design: □ Properly normalized (3NF minimum) □ Primary keys on all tables □ Foreign keys with appropriate CASCADE/SET NULL □ Check constraints for data validation □ NOT NULL constraints where appropriate □ UNIQUE constraints for business rules Indexes: □ Foreign keys indexed □ WHERE clause columns indexed □ JOIN columns indexed □ Composite indexes for multi-column queries □ Partial indexes for filtered queries □ No redundant indexes Performance: □ EXPLAIN ANALYZE on slow queries □ N+1 queries eliminated □ Proper use of JOINs vs subqueries □ Pagination implemented □ Connection pooling configured Migrations: □ All changes in migration files □ Rollback capability tested □ Safe column additions/removals □ Data backfilled before constraints □ Zero-downtime migrations for production Monitoring: □ Slow query log enabled □ Connection pool metrics tracked □ Disk usage monitored □ Replication lag checked (if applicable) □ Regular VACUUM (PostgreSQL)
Resources
Remember: Good database design is the foundation of scalable applications. Invest time upfront in proper schema design and indexing.