Database Modelling
Description
Design database schemas that balance normalisation, query performance, and evolvability — regardless of whether you use a relational database (PostgreSQL, SQL Server, MySQL), a document store (MongoDB, CosmosDB), or a combination. This skill covers the universal data modelling principles: entity identification, relationship mapping, indexing strategy, migration discipline, and the trade-offs between normalisation and denormalisation.
When To Use
- •Starting a new feature that requires persistent data
- •Designing the initial schema for a new service or application
- •Adding a relationship between existing entities
- •Queries are slow and you suspect missing or incorrect indexes
- •Evaluating whether to normalise or denormalise a data structure
- •Planning a schema migration on a live database
Prerequisites
- •Basic SQL knowledge (SELECT, JOIN, WHERE, CREATE TABLE)
- •Understanding of primary keys, foreign keys, and constraints
- •Familiarity with at least one database system
Instructions
1. Identify Entities and Relationships
Start with the domain, not the database. Map out:
- •Entities — the nouns: User, Order, Product, Invoice
- •Attributes — the data each entity holds
- •Relationships — how entities relate: a User has many Orders, an Order belongs to a User, an Order has many OrderItems
Classify relationships:
| Type | Example | Implementation |
|---|---|---|
| One-to-One | User → Profile | FK on either side, or same table |
| One-to-Many | User → Orders | FK on the "many" side |
| Many-to-Many | Order → Products | Junction/join table (OrderItems) |
2. Normalise First, Denormalise with Purpose
Start normalised (3NF) — every fact stored once:
-- ✅ Normalised: customer data in one place CREATE TABLE customers ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email TEXT NOT NULL UNIQUE, name TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), customer_id UUID NOT NULL REFERENCES customers(id), status TEXT NOT NULL DEFAULT 'pending', total NUMERIC(12,2) NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now() );
Denormalise only when:
- •A read-heavy query joins 5+ tables and performance is proven insufficient
- •You need a materialised view or cache table for reporting
- •You're using a document store where embedding is the idiomatic pattern
Always document why you denormalised. Future developers will see the duplication and try to "fix" it.
3. Choose the Right Primary Keys
| Strategy | Pros | Cons |
|---|---|---|
| UUID (v4/v7) | Globally unique, no coordination needed, safe to expose | Larger, poor index locality (v4). Use UUIDv7 for time-ordered. |
| Auto-increment | Compact, fast inserts, natural ordering | Leaks count, requires coordination in distributed systems |
| ULID / KSUID | Time-ordered + unique, compacts well | Less ecosystem support |
Recommendation: UUIDv7 or ULID for new systems. They combine global uniqueness with time-ordering for index efficiency. Never expose auto-increment IDs externally.
4. Design Indexes Intentionally
Indexes are the difference between a 5ms query and a 5s query:
-- Index columns that appear in WHERE, JOIN, and ORDER BY CREATE INDEX idx_orders_customer_id ON orders(customer_id); CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC); -- Partial index for active records only CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending'; -- Covering index to avoid table lookups CREATE INDEX idx_orders_summary ON orders(customer_id, status, total);
Indexing rules of thumb:
- •Every foreign key should be indexed. Without it,
JOINandON DELETE CASCADEdo full table scans. - •Composite indexes: Put high-cardinality columns first. An index on
(status, date)works forWHERE status = 'active' AND date > '...'but not forWHERE date > '...'alone. - •Don't over-index. Each index slows writes. Benchmark before adding.
- •Use
EXPLAIN ANALYZEto verify your indexes are actually being used.
5. Plan for Schema Evolution
Your schema will change. Plan for safe migrations:
- •Always use migration files. Never modify a database manually in production. Use a migration tool (Flyway, Liquibase, Prisma Migrate, EF Migrations, Alembic).
- •Make migrations backwards-compatible when possible — the old code should still work against the new schema during deployment.
Safe migration patterns:
| Change | Safe approach |
|---|---|
| Add column | ALTER TABLE ADD COLUMN ... DEFAULT ... (not null requires default) |
| Remove column | Stop reading it first → deploy → then drop the column |
| Rename column | Add new column → backfill → update code → drop old column |
| Change type | Add new column → backfill with cast → update code → drop old |
| Add constraint | Add as NOT VALID → validate separately (avoids full table lock) |
- •Never rename or drop a column in the same deploy that changes the code. The old code is still running during rollout.
6. Handle Soft Deletes and Temporal Data
Decide early whether records are hard-deleted or soft-deleted:
-- Soft delete pattern ALTER TABLE orders ADD COLUMN deleted_at TIMESTAMPTZ; -- All queries must filter: WHERE deleted_at IS NULL -- Consider a view for convenience: CREATE VIEW active_orders AS SELECT * FROM orders WHERE deleted_at IS NULL;
Trade-offs:
- •Soft delete: Keeps audit trail, enables undo, but complicates every query and unique constraints
- •Hard delete: Simpler queries, but data is gone (use an audit/event log instead for history)
For time-sensitive data (prices, configurations), consider temporal patterns (valid_from/valid_to) or event sourcing.
Best Practices
- •Use constraints aggressively.
NOT NULL,UNIQUE,CHECK, andFOREIGN KEYconstraints catch bugs at the database level before they reach your application. - •Store timestamps in UTC with timezone (
TIMESTAMPTZ). Convert to local time at the presentation layer. - •Use appropriate data types.
NUMERIC/DECIMALfor money (notFLOAT),TEXToverVARCHAR(n)in PostgreSQL,JSONBfor semi-structured data when schema flexibility is needed. - •Name things consistently.
snake_casefor tables and columns, singular or plural table names (pick one), descriptive foreign keys (customer_idnotcid). - •Add
created_atandupdated_atto every table. You'll need them for debugging, auditing, and ordering. - •Avoid
SELECT *in application code. Specify columns to benefit from covering indexes and reduce over-fetching.
Common Pitfalls
- •Missing indexes on foreign keys. This is the #1 cause of slow queries in production. Every FK needs an index.
- •Storing computed values without a sync strategy. An
order_countcolumn on the user table is stale the moment a new order is inserted unless you maintain it with triggers or application logic. - •Polymorphic associations. A
commentable_type+commentable_idpattern breaks foreign key constraints. Prefer separate join tables or table inheritance. - •Premature denormalisation. "We might need it to be fast" is not a measured performance problem. Start normalised and optimise with evidence.
- •N+1 queries at the application layer. Not a schema problem per se, but poor schema design (missing relationships, no eager-load hints) makes N+1 more likely. Know your ORM's loading behaviour.
- •Running migrations without a rollback plan. Every migration should be reversible. Test the rollback before deploying.