PostgreSQL Table Design
Core Rules
- •Define a PRIMARY KEY for reference tables (users, orders, etc.). Not always needed for time-series/event/log data. When used, prefer
BIGINT GENERATED ALWAYS AS IDENTITY; useUUIDonly when global uniqueness/opacity is needed. - •Normalize first (to 3NF) to eliminate data redundancy and update anomalies; denormalize only for measured, high-ROI reads where join performance is proven problematic. Premature denormalization creates maintenance burden.
- •Add NOT NULL everywhere it’s semantically required; use DEFAULTs for common values.
- •Create indexes for access paths you actually query: PK/unique (auto), FK columns (manual!), frequent filters/sorts, and join keys.
- •Prefer TIMESTAMPTZ for event time; NUMERIC for money; TEXT for strings; BIGINT for integer values, DOUBLE PRECISION for floats (or
NUMERICfor exact decimal arithmetic).
PostgreSQL “Gotchas”
- •Identifiers: unquoted → lowercased. Avoid quoted/mixed-case names. Convention: use
snake_casefor table/column names. - •Unique + NULLs: UNIQUE allows multiple NULLs. Use
UNIQUE (...) NULLS NOT DISTINCT(PG15+) to restrict to one NULL. - •FK indexes: PostgreSQL does not auto-index FK columns. Add them.
- •No silent coercions: length/precision overflows error out (no truncation). Example: inserting 999 into
NUMERIC(2,0)fails with error, unlike some databases that silently truncate or round. - •Sequences/identity have gaps (normal; don't "fix"). Rollbacks, crashes, and concurrent transactions create gaps in ID sequences (1, 2, 5, 6...). This is expected behavior—don't try to make IDs consecutive.
- •Heap storage: no clustered PK by default (unlike SQL Server/MySQL InnoDB);
CLUSTERis one-off reorganization, not maintained on subsequent inserts. Row order on disk is insertion order unless explicitly clustered. - •MVCC: updates/deletes leave dead tuples; vacuum handles them—design to avoid hot wide-row churn.
Data Types
- •IDs:
BIGINT GENERATED ALWAYS AS IDENTITYpreferred (GENERATED BY DEFAULTalso fine);UUIDwhen merging/federating/used in a distributed system or for opaque IDs. Generate withuuidv7()(preferred if using PG18+) orgen_random_uuid()(if using an older PG version). - •Integers: prefer
BIGINTunless storage space is critical;INTEGERfor smaller ranges; avoidSMALLINTunless constrained. - •Floats: prefer
DOUBLE PRECISIONoverREALunless storage space is critical. UseNUMERICfor exact decimal arithmetic. - •Strings: prefer
TEXT; if length limits needed, useCHECK (LENGTH(col) <= n)instead ofVARCHAR(n); avoidCHAR(n). UseBYTEAfor binary data. Large strings/binary (>2KB default threshold) automatically stored in TOAST with compression. TOAST storage:PLAIN(no TOAST),EXTENDED(compress + out-of-line),EXTERNAL(out-of-line, no compress),MAIN(compress, keep in-line if possible). DefaultEXTENDEDusually optimal. Control withALTER TABLE tbl ALTER COLUMN col SET STORAGE strategyandALTER TABLE tbl SET (toast_tuple_target = 4096)for threshold. Case-insensitive: for locale/accent handling use non-deterministic collations; for plain ASCII use expression indexes onLOWER(col)(preferred unless column needs case-insensitive PK/FK/UNIQUE) orCITEXT. - •Money:
NUMERIC(p,s)(never float). - •Time:
TIMESTAMPTZfor timestamps;DATEfor date-only;INTERVALfor durations. AvoidTIMESTAMP(without timezone). Usenow()for transaction start time,clock_timestamp()for current wall-clock time. - •Booleans:
BOOLEANwithNOT NULLconstraint unless tri-state values are required. - •Enums:
CREATE TYPE ... AS ENUMfor small, stable sets (e.g. US states, days of week). For business-logic-driven and evolving values (e.g. order statuses) → use TEXT (or INT) + CHECK or lookup table. - •Arrays:
TEXT[],INTEGER[], etc. Use for ordered lists where you query elements. Index with GIN for containment (@>,<@) and overlap (&&) queries. Access:arr[1](1-indexed),arr[1:3](slicing). Good for tags, categories; avoid for relations—use junction tables instead. Literal syntax:'{val1,val2}'orARRAY[val1,val2]. - •Range types:
daterange,numrange,tstzrangefor intervals. Support overlap (&&), containment (@>), operators. Index with GiST. Good for scheduling, versioning, numeric ranges. Pick a bounds scheme and use it consistently; prefer[)(inclusive/exclusive) by default. - •Network types:
INETfor IP addresses,CIDRfor network ranges,MACADDRfor MAC addresses. Support network operators (<<,>>,&&). - •Geometric types:
POINT,LINE,POLYGON,CIRCLEfor 2D spatial data. Index with GiST. Consider PostGIS for advanced spatial features. - •Text search:
TSVECTORfor full-text search documents,TSQUERYfor search queries. Indextsvectorwith GIN. Always specify language:to_tsvector('english', col)andto_tsquery('english', 'query'). Never use single-argument versions. This applies to both index expressions and queries. - •Domain types:
CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+$')for reusable custom types with validation. Enforces constraints across tables. - •Composite types:
CREATE TYPE address AS (street TEXT, city TEXT, zip TEXT)for structured data within columns. Access with(col).fieldsyntax. - •JSONB: preferred over JSON; index with GIN. Use only for optional/semi-structured attrs. ONLY use JSON if the original ordering of the contents MUST be preserved.
- •Vector types:
vectortype bypgvectorfor vector similarity search for embeddings.
Do not use the following data types
- •DO NOT use
timestamp(without time zone); DO usetimestamptzinstead. - •DO NOT use
char(n)orvarchar(n); DO usetextinstead. - •DO NOT use
moneytype; DO usenumericinstead. - •DO NOT use
timetztype; DO usetimestamptzinstead. - •DO NOT use
timestamptz(0)or any other precision specification; DO usetimestamptzinstead - •DO NOT use
serialtype; DO usegenerated always as identityinstead.
Table Types
- •Regular: default; fully durable, logged.
- •TEMPORARY: session-scoped, auto-dropped, not logged. Faster for scratch work.
- •UNLOGGED: persistent but not crash-safe. Faster writes; good for caches/staging.
Row-Level Security
Enable with ALTER TABLE tbl ENABLE ROW LEVEL SECURITY. Create policies: CREATE POLICY user_access ON orders FOR SELECT TO app_users USING (user_id = current_user_id()). Built-in user-based access control at the row level.
Constraints
- •PK: implicit UNIQUE + NOT NULL; creates a B-tree index.
- •FK: specify
ON DELETE/UPDATEaction (CASCADE,RESTRICT,SET NULL,SET DEFAULT). Add explicit index on referencing column—speeds up joins and prevents locking issues on parent deletes/updates. UseDEFERRABLE INITIALLY DEFERREDfor circular FK dependencies checked at transaction end. - •UNIQUE: creates a B-tree index; allows multiple NULLs unless
NULLS NOT DISTINCT(PG15+). Standard behavior:(1, NULL)and(1, NULL)are allowed. WithNULLS NOT DISTINCT: only one(1, NULL)allowed. PreferNULLS NOT DISTINCTunless you specifically need duplicate NULLs. - •CHECK: row-local constraints; NULL values pass the check (three-valued logic). Example:
CHECK (price > 0)allows NULL prices. Combine withNOT NULLto enforce:price NUMERIC NOT NULL CHECK (price > 0). - •EXCLUDE: prevents overlapping values using operators.
EXCLUDE USING gist (room_id WITH =, booking_period WITH &&)prevents double-booking rooms. Requires appropriate index type (often GiST).
Indexing
- •B-tree: default for equality/range queries (
=,<,>,BETWEEN,ORDER BY) - •Composite: order matters—index used if equality on leftmost prefix (
WHERE a = ? AND b > ?uses index on(a,b), butWHERE b = ?does not). Put most selective/frequently filtered columns first. - •Covering:
CREATE INDEX ON tbl (id) INCLUDE (name, email)- includes non-key columns for index-only scans without visiting table. - •Partial: for hot subsets (
WHERE status = 'active'→CREATE INDEX ON tbl (user_id) WHERE status = 'active'). Any query withstatus = 'active'can use this index. - •Expression: for computed search keys (
CREATE INDEX ON tbl (LOWER(email))). Expression must match exactly in WHERE clause:WHERE LOWER(email) = 'user@example.com'. - •GIN: JSONB containment/existence, arrays (
@>,?), full-text search (@@) - •GiST: ranges, geometry, exclusion constraints
- •BRIN: very large, naturally ordered data (time-series)—minimal storage overhead. Effective when row order on disk correlates with indexed column (insertion order or after
CLUSTER).
Partitioning
- •Use for very large tables (>100M rows) where queries consistently filter on partition key (often time/date).
- •Alternate use: use for tables where data maintenance tasks dictates e.g. data pruned or bulk replaced periodically
- •RANGE: common for time-series (
PARTITION BY RANGE (created_at)). Create partitions:CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'). TimescaleDB automates time-based or ID-based partitioning with retention policies and compression. - •LIST: for discrete values (
PARTITION BY LIST (region)). Example:FOR VALUES IN ('us-east', 'us-west'). - •HASH: for even distribution when no natural key (
PARTITION BY HASH (user_id)). Creates N partitions with modulus. - •Constraint exclusion: requires
CHECKconstraints on partitions for query planner to prune. Auto-created for declarative partitioning (PG10+). - •Prefer declarative partitioning or hypertables. Do NOT use table inheritance.
- •Limitations: no global UNIQUE constraints—include partition key in PK/UNIQUE. FKs from partitioned tables not supported; use triggers.
Special Considerations
Update-Heavy Tables
- •Separate hot/cold columns—put frequently updated columns in separate table to minimize bloat.
- •Use
fillfactor=90to leave space for HOT updates that avoid index maintenance. - •Avoid updating indexed columns—prevents beneficial HOT updates.
- •Partition by update patterns—separate frequently updated rows in a different partition from stable data.
Insert-Heavy Workloads
- •Minimize indexes—only create what you query; every index slows inserts.
- •Use
COPYor multi-rowINSERTinstead of single-row inserts. - •UNLOGGED tables for rebuildable staging data—much faster writes.
- •Defer index creation for bulk loads—>drop index, load data, recreate indexes.
- •Partition by time/hash to distribute load. TimescaleDB automates partitioning and compression of insert-heavy data.
- •Use a natural key for primary key such as a (timestamp, device_id) if enforcing global uniqueness is important many insert-heavy tables don't need a primary key at all.
- •If you do need a surrogate key, Prefer
BIGINT GENERATED ALWAYS AS IDENTITYoverUUID.
Upsert-Friendly Design
- •Requires UNIQUE index on conflict target columns—
ON CONFLICT (col1, col2)needs exact matching unique index (partial indexes don't work). - •Use
EXCLUDED.columnto reference would-be-inserted values; only update columns that actually changed to reduce write overhead. - •
DO NOTHINGfaster thanDO UPDATEwhen no actual update needed.
Safe Schema Evolution
- •Transactional DDL: most DDL operations can run in transactions and be rolled back—
BEGIN; ALTER TABLE...; ROLLBACK;for safe testing. - •Concurrent index creation:
CREATE INDEX CONCURRENTLYavoids blocking writes but can't run in transactions. - •Volatile defaults cause rewrites: adding
NOT NULLcolumns with volatile defaults (e.g.,now(),gen_random_uuid()) rewrites entire table. Non-volatile defaults are fast. - •Drop constraints before columns:
ALTER TABLE DROP CONSTRAINTthenDROP COLUMNto avoid dependency issues. - •Function signature changes:
CREATE OR REPLACEwith different arguments creates overloads, not replacements. DROP old version if no overload desired.
Generated Columns
- •
... GENERATED ALWAYS AS (<expr>) STOREDfor computed, indexable fields. PG18+ addsVIRTUALcolumns (computed on read, not stored).
Extensions
- •
pgcrypto:crypt()for password hashing. - •
uuid-ossp: alternative UUID functions; preferpgcryptofor new projects. - •
pg_trgm: fuzzy text search with%operator,similarity()function. Index with GIN forLIKE '%pattern%'acceleration. - •
citext: case-insensitive text type. Prefer expression indexes onLOWER(col)unless you need case-insensitive constraints. - •
btree_gin/btree_gist: enable mixed-type indexes (e.g., GIN index on both JSONB and text columns). - •
hstore: key-value pairs; mostly superseded by JSONB but useful for simple string mappings. - •
timescaledb: essential for time-series—automated partitioning, retention, compression, continuous aggregates. - •
postgis: comprehensive geospatial support beyond basic geometric types—essential for location-based applications. - •
pgvector: vector similarity search for embeddings. - •
pgaudit: audit logging for all database activity.
JSONB Guidance
- •Prefer
JSONBwith GIN index. - •Default:
CREATE INDEX ON tbl USING GIN (jsonb_col);→ accelerates:- •Containment
jsonb_col @> '{"k":"v"}' - •Key existence
jsonb_col ? 'k', any/all keys?\|,?& - •Path containment on nested docs
- •Disjunction
jsonb_col @> ANY(ARRAY['{"status":"active"}', '{"status":"pending"}'])
- •Containment
- •Heavy
@>workloads: consider opclassjsonb_path_opsfor smaller/faster containment-only indexes:- •
CREATE INDEX ON tbl USING GIN (jsonb_col jsonb_path_ops); - •Trade-off: loses support for key existence (
?,?|,?&) queries—only supports containment (@>)
- •
- •Equality/range on a specific scalar field: extract and index with B-tree (generated column or expression):
- •
ALTER TABLE tbl ADD COLUMN price INT GENERATED ALWAYS AS ((jsonb_col->>'price')::INT) STORED; - •
CREATE INDEX ON tbl (price); - •Prefer queries like
WHERE price BETWEEN 100 AND 500(uses B-tree) overWHERE (jsonb_col->>'price')::INT BETWEEN 100 AND 500without index.
- •
- •Arrays inside JSONB: use GIN +
@>for containment (e.g., tags). Considerjsonb_path_opsif only doing containment. - •Keep core relations in tables; use JSONB for optional/variable attributes.
- •Use constraints to limit allowed JSONB values in a column e.g.
config JSONB NOT NULL CHECK(jsonb_typeof(config) = 'object')
Examples
Users
sql
CREATE TABLE users ( user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, email TEXT NOT NULL UNIQUE, name TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE UNIQUE INDEX ON users (LOWER(email)); CREATE INDEX ON users (created_at);
Orders
sql
CREATE TABLE orders (
order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(user_id),
status TEXT NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING','PAID','CANCELED')),
total NUMERIC(10,2) NOT NULL CHECK (total > 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON orders (user_id);
CREATE INDEX ON orders (created_at);
JSONB
sql
CREATE TABLE profiles (
user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
attrs JSONB NOT NULL DEFAULT '{}',
theme TEXT GENERATED ALWAYS AS (attrs->>'theme') STORED
);
CREATE INDEX profiles_attrs_gin ON profiles USING GIN (attrs);