PostgreSQL + TimescaleDB Skill
Purpose: Design and optimize PostgreSQL databases for time-series retail data
When to use: Database schema design, migrations, query optimization, time-series operations
Core Competencies
1. Time-Series Data Modeling (TimescaleDB)
Best Practice: Partition by time for retail transaction/inventory data
-- Create hypertable (automatically partitions by time)
CREATE TABLE transactions (
transaction_id UUID PRIMARY KEY,
store_id UUID NOT NULL,
product_id UUID NOT NULL,
timestamp TIMESTAMP NOT NULL,
quantity INT NOT NULL,
total_amount DECIMAL(10,2) NOT NULL
);
-- Convert to hypertable (CRITICAL for time-series performance)
SELECT create_hypertable('transactions', 'timestamp');
-- Retention policy (keep only last 2 years)
SELECT add_retention_policy('transactions', INTERVAL '2 years');
Why TimescaleDB:
- •10-100x faster queries on time-range filters
- •Automatic partitioning (no manual PARTITION BY needed)
- •Compression (70-95% storage savings on old data)
- •Continuous aggregates (pre-computed rollups)
When to use hypertables:
- •✅ Transactions (POS sales)
- •✅ Inventory levels (snapshots over time)
- •✅ Forecasts (predictions over time)
- •❌ Products (relatively static, no time dimension)
- •❌ Stores (metadata, updated rarely)
2. Multi-Tenant Schema Design
Pattern: Single database, row-level security
-- All tables have customer_id
CREATE TABLE products (
product_id UUID PRIMARY KEY,
customer_id UUID NOT NULL, -- Tenant isolation
sku VARCHAR(100) NOT NULL,
name VARCHAR(255) NOT NULL,
-- ... other columns
UNIQUE(customer_id, sku) -- SKUs unique per customer
);
-- Row-level security (RLS)
ALTER TABLE products ENABLE ROW LEVEL SECURITY;
-- Policy: Users only see their customer's data
CREATE POLICY customer_isolation ON products
USING (customer_id = current_setting('app.current_customer_id')::UUID);
-- In application, set customer context per request
SET LOCAL app.current_customer_id = '<customer_uuid>';
Why this approach:
- •✅ Single deployment (easier ops)
- •✅ Cross-customer analytics possible
- •✅ Easier to scale vertically
- •❌ More complex security (must get RLS right)
Alternative: Separate database per customer
- •✅ Perfect isolation
- •✅ Simpler security
- •❌ Harder to maintain (100s of databases)
- •❌ No cross-customer analytics
Decision: Use row-level security for MVP, consider database-per-tenant at scale
3. Indexing Strategy
Rule of thumb: Index foreign keys, time columns, and frequent WHERE clauses
-- Foreign key indexes (ALWAYS)
CREATE INDEX idx_transactions_store ON transactions(store_id);
CREATE INDEX idx_transactions_product ON transactions(product_id);
CREATE INDEX idx_transactions_customer ON transactions(customer_id);
-- Composite indexes (for common query patterns)
CREATE INDEX idx_transactions_store_time ON transactions(store_id, timestamp DESC);
CREATE INDEX idx_inventory_store_product_time ON inventory_levels(store_id, product_id, timestamp DESC);
-- Partial indexes (for filtered queries)
CREATE INDEX idx_alerts_open ON alerts(store_id, created_at DESC)
WHERE status = 'open';
-- GIN index (for JSONB columns)
CREATE INDEX idx_metadata_gin ON anomalies USING GIN(metadata);
Index cost:
- •✅ Faster SELECT queries (10-1000x)
- •❌ Slower INSERTs (5-10%)
- •❌ More disk space (20-30%)
When to add indexes:
- •After identifying slow queries (EXPLAIN ANALYZE)
- •Foreign keys (relationships)
- •Common WHERE/JOIN clauses
- •ORDER BY columns (with DESC if sorting descending)
When NOT to index:
- •❌ Columns with low cardinality (boolean, status with 2-3 values)
- •❌ Tables with <10K rows (full scan is fast enough)
- •❌ Columns rarely queried
4. Query Optimization
Pattern: Use EXPLAIN ANALYZE to find bottlenecks
-- Example slow query
EXPLAIN ANALYZE
SELECT
p.name,
SUM(t.quantity) AS total_sold
FROM transactions t
JOIN products p ON t.product_id = p.product_id
WHERE t.timestamp >= NOW() - INTERVAL '30 days'
GROUP BY p.name
ORDER BY total_sold DESC
LIMIT 10;
-- Look for:
-- ❌ Seq Scan (full table scan) - needs index
-- ❌ Hash Join - might need better indexes
-- ✅ Index Scan - good!
-- ✅ Bitmap Index Scan - good for OR conditions
Common optimizations:
1. Partition pruning (TimescaleDB automatic):
-- Query with time filter - only scans relevant partitions SELECT * FROM transactions WHERE timestamp >= '2026-02-01' AND timestamp < '2026-03-01'; -- TimescaleDB automatically skips other months
2. Aggregation pushdown:
-- BAD: Compute on application side
results = db.execute("SELECT * FROM transactions WHERE ...")
total = sum(r['quantity'] for r in results)
-- GOOD: Let database aggregate
total = db.execute(
"SELECT SUM(quantity) FROM transactions WHERE ..."
).scalar()
3. Limit early:
-- BAD: Sort all rows, then limit SELECT * FROM products ORDER BY created_at DESC LIMIT 10; -- GOOD: Use index to get top 10 directly CREATE INDEX idx_products_created_desc ON products(created_at DESC); -- Now query uses index scan, only fetches 10 rows
4. Continuous aggregates (pre-computed rollups):
-- Create materialized view (refreshed automatically)
CREATE MATERIALIZED VIEW daily_sales
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', timestamp) AS day,
store_id,
product_id,
SUM(quantity) AS quantity_sold,
SUM(total_amount) AS revenue
FROM transactions
GROUP BY day, store_id, product_id;
-- Refresh policy (every hour)
SELECT add_continuous_aggregate_policy('daily_sales',
start_offset => INTERVAL '1 month',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
-- Now queries on daily_sales are instant (pre-computed)
SELECT day, SUM(revenue) FROM daily_sales
WHERE day >= '2026-02-01'
GROUP BY day;
5. Data Integrity Constraints
Use constraints to enforce business rules at database level
CREATE TABLE inventory_levels (
id BIGSERIAL PRIMARY KEY,
store_id UUID REFERENCES stores(store_id),
product_id UUID REFERENCES products(product_id),
timestamp TIMESTAMP NOT NULL,
quantity_on_hand INT NOT NULL,
quantity_allocated INT DEFAULT 0,
-- Constraints
CHECK (quantity_on_hand >= 0), -- Can't be negative
CHECK (quantity_allocated >= 0),
CHECK (quantity_allocated <= quantity_on_hand), -- Can't allocate more than on-hand
-- Computed column (auto-calculated, stored)
quantity_available INT GENERATED ALWAYS AS
(quantity_on_hand - quantity_allocated) STORED
);
-- Unique constraints
CREATE UNIQUE INDEX idx_reorder_points_current ON reorder_points(store_id, product_id)
WHERE effective_to IS NULL; -- Only one "current" reorder point per store/product
Why constraints matter:
- •✅ Prevent bad data at source
- •✅ Self-documenting (schema shows business rules)
- •✅ Faster than application validation (database enforces)
- •❌ Harder to change later (migrations required)
6. Migration Best Practices
Tool: Alembic (Python) for version-controlled migrations
# migrations/versions/001_create_stores_table.py
from alembic import op
import sqlalchemy as sa
def upgrade():
op.create_table(
'stores',
sa.Column('store_id', sa.UUID(), primary_key=True),
sa.Column('customer_id', sa.UUID(), nullable=False),
sa.Column('name', sa.String(255), nullable=False),
sa.Column('created_at', sa.TIMESTAMP(), server_default=sa.func.now())
)
op.create_index('idx_stores_customer', 'stores', ['customer_id'])
def downgrade():
op.drop_table('stores')
Migration rules:
- •Always reversible (implement
downgrade()) - •Idempotent (can run multiple times safely)
- •Test on copy of production data before deploying
- •Small incremental changes (not 50 tables in one migration)
- •Add columns with defaults (avoids table rewrite)
Safe column addition (no downtime):
-- Step 1: Add column (nullable, with default) ALTER TABLE products ADD COLUMN shelf_life_days INT DEFAULT NULL; -- Step 2: Backfill data (in batches, off-peak hours) UPDATE products SET shelf_life_days = 7 WHERE category = 'produce'; UPDATE products SET shelf_life_days = 30 WHERE category = 'dairy'; -- Step 3: Add NOT NULL constraint (after backfill complete) ALTER TABLE products ALTER COLUMN shelf_life_days SET NOT NULL;
Unsafe changes (require downtime or more care):
- •Dropping columns (data loss)
- •Changing column types (requires rewrite)
- •Adding NOT NULL without default (fails on existing rows)
7. Connection Pooling
Problem: Opening new database connection is expensive (10-50ms)
Solution: Connection pool (reuse connections)
# Using SQLAlchemy (Python)
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
'postgresql://user:pass@localhost/shelfops',
poolclass=QueuePool,
pool_size=10, # Number of persistent connections
max_overflow=20, # Additional connections under load
pool_pre_ping=True, # Check connection health before use
pool_recycle=3600 # Recycle connections every hour
)
Pool sizing:
- •pool_size: 10-20 for web applications (per instance)
- •max_overflow: 2x pool_size
- •Total connections: pool_size + max_overflow = 30
Database max_connections:
-- Check current setting SHOW max_connections; -- Default: 100 -- Calculate: (num_app_instances × pool_size) + buffer -- Example: 3 instances × 10 pool = 30 + 20 buffer = 50 connections
8. Monitoring Queries
View active queries:
-- See what's running now
SELECT
pid,
now() - query_start AS duration,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
-- Kill long-running query
SELECT pg_terminate_backend(pid) WHERE pid = <pid>;
Slow query log:
-- Enable slow query logging (queries > 100ms) ALTER SYSTEM SET log_min_duration_statement = 100; -- milliseconds SELECT pg_reload_conf(); -- Logs appear in PostgreSQL log file
Index usage stats:
-- Find unused indexes (never scanned)
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Consider dropping unused indexes (after confirming they're not seasonal)
Common Patterns for ShelfOps
Pattern 1: Latest Inventory Snapshot
-- Get most recent inventory level per store/product
SELECT DISTINCT ON (store_id, product_id)
store_id,
product_id,
timestamp,
quantity_on_hand
FROM inventory_levels
ORDER BY store_id, product_id, timestamp DESC;
-- Faster with lateral join (if querying specific stores)
SELECT
s.store_id,
p.product_id,
latest.timestamp,
latest.quantity_on_hand
FROM stores s
CROSS JOIN products p
CROSS JOIN LATERAL (
SELECT timestamp, quantity_on_hand
FROM inventory_levels i
WHERE i.store_id = s.store_id AND i.product_id = p.product_id
ORDER BY timestamp DESC
LIMIT 1
) latest;
Pattern 2: Sales Velocity (Rolling Average)
-- 7-day rolling average sales per store/product
SELECT
store_id,
product_id,
time_bucket('1 day', timestamp) AS day,
AVG(quantity) OVER (
PARTITION BY store_id, product_id
ORDER BY time_bucket('1 day', timestamp)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS avg_7d_sales
FROM transactions
WHERE timestamp >= NOW() - INTERVAL '30 days';
Pattern 3: Alert Deduplication
-- Don't create duplicate alerts for same store/product
INSERT INTO alerts (store_id, product_id, alert_type, message, ...)
VALUES (...)
ON CONFLICT (store_id, product_id, alert_type)
WHERE status = 'open'
DO UPDATE SET
message = EXCLUDED.message,
updated_at = NOW();
-- Requires unique index:
CREATE UNIQUE INDEX idx_alerts_dedup ON alerts(store_id, product_id, alert_type)
WHERE status = 'open';
Testing Database Code
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
@pytest.fixture
def db_session():
"""Create test database session"""
# Use separate test database
engine = create_engine('postgresql://localhost/shelfops_test')
# Run migrations
alembic.command.upgrade(alembic_cfg, "head")
Session = sessionmaker(bind=engine)
session = Session()
yield session
# Rollback after test
session.rollback()
session.close()
def test_inventory_constraint(db_session):
"""Test that negative inventory is rejected"""
from models import InventoryLevel
# This should fail
with pytest.raises(IntegrityError):
db_session.add(InventoryLevel(
store_id=uuid.uuid4(),
product_id=uuid.uuid4(),
quantity_on_hand=-5 # Negative - constraint violation
))
db_session.commit()
DO / DON'T
DO
- •✅ Use TimescaleDB hypertables for time-series data
- •✅ Index foreign keys
- •✅ Add constraints (CHECK, NOT NULL, UNIQUE)
- •✅ Use EXPLAIN ANALYZE for slow queries
- •✅ Create migrations for schema changes
- •✅ Use connection pooling
- •✅ Monitor query performance
DON'T
- •❌ Use SELECT * (specify columns)
- •❌ N+1 queries (use JOINs or eager loading)
- •❌ Store sensitive data unencrypted
- •❌ Skip migrations (manual ALTER TABLE)
- •❌ Over-index (every column doesn't need an index)
- •❌ Ignore slow query logs
- •❌ Run migrations in production without testing on copy first
Resources
- •TimescaleDB docs: https://docs.timescale.com/
- •PostgreSQL performance tips: https://wiki.postgresql.org/wiki/Performance_Optimization
- •Alembic (migrations): https://alembic.sqlalchemy.org/
- •SQLAlchemy (ORM): https://www.sqlalchemy.org/
Last Updated: 2026-02-09
Version: 1.0.0