AgentSkillsCN

Database Optimization

数据库优化

SKILL.md

Database Optimization

Overview

Best practices for designing efficient database schemas, writing optimized queries, and scaling database operations for high-performance applications.

When to Use This Skill

  • Writing complex SQL queries
  • Optimizing slow queries
  • Designing database schemas
  • Implementing indexing strategies
  • Scaling database operations
  • Troubleshooting N+1 query problems
  • Working with ORMs efficiently

Indexing Strategies

When to Add Indexes

  • Columns used in WHERE clauses
  • Foreign keys for JOIN operations
  • Columns used in ORDER BY, GROUP BY
  • Columns frequently searched

Index Types

B-Tree Index (Default)

sql
-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_users_status_created ON users(status, created_at);

-- Use composite index for queries like:
SELECT * FROM users WHERE status = 'active' ORDER BY created_at;

Unique Index

sql
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- Enforces uniqueness and improves lookup performance

Partial/Filtered Index

sql
-- PostgreSQL
CREATE INDEX idx_active_users ON users(created_at) WHERE status = 'active';

-- Only indexes active users, smaller and faster

Full-Text Search Index

sql
-- PostgreSQL
CREATE INDEX idx_products_search ON products USING GIN(to_tsvector('english', name || ' ' || description));

-- Search query
SELECT * FROM products 
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('laptop');

Index Trade-offs

Pros:

  • Faster SELECT queries
  • Faster JOIN operations
  • Enforces uniqueness

Cons:

  • Slower INSERT/UPDATE/DELETE
  • Takes up disk space
  • Maintenance overhead

Best Practice: Index frequently read columns, be selective for write-heavy tables

Query Optimization

1. Use EXPLAIN to Analyze Queries

sql
EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
AND o.created_at > '2026-01-01';

-- Look for:
-- - Seq Scan (bad) vs Index Scan (good)
-- - High cost values
-- - Nested loops on large datasets

2. Avoid SELECT *

sql
-- ❌ Bad - Retrieves unnecessary data
SELECT * FROM users WHERE id = 123;

-- ✅ Good - Only fetch needed columns
SELECT id, name, email FROM users WHERE id = 123;

3. Use Appropriate JOIN Types

sql
-- INNER JOIN - Only matching records
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN - All from left table, matching from right
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

-- ⚠️ Avoid unnecessary JOINs when possible

4. Optimize WHERE Clauses

sql
-- ✅ Good - Uses index on status
SELECT * FROM users WHERE status = 'active';

-- ❌ Bad - Function prevents index usage
SELECT * FROM users WHERE UPPER(status) = 'ACTIVE';

-- ✅ Good - Store normalized data
SELECT * FROM users WHERE status = 'active'; -- Ensure data is lowercase

-- ❌ Bad - Leading wildcard prevents index
SELECT * FROM users WHERE email LIKE '%@example.com';

-- ✅ Good - Can use index
SELECT * FROM users WHERE email LIKE 'john%';

5. Use LIMIT for Large Result Sets

sql
-- ❌ Bad - Returns all results
SELECT * FROM orders ORDER BY created_at DESC;

-- ✅ Good - Pagination
SELECT * FROM orders 
ORDER BY created_at DESC 
LIMIT 20 OFFSET 40; -- Page 3

-- ✅ Better - Cursor-based pagination (for large offsets)
SELECT * FROM orders 
WHERE created_at < '2026-02-01'
ORDER BY created_at DESC 
LIMIT 20;

N+1 Query Problem

The Problem

javascript
// ❌ Bad - N+1 queries
const users = await User.findAll(); // 1 query

for (const user of users) {
  const orders = await Order.findAll({ where: { userId: user.id } }); // N queries
  user.orders = orders;
}
// Total: 1 + N queries

Solutions

1. Eager Loading (ORM)

javascript
// ✅ Good - 2 queries total
const users = await User.findAll({
  include: [{
    model: Order,
    as: 'orders'
  }]
});

// Or with Prisma
const users = await prisma.user.findMany({
  include: {
    orders: true
  }
});

2. Manual JOIN

sql
-- ✅ Good - Single query
SELECT 
  u.id, u.name, u.email,
  o.id as order_id, o.total, o.created_at as order_created_at
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

3. Batch Loading (DataLoader)

javascript
const DataLoader = require('dataloader');

const orderLoader = new DataLoader(async (userIds) => {
  const orders = await Order.findAll({
    where: { userId: userIds }
  });
  
  // Group orders by userId
  const ordersByUser = userIds.map(id => 
    orders.filter(o => o.userId === id)
  );
  
  return ordersByUser;
});

// Usage - automatically batches requests
const user1Orders = await orderLoader.load(user1.id);
const user2Orders = await orderLoader.load(user2.id);
// Only 1 query executed

Database Schema Design

Normalization

1NF (First Normal Form)

  • Atomic values (no arrays in cells)
  • Each row is unique

2NF (Second Normal Form)

  • 1NF + No partial dependencies
  • All non-key attributes depend on entire primary key

3NF (Third Normal Form)

  • 2NF + No transitive dependencies
  • Non-key attributes don't depend on other non-key attributes

When to Denormalize

sql
-- Normalized (3NF)
CREATE TABLE orders (
  id INT PRIMARY KEY,
  user_id INT,
  created_at TIMESTAMP
);

CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT,
  product_id INT,
  quantity INT,
  price DECIMAL
);

-- Calculating total requires aggregation
SELECT o.id, SUM(oi.quantity * oi.price) as total
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id;

-- Denormalized - add total column
CREATE TABLE orders (
  id INT PRIMARY KEY,
  user_id INT,
  total DECIMAL, -- Denormalized for performance
  created_at TIMESTAMP
);

-- Update total when items change (use triggers or application logic)

When to Denormalize:

  • Read-heavy operations
  • Expensive aggregations
  • Reporting/analytics tables
  • Caching computed values

When NOT to Denormalize:

  • Write-heavy tables
  • Data frequently changes
  • Storage is a concern

Appropriate Data Types

sql
-- ✅ Good - Right-sized types
CREATE TABLE users (
  id BIGINT PRIMARY KEY,           -- For large tables
  email VARCHAR(255) NOT NULL,     -- Reasonable max length
  age SMALLINT,                    -- 0-255 is enough
  balance DECIMAL(10,2),           -- Precise for money
  is_active BOOLEAN,               -- Not INT
  created_at TIMESTAMP DEFAULT NOW()
);

-- ❌ Bad - Oversized types
CREATE TABLE users (
  id VARCHAR(1000),                -- Waste of space
  email TEXT,                      -- No max length validation
  age INT,                         -- Unnecessarily large
  balance FLOAT,                   -- Precision issues for money
  is_active VARCHAR(10),           -- Use BOOLEAN
  created_at VARCHAR(50)           -- Use TIMESTAMP
);

Transactions

ACID Properties

  • Atomicity: All or nothing
  • Consistency: Valid state transitions
  • Isolation: Concurrent transactions don't interfere
  • Durability: Committed data persists

Transaction Usage

javascript
// ✅ Good - Use transactions for related operations
await db.transaction(async (trx) => {
  const order = await trx('orders').insert({
    user_id: userId,
    total: 100
  }).returning('*');
  
  await trx('order_items').insert([
    { order_id: order.id, product_id: 1, quantity: 2 },
    { order_id: order.id, product_id: 2, quantity: 1 }
  ]);
  
  await trx('users').where({ id: userId }).decrement('balance', 100);
  
  // All succeed or all fail together
});

// ❌ Bad - No transaction
const order = await Order.create({ userId, total: 100 });
await OrderItem.insert([...]); // Could fail, leaving orphaned order
await User.update({ balance: balance - 100 }); // Could fail, inconsistent state

Isolation Levels

sql
-- Read Uncommitted (lowest isolation, highest performance)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- Read Committed (default in PostgreSQL)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Serializable (highest isolation, lowest performance)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Connection Pooling

Configuration

javascript
const { Pool } = require('pg');

const pool = new Pool({
  host: 'localhost',
  port: 5432,
  database: 'mydb',
  user: 'user',
  password: 'password',
  max: 20,          // Maximum connections
  min: 5,           // Minimum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// ✅ Good - Use pool
app.get('/api/users', async (req, res) => {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users');
    res.json(result.rows);
  } finally {
    client.release(); // Important!
  }
});

// ❌ Bad - Creating new connection each time
app.get('/api/users', async (req, res) => {
  const client = new Client({ ... });
  await client.connect();
  const result = await client.query('SELECT * FROM users');
  await client.end();
  res.json(result.rows);
});

Caching Strategies

Application-Level Caching

javascript
const Redis = require('ioredis');
const redis = new Redis();

async function getUserById(id) {
  // Try cache first
  const cached = await redis.get(`user:${id}`);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Cache miss - query database
  const user = await User.findById(id);
  
  // Store in cache (TTL: 1 hour)
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));
  
  return user;
}

// Invalidate cache on update
async function updateUser(id, data) {
  const user = await User.update(id, data);
  await redis.del(`user:${id}`); // Invalidate cache
  return user;
}

Query Result Caching

sql
-- PostgreSQL - Materialized views
CREATE MATERIALIZED VIEW popular_products AS
SELECT p.id, p.name, COUNT(oi.id) as order_count
FROM products p
JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.id, p.name
ORDER BY order_count DESC;

-- Refresh periodically
REFRESH MATERIALIZED VIEW popular_products;

-- Query the cached view
SELECT * FROM popular_products LIMIT 10;

Database Partitioning

Range Partitioning

sql
-- PostgreSQL - Partition by date
CREATE TABLE orders (
  id BIGINT,
  user_id INT,
  total DECIMAL,
  created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

-- Queries automatically use correct partition
SELECT * FROM orders WHERE created_at > '2026-01-01';

List Partitioning

sql
-- Partition by region
CREATE TABLE users (
  id BIGINT,
  name VARCHAR(255),
  region VARCHAR(10)
) PARTITION BY LIST (region);

CREATE TABLE users_us PARTITION OF users FOR VALUES IN ('US');
CREATE TABLE users_eu PARTITION OF users FOR VALUES IN ('EU', 'UK');
CREATE TABLE users_asia PARTITION OF users FOR VALUES IN ('CN', 'JP', 'IN');

Database Migrations

Best Practices

javascript
// Migration: add_email_index.js

exports.up = async function(knex) {
  // Check if index exists before creating
  const hasIndex = await knex.schema.hasColumn('users', 'email');
  if (hasIndex) {
    await knex.schema.alterTable('users', (table) => {
      table.index('email', 'idx_users_email');
    });
  }
};

exports.down = async function(knex) {
  await knex.schema.alterTable('users', (table) => {
    table.dropIndex('email', 'idx_users_email');
  });
};

// ✅ Always provide rollback (down)
// ✅ Make migrations idempotent
// ✅ Test migrations on staging first
// ⚠️ Be careful with data migrations on large tables

Safe Schema Changes

sql
-- ✅ Safe - Add nullable column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- ✅ Safe - Add column with default (PostgreSQL 11+)
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';

-- ⚠️ Risky - Add NOT NULL column without default
ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL;
-- Better: Add nullable first, populate, then add constraint

-- Step 1: Add nullable column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Step 2: Populate with default value
UPDATE users SET phone = '' WHERE phone IS NULL;

-- Step 3: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

ORM Best Practices

Efficient Queries with Prisma

javascript
// ✅ Good - Select only needed fields
const users = await prisma.user.findMany({
  select: {
    id: true,
    name: true,
    email: true
  }
});

// ✅ Good - Use findUnique for single records
const user = await prisma.user.findUnique({
  where: { id: 123 }
});

// ✅ Good - Batch operations
await prisma.order.createMany({
  data: orders, // Array of orders
  skipDuplicates: true
});

// ❌ Bad - Multiple individual creates
for (const order of orders) {
  await prisma.order.create({ data: order });
}

TypeORM Optimization

typescript
// ✅ Good - Use QueryBuilder for complex queries
const users = await userRepository
  .createQueryBuilder('user')
  .leftJoinAndSelect('user.orders', 'order')
  .where('user.status = :status', { status: 'active' })
  .andWhere('order.total > :minTotal', { minTotal: 100 })
  .getMany();

// ❌ Bad - Loading all entities
const users = await userRepository.find({
  relations: ['orders'] // Loads ALL orders
});

Monitoring & Maintenance

Slow Query Log

sql
-- PostgreSQL - Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
SELECT pg_reload_conf();

-- MySQL
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1;

Database Statistics

sql
-- PostgreSQL - Check table sizes
SELECT 
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

-- Check index usage
SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- Unused indexes
ORDER BY pg_relation_size(indexrelid) DESC;

VACUUM and ANALYZE

sql
-- PostgreSQL - Reclaim space and update statistics
VACUUM ANALYZE users;

-- Full vacuum (locks table)
VACUUM FULL users;

-- Auto-vacuum configuration
ALTER TABLE users SET (autovacuum_vacuum_scale_factor = 0.1);

Database Optimization Checklist

  • Indexes on frequently queried columns
  • Composite indexes for multi-column queries
  • No unused indexes (check pg_stat_user_indexes)
  • Proper data types (no oversized columns)
  • Foreign key constraints with indexes
  • Normalized schema (unless denormalization justified)
  • Connection pooling configured
  • Query optimization (use EXPLAIN ANALYZE)
  • No N+1 query problems
  • Batch operations for bulk inserts/updates
  • Transactions for related operations
  • Caching for expensive queries
  • Slow query logging enabled
  • Regular VACUUM and ANALYZE
  • Monitoring query performance
  • Prepared statements (SQL injection prevention)
  • Database backups automated
  • Partitioning for large tables
  • Read replicas for scaling reads
  • Regular security updates