AgentSkillsCN

database-audit

当您需要审计数据库 Schema、迁移流程、数据完整性、查询模式,或被问及数据库架构时,此技能同样不可或缺。它将全面覆盖 Schema 设计、索引策略(尤其是高流量表)、迁移操作、约束条件、查询优化,以及数据一致性保障。尤其适用于包含大量文本字段的表、规模庞大的数据集(如日志、活动记录、通知信息),以及那些缺乏有效索引的表。

SKILL.md
--- frontmatter
name: database-audit
description: "Use when auditing database schema, migrations, data integrity, query patterns, or when asked about database architecture. Covers schema design, indexing strategy (including high-volume tables), migrations, constraints, query optimization, and data consistency. Especially critical for tables with text-heavy columns, large datasets (logs, activity, notifications), and missing indexes."

Database Audit

Overview

The database is the foundation. If the schema is wrong, everything built on top is fragile.

Core principle: Schema should enforce business rules. Don't trust application code to maintain data integrity.

The Iron Laws

code
1. NO NULLABLE COLUMN WITHOUT DOCUMENTED REASON. NO MISSING INDEX ON FOREIGN KEYS.
2. EVERY QUERY THAT TOUCHES PRODUCTION MUST HIT AN INDEX — NEVER A FULL TABLE SCAN ON LARGE TABLES.
3. TEXT-HEAVY TABLES THAT GROW UNBOUNDED (logs, activity, notifications) MUST HAVE RETENTION, PARTITIONING, OR ARCHIVAL STRATEGY.

When to Use

  • Auditing database architecture
  • Reviewing migration files
  • Investigating data inconsistency
  • Before schema changes
  • Performance investigation (query-related)
  • During any codebase audit
  • When tables have many text columns (logs, activity feeds, notifications)
  • When tables are expected to grow unbounded (audit trails, event stores)
  • When query performance degrades over time (missing index symptoms)

When NOT to Use

  • Application-level performance only (use performance-audit)
  • API design concerns (use api-design-audit)
  • Code-level architecture (use architecture-audit)

Anti-Shortcut Rules

code
YOU CANNOT:
- Audit schema from ORM models alone — check the ACTUAL database schema or migration files
- Say "indexes are fine" without checking query patterns — indexes serve queries, not tables
- Skip nullable column analysis — every NULL column is a potential bug waiting to happen
- Accept missing FK constraints because "the app handles it" — apps crash, constraints don't
- Ignore migration rollback plans — every migration needs a reverse
- Trust column names match their purpose — verify types and constraints match semantics
- Skip data integrity checks — orphaned records, duplicates, and invalid states exist silently
- Assess schema health without understanding the query patterns — schema exists to serve queries
- Ignore large-table strategy — any table that grows unbounded WILL become a bottleneck
- Skip EXPLAIN/ANALYZE on critical queries — "it works" does not mean "it performs"
- Assume TEXT columns don't need indexing — if you query them, you index them
- Accept "we'll add indexes later" — by the time you notice, production is already slow

Common Rationalizations (Don't Accept These)

RationalizationReality
"The ORM handles foreign keys"ORMs create references, databases enforce them. Check for DB-level constraints.
"We don't need indexes yet, it's fast enough"It's fast with 100 rows. Production has millions.
"Nullable is easier, we'll fix it later"NULL propagates through joins and aggregations, causing subtle bugs.
"Money as FLOAT works fine for us"Until you have rounding errors in financial reports.
"We don't use migrations, we change the schema directly"Direct changes = no rollback, no history, no reproducibility.
"VARCHAR(255) is fine for everything"It's a red flag that types weren't considered.
"The log table doesn't need indexes"It does when you query it. And you WILL query it.
"We'll archive old data when it gets big"Define "big". Set the threshold NOW, automate it NOW.
"Full-text search is overkill"TEXT columns without search indexes cause LIKE '%query%' full scans.
"We only read recent rows"Without a time-based index, "recent" still scans the whole table.
"Notifications don't need optimization"Users have thousands. Unread counts on every page load. Do the math.

Iron Questions

code
1. Does every foreign key column have a database-level FK constraint? (not just ORM)
2. Does every foreign key column have an index?
3. Is every nullable column intentionally nullable? (documented reason?)
4. Is DECIMAL/NUMERIC used for all monetary values? (never FLOAT)
5. Does every table have timestamps (created_at, updated_at)?
6. Does every migration have a reversible rollback?
7. Are composite indexes in the correct order? (most selective first)
8. Are there tables with > 30 columns? (normalization issue)
9. Are there orphaned records? (FK data without parent)
10. Does the schema match the ORM models exactly?
11. Does EVERY query path hit an index? (run EXPLAIN on the top 20 queries)
12. Are high-volume tables (logs, activity, notifications) partitioned or have retention policies?
13. Are TEXT/JSONB columns that are queried/filtered indexed appropriately?
14. Does unread_count / badge_count hit an index or does it full-scan?
15. Are there queries doing LIKE '%text%' on unindexed TEXT columns?

The Audit Process

Phase 1: Schema Analysis

code
1. READ all migration files OR inspect actual schema
2. MAP all tables, columns, types, and constraints
3. IDENTIFY relationships (1:1, 1:N, M:N)
4. CHECK for orphaned tables (not referenced by any code)
5. CLASSIFY tables by growth pattern:
   - Static (config, settings) — rarely changes
   - Transactional (orders, trades) — grows with business
   - Unbounded (logs, activity, notifications) — ⚠️ grows forever

For each table, verify:

CheckQuestion
Primary keyDoes it exist? Is it appropriate (auto-increment, UUID, composite)?
Column typesAre they appropriate? (VARCHAR(255) for emails? DECIMAL for money?)
Nullable columnsIs NULL semantically meaningful or just lazy?
Default valuesDo they make business sense?
ConstraintsCHECK, UNIQUE, NOT NULL — are business rules enforced?
Timestampscreated_at, updated_at present where needed?
Soft deletesIf used, is deleted_at indexed?
Growth patternStatic, transactional, or unbounded?
RetentionIf unbounded, what's the archival/purge strategy?

Phase 2: Relationship Integrity

code
1. EVERY foreign key MUST have a database-level FK constraint
2. EVERY FK column MUST have an index
3. CHECK cascade behavior (ON DELETE, ON UPDATE)
4. CHECK for orphan records (FK without matching parent)

Cascade rules:

RelationshipON DELETEReason
Order → UserRESTRICTDon't delete users with orders
OrderItem → OrderCASCADEDeleting order removes items
Comment → UserSET NULLKeep comments, lose attribution
Session → UserCASCADEUser deletion clears sessions
Notification → UserCASCADEUser deletion clears notifications
ActivityLog → UserSET NULLKeep audit trail, lose attribution

Cascade decision tree:

QuestionIf YesIf No
Should child records survive parent deletion?SET NULL or RESTRICTCASCADE
Is the child meaningless without the parent?CASCADERESTRICT or SET NULL
Would deleting the parent cause data loss?RESTRICTCASCADE or SET NULL
Is the FK column nullable?SET NULL is an optionCASCADE or RESTRICT
Is it an audit/compliance table?SET NULL (preserve record)CASCADE or RESTRICT

Phase 3: Index Analysis — Core

code
1. EVERY foreign key column — indexed? (mandatory)
2. Frequent WHERE clause columns — indexed?
3. Frequent ORDER BY columns — indexed?
4. Composite queries — composite index in correct order?
5. Unused indexes — consuming write performance?
6. Missing covering indexes — queries reading table unnecessarily?

Index ordering rule for composite indexes:

code
Most selective column FIRST
= conditions before RANGE conditions

Example:
WHERE status = 'active' AND created_at > '2024-01-01'
→ INDEX (status, created_at) ✅
→ INDEX (created_at, status) ❌ (range before equality)

Index analysis checklist:

Column UsageNeeds Index?Notes
Foreign key✅ AlwaysRequired for JOIN performance
WHERE clause (frequent)✅ YesCheck query logs for frequency
JOIN condition✅ YesBoth sides of the join
ORDER BY (on large tables)✅ UsuallyPrevents filesort
GROUP BY✅ UsuallyCan improve aggregation
SELECT only❌ NoUnless using covering index
Boolean with low cardinality❌ Usually notUnless combined in composite
Soft-delete (deleted_at)✅ YesPartial index WHERE deleted_at IS NULL

Phase 4: Index Analysis — High-Volume Tables (CRITICAL)

This phase is mandatory for any table that grows unbounded: logs, activity feeds, notifications, audit trails, event stores, analytics, sessions, messages.

4.1 Identify High-Volume Tables

code
HIGH-VOLUME INDICATORS:
- Table name contains: log, activity, event, notification, audit, session, message, analytics, metric, tracking
- Table has a timestamp column used for ordering/filtering
- Table has a user_id/entity_id FK for per-user queries
- Table grows by 100+ rows/day per user
- Table has no DELETE or PURGE mechanism
- Table has TEXT/JSONB columns storing variable-length data

4.2 Common Query Patterns on High-Volume Tables

Every high-volume table is queried in predictable patterns. Each pattern MUST hit an index:

Query PatternExampleRequired IndexPriority
Recent by userWHERE user_id = ? ORDER BY created_at DESC LIMIT 20(user_id, created_at DESC)🔴 Critical
Unread countWHERE user_id = ? AND read = false(user_id, read) WHERE read = false🔴 Critical
Unread itemsWHERE user_id = ? AND read_at IS NULL ORDER BY created_at DESC(user_id, read_at, created_at DESC)🔴 Critical
Filter by typeWHERE user_id = ? AND type = ? ORDER BY created_at DESC(user_id, type, created_at DESC)🟠 High
Date rangeWHERE user_id = ? AND created_at BETWEEN ? AND ?(user_id, created_at)🟠 High
Global recentORDER BY created_at DESC LIMIT 50(created_at DESC)🟡 Medium
Search by contentWHERE body ILIKE '%search%'Full-text index (GIN/tsvector)🟠 High
Aggregate countSELECT COUNT(*) WHERE user_id = ? AND type = ?(user_id, type)🟡 Medium
Batch mark-readUPDATE WHERE user_id = ? AND read = false(user_id, read)🟠 High
Cleanup oldDELETE WHERE created_at < ? AND read = true(created_at) WHERE read = true🟡 Medium

4.3 The Notification/Activity Table Audit

This is the archetype of a table that silently kills performance. Run this checklist:

code
NOTIFICATION TABLE DEEP DIVE:

□ user_id column has FK constraint AND index?
□ created_at has index? (for ordering)
□ Composite index (user_id, created_at DESC) exists?
  → This is the MOST CRITICAL index — serves "show my notifications" query
□ read/read_at status indexed?
  → Partial index: WHERE read_at IS NULL (for unread count)
□ type/category column indexed?
  → Composite: (user_id, type, created_at DESC) for filtered views
□ TEXT columns (title, body, description):
  → If queried with LIKE: needs full-text index
  → If only displayed: no index needed, but watch column size
  → If filtered: GIN index on tsvector column
□ JSONB/JSON data columns:
  → If queried by key: GIN index on specific paths
  → If only stored: no index, but consider extracting query fields
□ action_url / link columns:
  → Rarely indexed (good) — only display, never queried
□ Row count estimate: how many rows per user per year?
  → < 1,000: manageable without special strategy
  → 1,000-100,000: needs good indexes + pagination
  → > 100,000: needs partitioning or archival
□ Retention policy exists?
  → Auto-delete after N days?
  → Archive to cold storage?
  → Soft-delete with cleanup job?

4.4 Text Column Indexing Strategy

Tables with many TEXT columns are performance traps. Audit each TEXT column:

TEXT Column PurposeIndex StrategyDDL Example
Displayed only (title, body)❌ No index
Filtered with =B-tree on hashCREATE INDEX ON t USING hash (column);
Filtered with LIKE 'prefix%'B-tree with text_pattern_opsCREATE INDEX ON t (column text_pattern_ops);
Searched with LIKE '%term%'GIN trigramCREATE EXTENSION pg_trgm; CREATE INDEX ON t USING gin (column gin_trgm_ops);
Full-text searchedGIN tsvectorCREATE INDEX ON t USING gin (to_tsvector('english', column));
JSON queried by keyGIN jsonb_path_opsCREATE INDEX ON t USING gin (column jsonb_path_ops);
Used in WHERE + ORDERComposite B-treeCREATE INDEX ON t (status, column);

The cardinal sin:

sql
-- 🔴 NEVER do this on a large table without an index:
SELECT * FROM notifications WHERE body LIKE '%payment%';
-- This is a FULL TABLE SCAN. On 10M rows, this takes seconds.

-- ✅ Instead, add a trigram index:
CREATE INDEX idx_notifications_body_trgm ON notifications USING gin (body gin_trgm_ops);
-- Now the same query uses the index: milliseconds.

4.5 Partial Indexes (Postgres)

Partial indexes are essential for high-volume tables. They index ONLY the rows that matter:

sql
-- Instead of indexing ALL notifications:
CREATE INDEX idx_notif_user_read ON notifications (user_id, read);
-- This indexes millions of read=true rows you'll never query

-- ✅ Use a partial index — only index unread:
CREATE INDEX idx_notif_unread ON notifications (user_id, created_at DESC)
WHERE read_at IS NULL;
-- 95% smaller. 10x faster for the query that matters.

-- More examples:
-- Active sessions only
CREATE INDEX idx_active_sessions ON sessions (user_id, created_at)
WHERE expired_at IS NULL;

-- Pending items only
CREATE INDEX idx_pending_orders ON orders (user_id, created_at)
WHERE status = 'pending';

-- Non-deleted records
CREATE INDEX idx_active_users ON users (email)
WHERE deleted_at IS NULL;

4.6 Table Partitioning Strategy

For tables exceeding 10M rows, partitioning is not optional:

sql
-- Time-based partitioning (most common for logs/activity)
CREATE TABLE activity_log (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    user_id BIGINT NOT NULL,
    action VARCHAR(100) NOT NULL,
    details JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE activity_log_2024_01 PARTITION OF activity_log
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE activity_log_2024_02 PARTITION OF activity_log
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automate with pg_partman extension
Partitioning StrategyWhen to UseExample Tables
Range (time)Time-series data, logsactivity_log, audit_trail, metrics
List (category)Categorical data, tenantsnotifications (by type), orders (by region)
Hash (distribution)Even distribution neededsessions, cache entries

4.7 High-Volume Table Health Score

Score each high-volume table:

CriteriaPointsCheck
Primary query has index+3EXPLAIN shows Index Scan
Unread/status query has partial index+2Partial index on status condition
TEXT columns queried → has search index+2GIN/tsvector index exists
Retention policy defined+2Auto-cleanup or archival documented
Row count manageable (< 10M) OR partitioned+1Check table size
Total/10≥ 7 = healthy, 4-6 = at risk, < 4 = 🔴 critical

Phase 5: Migration Quality

code
1. EVERY migration has a rollback (down/revert)
2. Migrations are idempotent (safe to run twice)
3. Data migrations separate from schema migrations
4. No destructive changes without backup strategy
5. Migration naming follows convention (timestamp + description)

Migration anti-patterns:

Anti-PatternProblemFix
No rollbackCan't undo deploymentAlways write down()
Mixing schema + dataCan't partially roll backSeparate migrations
Column rename without aliasBreaks running codeAdd new → migrate → drop old
Dropping column without backupData lossBackup first, soft-delete, then hard-delete
Adding NOT NULL without defaultBreaks existing rowsAdd nullable → backfill → add constraint
Large table ALTER in productionLocks table, causes downtimeUse pt-online-schema-change or equivalent
Adding index on huge table without CONCURRENTLYLocks writesCREATE INDEX CONCURRENTLY

Phase 6: Data Type Accuracy

Data TypeCorrect UsageCommon Mistakes
Money/CurrencyDECIMAL(19,4)FLOAT (rounding errors), INT cents
EmailVARCHAR(254)VARCHAR(50) (too short), TEXT (too loose)
UUIDUUID native typeVARCHAR(36) (slower, larger)
BooleanBOOLEANTINYINT without CHECK, VARCHAR
IP AddressINET (Postgres)VARCHAR(15) (misses IPv6)
JSONJSONB (Postgres)TEXT (no validation, no queries)
CoordinatesPOINT/GEOGRAPHYTwo FLOAT columns
Status/EnumVARCHAR + CHECKMagic integers
PhoneVARCHAR(20)INTEGER (leading zeros), VARCHAR(10) (international)
URLTEXT with CHECKVARCHAR(255) (too short for real URLs)
Log messageTEXTVARCHAR(255) (truncates data)
Notification bodyTEXTVARCHAR with arbitrary limit
Search contentTSVECTOR + TEXT sourceRaw TEXT without search index

Phase 7: Query Pattern Analysis

code
1. IDENTIFY the top 20 most frequent queries (from app code, not gut feeling)
2. RUN EXPLAIN ANALYZE on each
3. VERIFY each hits an index (Index Scan, not Seq Scan)
4. CHECK for N+1 query patterns in ORM code
5. CHECK for queries that fetch all columns when only a few are needed
6. CHECK for pagination using OFFSET on large tables (use cursor instead)

Query Anti-Patterns on High-Volume Tables:

Anti-PatternProblemFix
SELECT COUNT(*) FROM large_tableFull table scanMaintain counter cache or use estimate
ORDER BY created_at OFFSET 10000Scans and discards 10,000 rowsCursor-based pagination with WHERE id > last_id
SELECT * FROM notifications WHERE user_id = ? (no LIMIT)Returns ALL rows for userAlways LIMIT + pagination
WHERE body LIKE '%search%' on unindexed TEXTFull table scanAdd trigram or full-text index
WHERE DATE(created_at) = '2024-01-01'Function prevents index useWHERE created_at >= '2024-01-01' AND created_at < '2024-01-02'
WHERE LOWER(email) = 'user@example.com'Function prevents index useExpression index: CREATE INDEX ON t (LOWER(email))
IN (SELECT ...) on large subqueryMaterializes entire subqueryUse EXISTS or JOIN instead
Multiple sequential queries in loopN+1 problemBatch query with WHERE id IN (...)

Phase 8: Consistency Between Code and Schema

code
1. DO models match migrations? (same columns, same types)
2. ARE there columns in the DB not in any model?
3. ARE there model properties not in the DB?
4. DO relationships in code match FK constraints in DB?
5. ARE there tables with no corresponding model?
6. DO enums/check constraints match application-level validation?

Output Format

markdown
# Database Audit: [Project Name]

## Schema Overview
- **Tables:** N (static: N, transactional: N, unbounded: N)
- **Total Columns:** N (text-heavy: N)
- **Foreign Keys:** N (with DB constraints: N, missing: N)
- **Indexes:** N (missing critical: N, unused: N)
- **Nullable Columns:** N (justified: N, unjustified: N)

## High-Volume Table Assessment
| Table | Rows (est.) | Growth Rate | Has Indexes | Has Retention | Health Score |
|-------|------------|-------------|-------------|---------------|-------------|
| notifications | 2.3M | 50K/month | ⚠️ Partial | ❌ None | 4/10 |
| activity_log | 8.1M | 200K/month | ❌ Missing | ❌ None | 2/10 |

## Findings by Severity

### 🔴 Critical
[Findings with evidence, EXPLAIN output, and DDL fix]

### 🟠 High
[...]

### 🟡 Medium
[...]

### 🟢 Low
[...]

## Missing Indexes
| Table | Column(s) | Query Pattern | DDL | Priority |
|-------|-----------|--------------|-----|----------|
| notifications | (user_id, created_at DESC) | User's recent | `CREATE INDEX ...` | 🔴 Critical |
| notifications | (user_id) WHERE read_at IS NULL | Unread count | `CREATE INDEX ...` | 🔴 Critical |
| activity_log | (user_id, created_at DESC) | User activity | `CREATE INDEX ...` | 🔴 Critical |

## Unused Indexes
| Table | Index Name | Size | Last Used | Recommendation |
|-------|-----------|------|-----------|----------------|

## High-Volume Table Recommendations
| Table | Issue | Recommendation | Effort |
|-------|-------|---------------|--------|
| notifications | No partial index for unread | Add partial index WHERE read_at IS NULL | S |
| activity_log | 8M+ rows, no partitioning | Partition by month on created_at | L |
| logs | No retention | Add cleanup job for rows > 90 days | M |

## Schema Recommendations
[Specific DDL statements for each fix]

## Verdict
[PASS / CONDITIONAL PASS / FAIL]

Red Flags — Escalate Immediately

  • No foreign key constraints at database level
  • Money stored as FLOAT
  • No indexes on FK columns
  • Tables with > 50 columns (normalization issue)
  • No migrations (schema managed manually)
  • Migrations without rollbacks
  • Orphaned records in production
  • Schema/model mismatch
  • High-volume table (logs, activity, notifications) with no indexes on user_id + created_at
  • TEXT columns queried with LIKE without search index
  • Tables with > 1M rows and no partitioning or retention strategy
  • Counting queries (unread badges) doing full table scans
  • Pagination using OFFSET on tables with > 100K rows
  • No partial indexes on status/boolean columns in high-volume tables

Integration

  • Part of: Full audit with architecture-audit
  • Complements: performance-audit for query optimization
  • Follow-up: refactoring-safely for schema changes
  • Pair with: security-audit for data exposure concerns
  • Combine with: ui-ux-redesign when slow queries cause poor UX
  • Triggers: observability-audit if no query monitoring exists