AgentSkillsCN

debugging-db-issues

调试常见的数据库问题,如连接池耗尽与事务错误

SKILL.md
--- frontmatter
name: debugging-db-issues
description: Debug common database issues like connection pool exhaustion and transaction errors

Debugging Database Issues Skill

Overview

Quick reference for diagnosing and fixing common PostgreSQL issues in brickston-ai.

Common Issues

1. Connection Pool Exhaustion

Error: psycopg_pool.TooManyRequests

Symptoms:

  • 500 errors on API endpoints
  • "pool exhausted" in logs
  • Intermittent timeouts

Diagnosis:

python
# Check pool status
from app.core.database import pool
print(f"Pool size: {pool.get_size()}")
print(f"Available: {pool.get_available()}")
print(f"In use: {pool.get_size() - pool.get_available()}")

Solutions:

python
# Option 1: Increase pool size in database.py
pool = AsyncConnectionPool(
    DATABASE_URL,
    min_size=5,
    max_size=20,  # Increase from default
    timeout=30,
)

# Option 2: Ensure connections are released
async with pool.connection() as conn:
    # Use connection
    pass  # Connection auto-released here

# Option 3: Add connection timeout
async with asyncio.timeout(10):
    async with pool.connection() as conn:
        # Use connection

2. Transaction Aborted

Error: current transaction is aborted, commands ignored until end of transaction block

Symptoms:

  • First query fails, all subsequent queries fail
  • Error cascades through the request

Diagnosis:

python
# Find the original error in logs
# Look for the FIRST error, not the "transaction aborted" message

Solutions:

python
# Option 1: Wrap in try/except with rollback
async def safe_query(db: AsyncSession):
    try:
        result = await db.execute(text("SELECT ..."))
        return result.mappings().all()
    except Exception as e:
        await db.rollback()  # Critical!
        raise

# Option 2: Use savepoints for partial rollback
async with db.begin_nested():  # Creates savepoint
    await db.execute(text("INSERT ..."))
    # If this fails, only this block rolls back

3. Relation Does Not Exist

Error: relation "table_name" does not exist

Symptoms:

  • Query fails for specific table
  • Works in production, fails in dev (or vice versa)

Diagnosis:

sql
-- Check if table exists
SELECT table_name FROM information_schema.tables 
WHERE table_schema = 'public';

-- Check migrations
SELECT version_num FROM alembic_version;

Solutions:

bash
# Run pending migrations
cd apps/api
alembic upgrade head

# Or check if using wrong database
echo $DATABASE_URL

4. Foreign Key Violation

Error: insert or update on table "child" violates foreign key constraint

Symptoms:

  • Insert/update fails
  • Referenced record doesn't exist

Diagnosis:

sql
-- Check if parent record exists
SELECT id FROM parent_table WHERE id = '<foreign_key_value>';

Solutions:

python
# Option 1: Insert parent first
await db.execute(text("INSERT INTO parent ..."))
await db.execute(text("INSERT INTO child ..."))

# Option 2: Use NOT VALID for migrations
op.create_foreign_key(
    'fk_name', 'child', 'parent',
    ['parent_id'], ['id'],
    postgresql_not_valid=True
)

5. Slow Queries

Diagnosis:

sql
-- Find slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Check missing indexes
EXPLAIN ANALYZE SELECT * FROM table WHERE column = 'value';

Solutions:

sql
-- Add index
CREATE INDEX CONCURRENTLY ix_table_column ON table(column);

Quick Debug Script

python
# scripts/debug_db.py
import asyncio
from sqlalchemy import text
from app.core.database import get_db

async def diagnose():
    async for db in get_db():
        # Test connection
        result = await db.execute(text("SELECT 1"))
        print(f"Connection OK: {result.scalar()}")
        
        # Check table exists
        tables = await db.execute(text("""
            SELECT table_name FROM information_schema.tables 
            WHERE table_schema = 'public'
        """))
        print(f"Tables: {[t[0] for t in tables]}")
        
        # Check migrations
        version = await db.execute(text("SELECT version_num FROM alembic_version"))
        print(f"Migration version: {version.scalar()}")

asyncio.run(diagnose())

Checklist

  • Check error logs for ORIGINAL error (not cascaded)
  • Verify connection pool settings
  • Ensure rollback on exceptions
  • Run pending migrations
  • Check for missing indexes on slow queries