Database Handling Specifications
Goals
- •Safely explore database schema without performance degradation.
- •Construct precise, efficient SQL queries that prevent system crashes (OOM & OOT).
- •Handle dialect-specific nuances (PostgreSQL, MySQL, SQLite, etc.).
- •Transform raw result sets into structured, validated data for analysis.
Inspection
- •Volume Estimation:
- •Before any
SELECT *, always runSELECT COUNT(*) FROM table_nameto understand the scale. - •If a table has >1,000,000 rows, strictly use indexed columns for filtering.
- •Before any
- •Sample Data:
- •Use
SELECT * FROM table_name LIMIT 5to see actual data formats.
- •Use
Querying
- •Safety Constraints:
- •Always use
LIMIT: Never execute a query without aLIMITclause unless the row count is confirmed to be small. - •Avoid
SELECT *: In production-scale tables, explicitly name columns to reduce I/O and memory usage.
- •Always use
- •Dialect & Syntax:
- •Case Sensitivity: If a column/table name contains uppercase or special characters, MUST quote it (e.g.,
"UserTable"in Postgres,`UserTable`in MySQL). - •Date/Time: Use standard ISO strings for date filtering; be mindful of timezone-aware vs. naive columns.
- •Case Sensitivity: If a column/table name contains uppercase or special characters, MUST quote it (e.g.,
- •Complex Queries:
- •For
JOINoperations, ensure joining columns are indexed to prevent full table scans. - •When performing
GROUP BY, ensure the result set size is manageable.
- •For
Data Retrieval & Transformation
- •Type Mapping:
- •Ensure SQL types (e.g.,
DECIMAL,BIGINT,TIMESTAMP) are correctly mapped to Python/JSON types without precision loss. - •Convert
NULLvalues to a consistent "missing" representation (e.g.,NoneorNaN).
- •Ensure SQL types (e.g.,
- •Chunked Fetching:
- •For medium-to-large exports, use
fetchmany(size)orOFFSET/LIMITpagination instead of fetching everything into memory at once.
- •For medium-to-large exports, use
- •Aggregations:
- •Prefer performing calculations (SUM, AVG, COUNT) at the database level rather than pulling raw data to the client for processing.
Error Handling & Recovery
- •Timeout Management: If a query takes too long, retry with more restrictive filters or optimized joins.
- •Syntax Errors: If a query fails, inspect the dialect-specific error message and re-verify the schema (it's often a misspelled column or missing quotes).
Anti-Pattern Prevention (Avoiding "Bad" SQL)
- •Index-Friendly Filters: Never wrap indexed columns in functions (e.g.,
DATE(),UPPER()) within theWHEREclause. - •Join Safety: Always verify join keys. Before joining, check if the key has high cardinality to avoid massive intermediate result sets.
- •Memory Safety:
- •Avoid
DISTINCTandUNION(which performs de-duplication) on multi-million row sets unless necessary; useUNION ALLif duplicates are acceptable. - •Avoid
ORDER BYon large non-indexed text fields.
- •Avoid
- •Wildcard Warning: Strictly avoid leading wildcards in
LIKEpatterns (e.g.,%term) on large text columns. - •No Function on Columns:
WHERE col = FUNC(val)is good;WHERE FUNC(col) = valis bad. - •Explicit Columns: Only fetch what is necessary.
- •Early Filtering: Push
WHEREconditions as close to the base tables as possible. - •CTE for Clarity: Use
WITHfor complex multi-step logic to improve maintainability and optimizer hints.
Best Practices
- •Always verify database structure before querying
- •Use appropriate sampling techniques for large datasets
- •Optimize queries for efficiency based on schema inspection
- •Self-review the draft SQL against the "Anti-Pattern Prevention" list.
- •Perform a silent mental 'EXPLAIN' on your query. If it smells like a full table scan on a large table, refactor it before outputting