sql-splitter Skill
This skill helps you use sql-splitter to manipulate SQL dump files safely and efficiently.
When to Use This Skill
Use sql-splitter when:
- •The user mentions SQL dump files (
.sql,.sql.gz,.sql.bz2,.sql.xz,.sql.zst) - •The user wants to migrate, restore, or work with database dump files
- •The user needs to validate, analyze, split, merge, convert, sample, shard, or query dumps
- •Working with MySQL, PostgreSQL, SQLite, or MSSQL dump formats
- •The user wants to run SQL analytics on a dump file without loading it into a database
When NOT to Use This Skill
Do not use sql-splitter when:
- •Running complex ad-hoc SQL queries against a live database (use
psql/mysql/sqlcmddirectly) - •No dump file exists; only a running database is available
- •The user needs interactive data editing rather than dump manipulation
- •Working with dialects beyond MySQL/PostgreSQL/SQLite/MSSQL
- •Working with MSSQL binary backup files (.bak) or DACPAC/BACPAC formats (only script-based .sql dumps are supported)
Command Reference
split
Split a dump into per-table files.
sql-splitter split dump.sql --output tables/ --progress sql-splitter split dump.sql --tables users,orders --output tables/ sql-splitter split dump.sql --schema-only --output schema/ sql-splitter split dump.sql --data-only --output data/
merge
Merge per-table files back into a single dump.
sql-splitter merge tables/ --output restored.sql sql-splitter merge tables/ --output restored.sql --transaction sql-splitter merge tables/ --exclude logs,cache --output restored.sql
analyze
Get statistics about a dump (read-only).
sql-splitter analyze dump.sql --progress sql-splitter analyze "dumps/*.sql" --fail-fast
convert
Convert between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs).
sql-splitter convert mysql.sql --to postgres --output pg.sql sql-splitter convert pg_dump.sql --to mysql --output mysql.sql sql-splitter convert dump.sql --from postgres --to sqlite --output sqlite.sql sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql sql-splitter convert mysql.sql --to mssql --output mssql.sql sql-splitter convert mysql.sql --to postgres --output - | psql "$PG_CONN"
validate
Check dump integrity (syntax, encoding, PK/FK).
sql-splitter validate dump.sql --strict --progress sql-splitter validate "dumps/*.sql" --json --fail-fast sql-splitter validate dump.sql --no-fk-checks --progress
sample
Create reduced datasets with FK preservation.
sql-splitter sample dump.sql --output sampled.sql --percent 10 --preserve-relations sql-splitter sample dump.sql --output sampled.sql --rows 1000 --preserve-relations sql-splitter sample dump.sql --output sampled.sql --percent 10 --tables users,orders sql-splitter sample dump.sql --output sampled.sql --percent 10 --seed 42
shard
Extract tenant-specific data.
sql-splitter shard dump.sql --tenant-value 123 --tenant-column tenant_id --output tenant.sql sql-splitter shard dump.sql --tenant-values "1,2,3" --tenant-column account_id --output shards/
diff
Compare two SQL dumps for schema and data changes.
sql-splitter diff old.sql new.sql --progress sql-splitter diff old.sql new.sql --schema-only sql-splitter diff old.sql new.sql --data-only sql-splitter diff old.sql new.sql --format json --output diff.json sql-splitter diff old.sql new.sql --format sql --output migration.sql sql-splitter diff old.sql new.sql --tables users,orders --progress sql-splitter diff old.sql new.sql --verbose # Show sample PKs sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at" # Ignore columns sql-splitter diff old.sql new.sql --primary-key logs:ts+msg # Override PK sql-splitter diff old.sql new.sql --allow-no-pk # Tables without PK
redact
Anonymize PII in SQL dumps by replacing sensitive data with fake, hashed, or null values.
# Using YAML config file sql-splitter redact dump.sql --output safe.sql --config redact.yaml # Using CLI flags sql-splitter redact dump.sql --output safe.sql --null "*.ssn" --hash "*.email" --fake "*.name" # Mask credit cards (keep last 4 digits) sql-splitter redact dump.sql --output safe.sql --mask "****-****-****-XXXX=*.credit_card" # Generate config by analyzing input file sql-splitter redact dump.sql --generate-config --output redact.yaml # Reproducible with seed sql-splitter redact dump.sql --output safe.sql --config redact.yaml --seed 42 # Validate config only sql-splitter redact dump.sql --config redact.yaml --validate # With specific locale for fake data sql-splitter redact dump.sql --output safe.sql --fake "*.name" --locale de_de
Strategies:
- •
--null "pattern": Replace with NULL - •
--hash "pattern": SHA256 hash (deterministic, preserves FK integrity) - •
--fake "pattern": Generate realistic fake data - •
--mask "pattern=column": Partial masking - •
--constant "column=value": Fixed value replacement
Fake generators: email, name, first_name, last_name, phone, address, city, zip, company, ip, uuid, date, credit_card, ssn, lorem, and more.
graph
Generate Entity-Relationship Diagrams (ERD) from SQL dumps.
# Interactive HTML ERD with dark/light mode and panzoom sql-splitter graph dump.sql --output schema.html # Graphviz DOT format with ERD-style tables sql-splitter graph dump.sql --output schema.dot # Mermaid erDiagram syntax (paste into GitHub/GitLab) sql-splitter graph dump.sql --output schema.mmd --format mermaid # JSON with full schema details sql-splitter graph dump.sql --json # Filter tables sql-splitter graph dump.sql --tables "user*,order*" --exclude "log*" # Show only circular dependencies sql-splitter graph dump.sql --cycles-only # Focus on specific table and its dependencies sql-splitter graph dump.sql --table orders --transitive # Show tables that depend on users sql-splitter graph dump.sql --table users --reverse
order
Reorder SQL dump in topological FK order for safe imports.
# Rewrite in safe import order sql-splitter order dump.sql --output ordered.sql # Check for cycles without rewriting sql-splitter order dump.sql --check # Reverse order (for DROP operations) sql-splitter order dump.sql --reverse --output drop_order.sql
query
Run SQL analytics on dump files using embedded DuckDB (no database required).
# Single query sql-splitter query dump.sql "SELECT COUNT(*) FROM users" # Interactive REPL sql-splitter query dump.sql --interactive # Export to JSON/CSV sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json -o results.json sql-splitter query dump.sql "SELECT * FROM users LIMIT 100" -f csv -o users.csv # With caching (400x faster on repeated queries) sql-splitter query dump.sql "SELECT ..." --cache # Disk mode for large dumps (>2GB auto-enabled) sql-splitter query huge.sql "SELECT ..." --disk # Filter tables to import (faster startup) sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users # Memory limit sql-splitter query dump.sql "SELECT ..." --memory-limit 4GB # Cache management sql-splitter query --list-cache sql-splitter query --clear-cache
REPL commands:
- •
.tables- List all tables - •
.schema [table]- Show schema - •
.describe <table>- Describe table - •
.count <table>- Count rows - •
.sample <table> [n]- Sample rows - •
.format <fmt>- Set output format (table, json, csv, tsv) - •
.export <file> <query>- Export query results - •
.exit- Exit REPL
Step-by-Step Patterns
Pattern 1: Validate Before Use
Before using any dump from an external source:
- •
Validate integrity
bashsql-splitter validate path/to/dump.sql.gz --strict --progress
- •
If validation fails, check:
- •Incorrect dialect? Try
--dialect=postgres,--dialect=mysql, or--dialect=mssql - •Encoding issues? Report specific errors to user
- •Truncated file? Check file size and completeness
- •For MSSQL: Ensure GO batch separators are on their own lines
- •Incorrect dialect? Try
- •
Analyze structure
bashsql-splitter analyze path/to/dump.sql.gz --progress
Pattern 2: Database Migration
For migrating between MySQL, PostgreSQL, SQLite, and MSSQL (12 conversion pairs):
- •
Validate source
bashsql-splitter validate source.sql.gz --strict --progress
- •
Convert dialect
bashsql-splitter convert source.sql.gz --to postgres --output target.sql --strict # or for MSSQL sql-splitter convert mssql_dump.sql --from mssql --to mysql --output mysql.sql
- •
Validate converted output
bashsql-splitter validate target.sql --dialect=postgres --strict
- •
Or stream directly
bashsql-splitter convert source.sql.gz --to postgres --output - | psql "$PG_CONN"
Pattern 3: Create Dev Dataset
For creating smaller realistic data for development:
- •
Analyze to understand sizes
bashsql-splitter analyze prod.sql.zst --progress
- •
Sample with FK preservation
bashsql-splitter sample prod.sql.zst \ --output dev_seed.sql \ --percent 10 \ --preserve-relations \ --progress
- •
Restore to dev database
bashpsql "$DEV_DB" < dev_seed.sql
Pattern 4: CI Validation Gate
For validating dumps in CI pipelines:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast --strict
Parse with jq:
sql-splitter validate "dumps/*.sql.gz" --json --fail-fast \ | jq '.results[] | select(.passed == false)'
Pattern 5: Per-Table Editing
When the user needs to edit specific tables:
- •
Split
bashsql-splitter split dump.sql --output tables/ --progress
- •
Edit the per-table files (
tables/users.sql, etc.) - •
Merge back
bashsql-splitter merge tables/ --output updated.sql --transaction
Pattern 6: Tenant Extraction
For multi-tenant databases:
- •
Identify tenant column (often
tenant_id,account_id,company_id) - •
Extract tenant data
bashsql-splitter shard dump.sql \ --tenant-value 12345 \ --tenant-column tenant_id \ --output tenant_12345.sql \ --progress
Pattern 7: Comparing Dumps for Changes
For detecting schema or data changes between two versions:
- •
Full comparison (schema + data)
bashsql-splitter diff old_dump.sql new_dump.sql --progress
- •
Schema-only comparison (fast, no data parsing)
bashsql-splitter diff old.sql new.sql --schema-only
- •
Generate migration script
bashsql-splitter diff old.sql new.sql --format sql --output migration.sql
- •
JSON output for automation
bashsql-splitter diff old.sql new.sql --format json | jq '.summary'
Pattern 8: Data Anonymization
For creating safe development/testing datasets:
- •
Generate redaction config by analyzing dump
bashsql-splitter redact dump.sql --generate-config --output redact.yaml
- •
Review and customize the generated config
- •
Apply redaction
bashsql-splitter redact dump.sql --output safe.sql --config redact.yaml --progress
- •
Or use inline patterns for quick redaction
bashsql-splitter redact dump.sql --output safe.sql \ --null "*.ssn,*.tax_id" \ --hash "*.email" \ --fake "*.name,*.phone"
- •
Validate the redacted output
bashsql-splitter validate safe.sql --strict
Pattern 9: Schema Visualization
For understanding complex database schemas:
- •
Generate interactive ERD
bashsql-splitter graph dump.sql --output schema.html # Opens in browser with dark/light mode, zoom/pan
- •
For documentation (Mermaid)
bashsql-splitter graph dump.sql --output docs/schema.mmd --format mermaid # Paste into GitHub/GitLab/Notion
- •
Focus on specific area
bash# What does orders depend on? sql-splitter graph dump.sql --table orders --transitive --output orders.html # What depends on users? sql-splitter graph dump.sql --table users --reverse --output users_deps.html
- •
Find circular dependencies
bashsql-splitter graph dump.sql --cycles-only
Pattern 10: Safe Import Order
For ensuring FK constraints don't fail during restore:
- •
Check for cycles
bashsql-splitter order dump.sql --check
- •
Reorder if needed
bashsql-splitter order dump.sql --output ordered.sql
- •
For DROP operations (reverse order)
bashsql-splitter order dump.sql --reverse --output drop_order.sql
Pattern 11: Ad-hoc SQL Analytics
For running SQL queries on dump files without loading into a database:
- •
Quick exploratory query
bashsql-splitter query dump.sql "SELECT COUNT(*) FROM users"
- •
Interactive exploration (REPL)
bashsql-splitter query dump.sql --interactive # sql> .tables # sql> SELECT * FROM orders LIMIT 10 # sql> .count users
- •
Export analysis results
bashsql-splitter query dump.sql "SELECT * FROM orders WHERE total > 1000" -f csv -o big_orders.csv
- •
Repeated queries with caching (400x speedup)
bash# First run imports and caches sql-splitter query dump.sql "SELECT COUNT(*) FROM orders" --cache # Subsequent runs use cache sql-splitter query dump.sql "SELECT SUM(total) FROM orders" --cache
- •
Complex analytics
bashsql-splitter query dump.sql " SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name ORDER BY total_spent DESC LIMIT 10 " -f json
Common Flag Combinations
| Goal | Flags |
|---|---|
| CI validation | --strict --fail-fast --json |
| Safe exploration | --dry-run --progress |
| Reproducible sampling | --seed 42 --preserve-relations |
| Fast progress feedback | --progress |
| Compressed output | Pipe to gzip -c or zstd -c |
Error Handling
Dialect Detection Issues
If auto-detection fails, specify explicitly:
sql-splitter validate dump.sql --dialect=postgres sql-splitter validate mssql_dump.sql --dialect=mssql sql-splitter convert dump.sql --from mysql --to postgres --output out.sql sql-splitter convert dump.sql --from mssql --to mysql --output out.sql
Validation Failures
- •Parse
--jsonoutput for specific errors - •Check for encoding issues, missing FKs, duplicate PKs
- •Use
--no-fk-checksto skip expensive integrity checks
Large Files
- •sql-splitter uses constant ~50MB memory
- •Downstream tools may be bottlenecks
- •Test with
samplebefore full operations
Implementation Checklist
When using this skill:
- •Detect applicability: Check for
.sqlfiles or dump-related tasks - •Clarify intent: Validation? Conversion? Sampling? Splitting?
- •Choose pattern: Map goal to one of the patterns above
- •Propose plan: Explain steps before executing
- •Use safe flags:
--dry-runfirst, then--progressfor feedback - •Summarize results: Report success/failure with key stats