Natural Language to SQL (nl2sql)
Overview
This skill converts natural language query descriptions into safe, executable PostgreSQL SQL queries. It supports three test databases:
- •pg_mcp_test_small: Blog system (users, posts, comments, tags)
- •pg_mcp_test_medium: E-commerce platform (products, orders, customers, payments)
- •pg_mcp_test_large: ERP system (HR, Finance, Inventory, CRM, Projects)
Workflow
User Request
|
v
1. Identify Target Database
|
v
2. Load Database Reference
|
v
3. Generate SQL Query
|
v
4. Security Validation (CRITICAL)
|
v
5. Execute Query via psql
|
+---> If execution fails --> Deep analysis --> Regenerate SQL --> Step 5
|
v
6. Validate Results (score 0-10)
|
+---> If score < 7 --> Deep analysis --> Regenerate SQL --> Step 5
|
v
7. Return Results (SQL or Data based on user preference)
Database Selection
Based on the user's query context, select the appropriate database:
| Keywords/Context | Database | Schema |
|---|---|---|
| blog, posts, articles, authors, comments, tags, readers, writers | pg_mcp_test_small | blog |
| products, orders, customers, shopping, cart, payments, shipping, reviews, inventory, coupons | pg_mcp_test_medium | ecommerce |
| employees, HR, departments, salaries, leave, attendance, vendors, invoices, accounting, projects, tasks, tickets, CRM, leads, opportunities | pg_mcp_test_large | erp |
Reference Files
Load the appropriate reference file before generating SQL:
- •pg_mcp_test_small Reference - Blog schema
- •pg_mcp_test_medium Reference - E-commerce schema
- •pg_mcp_test_large Reference - ERP schema
SQL Generation Guidelines
Schema Qualification
IMPORTANT: Always use schema-qualified table names:
-- CORRECT SELECT * FROM blog.posts; SELECT * FROM ecommerce.orders; SELECT * FROM erp.employees; -- INCORRECT (will fail) SELECT * FROM posts; SELECT * FROM orders; SELECT * FROM employees;
Query Best Practices
- •Use appropriate JOINs: Leverage foreign key relationships documented in references
- •Use views when available: Views like
recent_posts,product_catalog,employee_directoryoften provide pre-joined data - •Apply LIMIT: Add
LIMIT 100by default unless user specifies otherwise - •Use proper date/time functions: PostgreSQL-specific functions like
date_trunc(),CURRENT_DATE, etc. - •Leverage indexes: Filter on indexed columns when possible for better performance
Example Patterns
Simple query:
SELECT id, title, status, created_at FROM blog.posts WHERE status = 'published' ORDER BY created_at DESC LIMIT 10;
Join query:
SELECT p.title, u.username, COUNT(c.id) as comment_count FROM blog.posts p JOIN blog.users u ON p.author_id = u.id LEFT JOIN blog.comments c ON p.id = c.post_id WHERE p.status = 'published' GROUP BY p.id, p.title, u.username ORDER BY comment_count DESC LIMIT 10;
Using views:
SELECT * FROM ecommerce.top_products LIMIT 10;
Aggregate query:
SELECT date_trunc('month', created_at) as month,
COUNT(*) as order_count,
SUM(total_amount) as total_revenue
FROM ecommerce.orders
WHERE status NOT IN ('cancelled', 'refunded')
GROUP BY date_trunc('month', created_at)
ORDER BY month DESC;
CRITICAL: Security Validation
Every generated SQL MUST pass these security checks before execution.
Forbidden Operations
The following are STRICTLY FORBIDDEN and must be rejected:
1. Write Operations
-- FORBIDDEN: Any data modification INSERT INTO ... UPDATE ... SET ... DELETE FROM ... TRUNCATE ... DROP ... ALTER ... CREATE ... GRANT ... REVOKE ...
2. Dangerous Functions
-- FORBIDDEN: System commands pg_sleep(...) pg_terminate_backend(...) pg_cancel_backend(...) pg_reload_conf() lo_import(...) lo_export(...) COPY ... FROM/TO ... -- FORBIDDEN: File system access pg_read_file(...) pg_read_binary_file(...) pg_ls_dir(...) pg_stat_file(...)
3. SQL Injection Patterns
-- FORBIDDEN: These patterns indicate injection attempts ; DROP TABLE ... '; -- 1=1 OR 1=1 UNION SELECT ... INTO OUTFILE ... INTO DUMPFILE ... LOAD_FILE(...)
4. Sensitive Data Access
-- FORBIDDEN: Direct access to sensitive columns SELECT password_hash FROM ... SELECT api_key FROM ... SELECT secret FROM ... SELECT token FROM ... SELECT credit_card FROM ...
5. Information Schema Abuse
-- FORBIDDEN: Excessive metadata queries that could aid attacks SELECT * FROM pg_shadow ... SELECT * FROM pg_authid ... SELECT usename, passwd FROM pg_user ...
Security Validation Checklist
Before executing, verify:
- • Query starts with SELECT (or WITH for CTEs followed by SELECT)
- • No semicolons except at the end (no statement chaining)
- • No forbidden keywords: INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, GRANT, REVOKE
- • No dangerous functions: pg_sleep, COPY, lo_import, lo_export
- • No injection patterns: UNION SELECT, OR 1=1, etc.
- • No access to password_hash, api_key, secret, token columns
- • LIMIT clause present (default to LIMIT 100)
- • Schema-qualified table names used
Validation Implementation
# Pseudo-code for validation
FORBIDDEN_KEYWORDS = [
'INSERT', 'UPDATE', 'DELETE', 'DROP', 'ALTER', 'CREATE',
'TRUNCATE', 'GRANT', 'REVOKE', 'COPY'
]
DANGEROUS_FUNCTIONS = [
'pg_sleep', 'pg_terminate_backend', 'pg_cancel_backend',
'pg_reload_conf', 'lo_import', 'lo_export',
'pg_read_file', 'pg_read_binary_file'
]
FORBIDDEN_PATTERNS = [
r';\s*DROP', r';\s*DELETE', r';\s*UPDATE', r';\s*INSERT',
r"'\s*;\s*--", r'\bOR\s+1\s*=\s*1\b', r'\bUNION\s+SELECT\b',
r'\bINTO\s+OUTFILE\b', r'\bINTO\s+DUMPFILE\b'
]
SENSITIVE_COLUMNS = [
'password_hash', 'password', 'api_key', 'secret',
'token', 'credit_card', 'ssn'
]
def validate_sql(sql):
sql_upper = sql.upper()
# Must be SELECT query
if not sql_upper.strip().startswith(('SELECT', 'WITH')):
return False, "Only SELECT queries allowed"
# Check forbidden keywords
for keyword in FORBIDDEN_KEYWORDS:
if keyword in sql_upper:
return False, f"Forbidden keyword: {keyword}"
# Check dangerous functions
for func in DANGEROUS_FUNCTIONS:
if func.upper() in sql_upper:
return False, f"Dangerous function: {func}"
# Check injection patterns
for pattern in FORBIDDEN_PATTERNS:
if re.search(pattern, sql_upper):
return False, f"Suspicious pattern detected"
# Check sensitive columns
for col in SENSITIVE_COLUMNS:
if col.lower() in sql.lower():
return False, f"Access to sensitive column: {col}"
return True, "Validation passed"
Query Execution
Execute validated SQL using psql:
PGPASSWORD='' psql -h localhost -p 5432 -U postgres -d <database_name> -c "<sql_query>"
Connection Details
| Parameter | Value |
|---|---|
| Host | localhost |
| Port | 5432 |
| User | postgres |
| Password | (empty string) |
Execution Command Template
# For pg_mcp_test_small (blog) PGPASSWORD='' psql -h localhost -p 5432 -U postgres -d pg_mcp_test_small -c "SELECT ..." # For pg_mcp_test_medium (ecommerce) PGPASSWORD='' psql -h localhost -p 5432 -U postgres -d pg_mcp_test_medium -c "SELECT ..." # For pg_mcp_test_large (erp) PGPASSWORD='' psql -h localhost -p 5432 -U postgres -d pg_mcp_test_large -c "SELECT ..."
Result Validation
After execution, analyze the results:
Scoring Criteria (0-10)
| Score | Meaning | Action |
|---|---|---|
| 10 | Perfect match - results exactly answer the question | Return results |
| 9 | Excellent - results fully address the question with minor extras | Return results |
| 8 | Very Good - results answer the question well | Return results |
| 7 | Good - results adequately answer the question | Return results |
| 6 | Acceptable but incomplete - missing some aspects | Regenerate SQL |
| 5 | Partial - only partially answers the question | Regenerate SQL |
| 4 | Poor - results don't well match the intent | Regenerate SQL |
| 3 | Very Poor - results mostly irrelevant | Regenerate SQL |
| 2 | Bad - wrong tables/columns used | Regenerate SQL |
| 1 | Failed - execution error or empty results when data expected | Regenerate SQL |
| 0 | Security violation detected | ABORT immediately |
Validation Questions
- •Relevance: Do the columns returned match what was asked?
- •Completeness: Are all requested data points included?
- •Correctness: Are joins and filters logically correct?
- •Meaningfulness: Do the results make sense in context?
- •Row Count: Is the number of rows appropriate?
If Score < 7: Deep Analysis
When score is below 7, perform deep analysis:
- •
Identify the issue:
- •Wrong table(s) selected?
- •Missing JOIN condition?
- •Incorrect WHERE filter?
- •Missing GROUP BY?
- •Wrong aggregate function?
- •
Review the reference documentation for correct table/column names
- •
Regenerate SQL with corrections
- •
Re-execute and re-validate
Output Format
Default: Return Query Results
Present results in a clear format:
## Query Results **Question**: [User's original question] **Database**: pg_mcp_test_medium (ecommerce) **Generated SQL**: ```sql SELECT ...
Results (showing first N rows):
| Column1 | Column2 | ... |
|---|---|---|
| value1 | value2 | ... |
Summary: [Brief interpretation of the results]
### Alternative: Return SQL Only When user specifically requests just the SQL: ```markdown ## Generated SQL **Question**: [User's original question] **Database**: pg_mcp_test_medium (ecommerce) ```sql SELECT ...
Notes: [Any important considerations about the query]
## Error Handling ### Execution Errors If psql returns an error: 1. **Parse the error message** (column not found, relation not found, syntax error, etc.) 2. **Identify the cause** (typo, wrong schema, missing table, etc.) 3. **Consult reference documentation** 4. **Regenerate corrected SQL** 5. **Re-execute** ### Common Errors and Fixes | Error | Cause | Fix | |-------|-------|-----| | `relation "posts" does not exist` | Missing schema | Use `blog.posts` instead of `posts` | | `column "xxx" does not exist` | Typo or wrong table | Check reference for correct column name | | `syntax error at or near` | SQL syntax issue | Review and fix syntax | | `ambiguous column` | Same column in multiple tables | Use table alias (e.g., `p.id`) | ## Examples ### Example 1: Blog Query **User**: "Show me the top 5 most viewed published posts" **Process**: 1. Database: pg_mcp_test_small (blog context) 2. Reference: blog.posts has view_count, status columns 3. SQL: ```sql SELECT id, title, view_count, published_at FROM blog.posts WHERE status = 'published' ORDER BY view_count DESC LIMIT 5;
- •Security: PASS (SELECT only, no sensitive columns)
- •Execute and validate
Example 2: E-commerce Analytics
User: "What are the total sales by month for 2024?"
Process:
- •Database: pg_mcp_test_medium (ecommerce context)
- •Reference: ecommerce.orders has total_amount, created_at
- •SQL:
SELECT
date_trunc('month', created_at) as month,
COUNT(*) as order_count,
SUM(total_amount) as total_sales
FROM ecommerce.orders
WHERE created_at >= '2024-01-01'
AND created_at < '2025-01-01'
AND status NOT IN ('cancelled', 'refunded')
GROUP BY date_trunc('month', created_at)
ORDER BY month;
- •Security: PASS
- •Execute and validate
Example 3: ERP HR Query
User: "List employees in the Engineering department with their managers"
Process:
- •Database: pg_mcp_test_large (HR/ERP context)
- •Reference: erp.employees, erp.departments
- •SQL:
SELECT
e.employee_number,
e.first_name || ' ' || e.last_name as employee_name,
e.email,
d.name as department,
m.first_name || ' ' || m.last_name as manager_name
FROM erp.employees e
JOIN erp.departments d ON e.department_id = d.id
LEFT JOIN erp.employees m ON e.manager_id = m.id
WHERE d.name ILIKE '%engineering%'
AND e.employment_status = 'active'
ORDER BY e.last_name
LIMIT 100;
- •Security: PASS
- •Execute and validate
Troubleshooting
Query Returns No Results
- •Check if filters are too restrictive
- •Verify enum values match exactly (case-sensitive)
- •Check date ranges
- •Verify table has data:
SELECT COUNT(*) FROM schema.table
Query Too Slow
- •Add LIMIT clause
- •Filter on indexed columns
- •Avoid SELECT * - specify needed columns
- •Use views when available
Ambiguous Requirements
If the user's request is unclear:
- •Ask clarifying questions
- •Make reasonable assumptions and state them
- •Provide multiple query options if applicable