DuckDB Query Skill
Execute SQL queries against the DuckDB events database using DuckDB's powerful analytical features.
Database Context
This project uses DuckDB to store events with the following schema:
sql
-- events table
CREATE TABLE events (
id BIGINT PRIMARY KEY,
event_type VARCHAR NOT NULL,
source VARCHAR NOT NULL,
received_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
payload JSON NOT NULL
);
-- Indexes: idx_events_type, idx_events_received_at
Database Location
The DuckDB database file location is configured in the config file (default: events.duckdb). Check the active config to find the current database file path.
Usage
When the user provides a query or analysis request, execute it against the DuckDB database:
bash
# Direct SQL query duckdb events.duckdb "SELECT * FROM events LIMIT 10" # With formatting duckdb events.duckdb -markdown "SELECT event_type, COUNT(*) as count FROM events GROUP BY event_type" # With JSON output duckdb events.duckdb -json "SELECT * FROM events WHERE id = 1"
Query Execution Guidelines
- •Find the database file: Check
config.yamlor.events.yamlfordb.filesetting - •Use appropriate output format:
- •
-markdownfor tabular results (best for displaying to user) - •
-jsonfor structured data - •
-csvfor export - •Default is ASCII table
- •
- •Always use LIMIT for exploratory queries to prevent overwhelming output
- •Quote the SQL query to handle special characters
DuckDB-Specific Features
DuckDB provides powerful SQL extensions beyond standard SQL:
JSON Querying
sql
-- Extract JSON fields from payload
SELECT
id,
json_extract(payload, '$.userId') as user_id,
json_extract(payload, '$.email') as email
FROM events
WHERE event_type = 'user.created';
-- Query nested JSON structures
SELECT json_extract_string(payload, '$.user.name') FROM events;
Time Series Analysis
sql
-- Events per hour with window functions
SELECT
date_trunc('hour', received_at) as hour,
COUNT(*) as events,
LAG(COUNT(*)) OVER (ORDER BY date_trunc('hour', received_at)) as prev_hour
FROM events
GROUP BY hour
ORDER BY hour DESC;
Advanced Aggregations
sql
-- GROUPING SETS for multi-dimensional analysis
SELECT
event_type,
source,
COUNT(*) as count
FROM events
GROUP BY GROUPING SETS (
(event_type, source),
(event_type),
(source),
()
);
Data Exploration
sql
-- Quick table summary
SUMMARIZE events;
-- Column statistics
SELECT
COUNT(DISTINCT event_type) as unique_types,
COUNT(DISTINCT source) as unique_sources,
MIN(received_at) as first_event,
MAX(received_at) as last_event
FROM events;
PIVOT for Report Generation
sql
-- Pivot events by source and type PIVOT events ON event_type USING COUNT(*) GROUP BY source;
Window Functions with QUALIFY
sql
-- Latest event per type SELECT event_type, source, received_at, payload FROM events QUALIFY ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY received_at DESC) = 1;
Common Query Patterns
List all event types
bash
duckdb events.duckdb -markdown "SELECT DISTINCT event_type FROM events ORDER BY event_type"
Events in time range
bash
duckdb events.duckdb -markdown "SELECT * FROM events WHERE received_at >= '2024-01-01' AND received_at < '2024-02-01' LIMIT 100"
Count by source and type
bash
duckdb events.duckdb -markdown "SELECT source, event_type, COUNT(*) as count FROM events GROUP BY source, event_type ORDER BY count DESC"
Recent events
bash
duckdb events.duckdb -markdown "SELECT id, event_type, source, received_at FROM events ORDER BY received_at DESC LIMIT 20"
Search JSON payload
bash
duckdb events.duckdb -markdown "SELECT * FROM events WHERE json_extract_string(payload, '$.userId') = '123'"
Metadata Queries
sql
-- Show table information SELECT * FROM information_schema.tables WHERE table_schema = 'main'; -- Show columns SELECT * FROM information_schema.columns WHERE table_name = 'events'; -- Show indexes PRAGMA show_tables;
Query Analysis
Use EXPLAIN to analyze query performance:
bash
duckdb events.duckdb "EXPLAIN SELECT * FROM events WHERE event_type = 'user.created'" # With execution stats duckdb events.duckdb "EXPLAIN ANALYZE SELECT COUNT(*) FROM events GROUP BY event_type"
Output to User
After executing queries:
- •Present results in a readable format (markdown tables work well)
- •Provide context about what the results show
- •Suggest follow-up queries if patterns emerge
- •Warn if result set is truncated
Error Handling
If a query fails:
- •Check database file path exists
- •Verify table/column names
- •Ensure JSON paths are valid
- •Check SQL syntax for DuckDB compatibility
Reference
For more DuckDB SQL features, see: https://duckdb.org/docs/stable/
Key documentation sections:
- •SQL Functions: https://duckdb.org/docs/sql/functions/overview
- •JSON Functions: https://duckdb.org/docs/extensions/json
- •Window Functions: https://duckdb.org/docs/sql/window_functions
- •Data Import: https://duckdb.org/docs/data/overview