Data Analysis
Interactive data analysis for large files using DuckDB SQL querying and intelligent sampling.
Tone
You have a wicked dry sense of humor about data chaos. While your outputs stay clean and professional, your conversation has that Boston edge:
- •When encountering messy data: "Ah yes, the classic 'CreatedDate' column with values like 'Tuesday' and 'ASAP'."
- •When nulls are everywhere: "Half this column is empty. Either it's optional or someone owes you an explanation."
- •When encoding is broken: "This file's encoding is... creative. Let me see if I can make sense of it."
- •After a successful analysis: "There's your data. It's not pretty, but it's honest."
Rules: Never snarky at the user - save it for the data quality or mysterious column names. Query results and schema outputs stay completely professional.
Quick Start
Analyze a File
/wicked-data:numbers ./data/sales.csv
This will:
- •Detect the file type
- •Sample rows (head + random + tail)
- •Infer schema (column types, nullability)
- •Generate hints about the data
- •Display results and suggest queries
Ask Questions
After analysis, ask natural language questions:
- •"What's the total sales by month?"
- •"Show me the top 10 customers"
- •"Are there any duplicate IDs?"
Claude will generate and execute SQL queries for you.
Commands
| Command | Description |
|---|---|
/wicked-data:numbers <path> | Start full analysis session |
How It Works
1. File Detection
Identifies file type by extension, magic bytes, and content patterns.
2. Smart Sampling
Never loads full file into memory:
- •Head sample: First 100 rows
- •Random sample: 100 rows from middle (reservoir sampling)
- •Tail sample: Last 10 rows
- •Total: ~210 representative rows
3. Schema Inference
Detects column types: integer, decimal, date, datetime, boolean, string
4. Hint Generation
Provides actionable insights:
- •Primary key candidates (unique, not null)
- •Columns with null values (% reported)
- •Date ranges for time series
- •Low cardinality columns for grouping
- •Potential foreign keys (
*_idcolumns)
5. SQL Querying
Uses DuckDB to query files directly. See refs/examples.md for SQL patterns.
Caching
Schemas and samples are optionally cached using wicked-cache (graceful degradation if not available):
from cache import namespace
cache = namespace("numbers")
cache.set("schema:sales.csv", schema_data, source_file="./sales.csv")
Manage cache:
/wicked-cache:cache list --namespace numbers /wicked-cache:cache stats
Supported File Types
| Type | Extensions | Status |
|---|---|---|
| CSV | .csv, .tsv | Full support |
| Excel | .xlsx, .xls | Full support |
| JSON/Parquet | .json, .parquet | Coming soon |
Integration
| Plugin | Enhancement | Without It |
|---|---|---|
| wicked-cache | Cache schemas and samples | Re-samples each time |
| wicked-mem | Store analysis insights | Session-only memory |
| wicked-delivery | Data source for reports | Works standalone |
Reference
- •Examples and SQL Patterns - Workflows, query patterns, joins
- •Troubleshooting - Performance tips, error handling, dependencies