data-analysis
Overview
Use this skill to analyze tabular data in the workspace using DuckDB tools, optionally produce a Plotly visualization, and deliver a concise, evidence-based summary that references any generated artifacts.
Standards (required)
- •Summary and insights must be valid Markdown, not raw paragraphs. Use headings (
###) and bullet lists (-) so the report renders cleanly. - •Every insight must cite concrete evidence from queries (counts, averages, deltas, correlations, percent shares). No speculation.
- •Name artifacts and charts clearly using Title_Case_With_Underscores and reference them explicitly in the summary.
- •State limitations if data is incomplete or if a question cannot be answered from available tables.
Default analysis pack (use unless user asks for a quick answer)
- •Dataset snapshot: row count, key columns, and basic distribution for the main outcome.
- •Segmentation: group by primary category (or tiers) and compute core metrics.
- •Drivers: quantify top positive/negative relationships (correlation or ranked deltas).
- •Interactions: at least one two-factor combination or tiered interaction (e.g., high/low bins).
- •Outliers/variance: highlight spread changes across segments (box plot or variance stats).
Workflow
- •
Scope & schema
- •Use
get_table_schemato inspect only the tables needed for the question. - •Note relevant tables, key columns, joins, date fields, and metrics before querying.
- •Use
- •
Query
- •Use
run_sql_queryto answer the question with focused SQL. - •Always specify columns (no
SELECT *) and includeLIMIT 1000. - •You may run up to 5 queries total; refine only when needed.
- •Prefer a segmentation query and at least one interaction/combination query when deeper analysis is requested.
- •After each query, briefly interpret results and decide whether another refinement is required.
- •Use
- •
Chart (optional)
- •If a chart helps, use
generate_chart_configon the latestdf. - •Plotly only. Assign a serializable
chart_configand keep logic deterministic. - •Plotly outputs will be saved as
.plotly.jsonand.plotly.htmlincharts/. - •Use descriptive
chart_titlenames (Title_Case_With_Underscores). - •Do not generate more than 3 charts.
- •If a chart helps, use
- •
Artifacts (optional)
- •If artifacts are useful (CSV extracts, HTML previews, Markdown notes), write them to disk using the provided helpers.
- •Keep artifacts minimal and clearly named; explain why each artifact exists.
- •
Summary
- •Call
generate_summaryonce after queries (and any chart). - •Include tables used, filters, metrics, and artifacts created.
- •Provide at least two concrete insights (or explain if none exist).
- •Structure output like:
- •
### Summarywith 2-4 bullets - •
### Key Insightswith 3-6 bullets
- •
- •Mention any artifacts (charts/files) explicitly so the user knows what to open.
- •Note:
generate_summarywill also create a report file inreports/with the analysis steps and outputs.
- •Call
Guardrails
- •Order is mandatory: schema -> SQL -> chart (optional) -> artifacts (optional) -> summary.
- •No direct file I/O or Python reads of raw data files.
- •Use only the data tools provided (via
data_agent_tools). - •Explain intent briefly before each tool call; stop after
generate_summarysucceeds.