AgentSkillsCN

Analyze Data

为 CSV 和 JSON 文件提供即时的数据科学洞察,包括统计摘要、质量审核以及趋势分析。

SKILL.md
--- frontmatter
name: Analyze Data
version: 0.1.0
description: Instant data science insights for CSV and JSON files with statistical summaries, quality audits, and trend detection
author: skills.md
category: Data & Analytics
tags:
  - data
  - analytics
  - statistics
  - csv
  - json
  - insights
  - quality-assurance
credits: 5

Analyze Data

Instant data science insights for your CSV and JSON files. Automatically generates comprehensive reports including statistical summaries, data quality audits, correlation matrices, and trend detection without sending your data to external APIs.

This is a CLI skill. It requires the skills CLI to execute. Install it with npm install -g @hasna/skills, then run the commands below.

Features

  • Statistical Profiling: Calculates mean, median, mode, standard deviation, quartiles, and custom percentiles for all numeric columns.
  • Data Quality Audit: Detects missing values, duplicates, mixed data types, and outliers to assess dataset health.
  • Correlation Discovery: Identifies relationships between variables using Pearson correlation coefficients.
  • Trend Detection: Automatically spots time-series patterns, seasonality, and growth trends.
  • Visualization Ready: Recommends the best chart types (histograms, line charts, scatter plots) based on data distribution.
  • Multi-Format Reports: Exports findings as interactive HTML, structured JSON, or readable Markdown.
  • Scalable: Efficiently processes large files using stream processing and sampling options.

Usage

bash
skills run analyze-data -- <file-path> [options]

Options

OptionDescriptionDefault
--formatOutput format: markdown, json, or htmlmarkdown
--outputSave report to file path(prints to console)
--correlationsCalculate correlation matrix for numeric columnsfalse
--outliersDetect and report outliers using IQR methodfalse
--trendsAnalyze trends if time-series data is detectedfalse
--sampleAnalyze only the first N rows (for large files)(all rows)
--percentilesCustom percentiles to calculate (comma-separated)25,50,75,90,95
--verboseShow detailed processing logsfalse

Examples

Quick Health Check

bash
# Analyze a CSV file and print report to console
skills run analyze-data -- sales-data.csv

Deep Dive Analysis

bash
# Full analysis with correlations, outliers, and HTML report
skills run analyze-data -- dataset.csv \
  --correlations \
  --outliers \
  --format html \
  --output report.html

Large File Sampling

bash
# Quickly analyze the first 10,000 rows of a large dataset
skills run analyze-data -- huge-log-file.json \
  --sample 10000 \
  --verbose

JSON Export for Pipelines

bash
# Generate JSON stats for downstream processing
skills run analyze-data -- metrics.json \
  --format json \
  --output analysis.json

Report Sections

  1. Dataset Overview: File size, row/column counts, memory usage.
  2. Column Analysis: Data types, unique values, null counts, and sample data.
  3. Statistical Summary: Detailed metrics for numeric columns (mean, std dev, etc.).
  4. Data Quality: Quality score (0-100), missing data summary, and duplicate detection.
  5. Correlations: (Optional) Matrix of relationships between variables.
  6. Trends: (Optional) Time-series patterns and anomalies.
  7. Visualization Recommendations: Suggested charts for exploring the data.
  8. Key Insights: Auto-generated observations about the dataset.

Supported Formats

  • CSV: Standard comma-separated, TSV, and custom delimiters.
  • JSON: Arrays of objects or objects containing data arrays.

Performance

  • Small (<10MB): Instant analysis.
  • Medium (10-100MB): 2-10 seconds.
  • Large (>100MB): Use --sample for best performance.

Requirements

  • Bun runtime.
  • No external API keys required (runs locally).