AgentSkillsCN

data

核心数据工程能力:数据集概要分析、Schema校验与数据质量评估。适用于数据集的分析、Schema的验证,以及数据质量的全面评估。 适用场景如下: - “对这个数据集进行概要分析” - “验证数据的Schema” - “检查数据的质量” - “这个CSV/Excel文件里都有些什么?” - “对这些数据进行分析”

SKILL.md
--- frontmatter
name: data
description: |
  Core data engineering capabilities: dataset profiling, schema validation, and data quality assessment.
  Use when analyzing datasets, validating schemas, or assessing data quality.

  Use when:
  - "profile this dataset"
  - "validate schema"
  - "check data quality"
  - "what's in this CSV/Excel file"
  - "analyze this data"

Data Engineering Skill

Core data engineering operations for profiling, validation, and quality assessment.

Quick Start

Profile a Dataset

bash
/wicked-data:data profile path/to/data.csv

This will:

  1. Detect file format
  2. Sample rows (head/random/tail)
  3. Infer schema and types
  4. Calculate quality metrics
  5. Generate recommendations

Validate Schema

bash
/wicked-data:data validate --schema schema.json --data data.csv

Checks: Column presence, type conformance, constraint validation, nullability rules.

Assess Quality

bash
/wicked-data:data quality data.csv

Reports on: Completeness (null rates), Uniqueness (duplicates), Validity (constraints), Consistency (cross-field checks).

Commands

CommandPurpose
/wicked-data:data profile <path>Profile dataset structure and quality
/wicked-data:data validateValidate data against schema
/wicked-data:data quality <path>Generate quality report

Dataset Profiling

Uses data_profiler.py script:

bash
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/data_profiler.py" \
  --input data.csv --output profile.json

Output includes:

  • Row count and column count
  • Column types (inferred)
  • Null rates per column
  • Cardinality (distinct values)
  • Sample values
  • Statistical summaries for numeric columns

Schema Validation

Uses schema_validator.py script. Define expected columns with:

  • name: Column name
  • type: integer, string, decimal, datetime, date
  • nullable: true/false
  • constraints: unique, min, max, pattern, enum

See examples for schema format.

Quality Dimensions

DimensionMetricThreshold
CompletenessNull rate<5%
UniquenessDuplicate rate<1%
ValidityType conformance100%
ConsistencyCross-field rules100%

Integration

PluginEnhancement
wicked-data:numbersUse for SQL-based profiling of large files
wicked-cacheCache profiling results for repeat analysis
wicked-kanbanDocument quality issues as tasks
wicked-memStore quality patterns across sessions

Large Files

For files >1GB, use wicked-data:numbers for efficient SQL-based profiling:

bash
/wicked-data:numbers large_file.csv

Output Standards

All reports include:

  • Summary: High-level findings
  • Metrics: Quantitative measurements
  • Issues: Prioritized problems
  • Recommendations: Actionable next steps
  • Confidence: Assessment reliability

Reference

For detailed examples and patterns:

  • Examples - Profile output, schema format, quality report