AgentSkillsCN

data-governance-lineage

数据治理血缘工作流,适用于量化研究、实施与生产控制。适用于在涉及模式合约、新鲜度追踪,以及血缘完整性等任务。

SKILL.md
--- frontmatter
name: data-governance-lineage
description: "Data Governance Lineage workflows for quantitative research, implementation, and production controls. use when tasks involve schema contracts, freshness tracking, and lineage completeness."

Data Governance Lineage

objective

Execute data governance lineage work with reproducible research, explicit controls, and deployable outputs.

workflow

  1. define source contracts, schema versions, and freshness objectives.
  2. ingest data with replay support and deterministic normalization.
  3. validate keys, timestamps, and point-in-time join behavior.
  4. monitor quality metrics continuously and quarantine degraded feeds.
  5. publish only when lineage, ownership, and quality thresholds are satisfied.

required diagnostics

  • freshness, completeness, null-rate, and duplicate-rate trends.
  • schema drift and breaking-change frequency across sources.
  • point-in-time join integrity for features and labels.
  • backfill and replay consistency versus canonical snapshots.

risk controls

  • enforce hard thresholds for freshness and data-quality metrics.
  • enforce quarantine and fallback paths for corrupted feeds.
  • enforce full lineage metadata before downstream release.

outputs

  • run python scripts/data_governance_lineage_diagnostics.py input.csv --output diagnostics.json and keep the json artifact.
  • write an implementation memo using references/data-governance-lineage-playbook.md with assumptions, tests, limits, and rollout plan.

resources

  • use scripts/data_governance_lineage_diagnostics.py for deterministic diagnostics.
  • use references/data-governance-lineage-playbook.md for the domain-specific checklist and delivery structure.