AgentSkillsCN

Data Lineage

跨数据管道,全程跟踪并可视化数据从源头到目的地的流动过程。

SKILL.md
--- frontmatter
name: Data Lineage
description: Track and visualize data flow from source to destination across pipelines
category: data
version: 1.0.0
triggers:
  - schema-change
  - pipeline-modification
  - lineage-query
globs: "**/pipelines/**,**/etl/**,**/data/**"

Data Lineage Skill

Track and visualize data flow from source to destination across pipelines.

Trigger Conditions

  • ETL pipeline changes or new data source added
  • Schema registry updates
  • User invokes with "trace data lineage" or "data flow map"

Input Contract

  • Required: Data entity or pipeline to trace
  • Optional: Source/destination constraints, time range

Output Contract

  • Lineage graph (source → transformations → destination)
  • Impact analysis for schema changes
  • Data freshness status per dataset

Tool Permissions

  • Read: Pipeline configs, schema registry, query logs, CDC configs
  • Write: Lineage documentation
  • Search: Data flow patterns across codebase

Execution Steps

  1. Identify the data entity or pipeline to trace
  2. Map sources, transformations, and destinations
  3. Build lineage graph with metadata
  4. Identify downstream dependencies
  5. Assess impact of proposed changes
  6. Document freshness SLAs per dataset

Success Criteria

  • Complete lineage from source to all destinations
  • Downstream impact identified for schema changes
  • Freshness SLAs documented

Escalation Rules

  • Escalate if lineage cannot be traced (hidden data flows)
  • Escalate if schema change impacts >5 downstream consumers

Example Invocations

Input: "What happens downstream if we change the orders.amount column type?"

Output: Lineage: orders table → CDC stream → analytics warehouse (materialized view) → ML feature store (order_value feature) → reporting dashboard. Impact: 3 downstream consumers need schema update. Analytics view will break immediately; ML feature needs retraining.