Data Lineage Skill
Track and visualize data flow from source to destination across pipelines.
Trigger Conditions
- •ETL pipeline changes or new data source added
- •Schema registry updates
- •User invokes with "trace data lineage" or "data flow map"
Input Contract
- •Required: Data entity or pipeline to trace
- •Optional: Source/destination constraints, time range
Output Contract
- •Lineage graph (source → transformations → destination)
- •Impact analysis for schema changes
- •Data freshness status per dataset
Tool Permissions
- •Read: Pipeline configs, schema registry, query logs, CDC configs
- •Write: Lineage documentation
- •Search: Data flow patterns across codebase
Execution Steps
- •Identify the data entity or pipeline to trace
- •Map sources, transformations, and destinations
- •Build lineage graph with metadata
- •Identify downstream dependencies
- •Assess impact of proposed changes
- •Document freshness SLAs per dataset
Success Criteria
- •Complete lineage from source to all destinations
- •Downstream impact identified for schema changes
- •Freshness SLAs documented
Escalation Rules
- •Escalate if lineage cannot be traced (hidden data flows)
- •Escalate if schema change impacts >5 downstream consumers
Example Invocations
Input: "What happens downstream if we change the orders.amount column type?"
Output: Lineage: orders table → CDC stream → analytics warehouse (materialized view) → ML feature store (order_value feature) → reporting dashboard. Impact: 3 downstream consumers need schema update. Analytics view will break immediately; ML feature needs retraining.