Data Governance Lineage
objective
Execute data governance lineage work with reproducible research, explicit controls, and deployable outputs.
workflow
- •define source contracts, schema versions, and freshness objectives.
- •ingest data with replay support and deterministic normalization.
- •validate keys, timestamps, and point-in-time join behavior.
- •monitor quality metrics continuously and quarantine degraded feeds.
- •publish only when lineage, ownership, and quality thresholds are satisfied.
required diagnostics
- •freshness, completeness, null-rate, and duplicate-rate trends.
- •schema drift and breaking-change frequency across sources.
- •point-in-time join integrity for features and labels.
- •backfill and replay consistency versus canonical snapshots.
risk controls
- •enforce hard thresholds for freshness and data-quality metrics.
- •enforce quarantine and fallback paths for corrupted feeds.
- •enforce full lineage metadata before downstream release.
outputs
- •run
python scripts/data_governance_lineage_diagnostics.py input.csv --output diagnostics.jsonand keep the json artifact. - •write an implementation memo using
references/data-governance-lineage-playbook.mdwith assumptions, tests, limits, and rollout plan.
resources
- •use
scripts/data_governance_lineage_diagnostics.pyfor deterministic diagnostics. - •use
references/data-governance-lineage-playbook.mdfor the domain-specific checklist and delivery structure.