alternative-data-pipeline

替代数据管道工作流，适用于量化研究、实施与生产控制。适用于在涉及模式合约、新鲜度追踪以及血缘完整性等任务时使用。

SKILL.md

--- frontmatter

name: alternative-data-pipeline
description: "Alternative Data Pipeline workflows for quantitative research, implementation, and production controls. use when tasks involve schema contracts, freshness tracking, and lineage completeness."

Alternative Data Pipeline

objective

Execute alternative data pipeline work with reproducible research, explicit controls, and deployable outputs.

workflow

•define source contracts, schema versions, and freshness objectives.
•ingest data with replay support and deterministic normalization.
•validate keys, timestamps, and point-in-time join behavior.
•monitor quality metrics continuously and quarantine degraded feeds.
•publish only when lineage, ownership, and quality thresholds are satisfied.

required diagnostics

•freshness, completeness, null-rate, and duplicate-rate trends.
•schema drift and breaking-change frequency across sources.
•point-in-time join integrity for features and labels.
•backfill and replay consistency versus canonical snapshots.

risk controls

•enforce hard thresholds for freshness and data-quality metrics.
•enforce quarantine and fallback paths for corrupted feeds.
•enforce full lineage metadata before downstream release.

outputs

•run python scripts/alternative_data_pipeline_diagnostics.py input.csv --output diagnostics.json and keep the json artifact.
•write an implementation memo using references/alternative-data-pipeline-playbook.md with assumptions, tests, limits, and rollout plan.

resources

•use scripts/alternative_data_pipeline_diagnostics.py for deterministic diagnostics.
•use references/alternative-data-pipeline-playbook.md for the domain-specific checklist and delivery structure.