Upstream Lineage: Sources
Trace the origins of data and answer "Where does this data come from?"
Lineage Investigation
Step 1: Identify the Target Type
Determine what we are tracing:
- •Table
- •Column
- •DAG
Step 2: Find the Producing DAG
- •List DAGs: use
list_active_dagsandlist_paused_dags - •Read DAG source: use
get_dag_source_code - •If a run exists, use
analyse_dag_latest_runto see tasks and logs
Step 3: Trace Data Sources
From the DAG code, identify source tables and systems:
- •SQL sources in FROM or JOIN clauses
- •External sources via operator hooks or connection IDs
- •Files in object storage
Use go_to_connections_view to inspect connection metadata.
Step 4: Build the Lineage Chain
Example:
code
TARGET: analytics.orders_daily
^
+-- DAG: etl_daily_orders
^
+-- SOURCE: raw.orders
|
+-- SOURCE: dim.customers
Step 5: Check Source Health
- •Use
get_dag_runsorget_dag_historyon upstream DAGs - •For logs, use
go_to_dag_log_view
Lineage for Columns
- •Find the column in the target table schema
- •Search DAG source for references
- •Trace transformations and mappings
Output: Lineage Report
Include:
- •Summary of sources
- •Lineage diagram
- •Source details (connections, freshness)
- •Transformation chain
- •Data quality implications
Related Skills
- •checking-freshness
- •debugging-dags
- •tracing-downstream-lineage
- •annotating-task-lineage
- •creating-openlineage-extractors