Operating rule: never guess DataFusion/DeltaLake/PyArrow/UDF APIs
When uncertain:
- •Probe local environment (versions + available methods).
- •Search the repo for how we already use it.
- •Open the relevant reference file below (only the section you need).
- •Implement using existing local patterns unless the plan says otherwise.
Reference map (open these files as needed)
- •Core DataFusion Python surfaces (IO, catalog, SQL, DataFrame API): reference/datafusion.md
- •"Best-in-class deployment gaps" (caching, stats, observability, planning knobs): reference/datafusion_addendum.md
- •Planning deep dive (logical/physical plan pipeline, introspection, optimization rules): reference/datafusion_planning.md
- •Rust UDF contracts (Scalar/UDAF/UDWF/Async/named args): reference/datafusion_rust_UDFs.md
- •Schema management + schema pitfalls: reference/datafusion_schema.md
- •DeltaLake ↔ DataFusion integration details: reference/deltalake_datafusion_integration.md
- •Advanced Rust integration (PyO3 packaging, wheels, CI, native module distribution): reference/datafusion_deltalake_advanced_rust_integration.md
- •DataFusionMixins trait (Delta snapshot schema + predicate parsing helpers): reference/deltalake_datafusionmixins.md
- •Plan combination (composing DataFusion plans via joins/unions/CTEs, Delta integration, parameterized queries, plan serialization): reference/datafusion_plan_combination.md
- •Rust LogicalPlan programmatic construction (LogicalPlanBuilder, Expr, schema/DFSchema, plan rewriting via TreeNode, extensibility, serialization): reference/Datafusion_logicplan_rust.md
- •DataFusion tracing (Rust community extension: execution spans, metrics capture, partial-result previews, rule-phase instrumentation, OpenTelemetry export): reference/datafusion-tracing.md
- •DeltaLake core (format/protocol, client APIs, 3-layer model): reference/deltalake.md