AgentSkillsCN

dfdl_ref

为本仓库提供DataFusion + DeltaLake操作手册。DataFusion是核心查询引擎;DeltaLake则提供存储层,并通过扫描提供者、模式桥接以及谓词下推等机制与DataFusion紧密集成。请务必参考文档并进行本地探查,切勿盲目猜测API接口。

SKILL.md
--- frontmatter
name: dfdl_ref
description: DataFusion + DeltaLake operations manual for this repo. DataFusion is the core query engine; DeltaLake provides the storage layer and integrates tightly via scan providers, schema bridging, and predicate pushdown. Use lookup + local probes; do not guess APIs.
allowed-tools: Read, Grep, Glob, Bash

Operating rule: never guess DataFusion/DeltaLake/PyArrow/UDF APIs

When uncertain:

  1. Probe local environment (versions + available methods).
  2. Search the repo for how we already use it.
  3. Open the relevant reference file below (only the section you need).
  4. Implement using existing local patterns unless the plan says otherwise.

Reference map (open these files as needed)

  • Core DataFusion Python surfaces (IO, catalog, SQL, DataFrame API): reference/datafusion.md
  • "Best-in-class deployment gaps" (caching, stats, observability, planning knobs): reference/datafusion_addendum.md
  • Planning deep dive (logical/physical plan pipeline, introspection, optimization rules): reference/datafusion_planning.md
  • Rust UDF contracts (Scalar/UDAF/UDWF/Async/named args): reference/datafusion_rust_UDFs.md
  • Schema management + schema pitfalls: reference/datafusion_schema.md
  • DeltaLake ↔ DataFusion integration details: reference/deltalake_datafusion_integration.md
  • Advanced Rust integration (PyO3 packaging, wheels, CI, native module distribution): reference/datafusion_deltalake_advanced_rust_integration.md
  • DataFusionMixins trait (Delta snapshot schema + predicate parsing helpers): reference/deltalake_datafusionmixins.md
  • Plan combination (composing DataFusion plans via joins/unions/CTEs, Delta integration, parameterized queries, plan serialization): reference/datafusion_plan_combination.md
  • Rust LogicalPlan programmatic construction (LogicalPlanBuilder, Expr, schema/DFSchema, plan rewriting via TreeNode, extensibility, serialization): reference/Datafusion_logicplan_rust.md
  • DataFusion tracing (Rust community extension: execution spans, metrics capture, partial-result previews, rule-phase instrumentation, OpenTelemetry export): reference/datafusion-tracing.md
  • DeltaLake core (format/protocol, client APIs, 3-layer model): reference/deltalake.md