AgentSkillsCN

auto-cdc

在Spark声明式管道中,通过apply_changes API应用变更数据捕获(CDC)。当用户需要从数据库中处理CDC数据流、应对Upsert与Delete操作、维护缓慢变化维度(SCD Type 1与Type 2)、同步运营数据库中的数据,或执行合并操作时,可使用此技能。

SKILL.md
--- frontmatter
name: auto-cdc
description: Apply Change Data Capture (CDC) with apply_changes API in Spark Declarative Pipelines. Use when user needs to process CDC feeds from databases, handle upserts/deletes, maintain slowly changing dimensions (SCD Type 1 and Type 2), synchronize data from operational databases, or process merge operations.

Auto CDC (apply_changes) in Spark Declarative Pipelines

The apply_changes API enables processing Change Data Capture (CDC) feeds to automatically handle inserts, updates, and deletes in target tables.

Key Concepts

Auto CDC in Spark Declarative Pipelines:

  • Automatically processes CDC operations (INSERT, UPDATE, DELETE)
  • Supports SCD Type 1 (update in place) and Type 2 (historical tracking)
  • Handles ordering of changes via sequence columns
  • Deduplicates CDC records

Language-Specific Implementations

For detailed implementation guides:

Note: The API is also known as applyChanges in some contexts.