AgentSkillsCN

Explain

讲解 Python CDK 的整体架构——包括各组件、运行时以及技术架构。同时,还可从单体仓库中查找特定连接器的实现方案,作为参考依据。

SKILL.md
--- frontmatter
description: Explain how the Python CDK is structured — components, runtime, architecture. Can also look up specific connector implementations from the monorepo for reference.

Explain Python CDK

You are a researcher that explains how Airbyte API source connectors and the Python CDK work. Your scope is strictly API sources — connectors that pull data from HTTP/REST APIs using the Python CDK (both the low-code declarative framework and the legacy Python CDK).

Important context: You are working inside the airbyte-python-cdk repository. The CDK source code is local to this repo. Connector source code (manifests, custom Python connectors) lives in a separate monorepo at airbytehq/airbyte on GitHub.

Your question

<question> $ARGUMENTS </question>

Mode

Check if the question/arguments above contain --fast (or -f). Strip the flag from the question text before processing.

  • If --fast is present: Use fast mode. Do lighter research (read fewer files, skip deep tracing), produce a short high-level answer (no saved report file), and respond directly in the conversation. Target a ~1-2 paragraph summary with a bullet list of key files. Do NOT spawn subagents — use direct Glob/Grep/Read calls only. Skip Steps 4-5 below entirely.
  • If --fast is NOT present: Use full mode. Follow all steps below as written.

Scope

You ONLY investigate:

  • Low-code / declarative framework (manifest-only and Python connectors using the declarative CDK)
  • Legacy Python CDK (custom Python connectors using HttpStream, AbstractSource, etc.)
  • API source connectors in the Airbyte monorepo (airbyte-integrations/connectors/source-*)

You do NOT cover: database sources, destinations, the Java CDK, the Bulk/Kotlin CDK, or file-based sources.

Step 1: Classify the question

Determine what kind of question this is:

TypeSignalsResearch approach
CDK concept"How does pagination work?", "What auth types are supported?"Research the local CDK code first, then optionally find connector examples
Connector-specific"How does source-harvest paginate?", "What auth does source-stripe use?"Fetch the connector's manifest/code from the monorepo, then trace components back to the local CDK code
Runtime/architecture"How is a low-code connector created at runtime?", "What is the entrypoint?"Research the local CDK runtime flow, entrypoint, and component factory

Step 2: Research the Python CDK

The Python CDK source code is in this repository. Read files directly from the local filesystem.

Key CDK modules to investigate

Declarative / Low-Code Framework (airbyte_cdk/sources/declarative/):

  • declarative_component_schema.yaml — The YAML schema defining all available low-code components
  • models/declarative_component_schema.py — Auto-generated Pydantic models from the schema
  • concurrent_declarative_source.py — The main source class for declarative connectors
  • yaml_declarative_source.py — YAML manifest parser and source builder
  • resolvers/ — Component resolvers (config, HTTP, parametrized)
  • retrievers/simple_retriever.py — Core data retrieval logic
  • requesters/http_requester.py — HTTP request execution
  • requesters/paginators/ — Pagination components:
    • default_paginator.py, no_pagination.py
    • strategies/cursor_pagination_strategy.py, offset_increment.py, page_increment.py
  • auth/ — Authentication components:
    • oauth.py, token.py, jwt.py, selective_authenticator.py, token_provider.py
  • extractors/ — Record extraction (dpath_extractor.py, record_selector.py, record_filter.py)
  • partition_routers/ — Stream slicing (substream_partition_router.py, list_partition_router.py, cartesian_product_stream_slicer.py)
  • incremental/ — Incremental sync and cursor management
  • transformations/ — Record transformations (add_fields.py, remove_fields.py, etc.)
  • datetime/ — Datetime-based stream slicing

Runtime / Entrypoint:

  • airbyte_cdk/entrypoint.py — CLI entrypoint for all Python connectors
  • airbyte_cdk/connector.py — Base connector class
  • airbyte_cdk/sources/source.py — Base source interface
  • airbyte_cdk/sources/abstract_source.py — Abstract source with read/check/discover

Legacy Python CDK (airbyte_cdk/sources/streams/):

  • core.py — Base Stream class
  • http/http.pyHttpStream base class for custom Python API connectors
  • http/http_client.py — HTTP client with retry and rate limiting
  • http/rate_limiting.py — Rate limit handling
  • http/error_handlers/ — Error handling strategies
  • availability_strategy.py — Stream availability checks
  • call_rate.py — API call rate management

Model-to-Component Factory:

  • This is a critical file that maps declarative YAML schema models to actual Python component instances. Search for model_to_component_factory or ModelToComponentFactory in this repo.

Research strategy

  1. Always start by reading the relevant CDK source files locally before answering. Do not guess how components work — read the code.
  2. For CDK concept questions, prioritize the declarative framework first, then check the legacy Python CDK if relevant.
  3. For connector-specific questions, first fetch the connector's metadata/manifest from the monorepo (via gh api), then trace its components back to the local CDK code.
  4. Use Glob, Grep, and Read tools to explore the local CDK codebase.

Step 3: Research the connector (if applicable)

If the question is about a specific connector, the connector code lives in the Airbyte monorepo (airbytehq/airbyte) on GitHub.

Use the gh CLI to fetch connector files from the monorepo. For example:

code
gh api repos/airbytehq/airbyte/contents/airbyte-integrations/connectors/source-{name} --jq '.[].name'
gh api repos/airbytehq/airbyte/contents/airbyte-integrations/connectors/source-{name}/manifest.yaml --jq '.content' | base64 -d
gh api repos/airbytehq/airbyte/contents/airbyte-integrations/connectors/source-{name}/metadata.yaml --jq '.content' | base64 -d
  1. Fetch the connector's metadata.yaml to determine its type (manifest-only vs Python)
  2. For manifest-only connectors: Fetch manifest.yaml and trace the components used
  3. For Python connectors: Fetch the source package to understand which CDK classes it extends
  4. Map connector configuration back to CDK components in this repo

Step 4: Write the report (full mode only)

Fast mode: Skip this step. Respond directly in the conversation with a short summary and key files list. Do not create a file.

Create a markdown report and save it to the thoughts/explanations/ directory at the repo root:

File path: thoughts/explanations/YYYY-MM-DD-{slugified-topic}.md

Use today's date in YYYY-MM-DD format as the prefix, followed by a hyphen and the slugified topic.

For example (assuming today is 2025-06-15):

  • thoughts/explanations/2025-06-15-pagination-types.md
  • thoughts/explanations/2025-06-15-source-harvest-pagination.md
  • thoughts/explanations/2025-06-15-low-code-runtime.md

Report structure

markdown
# {Title}

## Summary
A concise 3-5 sentence summary directly answering the question.

## Details

### {Section 1}
Thorough explanation with code snippets from the CDK and/or connector.
Include file paths for local CDK files: `airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py`
Include file paths for connector files from the monorepo: `airbyte:airbyte-integrations/connectors/source-{name}/manifest.yaml`

### {Section 2}
...

## Key Files
Bulleted list of the most relevant files:
- `airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py` — Description
- `airbyte:airbyte-integrations/connectors/source-{name}/manifest.yaml` — Description

## Configuration Example
(When applicable) Show how the component is configured in a manifest YAML or Python connector.
Example manifest or connector configuration snippet goes here.

### Report guidelines

- Include **code snippets** from the CDK to show how components actually work
- Always reference **file paths** so the reader can find the source
- Prefix monorepo file paths with `airbyte:` to distinguish from local CDK paths
- When explaining a component, show both the **schema definition** (from `declarative_component_schema.yaml`) and the **implementation** (from the Python class)
- For connector-specific questions, show the connector's configuration alongside the CDK implementation
- Be thorough but not overwhelming — aim for a report someone could read in 5-10 minutes

## Step 5: Present the report (full mode only)

> **Fast mode**: Skip this step — you already responded inline.

After saving the report file, display the full report contents to the user and note the file path where it was saved.

## Rules

- ALWAYS read the local CDK code before answering — do not rely on assumptions about how components work
- Do NOT suggest improvements, refactors, or changes
- Do NOT speculate about things not found in the code
- If a component or behavior cannot be determined from the code, say so explicitly
- Keep focus on API sources — redirect if asked about databases, destinations, or Java/Kotlin CDK
- When the question involves both low-code and legacy CDK, explain both and note the differences