dlt (data load tool)

Name: dlt
Rating: 78
Author: clawdata

Declarative ingestion pipelines — replace manual CSV loading with source connectors for APIs, databases, and SaaS platforms.

Commands

Task	Command
Initialise dlt pipeline	`clawdata dlt init <source> duckdb`
Run a pipeline	`clawdata dlt run <pipeline>`
List available sources	`clawdata dlt sources`
Check pipeline status	`clawdata dlt status <pipeline>`
Show schema	`clawdata dlt schema <pipeline>`

Supported Sources

dlt supports 100+ verified sources including:

Source	Example
REST APIs	GitHub, Notion, Slack, HubSpot
Databases	PostgreSQL, MySQL, MongoDB
SaaS	Stripe, Salesforce, Google Sheets
Files	CSV, JSON, Parquet (local or remote)

Example Pipeline

python

# pipelines/github_pipeline.py
import dlt
from dlt.sources.rest_api import rest_api_source

source = rest_api_source({
    "client": {"base_url": "https://api.github.com"},
    "resources": ["repos/{owner}/{repo}/issues"],
})

pipeline = dlt.pipeline(
    pipeline_name="github_issues",
    destination="duckdb",
    dataset_name="github",
)

pipeline.run(source)

When to use

•User wants to load data from an API → clawdata dlt init <source> duckdb
•User wants to replace manual CSV downloads → create a dlt pipeline
•User needs incremental loading from a database → dlt handles state management

Integration

dlt writes directly to DuckDB, so loaded data is immediately available for dbt transformations and SQL queries.