dbt Core Development (Research-Driven)
Constitution Alignment
This skill enforces project principles:
- •I. Technology Ownership: dbt owns ALL SQL compilation, dialect translation - NEVER parse SQL in Python
- •II. Plugin-First Architecture: dbt adapter is pluggable (DuckDB default, Snowflake, BigQuery alternatives)
- •IV. Contract-Driven Integration: dbt profiles generated from CompiledArtifacts contract
- •VIII. Observability By Default: dbt metadata (manifest.json) feeds lineage and governance
Philosophy
This skill does NOT prescribe specific SQL patterns or dbt project structures. Instead, it guides you to:
- •Research the current dbt Core version and capabilities
- •Discover existing dbt project patterns in the codebase
- •Validate your implementations against dbt documentation
- •Verify integration with Dagster orchestration and Iceberg storage
CRITICAL: dbt owns SQL. Never parse, validate, or transform SQL in Python. Let dbt handle all SQL dialect translation and execution.
Pre-Implementation Research Protocol
Step 1: Verify Runtime Environment
ALWAYS run this first:
dbt --version
python -c "import dbt.version; print(f'dbt-core {dbt.version.__version__}')"
Critical Questions to Answer:
- •What version is installed? (1.x series recommended)
- •What adapters are installed? (duckdb, snowflake, bigquery, etc.)
- •Does it match the documented requirements?
Step 2: Research SDK State (if unfamiliar)
When to research: If you encounter unfamiliar dbt features or need to validate patterns
Research queries (use WebSearch):
- •"dbt Core [feature] documentation 2025" (e.g., "dbt Core snapshots documentation 2025")
- •"dbt programmatic invocation dbtRunner 2025"
- •"dbt [adapter] configuration 2025" (e.g., "dbt duckdb configuration 2025")
Official documentation: https://docs.getdbt.com
Key documentation sections:
- •Models: https://docs.getdbt.com/docs/build/models
- •Tests: https://docs.getdbt.com/docs/build/data-tests
- •Snapshots: https://docs.getdbt.com/docs/build/snapshots
- •Programmatic invocations: https://docs.getdbt.com/reference/programmatic-invocations
Step 3: Discover Existing Patterns
BEFORE creating new dbt resources, search for existing implementations:
# Find dbt project directories find . -name "dbt_project.yml" # Find existing models find . -path "*/models/*.sql" -o -path "*/models/*.py" # Find existing tests find . -path "*/tests/*.sql" # Find macros find . -path "*/macros/*.sql" # Check profiles.yml location echo $DBT_PROFILES_DIR find . -name "profiles.yml"
Key questions:
- •What dbt project structure exists?
- •What naming conventions are used for models?
- •What adapters are configured in profiles.yml?
- •What macros or custom tests exist?
Step 4: Validate Against Architecture
Check architecture docs for integration requirements:
- •Read
/docs/for CompiledArtifacts contract (dbt paths, profiles) - •Understand compute targets (how they map to dbt profiles)
- •Verify Iceberg integration requirements
- •Check governance requirements (classification metadata in model
meta)
Implementation Guidance (Not Prescriptive)
dbt Project Structure
Core concept: dbt projects organize transformations into models, tests, snapshots, seeds, and macros
Research questions:
- •Where should the dbt project live? (monorepo package structure)
- •What directory structure makes sense? (staging, intermediate, marts)
- •How should models be organized? (by source system, by domain)
- •What naming conventions should be used?
SDK features to research:
- •
dbt_project.yml: Project configuration - •
profiles.yml: Connection profiles for compute targets - •Model directories:
models/staging/,models/marts/ - •Resource paths:
analysis-paths,test-paths,macro-paths
Models (SQL and Python)
Core concept: Models are SELECT statements that create tables/views in the data warehouse
Research questions:
- •Should this be a SQL or Python model?
- •What materialization? (table, view, incremental, ephemeral)
- •What dependencies exist? (use
{{ ref('model_name') }}) - •What sources? (use
{{ source('source_name', 'table_name') }})
SDK features to research:
- •SQL models:
.sqlfiles with Jinja templating - •Python models:
.pyfiles withdbt.ref()anddbt.source() - •Materializations:
{{ config(materialized='table') }} - •Incremental models:
is_incremental(), incremental strategies - •Model configurations: tags, meta, pre/post hooks
Tests
Core concept: Tests assert data quality on models, sources, seeds, and snapshots
Research questions:
- •What data quality assertions are needed?
- •Should I use generic tests (unique, not_null) or singular tests?
- •What custom tests are needed?
- •How should test severity be configured? (warn vs error)
SDK features to research:
- •Generic tests:
unique,not_null,accepted_values,relationships - •Singular tests: SQL files in
tests/directory - •Custom generic tests: Macros in
macros/with{% test %} - •Test configurations:
severity,error_if,warn_if
Snapshots
Core concept: Snapshots capture Type 2 slowly changing dimensions (historical records)
Research questions:
- •What tables need historical tracking?
- •What snapshot strategy? (timestamp, check)
- •How should snapshot metadata be named?
- •What should happen to hard deletes?
SDK features to research:
- •Snapshot strategies:
timestamp,check - •YAML configuration (v1.9+): cleaner snapshot setup
- •
snapshot_meta_column_names: Customize dbt_valid_from, dbt_valid_to - •
hard_deletes:ignore,invalidate,new_record
Sources
Core concept: Sources represent raw tables in external systems
Research questions:
- •What raw data sources exist?
- •How should source freshness be validated?
- •What source metadata is needed?
- •Should source tests be added?
SDK features to research:
- •
sources.yml: Source definitions - •
{{ source('source_name', 'table_name') }}: Referencing sources - •Source freshness:
loaded_at_field,freshnesschecks - •Source tests: Generic tests on source columns
Macros
Core concept: Macros are reusable SQL snippets (Jinja templates)
Research questions:
- •What SQL patterns are repeated?
- •Should I create a custom generic test?
- •Are there adapter-specific needs?
SDK features to research:
- •Macro syntax:
{% macro name(args) %} - •
{{ return() }}: Returning values from macros - •Adapter dispatch:
adapter.dispatch() - •Package macros:
dbt_utils, custom packages
Programmatic Invocation (Python API)
Core concept: dbtRunner allows calling dbt commands from Python (for Dagster integration)
Research questions:
- •What dbt commands need to be run from Python? (compile, run, test)
- •How should errors be handled?
- •Should manifest be cached for performance?
- •What context is needed (profiles dir, project dir, target)?
SDK features to research:
- •
dbtRunner: Entry point for programmatic invocation - •
dbtRunnerResult: Return object with results and exceptions - •
.invoke(cli_args): Execute dbt commands - •Manifest reuse: Avoid reparsing for performance
CRITICAL: dbt-core does NOT support safe parallel execution in the same process
Validation Workflow
Before Implementation
- •✅ Verified dbt Core version and adapter
- •✅ Searched for existing dbt project structure
- •✅ Read architecture docs for compute targets and profiles
- •✅ Identified data sources and transformation requirements
- •✅ Researched unfamiliar dbt features
During Implementation
- •✅ SQL models using Jinja (
{{ ref() }},{{ source() }}) - •✅ Proper materialization configured for performance
- •✅ Tests added for data quality (generic and singular)
- •✅ Metadata added to model
metafor governance (classifications) - •✅ Dependencies correctly specified
- •✅ NEVER parse or validate SQL in Python (dbt owns SQL)
After Implementation
- •✅ Run
dbt debugto verify connection - •✅ Run
dbt compileto verify SQL generation - •✅ Run
dbt run --select model_nameto test model - •✅ Run
dbt testto validate data quality - •✅ Check
target/manifest.jsonfor metadata - •✅ Verify integration with Dagster (if applicable)
Context Injection (For Future Claude Instances)
When this skill is invoked, you should:
- •
Verify runtime state (don't assume):
bashdbt --version dbt debug # Verify connection to target
- •
Discover existing patterns (don't invent):
bashfind . -name "dbt_project.yml" find . -path "*/models/*.sql"
- •
Research when uncertain (don't guess):
- •Use WebSearch for "dbt Core [feature] documentation 2025"
- •Check official docs: https://docs.getdbt.com
- •
Validate against architecture (don't assume requirements):
- •Read relevant architecture docs in
/docs/ - •Understand compute targets and how they map to dbt profiles
- •Check CompiledArtifacts contract for dbt paths
- •Read relevant architecture docs in
- •
NEVER parse SQL (dbt owns SQL):
- •Don't validate SQL syntax in Python
- •Don't transform SQL dialects manually
- •Let dbt handle all SQL compilation and execution
Quick Reference: Common Research Queries
Use these WebSearch queries when encountering specific needs:
- •Models: "dbt Core models materialization documentation 2025"
- •Incremental models: "dbt Core incremental models strategies 2025"
- •Tests: "dbt Core data tests custom tests 2025"
- •Snapshots: "dbt Core snapshots YAML configuration 2025"
- •Sources: "dbt Core sources freshness documentation 2025"
- •Macros: "dbt Core macros Jinja examples 2025"
- •Programmatic invocation: "dbt Core dbtRunner programmatic invocation 2025"
- •Adapters: "dbt [adapter] configuration 2025" (e.g., "dbt duckdb configuration 2025")
- •Python models: "dbt Core Python models documentation 2025"
- •Governance: "dbt Core model meta tags documentation 2025"
Integration Points to Research
CompiledArtifacts → dbt profiles.yml
Key question: How does CompiledArtifacts generate profiles.yml?
Research areas:
- •What compute targets are defined in CompiledArtifacts?
- •How do compute types map to dbt adapters? (duckdb, snowflake, bigquery)
- •Where should profiles.yml be written? (DBT_PROFILES_DIR)
- •What connection parameters are needed for each adapter?
dbt → Dagster Integration
Key question: How are dbt models represented as Dagster assets?
Research areas:
- •
dagster-dbtlibrary usage - •
load_assets_from_dbt_manifest()vsload_assets_from_dbt_project() - •dbtRunner for programmatic invocation
- •Manifest.json location (target/manifest.json)
- •Metadata extraction (model meta, column classifications)
dbt → Iceberg Storage
Key question: How do dbt models write to Iceberg tables?
Research areas:
- •dbt adapter for Iceberg (does one exist?)
- •External tables pattern (dbt creates views, separate process writes Iceberg)
- •Python models writing to PyIceberg
- •Post-hooks for Iceberg table creation
Governance Metadata (dbt meta)
Key question: How should classifications be tagged in dbt models?
Research areas:
- •Model-level meta tags
- •Column-level meta tags
- •Custom schema tests for governance
- •Metadata extraction in CompiledArtifacts
dbt Development Workflow
Local Development
# Install dbt with adapter pip install dbt-core dbt-duckdb # Verify installation dbt --version # Initialize project (if needed) dbt init my_project # Verify connection dbt debug # Compile models (generate SQL) dbt compile # Run models dbt run # Run specific model dbt run --select my_model # Test data quality dbt test # Generate documentation dbt docs generate dbt docs serve # View at localhost:8080
Programmatic Invocation (Python)
from dbt.cli.main import dbtRunner, dbtRunnerResult
# Initialize runner
dbt = dbtRunner()
# Run dbt command
cli_args = ["run", "--select", "tag:daily"]
res: dbtRunnerResult = dbt.invoke(cli_args)
# Check results
if res.success:
for r in res.result:
print(f"{r.node.name}: {r.status}")
else:
print(f"Error: {res.exception}")
References
- •dbt Documentation: Official documentation
- •dbt Core on PyPI: Package information
- •Programmatic Invocations: dbtRunner API
- •Models: Model documentation
- •Tests: Testing documentation
- •Snapshots: Snapshot documentation
- •GitHub Repository: dbt-core source
Remember: This skill provides research guidance, NOT prescriptive SQL patterns. Always:
- •Verify the dbt version and adapter compatibility
- •Discover existing dbt project structure and conventions
- •Research dbt capabilities when needed (use WebSearch liberally)
- •Validate against actual CompiledArtifacts contract requirements
- •NEVER parse or validate SQL in Python - dbt owns SQL
- •Test with
dbt runanddbt testbefore considering complete