AgentSkillsCN

dbt-skill

在使用 dbt(数据构建工具)时,无论是创建模型、编写测试、构建 CI/CD 流水线、进行物化处理、定义数据源、搭建暂存/中间/集市层、配置 Snowflake/BigQuery 仓库、实施增量策略、编写 Jinja 宏、保障数据质量、构建语义层,还是做出分析工程方面的决策时,均可使用此功能。

SKILL.md
--- frontmatter
name: dbt-skill
description: Use when working with dbt (data build tool) - creating models, writing tests, CI/CD pipelines, materializations, sources, staging/intermediate/marts layers, Snowflake/BigQuery warehouse configuration, incremental strategies, Jinja macros, data quality, semantic layer, or making analytics engineering decisions
license: Apache-2.0
metadata:
  author: Daniel Song
  version: 1.0.0

dbt Skill for Claude

Comprehensive dbt guidance covering project structure, modeling methodology, testing, CI/CD, and production patterns. Targets Snowflake and BigQuery as primary warehouses. Beginner-friendly with progressive scaling toward advanced capabilities.

When to Use This Skill

Activate this skill when:

  • Creating new dbt projects or adding models to existing ones
  • Choosing materializations (view, table, incremental, ephemeral, snapshot)
  • Structuring staging, intermediate, and marts layers
  • Setting up testing strategies (schema, generic, singular, unit)
  • Implementing CI/CD pipelines for dbt
  • Configuring sources and freshness monitoring
  • Writing Jinja macros or installing dbt packages
  • Reviewing or refactoring existing dbt projects
  • Making analytics engineering architecture decisions

Don't use this skill for:

  • Basic SQL syntax questions (Claude knows this)
  • Warehouse administration (user management, networking, billing)
  • Raw data pipeline configuration (Fivetran, Airbyte, Stitch)
  • BI tool configuration (Looker, Tableau, Power BI)

Core Principles

1. DRY via ref() and source()

Every model references upstream dependencies through ref() or source() — never hardcoded table names.

2. Single Source of Truth

Each concept is defined once. Staging models are the single entry point for raw data. Marts are the single interface for consumers.

3. Idempotent Transformations

Running dbt run twice produces the same result. Models are deterministic and reproducible.

4. Test Everything

Tests are not optional. Every model has at minimum a primary key uniqueness and not-null test.

5. Progressive Complexity

Start simple (views and tables), add complexity only when data volume or business requirements demand it.

Project Structure

code
dbt_project/
├── dbt_project.yml              # Project configuration
├── packages.yml                 # Package dependencies
├── profiles.yml                 # Connection profiles (local only, not committed)
├── models/
│   ├── staging/                 # 1:1 with source tables
│   │   ├── stripe/
│   │   │   ├── _stripe__models.yml
│   │   │   ├── _stripe__sources.yml
│   │   │   ├── stg_stripe__payments.sql
│   │   │   └── stg_stripe__customers.sql
│   │   └── shopify/
│   │       ├── _shopify__models.yml
│   │       ├── _shopify__sources.yml
│   │       └── stg_shopify__orders.sql
│   ├── intermediate/            # Business logic joins and transformations
│   │   ├── finance/
│   │   │   ├── _int_finance__models.yml
│   │   │   └── int_payments_pivoted.sql
│   │   └── marketing/
│   │       └── int_web_sessions_sessionized.sql
│   └── marts/                   # Business-facing tables
│       ├── finance/
│       │   ├── _finance__models.yml
│       │   ├── fct_orders.sql
│       │   └── dim_customers.sql
│       └── marketing/
│           ├── _marketing__models.yml
│           └── fct_web_sessions.sql
├── macros/                      # Reusable Jinja macros
├── tests/                       # Singular and generic tests
│   └── generic/                 # Custom generic tests
├── seeds/                       # CSV reference data
├── snapshots/                   # SCD Type 2 snapshots
└── analyses/                    # Ad-hoc analytical queries

dbt_project.yml Template

yaml
name: 'my_project'
version: '1.0.0'
config-version: 2

profile: 'my_project'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets: ["target", "dbt_packages"]

models:
  my_project:
    staging:
      +materialized: view           # Lightweight, always fresh
    intermediate:
      +materialized: ephemeral      # No warehouse cost, inlined as CTEs
    marts:
      +materialized: table          # Fast reads for BI tools

Modeling Methodology — Medallion + Kimball

Layer Decision Matrix

LayerMaterializationPurposeNamingTests
StagingviewClean and rename raw data, 1:1 with sourcestg_<source>__<entity>not_null, unique on PK
IntermediateephemeralBusiness logic, joins, pivotsint_<entity>_<verb>edTested via downstream
Martstable / incrementalBusiness-facing facts and dimensionsfct_<entity>, dim_<entity>Full test coverage
ReportstablePre-aggregated for specific dashboardsrpt_<entity>Acceptance tests

Layer Flow

code
Raw Data (sources)
    ↓
Staging (stg_) ── view ── clean, rename, cast, 1:1
    ↓
Intermediate (int_) ── ephemeral ── join, pivot, business logic
    ↓
Marts (fct_, dim_) ── table/incremental ── business entities
    ↓
BI Tools / Consumers

Staging Model Pattern

sql
-- stg_stripe__payments.sql
with source as (
    select * from {{ source('stripe', 'payments') }}
),

renamed as (
    select
        -- primary key
        id as payment_id,

        -- foreign keys
        order_id,
        customer_id,

        -- properties
        lower(payment_method) as payment_method,
        status as payment_status,

        -- numerics
        amount / 100.0 as amount_dollars,

        -- timestamps
        created_at,
        updated_at

    from source
)

select * from renamed

Intermediate Model Pattern

sql
-- int_payments_pivoted.sql
with payments as (
    select * from {{ ref('stg_stripe__payments') }}
),

pivoted as (
    select
        order_id,
        sum(case when payment_method = 'credit_card' then amount_dollars else 0 end) as credit_card_amount,
        sum(case when payment_method = 'bank_transfer' then amount_dollars else 0 end) as bank_transfer_amount,
        sum(amount_dollars) as total_amount
    from payments
    where payment_status = 'completed'
    group by order_id
)

select * from pivoted

Marts Model Pattern

sql
-- fct_orders.sql
with orders as (
    select * from {{ ref('stg_shopify__orders') }}
),

payments as (
    select * from {{ ref('int_payments_pivoted') }}
),

final as (
    select
        orders.order_id,
        orders.customer_id,
        orders.order_date,
        orders.order_status,
        payments.credit_card_amount,
        payments.bank_transfer_amount,
        payments.total_amount
    from orders
    left join payments on orders.order_id = payments.order_id
)

select * from final

Model Naming Conventions

Models

LayerPatternExample
Stagingstg_<source>__<entity>stg_stripe__payments
Intermediateint_<entity>_<verb>edint_payments_pivoted
Factsfct_<entity>fct_orders
Dimensionsdim_<entity>dim_customers
Reportsrpt_<entity>rpt_monthly_revenue

YAML Files

TypePatternExample
Model config_<layer>__models.yml_stripe__models.yml
Sources_<layer>__sources.yml_stripe__sources.yml

Use the leading underscore to sort config files above model files in the directory listing.

SQL Style Guide

Rules

  1. Leading commas — easier to comment out lines, cleaner diffs
  2. Lowercase keywordsselect, not SELECT
  3. CTEs over subqueries — always use with blocks
  4. Explicit columns — never select * in marts (acceptable in staging with source)
  5. Final CTE — name the last CTE final for consistency
  6. 4-space indentation — align for readability
  7. One column per line — in select statements

Complete Example

sql
with source as (
    select * from {{ source('shopify', 'orders') }}
),

renamed as (
    select
        id as order_id,
        user_id as customer_id,
        lower(status) as order_status,
        created_at as order_date,
        updated_at
    from source
),

final as (
    select
        order_id,
        customer_id,
        order_status,
        order_date,
        updated_at
    from renamed
    where order_status is not null
)

select * from final

Materialization Decision Matrix

SituationMaterializationWhy
Staging modelsviewAlways fresh, minimal storage cost
Intermediate logicephemeralZero cost, inlined as CTE
Marts < 100M rowstableSimple, fast reads
Marts > 100M rowsincrementalOnly process new/changed data
SCD Type 2 trackingsnapshotTrack historical changes
One-off analysisephemeralNo need to persist

Warehouse-Specific Configuration

FeatureSnowflakeBigQuery
Incremental defaultmergemerge
Recommended for eventsmicrobatch (1.9+)insert_overwrite
Clusteringcluster_by (automatic)cluster_by (manual)
PartitioningAutomatic micro-partitionspartition_by (required for large tables)
Transient tablestransient: true (no fail-safe)N/A
Dynamic tablesmaterialized: dynamic_tableN/A
Cost modelCredits (compute time)Bytes scanned (on-demand) / Slots (flat-rate)

For detailed incremental strategies, see: Incremental Models & Performance

Source Configuration

yaml
# _stripe__sources.yml
version: 2

sources:
  - name: stripe
    description: "Stripe payment data loaded by Fivetran"
    database: raw                     # Snowflake: database name
    schema: stripe                    # Snowflake: schema name
    loader: fivetran
    loaded_at_field: _fivetran_synced # For freshness checks
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    tables:
      - name: payments
        description: "One record per payment attempt"
        columns:
          - name: id
            description: "Primary key"
            data_tests:
              - unique
              - not_null
          - name: order_id
            description: "Foreign key to orders"
            data_tests:
              - not_null
              - relationships:
                  to: source('shopify', 'orders')
                  field: id

Warehouse Terminology

ConceptSnowflakeBigQuery
Top-level containerDatabaseProject
Schema groupingSchemaDataset
Freshness field_fivetran_synced_fivetran_synced or _PARTITIONTIME

Basic Testing Overview

Testing Pyramid

code
          /\
         /  \        Singular Tests (specific business rules)
        /____\
       /      \      Generic Tests (reusable patterns)
      /________\
     /          \    Schema Tests (YAML-configured)
    /____________\
   /              \  Source Freshness (automated)
  /________________\

What to Test at Each Layer

LayerTest TypeExamples
SourcesFreshness, existenceloaded_at_field, not_null on keys
StagingPrimary key integrityunique, not_null on PK
IntermediateTested via downstream models
MartsFull coverageAll keys, accepted values, relationships, row counts

Quick Test Example

yaml
# _finance__models.yml
version: 2

models:
  - name: fct_orders
    description: "One record per order with payment details"
    columns:
      - name: order_id
        description: "Primary key"
        data_tests:
          - unique
          - not_null
      - name: order_status
        data_tests:
          - accepted_values:
              values: ['completed', 'pending', 'cancelled', 'refunded']
      - name: total_amount
        data_tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0

For deep testing strategies, see: Testing & Quality Strategy

ref() and source() Patterns

Rules

  1. source() only in staging — staging models are the only gateway to raw data
  2. ref() everywhere else — all other models reference through ref()
  3. Never skip layers — marts must not ref() staging directly (go through intermediate)
  4. Never hardcode schema names — use source() and ref() exclusively

Correct DAG Flow

code
source('stripe', 'payments')  →  stg_stripe__payments
                                       ↓
                                 int_payments_pivoted
                                       ↓
                                   fct_orders

Anti-patterns

sql
-- WRONG: Hardcoded table reference
select * from raw.stripe.payments

-- WRONG: source() outside staging
-- (in a marts model)
select * from {{ source('stripe', 'payments') }}

-- WRONG: Skipping layers
-- (in a marts model referencing staging directly)
select * from {{ ref('stg_stripe__payments') }}

-- CORRECT: Follow the DAG
select * from {{ ref('int_payments_pivoted') }}

Common Commands Cheat Sheet

Build & Run

CommandPurpose
dbt runRun all models
dbt testRun all tests
dbt buildRun + test in DAG order (recommended)
dbt compileCompile SQL without executing
dbt run --select fct_ordersRun a single model
dbt build --select +fct_ordersBuild model and all ancestors
dbt build --select fct_orders+Build model and all descendants

Selectors

SelectorMeaning
-s model_nameSelect a single model
-s +model_nameModel + all upstream ancestors
-s model_name+Model + all downstream descendants
-s +model_name+Model + all ancestors + descendants
-s tag:financeAll models tagged finance
-s path:models/martsAll models in a directory
-s state:modified+Modified models + descendants (Slim CI)

Utilities

CommandPurpose
dbt depsInstall packages from packages.yml
dbt seedLoad CSV seeds into warehouse
dbt snapshotExecute snapshot models
dbt source freshnessCheck source freshness
dbt docs generateGenerate documentation site
dbt docs serveServe documentation locally
dbt debugTest database connection
dbt cleanRemove compiled artifacts

Warehouse Quick Reference

ConfigurationSnowflakeBigQuery
Profile targettype: snowflaketype: bigquery
Auth methodUser/password or key-pairOAuth or service account
Schema generationdatabase.schema.modelproject.dataset.model
Incremental defaultmerge (using unique_key)merge (using unique_key)
PartitioningAutomatic micro-partitionspartition_by: {field, data_type}
Clustering{{ config(cluster_by=['col']) }}{{ config(cluster_by=['col']) }}
Cost optimizationWarehouse auto-suspend, transient tablesPartition pruning, BI Engine caching
Connection testdbt debugdbt debug

Detailed Guides

This skill uses progressive disclosure — essential information is in this main file, detailed guides are available when needed:

Reference Files:

  • Testing & Quality Strategy — Deep dive into schema tests, generic tests, singular tests, unit tests, dbt-expectations, and layer-specific testing strategy
  • CI/CD & Deployment — Local dev workflows, Slim CI, GitHub Actions, dbt Cloud jobs, environment strategy, blue/green deployment, SQLFluff
  • Jinja, Macros & Packages — Jinja fundamentals, custom macros, essential packages (dbt-utils, dbt-expectations), debugging, warehouse-adaptive patterns
  • Incremental Models & Performance — Microbatch (1.9+), merge, delete+insert, insert_overwrite, Snowflake/BigQuery performance tuning, cost monitoring
  • Data Quality & Observability — Source freshness, Elementary package, anomaly detection, alerting, lineage, incident response
  • Semantic Layer & Governance — MetricFlow, model contracts, versions, access controls, dbt Mesh, maturity assessment

How to use: When you need detailed information on a topic, reference the appropriate guide. Claude will load it on demand to provide comprehensive guidance.

License

This skill is licensed under the Apache License 2.0. See the LICENSE file for full terms.

Copyright 2026 Daniel Song