Databricks Asset Bundles (DABs)

Overview

Databricks Asset Bundles provide infrastructure-as-code for deploying Databricks workflows, jobs, and DLT pipelines. This skill standardizes configuration patterns for serverless-first, production-ready deployments with hierarchical job architecture, proper parameter passing, and comprehensive error prevention.

When to Use This Skill

•Creating or configuring Databricks Asset Bundle YAML files
•Deploying serverless jobs, DLT pipelines, dashboards, alerts, apps, or workflows
•Setting up hierarchical job architectures (atomic/composite/orchestrator)
•Configuring dashboard resources with dataset_catalog/dataset_schema (CLI 0.281.0+)
•Setting up SQL Alerts v2 (schema differs significantly from other resources)
•Configuring Databricks Apps in DABs (env vars in app.yaml, not databricks.yml)
•Troubleshooting deployment errors or configuration issues
•Converting notebooks to use proper parameter passing patterns
•Validating bundle configurations before deployment

Critical Rules (Quick Reference)

🔴 MANDATORY: Serverless Environment Configuration (Environments V4)

EVERY JOB MUST INCLUDE THIS — NO EXCEPTIONS:

yaml

resources:
  jobs:
    <job_name>:
      name: "[${bundle.target}] <Display Name>"
      
      # ✅ MANDATORY: Serverless environment with V4
      environments:
        - environment_key: "default"
          spec:
            environment_version: "4"  # 🔴 ALWAYS V4 - never omit or use older versions
      
      tasks:
        - task_key: <task_name>
          environment_key: default  # ✅ MANDATORY: Reference environment in EVERY task
          notebook_task:
            notebook_path: ../src/<script>.py

Validation: Before deploying ANY job YAML:

• environments: block exists at job level
• environment_version: "4" is set (NEVER omit, NEVER use older versions)
• Every task has environment_key: default
• NO job_clusters:, existing_cluster_id:, or new_cluster: defined (serverless only)

🔴 MANDATORY: Hierarchical Job Architecture

3-LAYER HIERARCHY - NO EXCEPTIONS:

•Layer 1: Atomic Jobs - Contain actual notebook_task references (single notebook per job)
•Layer 2: Composite Jobs - Reference atomic jobs via run_job_task (NO direct notebooks)
•Layer 3: Master Orchestrators - Reference composite/atomic jobs via run_job_task (NO direct notebooks)

Rule: Each notebook appears in EXACTLY ONE atomic job. Higher-level jobs reference lower-level jobs, never duplicate notebooks.

🔴 MANDATORY: Parameter Passing Pattern

ALWAYS use dbutils.widgets.get() for notebook_task, NEVER argparse:

python

# ✅ CORRECT: Databricks notebook
def get_parameters():
    catalog = dbutils.widgets.get("catalog")  # ✅ Works in notebook_task
    schema = dbutils.widgets.get("schema")
    return catalog, schema

yaml

# ✅ CORRECT: YAML configuration
notebook_task:
  notebook_path: ../src/script.py
  base_parameters:  # ✅ Dictionary format
    catalog: ${var.catalog}
    schema: ${var.schema}

Why: notebook_task passes parameters through widgets, not command-line arguments. Using argparse causes immediate failure.

🔴 MANDATORY: Task Type Pattern

ALWAYS use notebook_task, NEVER python_task:

yaml

# ✅ CORRECT
tasks:
  - task_key: my_task
    notebook_task:  # ✅ Use notebook_task
      notebook_path: ../src/script.py
      base_parameters:  # ✅ Dictionary format
        catalog: ${var.catalog}

# ❌ WRONG
tasks:
  - task_key: my_task
    python_task:  # ❌ Invalid task type!
      python_file: ../src/script.py
      parameters:  # ❌ CLI-style doesn't work!
        - "--catalog=value"

Core Patterns

Serverless Job Pattern

yaml

resources:
  jobs:
    <job_key>:
      name: "[${bundle.target}] <Job Display Name>"
      
      # ✅ MANDATORY: Serverless environment
      environments:
        - environment_key: "default"
          spec:
            environment_version: "4"
      
      tasks:
        - task_key: <task_key>
          environment_key: default  # ✅ MANDATORY
          notebook_task:
            notebook_path: ../src/<script>.py
            base_parameters:
              catalog: ${var.catalog}
      
      tags:
        environment: ${bundle.target}
        project: <project_name>
        layer: <bronze|silver|gold>

DLT Pipeline Pattern

yaml

resources:
  pipelines:
    <pipeline_key>:
      name: "[${bundle.target}] <Pipeline Display Name>"
      
      # ✅ MANDATORY: Root path for Lakeflow Pipelines Editor
      root_path: ../src/<layer>_pipeline
      
      # ✅ Direct Publishing Mode (Modern Pattern)
      catalog: ${var.catalog}
      schema: ${var.<layer>_schema}
      
      libraries:
        - notebook:
            path: ../src/<layer>/<notebook>.py
      
      configuration:
        catalog: ${var.catalog}
        bronze_schema: ${var.bronze_schema}
      
      serverless: true
      photon: true
      edition: ADVANCED
      
      tags:
        environment: ${bundle.target}
        layer: <layer>

Job Reference Pattern (Hierarchical Architecture)

yaml

# Layer 1: Atomic Job (contains notebook)
resources:
  jobs:
    tvf_deployment_job:
      name: "[${bundle.target}] TVF Deployment"
      environments:
        - environment_key: default
          spec:
            environment_version: "4"
      tasks:
        - task_key: deploy_tvfs
          environment_key: default
          notebook_task:  # ✅ Actual notebook reference
            notebook_path: ../../src/semantic/tvfs/deploy_tvfs.py
      tags:
        job_level: atomic

# Layer 2: Composite Job (references atomic jobs)
resources:
  jobs:
    semantic_layer_setup_job:
      name: "[${bundle.target}] Semantic Layer Setup"
      tasks:
        - task_key: deploy_tvfs
          run_job_task:  # ✅ Reference job, NOT notebook
            job_id: ${resources.jobs.tvf_deployment_job.id}
        - task_key: deploy_metric_views
          depends_on:
            - task_key: deploy_tvfs
          run_job_task:
            job_id: ${resources.jobs.metric_view_deployment_job.id}
      tags:
        job_level: composite

Job Hierarchy Overview

Layer 1: Atomic Jobs

•Purpose: Single-purpose jobs with actual notebook references
•Pattern: Use notebook_task with notebook_path
•Tag: job_level: atomic
•Example: tvf_deployment_job, gold_setup_job

Layer 2: Composite Jobs

•Purpose: Domain-level coordination (e.g., semantic layer setup)
•Pattern: Use run_job_task to reference atomic jobs
•Tag: job_level: composite
•Example: semantic_layer_setup_job, monitoring_layer_setup_job

Layer 3: Master Orchestrators

•Purpose: Complete workflow coordination across layers
•Pattern: Use run_job_task to reference composite/atomic jobs
•Tag: job_level: orchestrator
•Example: master_setup_orchestrator, master_refresh_orchestrator

Key Principle: No notebook duplication. Each notebook appears in exactly ONE atomic job.

Path Resolution Rules

Relative paths depend on YAML file location:

•From resources/*.yml → Use ../src/
•From resources/<layer>/*.yml → Use ../../src/
•From resources/<layer>/<sublevel>/*.yml → Use ../../../src/

Rule: Always verify path depth matches directory structure.

Reference Files

•Configuration Guide: Complete YAML configuration patterns, environment setup, variables (with warehouse_id lookup), targets, DLT pipelines (with glob libraries), dashboards (dataset_catalog/dataset_schema), SQL Alerts v2, volumes (grants not permissions), Apps, schedules, notifications, permissions, library dependencies
•Job Patterns: Hierarchical job architecture (atomic/composite/orchestrator), task types, parameter passing (dbutils.widgets.get vs argparse), orchestrator patterns, SQL tasks, multi-task dependencies
•Common Errors: Anti-patterns, deployment error prevention (14 common errors including dashboard hardcoded catalog, alert v2 schema mismatch, volume permissions, app env vars), troubleshooting guide, validation checklist, pre-deployment validation script

Scripts

•validate_bundle.py: Pre-deployment validation script to catch common configuration errors

Assets

•bundle-template.yaml: Starter template for a new Databricks Asset Bundle with serverless configuration

Quick Validation Checklist

Before deploying any bundle:

Jobs & Pipelines

Dashboards

• Uses dataset_catalog/dataset_schema params (no hardcoded catalogs in JSON)

SQL Alerts

• Uses evaluation (not condition), quartz_cron_schedule (not quartz_cron_expression)
• Schema verified with databricks bundle schema | grep -A 100 'sql.AlertV2'

Volumes & Apps

• Volumes use grants (not permissions)
• App env vars in app.yaml (not databricks.yml)

Pre-Deploy

• Run pre-deployment validation script
• databricks bundle validate passes

Deployment Commands

bash

# Validate bundle configuration
databricks bundle validate

# Deploy to dev
databricks bundle deploy -t dev

# Deploy with auto-approve (skip confirmation prompts)
databricks bundle deploy -t dev --auto-approve

# Force deploy (overwrite remote changes)
databricks bundle deploy -t dev --force

# Run specific job
databricks bundle run -t dev <job_name>

# Start an app after deployment
databricks bundle run -t dev <app_resource_key>

# View app logs for debugging
databricks apps logs <app-name> --profile <profile-name>

# Deploy to production
databricks bundle deploy -t prod

# Destroy all resources (cleanup)
databricks bundle destroy -t dev
databricks bundle destroy -t dev --auto-approve

databricks-asset-bundles

Databricks Asset Bundles (DABs)

Overview

When to Use This Skill

Critical Rules (Quick Reference)

🔴 MANDATORY: Serverless Environment Configuration (Environments V4)

🔴 MANDATORY: Hierarchical Job Architecture

🔴 MANDATORY: Parameter Passing Pattern

🔴 MANDATORY: Task Type Pattern

Core Patterns

Serverless Job Pattern

DLT Pipeline Pattern

Job Reference Pattern (Hierarchical Architecture)

Job Hierarchy Overview

Layer 1: Atomic Jobs

Layer 2: Composite Jobs

Layer 3: Master Orchestrators

Path Resolution Rules

Reference Files

Scripts

Assets

Quick Validation Checklist

Jobs & Pipelines

Dashboards

SQL Alerts

Volumes & Apps

Pre-Deploy

Deployment Commands

References

Official Documentation