Databricks Asset Bundles (DABs)
Overview
Databricks Asset Bundles provide infrastructure-as-code for deploying Databricks workflows, jobs, and DLT pipelines. This skill standardizes configuration patterns for serverless-first, production-ready deployments with hierarchical job architecture, proper parameter passing, and comprehensive error prevention.
When to Use This Skill
- •Creating or configuring Databricks Asset Bundle YAML files
- •Deploying serverless jobs, DLT pipelines, dashboards, alerts, apps, or workflows
- •Setting up hierarchical job architectures (atomic/composite/orchestrator)
- •Configuring dashboard resources with
dataset_catalog/dataset_schema(CLI 0.281.0+) - •Setting up SQL Alerts v2 (schema differs significantly from other resources)
- •Configuring Databricks Apps in DABs (env vars in
app.yaml, notdatabricks.yml) - •Troubleshooting deployment errors or configuration issues
- •Converting notebooks to use proper parameter passing patterns
- •Validating bundle configurations before deployment
Critical Rules (Quick Reference)
🔴 MANDATORY: Serverless Environment Configuration (Environments V4)
EVERY JOB MUST INCLUDE THIS — NO EXCEPTIONS:
resources:
jobs:
<job_name>:
name: "[${bundle.target}] <Display Name>"
# ✅ MANDATORY: Serverless environment with V4
environments:
- environment_key: "default"
spec:
environment_version: "4" # 🔴 ALWAYS V4 - never omit or use older versions
tasks:
- task_key: <task_name>
environment_key: default # ✅ MANDATORY: Reference environment in EVERY task
notebook_task:
notebook_path: ../src/<script>.py
Validation: Before deploying ANY job YAML:
- •
environments:block exists at job level - •
environment_version: "4"is set (NEVER omit, NEVER use older versions) - • Every task has
environment_key: default - • NO
job_clusters:,existing_cluster_id:, ornew_cluster:defined (serverless only)
🔴 MANDATORY: Hierarchical Job Architecture
3-LAYER HIERARCHY - NO EXCEPTIONS:
- •Layer 1: Atomic Jobs - Contain actual
notebook_taskreferences (single notebook per job) - •Layer 2: Composite Jobs - Reference atomic jobs via
run_job_task(NO direct notebooks) - •Layer 3: Master Orchestrators - Reference composite/atomic jobs via
run_job_task(NO direct notebooks)
Rule: Each notebook appears in EXACTLY ONE atomic job. Higher-level jobs reference lower-level jobs, never duplicate notebooks.
🔴 MANDATORY: Parameter Passing Pattern
ALWAYS use dbutils.widgets.get() for notebook_task, NEVER argparse:
# ✅ CORRECT: Databricks notebook
def get_parameters():
catalog = dbutils.widgets.get("catalog") # ✅ Works in notebook_task
schema = dbutils.widgets.get("schema")
return catalog, schema
# ✅ CORRECT: YAML configuration
notebook_task:
notebook_path: ../src/script.py
base_parameters: # ✅ Dictionary format
catalog: ${var.catalog}
schema: ${var.schema}
Why: notebook_task passes parameters through widgets, not command-line arguments. Using argparse causes immediate failure.
🔴 MANDATORY: Task Type Pattern
ALWAYS use notebook_task, NEVER python_task:
# ✅ CORRECT
tasks:
- task_key: my_task
notebook_task: # ✅ Use notebook_task
notebook_path: ../src/script.py
base_parameters: # ✅ Dictionary format
catalog: ${var.catalog}
# ❌ WRONG
tasks:
- task_key: my_task
python_task: # ❌ Invalid task type!
python_file: ../src/script.py
parameters: # ❌ CLI-style doesn't work!
- "--catalog=value"
Core Patterns
Serverless Job Pattern
resources:
jobs:
<job_key>:
name: "[${bundle.target}] <Job Display Name>"
# ✅ MANDATORY: Serverless environment
environments:
- environment_key: "default"
spec:
environment_version: "4"
tasks:
- task_key: <task_key>
environment_key: default # ✅ MANDATORY
notebook_task:
notebook_path: ../src/<script>.py
base_parameters:
catalog: ${var.catalog}
tags:
environment: ${bundle.target}
project: <project_name>
layer: <bronze|silver|gold>
DLT Pipeline Pattern
resources:
pipelines:
<pipeline_key>:
name: "[${bundle.target}] <Pipeline Display Name>"
# ✅ MANDATORY: Root path for Lakeflow Pipelines Editor
root_path: ../src/<layer>_pipeline
# ✅ Direct Publishing Mode (Modern Pattern)
catalog: ${var.catalog}
schema: ${var.<layer>_schema}
libraries:
- notebook:
path: ../src/<layer>/<notebook>.py
configuration:
catalog: ${var.catalog}
bronze_schema: ${var.bronze_schema}
serverless: true
photon: true
edition: ADVANCED
tags:
environment: ${bundle.target}
layer: <layer>
Job Reference Pattern (Hierarchical Architecture)
# Layer 1: Atomic Job (contains notebook)
resources:
jobs:
tvf_deployment_job:
name: "[${bundle.target}] TVF Deployment"
environments:
- environment_key: default
spec:
environment_version: "4"
tasks:
- task_key: deploy_tvfs
environment_key: default
notebook_task: # ✅ Actual notebook reference
notebook_path: ../../src/semantic/tvfs/deploy_tvfs.py
tags:
job_level: atomic
# Layer 2: Composite Job (references atomic jobs)
resources:
jobs:
semantic_layer_setup_job:
name: "[${bundle.target}] Semantic Layer Setup"
tasks:
- task_key: deploy_tvfs
run_job_task: # ✅ Reference job, NOT notebook
job_id: ${resources.jobs.tvf_deployment_job.id}
- task_key: deploy_metric_views
depends_on:
- task_key: deploy_tvfs
run_job_task:
job_id: ${resources.jobs.metric_view_deployment_job.id}
tags:
job_level: composite
Job Hierarchy Overview
Layer 1: Atomic Jobs
- •Purpose: Single-purpose jobs with actual notebook references
- •Pattern: Use
notebook_taskwithnotebook_path - •Tag:
job_level: atomic - •Example:
tvf_deployment_job,gold_setup_job
Layer 2: Composite Jobs
- •Purpose: Domain-level coordination (e.g., semantic layer setup)
- •Pattern: Use
run_job_taskto reference atomic jobs - •Tag:
job_level: composite - •Example:
semantic_layer_setup_job,monitoring_layer_setup_job
Layer 3: Master Orchestrators
- •Purpose: Complete workflow coordination across layers
- •Pattern: Use
run_job_taskto reference composite/atomic jobs - •Tag:
job_level: orchestrator - •Example:
master_setup_orchestrator,master_refresh_orchestrator
Key Principle: No notebook duplication. Each notebook appears in exactly ONE atomic job.
Path Resolution Rules
Relative paths depend on YAML file location:
- •From
resources/*.yml→ Use../src/ - •From
resources/<layer>/*.yml→ Use../../src/ - •From
resources/<layer>/<sublevel>/*.yml→ Use../../../src/
Rule: Always verify path depth matches directory structure.
Reference Files
- •Configuration Guide: Complete YAML configuration patterns, environment setup, variables (with warehouse_id lookup), targets, DLT pipelines (with glob libraries), dashboards (dataset_catalog/dataset_schema), SQL Alerts v2, volumes (grants not permissions), Apps, schedules, notifications, permissions, library dependencies
- •Job Patterns: Hierarchical job architecture (atomic/composite/orchestrator), task types, parameter passing (dbutils.widgets.get vs argparse), orchestrator patterns, SQL tasks, multi-task dependencies
- •Common Errors: Anti-patterns, deployment error prevention (14 common errors including dashboard hardcoded catalog, alert v2 schema mismatch, volume permissions, app env vars), troubleshooting guide, validation checklist, pre-deployment validation script
Scripts
- •validate_bundle.py: Pre-deployment validation script to catch common configuration errors
Assets
- •bundle-template.yaml: Starter template for a new Databricks Asset Bundle with serverless configuration
Quick Validation Checklist
Before deploying any bundle:
Jobs & Pipelines
- • Serverless environment configured (
environments:block +environment_keyin tasks) - • Environments Version 4:
environment_version: "4"in everyenvironments.spec(MANDATORY) - • Using
notebook_task(NOTpython_task) - • Using
base_parametersdictionary format (NOT CLI-styleparameters) - • Notebooks use
dbutils.widgets.get()(NOTargparse) - • Variable references use
${var.<name>}format - • Hierarchical architecture: notebooks in atomic jobs only, composite/orchestrator use
run_job_task - • All jobs have
job_leveltag (atomic/composite/orchestrator) - • Path resolution matches directory structure
- • DLT pipelines have
root_pathdefined
Dashboards
- • Uses
dataset_catalog/dataset_schemaparams (no hardcoded catalogs in JSON)
SQL Alerts
- • Uses
evaluation(notcondition),quartz_cron_schedule(notquartz_cron_expression) - • Schema verified with
databricks bundle schema | grep -A 100 'sql.AlertV2'
Volumes & Apps
- • Volumes use
grants(notpermissions) - • App env vars in
app.yaml(notdatabricks.yml)
Pre-Deploy
- • Run pre-deployment validation script
- •
databricks bundle validatepasses
Deployment Commands
# Validate bundle configuration databricks bundle validate # Deploy to dev databricks bundle deploy -t dev # Deploy with auto-approve (skip confirmation prompts) databricks bundle deploy -t dev --auto-approve # Force deploy (overwrite remote changes) databricks bundle deploy -t dev --force # Run specific job databricks bundle run -t dev <job_name> # Start an app after deployment databricks bundle run -t dev <app_resource_key> # View app logs for debugging databricks apps logs <app-name> --profile <profile-name> # Deploy to production databricks bundle deploy -t prod # Destroy all resources (cleanup) databricks bundle destroy -t dev databricks bundle destroy -t dev --auto-approve