Observability Setup Orchestrator
End-to-end workflow for setting up Databricks observability — Lakehouse Monitoring, AI/BI Dashboards, and SQL Alerts — on top of a completed Gold layer and semantic layer.
Predecessor: semantic-layer-setup skill (Semantic layer should be complete, but Gold tables are the minimum requirement)
Time Estimate: 3-5 hours for initial setup, 30 min per additional table/dashboard
What You'll Create:
- •Lakehouse Monitors — data quality, drift, and custom business KPIs for Gold tables
- •AI/BI Dashboards — Lakeview dashboards with monitoring widgets and business metrics
- •SQL Alerts — config-driven alerting with severity-based routing
Decision Tree
| Question | Action |
|---|---|
| Setting up observability end-to-end? | Use this skill — it orchestrates everything |
| Only need Lakehouse Monitoring? | Read monitoring/01-lakehouse-monitoring-comprehensive/SKILL.md directly |
| Only need AI/BI Dashboards? | Read monitoring/02-databricks-aibi-dashboards/SKILL.md directly |
| Only need SQL Alerts? | Read monitoring/03-sql-alerting-patterns/SKILL.md directly |
Mandatory Skill Dependencies
CRITICAL: Before generating ANY code for observability, you MUST read and follow the patterns in these common skills. Do NOT generate these patterns from memory.
| Phase | MUST Read Skill (use Read tool on SKILL.md) | What It Provides |
|---|---|---|
| All phases | common/databricks-expert-agent | Core extraction principle: extract names from source, never hardcode |
| Monitor scripts | common/databricks-python-imports | Pure Python module patterns for helpers |
| Job deployment | common/databricks-asset-bundles | Job YAML, deployment patterns |
| Troubleshooting | common/databricks-autonomous-operations | Deploy → Poll → Diagnose → Fix → Redeploy loop when jobs fail |
Monitoring-Domain Dependencies
| Skill | Requirement | What It Provides |
|---|---|---|
monitoring/01-lakehouse-monitoring-comprehensive | MUST read at Phase 1 | Monitor setup, custom metrics, graceful degradation |
monitoring/02-databricks-aibi-dashboards | MUST read at Phase 2 | Dashboard JSON, widget patterns, deployment |
monitoring/03-sql-alerting-patterns | MUST read at Phase 3 | Config-driven alerts, SDK deployment, severity routing |
🔴 Non-Negotiable Defaults
| Default | Value | Applied Where | NEVER Do This Instead |
|---|---|---|---|
| Monitor type | MonitorTimeSeries or MonitorSnapshot | Every Lakehouse Monitor | ❌ NEVER skip monitor type selection |
| Custom metrics | input_columns=[":table"] for table-level KPIs | Every custom business metric | ❌ NEVER use column-level when table-level is needed |
| Dashboard deployment | Lakeview JSON with dataset_catalog/dataset_schema | Every dashboard | ❌ NEVER hardcode catalog/schema in dashboard queries |
| Alert queries | Fully qualified table names (no parameters) | Every SQL alert query | ❌ NEVER use parameterized table names in alerts |
| Serverless | environments: block with environment_key | Every monitoring job | ❌ NEVER define job_clusters: |
Phased Implementation Workflow
Phase 0: Read Plan (5 minutes)
Before starting implementation, check for a planning manifest that defines what to build.
import yaml
from pathlib import Path
manifest_path = Path("plans/manifests/observability-manifest.yaml")
if manifest_path.exists():
with open(manifest_path) as f:
manifest = yaml.safe_load(f)
# Extract implementation checklist from manifest
monitors = manifest.get('lakehouse_monitors', [])
dashboards = manifest.get('dashboards', [])
alerts = manifest.get('alerts', [])
print(f"Plan: {len(monitors)} monitors, {len(dashboards)} dashboards, {len(alerts)} alerts")
# Each monitor has: table_name, monitor_type, custom_metrics, slicing_exprs
# Each dashboard has: name, pages, widgets
# Each alert has: alert_id, severity, query, threshold, schedule
else:
# Fallback: self-discovery from Gold tables
print("No manifest found — falling back to Gold table self-discovery")
# Discover Gold tables from catalog, create one monitor per table
If manifest exists: Use it as the implementation checklist. Every monitor, dashboard, and alert is pre-defined with configuration details. Track completion against the manifest's summary counts.
If manifest doesn't exist: Fall back to self-discovery — inventory Gold tables, create one monitor per table (TimeSeries for facts, Snapshot for dimensions), and generate standard dashboards and alerts. This works but may miss custom business KPIs the planning phase would have defined.
Phase 1: Lakehouse Monitoring (1-2 hours)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/common/databricks-expert-agent/SKILL.md | Extract-don't-generate principle |
| 2 | data_product_accelerator/skills/monitoring/01-lakehouse-monitoring-comprehensive/SKILL.md | Monitor setup, custom metrics |
Steps:
- •Inventory Gold tables that need monitoring (fact tables are highest priority)
- •Choose monitor type per table (TimeSeries for facts, Snapshot for dimensions)
- •Define custom business metrics using
input_columns=[":table"]for table-level KPIs - •Create monitor setup script with graceful degradation (delete-then-create pattern)
- •Deploy monitors and verify metric tables are populated
- •Document monitor configuration in Genie Space instructions (if applicable)
Phase 2: AI/BI Dashboards (1-2 hours)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/monitoring/02-databricks-aibi-dashboards/SKILL.md | Dashboard JSON, widget patterns |
Steps:
- •Design dashboard layout: monitoring overview + business metrics sections
- •Create queries using monitoring profile/drift tables
- •Build widget configurations with proper number formatting
- •Set
dataset_cataloganddataset_schemafor environment portability - •Deploy dashboard via Asset Bundle or API
- •Validate all widgets render correctly
Phase 3: SQL Alerts (1 hour)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/monitoring/03-sql-alerting-patterns/SKILL.md | Config-driven alerts, SDK deployment |
| 2 | data_product_accelerator/skills/common/databricks-asset-bundles/SKILL.md | Job YAML for alert deployment |
Steps:
- •Create alert configuration table (Delta table-based, severity-driven)
- •Define alert rules: threshold, percentage change, anomaly detection
- •Deploy alerts via Databricks SDK (V2 dict-based or typed classes)
- •Configure notification destinations per severity level
- •Set up Quartz cron schedules for alert evaluation
- •Validate alerts fire correctly with test data
Post-Creation Validation
Common Skill Compliance
- • Names extracted from Gold YAML (not generated) per
databricks-expert-agent - • Asset Bundle YAML follows
databricks-asset-bundlespatterns - • Python imports follow
databricks-python-importspatterns
Observability Specifics
- • Lakehouse Monitors created for all critical Gold tables
- • Custom business metrics use
input_columns=[":table"]syntax - • Monitor setup uses graceful degradation (delete-then-create)
- • Dashboard uses
dataset_catalog/dataset_schemafor portability - • Dashboard widgets align with query columns
- • Alert queries use fully qualified table names (no parameters)
- • Alert severity routing configured (critical → PagerDuty, warning → email)
- • All monitoring jobs use serverless compute
Pipeline Progression
Previous stage: semantic-layer-setup → Metric Views, TVFs, and Genie Spaces should exist
Next stage: After completing observability, proceed to:
- •
ml/00-ml-pipeline-setup— Set up ML models, experiments, and batch inference
Related Skills
| Skill | Relationship | Path |
|---|---|---|
lakehouse-monitoring-comprehensive | Mandatory — Monitor setup | monitoring/01-lakehouse-monitoring-comprehensive/SKILL.md |
databricks-aibi-dashboards | Mandatory — Dashboard patterns | monitoring/02-databricks-aibi-dashboards/SKILL.md |
sql-alerting-patterns | Mandatory — Alert framework | monitoring/03-sql-alerting-patterns/SKILL.md |
databricks-expert-agent | Mandatory — Extraction principle | common/databricks-expert-agent/SKILL.md |
databricks-asset-bundles | Mandatory — Deployment | common/databricks-asset-bundles/SKILL.md |
databricks-python-imports | Mandatory — Python patterns | common/databricks-python-imports/SKILL.md |