Semantic Layer Setup Orchestrator
End-to-end workflow for building the Databricks semantic layer — Metric Views, Table-Valued Functions, Genie Spaces, and optimization — on top of a completed Gold layer.
Predecessor: gold-layer-setup skill (Gold tables must exist before using this skill)
Time Estimate: 4-6 hours for initial setup, 1-2 hours per additional domain
What You'll Create:
- •Metric Views — YAML-based semantic definitions for each Gold table
- •Table-Valued Functions (TVFs) — parameterized SQL functions for Genie
- •Genie Spaces — configured with agent instructions, data assets, benchmark questions
- •Optimization results — accuracy ≥95%, repeatability ≥90%
Decision Tree
| Question | Action |
|---|---|
| Building semantic layer end-to-end? | Use this skill — it orchestrates everything |
| Only need Metric Views? | Read semantic-layer/01-metric-views-patterns/SKILL.md directly |
| Only need TVFs? | Read semantic-layer/02-databricks-table-valued-functions/SKILL.md directly |
| Only need Genie Space setup? | Read semantic-layer/03-genie-space-patterns/SKILL.md directly |
| Need Genie API automation? | Read semantic-layer/04-genie-space-export-import-api/SKILL.md directly |
| Need to optimize Genie accuracy? | Read semantic-layer/05-genie-space-optimization/SKILL.md directly |
Mandatory Skill Dependencies
CRITICAL: Before generating ANY code for the semantic layer, you MUST read and follow the patterns in these common skills. Do NOT generate these patterns from memory.
| Phase | MUST Read Skill (use Read tool on SKILL.md) | What It Provides |
|---|---|---|
| All phases | common/databricks-expert-agent | Core extraction principle: extract names from source, never hardcode |
| Metric Views | common/databricks-python-imports | Pure Python module patterns for helpers |
| Deployment | common/databricks-asset-bundles | Job YAML, deployment patterns |
| Troubleshooting | common/databricks-autonomous-operations | Deploy → Poll → Diagnose → Fix → Redeploy loop when jobs fail |
Semantic-Domain Dependencies
| Skill | Requirement | What It Provides |
|---|---|---|
semantic-layer/01-metric-views-patterns | MUST read at Phase 1 | YAML syntax, validation, joins, window measures |
semantic-layer/02-databricks-table-valued-functions | MUST read at Phase 2 | STRING params, Genie compatibility, null safety |
semantic-layer/03-genie-space-patterns | MUST read at Phase 3 | 7-section deliverable, agent instructions, benchmark Qs |
semantic-layer/04-genie-space-export-import-api | Optional at Phase 4 | Programmatic Genie Space deployment via REST API |
semantic-layer/05-genie-space-optimization | MUST read at Phase 5 | Benchmark testing, 6 control levers, optimization loop |
🔴 Non-Negotiable Defaults
| Default | Value | Applied Where | NEVER Do This Instead |
|---|---|---|---|
| Metric View syntax | WITH METRICS LANGUAGE YAML | Every Metric View DDL | ❌ NEVER use non-YAML metric views |
| TVF parameters | All STRING type | Every TVF signature | ❌ NEVER use DATE, INT, or other non-STRING params (Genie incompatible) |
| Genie warehouse | Serverless SQL Warehouse | Every Genie Space | ❌ NEVER use Classic or Pro warehouse |
| Benchmark questions | Minimum 10 per Genie Space | Every Genie Space | ❌ NEVER deploy without benchmarks |
| Column comments | Required on all Gold tables | Before Genie Space creation | ❌ NEVER create Genie Space without column comments |
Phased Implementation Workflow
Phase 0: Read Plan (5 minutes)
Before starting implementation, check for a planning manifest that defines what to build.
import yaml
from pathlib import Path
manifest_path = Path("plans/manifests/semantic-layer-manifest.yaml")
if manifest_path.exists():
with open(manifest_path) as f:
manifest = yaml.safe_load(f)
# Extract implementation checklist from manifest
domains = manifest.get('domains', {})
for domain_name, domain_config in domains.items():
metric_views = domain_config.get('metric_views', [])
tvfs = domain_config.get('tvfs', [])
genie_spaces = domain_config.get('genie_spaces', [])
print(f"Domain {domain_name}: {len(metric_views)} MVs, {len(tvfs)} TVFs, {len(genie_spaces)} Genie Spaces")
# Use manifest as the implementation checklist
# Each artifact has name, description, source tables, and business questions
else:
# Fallback: self-discovery from Gold tables
print("No manifest found — falling back to Gold table self-discovery")
# Discover Gold tables from catalog and infer metric views from fact tables
If manifest exists: Use it as the implementation checklist. Every Metric View, TVF, and Genie Space is pre-defined with names, source tables, dimensions, measures, and business questions. Track completion against the manifest's summary counts.
If manifest doesn't exist: Fall back to self-discovery — inventory Gold tables, infer metric views from fact tables, and derive TVFs from common business queries. This works but may miss artifacts the planning phase would have caught.
Phase 1: Metric Views (1-2 hours)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/common/databricks-expert-agent/SKILL.md | Extract-don't-generate principle |
| 2 | data_product_accelerator/skills/semantic-layer/01-metric-views-patterns/SKILL.md | YAML syntax, validation, joins |
Steps:
- •Inventory all Gold tables that need Metric Views
- •For each table, create a Metric View YAML file with dimensions and measures
- •Validate column references against actual Gold table schemas
- •Deploy Metric Views using
CREATE VIEW ... WITH METRICS LANGUAGE YAML - •Test each Metric View with sample queries
Phase 2: Table-Valued Functions (1-2 hours)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/semantic-layer/02-databricks-table-valued-functions/SKILL.md | STRING params, null safety, Genie compat |
Steps:
- •Identify business questions that require parameterized queries
- •Design TVF signatures (ALL STRING parameters)
- •Implement TVFs with null safety and SCD2 handling
- •Add v3.0 bullet-point comments for Genie discoverability
- •Deploy and validate with test queries
Phase 3: Genie Space Setup (1 hour)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/semantic-layer/03-genie-space-patterns/SKILL.md | 7-section deliverable, agent instructions |
Steps:
- •Verify all Gold tables have column comments (Genie depends on these)
- •Select data assets: Metric Views → TVFs → Gold Tables (priority order)
- •Write General Instructions (≤20 lines)
- •Create benchmark questions with exact SQL answers (minimum 10)
- •Configure Serverless SQL Warehouse
- •Deploy Genie Space
Phase 4: API Deployment (Optional, 30 min)
Read only if automating Genie Space deployment:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/semantic-layer/04-genie-space-export-import-api/SKILL.md | REST API, JSON schema, CI/CD |
Steps:
- •Export existing Genie Space as JSON
- •Parameterize with variable substitution
- •Import to target environment via REST API
Phase 5: Optimization Loop (1-2 hours)
MANDATORY: Read each skill below using the Read tool BEFORE writing any code for this phase:
| # | Skill Path | What It Provides |
|---|---|---|
| 1 | data_product_accelerator/skills/semantic-layer/05-genie-space-optimization/SKILL.md | Benchmark testing, 6 control levers |
Steps:
- •Run benchmark questions via Conversation API
- •Evaluate accuracy and repeatability scores
- •Apply control levers (UC metadata, Metric Views, TVFs, Instructions)
- •Re-test until accuracy ≥95% and repeatability ≥90%
- •Document optimization results
Post-Creation Validation
Common Skill Compliance
- • Names extracted from Gold YAML (not generated) per
databricks-expert-agent - • Asset Bundle YAML follows
databricks-asset-bundlespatterns - • Python imports follow
databricks-python-importspatterns
Semantic Layer Specifics
- • All Metric Views use
WITH METRICS LANGUAGE YAMLsyntax - • All TVFs use STRING parameters only
- • All TVFs have v3.0 bullet-point comments
- • Genie Space has ≤20 line General Instructions
- • Genie Space has ≥10 benchmark questions with exact SQL
- • Genie Space uses Serverless SQL Warehouse
- • All Gold tables have column comments before Genie Space creation
- • Optimization targets met: accuracy ≥95%, repeatability ≥90%
Pipeline Progression
Previous stage: planning/00-project-planning → Project plan for semantic layer, observability, ML, and GenAI agent phases should be complete
Next stage: After completing the semantic layer, proceed to:
- •
monitoring/00-observability-setup— Set up Lakehouse Monitoring, AI/BI Dashboards, and SQL Alerts
Related Skills
| Skill | Relationship | Path |
|---|---|---|
metric-views-patterns | Mandatory — Metric View YAML | semantic-layer/01-metric-views-patterns/SKILL.md |
databricks-table-valued-functions | Mandatory — TVF patterns | semantic-layer/02-databricks-table-valued-functions/SKILL.md |
genie-space-patterns | Mandatory — Genie Space setup | semantic-layer/03-genie-space-patterns/SKILL.md |
genie-space-export-import-api | Optional — API automation | semantic-layer/04-genie-space-export-import-api/SKILL.md |
genie-space-optimization | Mandatory — Optimization loop | semantic-layer/05-genie-space-optimization/SKILL.md |
databricks-expert-agent | Mandatory — Extraction principle | common/databricks-expert-agent/SKILL.md |
databricks-asset-bundles | Mandatory — Deployment | common/databricks-asset-bundles/SKILL.md |
databricks-python-imports | Mandatory — Python patterns | common/databricks-python-imports/SKILL.md |