Databricks Table-Valued Functions (TVFs) for Genie
Overview
Table-Valued Functions (TVFs) provide pre-built, parameterized queries that Genie can invoke for natural language access to your data. This skill covers the entire TVF lifecycle: planning, creation, deployment, and validation.
Why TVFs Matter:
- •Pre-built queries ensure consistent business logic across all consumers
- •LLM-friendly metadata helps Genie understand when to use each function
- •Proper SQL patterns prevent compile-time errors and data inflation bugs
- •Parameterized access gives users instant answers to common questions
Key Capabilities:
- •Plan TVFs from business questions using requirements gathering templates
- •Create TVFs with Genie-compatible parameter types (STRING for dates)
- •Validate schemas before writing SQL to prevent 100% of compilation errors
- •Structure comments (v3.0 bullet-point format) for optimal Genie query matching
- •Handle SCD2 dimensions with proper
is_currentfiltering - •Prevent cartesian products in aggregation CTEs
- •Use proper parameter ordering and LIMIT workarounds
- •Deploy TVFs via Asset Bundle jobs
- •Validate TVFs with post-deployment queries
What TVFs Provide:
- •✅ Parameterized queries with input parameters
- •✅ LLM-friendly metadata for Genie understanding
- •✅ Genie-compatible types (STRING for dates, not DATE)
- •✅ Top N patterns using WHERE rank <= param (not LIMIT param)
- •✅ Null safety with NULLIF for all divisions
- •❌ No DATE parameters (Genie doesn't support DATE type)
- •❌ No LIMIT with parameters (use ROW_NUMBER + WHERE instead)
When to Use This Skill
Use this skill when:
- •Planning which TVFs to create from business questions
- •Creating TVFs for Genie Spaces
- •Troubleshooting TVF compilation errors
- •Ensuring Genie compatibility
- •Validating schemas before writing SQL
- •Deploying TVFs via Asset Bundles
- •Preventing common SQL errors (parameter types, LIMIT clauses, cartesian products)
Quick Start (2-3 hours)
Goal: Create 10-15 pre-built, parameterized SQL queries for common business questions.
What You'll Create:
- •
table_valued_functions.sql— SQL file with 10-15 TVF definitions - •
tvf_job.yml— Asset Bundle job (SQL task) for deployment
Fast Track:
CREATE OR REPLACE FUNCTION get_top_stores_by_revenue(
start_date STRING COMMENT 'Start date (format: YYYY-MM-DD)',
end_date STRING COMMENT 'End date (format: YYYY-MM-DD)',
top_n INT DEFAULT 10 COMMENT 'Number of top stores to return'
)
RETURNS TABLE(
rank INT COMMENT 'Store rank by revenue',
store_name STRING COMMENT 'Store display name',
total_revenue DECIMAL(18,2) COMMENT 'Total revenue for period'
)
COMMENT '
• PURPOSE: Returns top N stores ranked by revenue for a date range
• BEST FOR: "What are the top 10 stores by revenue?" | "Show me best performing stores"
• RETURNS: Individual store rows (rank, store_name, total_revenue)
• PARAMS: start_date, end_date, top_n (default: 10)
• SYNTAX: SELECT * FROM get_top_stores_by_revenue(''2024-01-01'', ''2024-12-31'', 10)
'
RETURN ...;
Common Business Naming Patterns:
| Pattern | Template | Example |
|---|---|---|
| Top N | get_top_{entity}_by_{metric}(start_date, end_date, top_n) | get_top_stores_by_revenue(...) |
| Trending | get_{metric}_trend(start_date, end_date) | get_daily_sales_trend(...) |
| Comparison | get_{metric}_by_{dimension}(start_date, end_date) | get_sales_by_state(...) |
| Performance | get_{entity}_performance(entity_id, start_date, end_date) | get_store_performance(...) |
Critical SQL Rules:
- •STRING for date parameters (never DATE)
- •Required parameters first, DEFAULT parameters last
- •ROW_NUMBER + WHERE for Top N (never LIMIT with parameter)
Output: 10-15 TVFs callable by Genie and queryable via SQL
See references/tvf-planning-guide.md for question categorization and domain examples.
See references/tvf-examples.md for 5 complete, production-ready TVF implementations.
See assets/templates/tvf-requirements-template.md to plan your TVFs before coding.
Critical Rules
⚠️ CRITICAL: Schema Validation BEFORE Writing SQL
RULE #0: Always consult YAML schema definitions before writing any TVF SQL
100% of SQL compilation errors are caused by not consulting YAML schemas first.
Pre-Development Checklist:
- •Read YAML schema files (5 minutes)
- •Create SCHEMA_MAPPING.md (2 minutes)
- •Write TVF SQL using documented schema
- •Run validation script (30 sec)
- •Deploy
ROI: 71% time reduction (45 min → 13 min)
First-Time Success Rate: 0% → 95%+
See references/tvf-patterns.md for detailed schema validation workflow.
⚠️ Issue 1: Parameter Types for Genie Compatibility
RULE: Use STRING for date parameters, not DATE
Genie Spaces do not support DATE type parameters. Always use STRING with explicit format documentation.
❌ DON'T:
CREATE FUNCTION get_sales_by_date_range( start_date DATE COMMENT 'Start date', end_date DATE COMMENT 'End date' )
✅ DO:
CREATE FUNCTION get_sales_by_date_range( start_date STRING COMMENT 'Start date (format: YYYY-MM-DD)', end_date STRING COMMENT 'End date (format: YYYY-MM-DD)' ) ... WHERE transaction_date BETWEEN CAST(start_date AS DATE) AND CAST(end_date AS DATE)
⚠️ Issue 2: Parameter Ordering with DEFAULT Values
RULE: Parameters with DEFAULT must come AFTER parameters without DEFAULT
❌ DON'T:
CREATE FUNCTION get_top_stores( top_n INT DEFAULT 10, -- ❌ DEFAULT parameter first start_date STRING, -- ❌ Required parameter after DEFAULT end_date STRING )
✅ DO:
CREATE FUNCTION get_top_stores( start_date STRING, -- ✅ Required parameter first end_date STRING, -- ✅ Required parameter top_n INT DEFAULT 10 -- ✅ Optional parameter last )
⚠️ Issue 3: LIMIT Clauses Cannot Use Parameters
RULE: Use WHERE rank <= parameter instead of LIMIT parameter
LIMIT clauses require compile-time constants. Use WHERE with ROW_NUMBER() instead.
❌ DON'T:
SELECT * FROM store_metrics ORDER BY total_revenue DESC LIMIT top_n; -- ❌ Cannot use parameter here
✅ DO:
WITH ranked_stores AS (
SELECT ...,
ROW_NUMBER() OVER (ORDER BY total_revenue DESC) as rank
FROM store_metrics
)
SELECT * FROM ranked_stores
WHERE rank <= top_n -- ✅ Can use parameter in WHERE
ORDER BY rank;
⚠️ CRITICAL: Cartesian Product Bug in Aggregation CTEs
Never re-join a table that's already been aggregated in a CTE.
❌ BUGGY PATTERN:
WITH period_data AS ( SELECT SUM(revenue) as total_revenue FROM fact_table GROUP BY period ), final AS ( SELECT SUM(pd.total_revenue), SUM(ft.other_metric) -- 🔥 CARTESIAN! FROM period_data pd LEFT JOIN fact_table ft ON ... -- ❌ Re-joining source! )
✅ CORRECT PATTERN:
SELECT period, SUM(revenue) as total_revenue, SUM(other_metric) as other_metric FROM fact_table GROUP BY period; -- ✅ Single aggregation pass
See references/tvf-patterns.md for detailed cartesian product prevention patterns.
Quick Reference
Standardized TVF Comment Format (v3.0)
Use bullet-point format for ALL TVF comments:
COMMENT '
• PURPOSE: [One-line description of what the TVF does]
• BEST FOR: [Example questions separated by |]
• NOT FOR: [What to avoid - redirect to correct TVF] (optional)
• RETURNS: [PRE-AGGREGATED rows or Individual rows] (exact column list)
• PARAMS: [Parameter names with defaults]
• SYNTAX: SELECT * FROM tvf_name(''param1'', ''param2'')
• NOTE: [Important caveats - DO NOT wrap in TABLE(), etc.] (optional)
'
Complete TVF Pattern
CREATE OR REPLACE FUNCTION get_top_stores_by_revenue(
-- Required parameters first (no DEFAULT)
start_date STRING COMMENT 'Start date (format: YYYY-MM-DD)',
end_date STRING COMMENT 'End date (format: YYYY-MM-DD)',
-- Optional parameters last (with DEFAULT)
top_n INT DEFAULT 10 COMMENT 'Number of top stores to return'
)
RETURNS TABLE(
rank INT COMMENT 'Store rank by revenue',
store_number STRING COMMENT 'Store identifier',
store_name STRING COMMENT 'Store name',
total_revenue DECIMAL(18,2) COMMENT 'Total revenue for period',
total_units BIGINT COMMENT 'Total units sold'
)
COMMENT '
• PURPOSE: Returns the top N stores ranked by revenue for a date range
• BEST FOR: "What are the top 10 stores by revenue?" | "Show me best performing stores"
• RETURNS: Individual store rows (rank, store_number, store_name, total_revenue, total_units)
• PARAMS: start_date, end_date, top_n (default: 10)
• SYNTAX: SELECT * FROM get_top_stores_by_revenue(''2024-01-01'', ''2024-12-31'', 10)
• NOTE: Returns user_id for individual store analysis | Sorted by total_revenue DESC
'
RETURN
WITH store_metrics AS (
SELECT
store_number,
store_name,
SUM(net_revenue) as total_revenue,
SUM(net_units) as total_units
FROM fact_sales_daily
WHERE transaction_date BETWEEN CAST(start_date AS DATE) AND CAST(end_date AS DATE)
GROUP BY store_number, store_name
),
ranked_stores AS (
SELECT
ROW_NUMBER() OVER (ORDER BY total_revenue DESC) as rank,
store_number,
store_name,
total_revenue,
total_units
FROM store_metrics
)
SELECT * FROM ranked_stores
WHERE rank <= top_n -- ✅ Use WHERE instead of LIMIT
ORDER BY rank;
Core Patterns
Null Safety
Always use NULLIF() for division to prevent divide-by-zero errors:
-- ✅ DO: Null-safe division total_revenue / NULLIF(transaction_count, 0) as avg_transaction_value
SCD Type 2 Dimension Handling
Always filter for current records when joining SCD2 dimensions:
-- ✅ Correct: Filter for current version LEFT JOIN dim_store ds ON fsd.store_number = ds.store_number AND ds.is_current = true
Aggregate vs Individual Row TVFs
Aggregate TVF (Returns Pre-Aggregated Rows):
- •Returns fixed number of rows (e.g., 5 segment rows)
- •Data is PRE-AGGREGATED - no GROUP BY needed on top
- •Do NOT use in JOINs (no user_id or other keys to join on)
Individual Row TVF (Returns Detail Rows):
- •Returns variable number of rows based on data
- •Each row represents one entity (customer, property, host)
- •CAN be used in JOINs (has identifier columns)
See references/genie-integration.md for detailed examples.
TVF Creation Checklist
SQL Compliance
- • All date parameters are STRING type (not DATE)
- • Required parameters come before optional parameters
- • No parameters used in LIMIT clauses (use WHERE rank <= param)
- • All divisions use NULLIF to prevent divide-by-zero
- • SCD2 joins include
is_current = truefilter - • No cartesian products: CTEs don't re-join tables already aggregated
- • Single aggregation pass: Each source table read and aggregated only once
Genie Optimization (Standardized Comment Format)
- • Function COMMENT uses bullet-point format (• PURPOSE, • BEST FOR, etc.)
- • PURPOSE: One-line description of what TVF does
- • BEST FOR: 2+ example questions (pipe-separated)
- • NOT FOR / PREFERRED OVER: Redirect to correct asset when applicable
- • RETURNS: Specifies PRE-AGGREGATED or Individual rows + exact column list
- • PARAMS: Parameter names with defaults
- • SYNTAX: Exact copyable example with proper date format
- • NOTE: Caveats (DO NOT wrap in TABLE(), DO NOT add GROUP BY, etc.)
- • All parameters have descriptive COMMENT with format
- • All returned columns have COMMENT
- • Professional language (no "metric view is broken" phrases)
Testing
- • Function compiles without errors
- • Function executes with valid parameters
- • Function handles edge cases (empty results, null values)
- • Function tested in Genie Space (if applicable)
- • Results validated against metric view (ratio ≈ 1.0, not 254x)
Implementation Workflow
Phase 1: Planning (30 min)
- • Fill out requirements template (
assets/templates/tvf-requirements-template.md) - • List 10-15 common business questions from stakeholders
- • Categorize questions (revenue, product, entity, trend)
- • Map questions to TVF names and parameters using naming patterns
- • Identify required vs optional parameters
See references/tvf-planning-guide.md for question categories and domain examples.
Phase 2: SQL Development (1-2 hours)
- • Consult YAML schemas FIRST (Rule #0 — see
references/tvf-patterns.md) - • Create
table_valued_functions.sqlfollowing file organization pattern - • Implement each TVF following the template (
assets/templates/tvf-template.sql) - • Verify all date parameters are STRING type
- • Verify parameter ordering (required first)
- • Verify Top N uses ROW_NUMBER + WHERE (not LIMIT)
- • Verify all divisions use NULLIF
See references/tvf-examples.md for 5 complete TVF examples covering different patterns.
Phase 3: Metadata (30 min)
- • Add function-level COMMENT using v3.0 bullet-point format
- • Add 2+ example questions per function (BEST FOR)
- • Add COMMENT to every parameter (include format for STRING dates)
- • Add COMMENT to every returned column
- • Add NOT FOR / PREFERRED OVER cross-references where applicable
See references/genie-integration.md for comment format details and Genie misuse prevention.
Phase 4: Testing (30 min)
- • Compile each function (no syntax errors)
- • Execute with valid parameters
- • Test edge cases (empty results, null values)
- • Validate results against metric view (ratio ≈ 1.0)
- • Verify results match expectations
See scripts/validate_tvfs.sql for ready-to-run validation queries.
Phase 5: Deployment
- • Add to Asset Bundle job (
sql_taskwithwarehouse_id) - • Deploy:
databricks bundle deploy -t dev - • Run:
databricks bundle run gold_setup_job -t dev - • Test in Genie Space (if applicable)
See references/tvf-patterns.md (Asset Bundle Deployment section) for job YAML patterns.
Time Estimates:
| Phase | Duration | Activities |
|---|---|---|
| Phase 1: Planning | 30 min | Requirements gathering, question mapping |
| Phase 2: SQL Development | 1-2 hours | Write 10-15 TVFs with schema validation |
| Phase 3: Metadata | 30 min | v3.0 comments, parameter/column documentation |
| Phase 4: Testing | 30 min | Compile, execute, validate against metric views |
| Phase 5: Deployment | 30 min | Asset Bundle deploy, Genie Space testing |
| Total | 2-3 hours | For 10-15 TVFs |
Common Mistakes to Avoid
❌ Don't:
- •Use DATE parameters (use STRING with CAST)
- •Mix DEFAULT and non-DEFAULT parameters
- •Use parameters in LIMIT clauses
- •Re-join tables after aggregation (cartesian product)
- •Skip schema validation before writing SQL
✅ Do:
- •Validate schemas from YAML before coding
- •Use STRING for date parameters
- •Put required parameters before optional ones
- •Use WHERE rank <= param instead of LIMIT param
- •Single aggregation pass, no self-joins
Reference Files
- •
references/tvf-patterns.md— SQL patterns, parameter types, cartesian product prevention, schema validation, SQL file organization, Asset Bundle deployment - •
references/genie-integration.md— Genie compatibility, v3.0 comment format, misuse prevention, professional language standards - •
references/tvf-examples.md— 5 complete production-ready TVF examples (ranking, drilldown, product, geographic, temporal) - •
references/tvf-planning-guide.md— Question categorization, TVF planning tables, domain-specific examples (Retail, Healthcare, Finance, Hospitality)
Assets
- •
assets/templates/tvf-template.sql— Starter SQL template for new TVFs with v3.0 comment format - •
assets/templates/tvf-requirements-template.md— Requirements gathering template (fill in before coding)
Scripts
- •
scripts/validate_tvfs.sql— Post-deployment validation queries (list, describe, test, compare to metric views)
References
Official Documentation
Related Skills
- •
metric-views-patterns- Metric view YAML structure - •
genie-space-patterns- Genie Space setup
Version History
- •v2.0 (Feb 2026) - Merged comprehensive TVF creation workflow: Quick Start, requirements gathering template, 5 complete examples, planning guide with domain-specific examples, implementation workflow (5 phases), Asset Bundle deployment patterns, validation queries, common business naming patterns
- •v1.3 (Dec 16, 2025) - Standardized TVF comment format v3.0 for Genie optimization
- •v1.2 (Dec 15, 2025) - Critical bug prevention: Cartesian product in aggregations
- •v1.1 (Dec 2025) - Major enhancement: Schema-first development patterns
- •v1.0 (Oct 2025) - Initial rule based on 15 TVF deployment learnings