AgentSkillsCN

genie-space-export-import-api

全面覆盖 Databricks Genie Space 导出/导入 API 的模式:JSON Schema、序列化格式,以及程序化部署方案。适用于通过 REST API 进行 Genie Space 的程序化创建、导出或导入,排查 API 部署中的各类错误,或为 Genie Space 实施 CI/CD 流程。方案包含完整的 GenieSpaceExport Schema、API 端点(列表、获取、创建、更新、删除)、JSON 格式要求、ID 生成机制、变量替换策略、基于库存的生成模式,以及生产环境部署 checklist。

SKILL.md
--- frontmatter
name: genie-space-export-import-api
description: Comprehensive patterns for Databricks Genie Space Export/Import API - JSON schema, serialization format, and programmatic deployment. Use when programmatically creating, exporting, or importing Genie Spaces via REST API, troubleshooting API deployment errors, or implementing CI/CD for Genie Spaces. Includes complete GenieSpaceExport schema, API endpoints (List, Get, Create, Update, Delete), JSON format requirements, ID generation, variable substitution, inventory-driven generation patterns, and production deployment checklists.
metadata:
  author: prashanth subrahmanyam
  version: "1.0"
  domain: semantic-layer
  role: worker
  pipeline_stage: 6
  pipeline_stage_name: semantic-layer
  called_by:
    - semantic-layer-setup
  standalone: true
  last_verified: "2026-02-07"
  volatility: high
  upstream_sources:
    - name: "ai-dev-kit"
      repo: "databricks-solutions/ai-dev-kit"
      paths:
        - "databricks-skills/databricks-genie/SKILL.md"
      relationship: "extended"
      last_synced: "2026-02-09"
      sync_commit: "97a3637"

Genie Space Export/Import API

Overview

This skill provides comprehensive patterns for programmatically creating, exporting, and importing Databricks Genie Spaces via the REST API. It covers the complete GenieSpaceExport JSON schema, API endpoints, common deployment errors, and production-ready workflows including variable substitution and asset inventory-driven generation.

When to Use This Skill

Use this skill when you need to:

  • Programmatically deploy Genie Spaces via REST API (CI/CD pipelines, environment promotion)
  • Export Genie Space configurations for version control, backup, or migration
  • Troubleshoot API deployment errors (BAD_REQUEST, INVALID_PARAMETER_VALUE, INTERNAL_ERROR)
  • Implement cross-workspace deployment with template variable substitution
  • Generate Genie Spaces from asset inventories to prevent non-existent table errors
  • Validate Genie Space JSON structure before deployment
  • Understand the complete GenieSpaceExport schema (config, data_sources, instructions, benchmarks)

Quick Reference

API Operations

OperationMethodEndpointUse Case
List SpacesGET/api/2.0/genie/spacesDiscover existing spaces
Get SpaceGET/api/2.0/genie/spaces/{space_id}Export config, backup
Create SpacePOST/api/2.0/genie/spacesNew deployment, CI/CD
Update SpacePATCH/api/2.0/genie/spaces/{space_id}Modify config, add benchmarks
Delete SpaceDELETE/api/2.0/genie/spaces/{space_id}Cleanup, teardown

API Limits

ResourceLimitEnforcement
instructions.sql_functionsMax 50Truncate in generation script
benchmarks.questionsMax 50Truncate in generation script
data_sources.tablesNo hard limitKeep ~25-30 for performance
data_sources.metric_viewsNo hard limitKeep ~5-10 per space

Core Workflow

Initial Deployment:

  1. List spaces (check if already exists)
  2. Load configuration from JSON file
  3. Substitute template variables (${catalog}, ${gold_schema}, etc.)
  4. Create space with full configuration
  5. Get space to verify deployment

Incremental Updates:

  1. Get current space configuration
  2. Modify specific sections (e.g., add benchmarks)
  3. Update space with PATCH (partial update)

Migration/Backup:

  1. Get space with include_serialized_space=true
  2. Save JSON to version control
  3. Create space in new environment (with variable substitution)

Key Patterns

1. JSON Structure Requirements

CRITICAL: The serialized_space field must be a JSON string (escaped), not a nested object:

python
payload = {
    "title": "My Space",
    "warehouse_id": "abc123",
    "serialized_space": json.dumps(genie_config)  # ✅ String, not dict
}

2. ID Generation

All IDs must be 32-character hex strings (UUID without dashes):

python
import uuid

def generate_genie_id():
    return uuid.uuid4().hex  # "01f0ad0d629b11879bb8c06e03b919f8"

Required IDs:

  • config.sample_questions[].id
  • instructions.text_instructions[].id
  • instructions.sql_functions[].id
  • instructions.join_specs[].id
  • benchmarks.questions[].id

3. Array Format Requirements

CRITICAL: All string fields that appear as arrays must be arrays, even for single values:

json
{
  "config": {
    "sample_questions": [
      {
        "id": "...",
        "question": ["What is revenue?"]  // ✅ Array, not string
      }
    ]
  }
}

4. Template Variable Substitution

NEVER hardcode schema paths. Use template variables:

json
{
  "data_sources": {
    "tables": [
      {"identifier": "${catalog}.${gold_schema}.dim_store"}  // ✅ Template
    ]
  }
}

Substitute at runtime:

python
def substitute_variables(data: dict, variables: dict) -> dict:
    json_str = json.dumps(data)
    json_str = json_str.replace("${catalog}", variables.get('catalog', ''))
    json_str = json_str.replace("${gold_schema}", variables.get('gold_schema', ''))
    return json.loads(json_str)

5. Asset Inventory-Driven Generation

NEVER manually edit data_sources. Generate from verified inventory:

python
# Load inventory
with open('actual_assets_inventory.json') as f:
    inventory = json.load(f)

# Generate data_sources from inventory
genie_config['data_sources']['tables'] = [
    {"identifier": table_id}
    for table_id in inventory['genie_space_mappings']['cost_intelligence']['tables']
]

Benefits:

  • ✅ Prevents "table doesn't exist" errors
  • ✅ Enforces API limits automatically
  • ✅ Single source of truth for assets

6. Column Configs Warning

column_configs triggers Unity Catalog validation that can fail for complex spaces:

json
{
  "data_sources": {
    "metric_views": [
      {
        "identifier": "catalog.schema.mv_sales"
        // ✅ Start without column_configs for reliable deployment
      }
    ]
  }
}

Trade-off:

  • Without column_configs: Reliable deployment, less LLM context
  • With column_configs: More LLM context, higher risk of INTERNAL_ERROR

7. Field Validation Rules

config.sample_questions:

  • ✅ Array of objects (not strings)
  • ✅ Each object: {id: string, question: string[]}
  • ❌ NO name, description fields

data_sources.metric_views:

  • identifier field (full 3-part UC name)
  • ✅ Optional: description, column_configs
  • ❌ NO id, name, full_name fields

instructions.sql_functions:

  • id field (32 hex chars) - REQUIRED
  • identifier field (full 3-part function name) - REQUIRED
  • ❌ NO other fields (name, signature, description)

Common Errors & Quick Fixes

ErrorCauseQuick Fix
BAD_REQUEST: Invalid JSONsample_questions as stringsConvert to objects with id and question[]
BAD_REQUEST: Invalid JSONmetric_views with full_nameUse identifier instead
INTERNAL_ERROR: Failed to retrieve schemaMissing id in sql_functionsAdd id field (32 hex chars)
INVALID_PARAMETER_VALUE: Expected arrayquestion is stringWrap in array: ["question"]
Exceeded maximum number (50)Too many TVFs/benchmarksTruncate to 50 in generation script

See Troubleshooting Guide for detailed fix scripts.

Reference Files

  • API Reference: Complete API endpoint documentation, request/response schemas, authentication details, Databricks CLI usage
  • Workflow Patterns: Detailed GenieSpaceExport schema (config, data_sources, instructions, benchmarks), ID generation, serialization patterns, variable substitution, asset inventory-driven generation, complete examples
  • Troubleshooting: Common production errors with Python fix scripts, validation checklists, deployment checklist, error recovery patterns, field-level format requirements

Scripts

  • export_genie_space.py: Export Genie Space configurations

    bash
    python scripts/export_genie_space.py --host <workspace> --token <token> --list
    python scripts/export_genie_space.py --host <workspace> --token <token> --space-id <id> --output space.json
    
  • import_genie_space.py: Create/update Genie Spaces from JSON

    bash
    python scripts/import_genie_space.py --host <workspace> --token <token> create \
      --config space.json --title "My Space" --description "..." --warehouse-id <id>
    
    python scripts/import_genie_space.py --host <workspace> --token <token> update \
      --space-id <id> --title "Updated Title"
    

Production Deployment Checklist

  1. Validate JSON Structure

    bash
    python scripts/validate_against_reference.py
    
  2. Validate SQL Queries (if benchmarks present)

    bash
    databricks bundle run -t dev genie_benchmark_validation_job
    
  3. Deploy Genie Spaces

    bash
    databricks bundle deploy -t dev
    databricks bundle run -t dev genie_spaces_deployment_job
    
  4. Verify in UI

    • Navigate to Genie Spaces
    • Test sample questions
    • Verify data sources load correctly

Related Resources

Official Documentation

Related Skills

  • genie-space-patterns - UI-based Genie Space setup
  • metric-views-patterns - Metric view YAML creation
  • databricks-table-valued-functions - TVF patterns for Genie

Version History

  • v3.0 (January 2026) - Inventory-driven programmatic generation, template variables, 100% deployment success
  • v2.0 (January 2026) - Production deployment patterns, format validation, 8 common error fixes
  • v1.0 (January 2026) - Initial schema documentation and API patterns