Databricks Model Serving

Deploy MLflow models and AI agents to scalable REST API endpoints.

Quick Decision: What Are You Deploying?

Model Type	Pattern	Reference
Traditional ML (sklearn, xgboost)	`mlflow.sklearn.autolog()`	1-classical-ml.md
Custom Python model	`mlflow.pyfunc.PythonModel`	2-custom-pyfunc.md
GenAI Agent (LangGraph, tool-calling)	`ResponsesAgent`	3-genai-agents.md

Prerequisites

•DBR 16.1+ recommended (pre-installed GenAI packages)
•Unity Catalog enabled workspace
•Model Serving enabled

Reference Files

Topic	File	When to Read
Classical ML	1-classical-ml.md	sklearn, xgboost, autolog
Custom PyFunc	2-custom-pyfunc.md	Custom preprocessing, signatures
GenAI Agents	3-genai-agents.md	ResponsesAgent, LangGraph
Tools Integration	4-tools-integration.md	UC Functions, Vector Search
Development & Testing	5-development-testing.md	MCP workflow, iteration
Logging & Registration	6-logging-registration.md	mlflow.pyfunc.log_model
Deployment	7-deployment.md	Job-based async deployment
Querying Endpoints	8-querying-endpoints.md	SDK, REST, MCP tools
Package Requirements	9-package-requirements.md	DBR versions, pip

Quick Start: Deploy a GenAI Agent

Step 1: Install Packages (in notebook or via MCP)

python

%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic
dbutils.library.restartPython()

Or via MCP:

code

execute_databricks_command(code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic")

Step 2: Create Agent File

Create agent.py locally with ResponsesAgent pattern (see 3-genai-agents.md).

Step 3: Upload to Workspace

code

upload_folder(
    local_folder="./my_agent",
    workspace_folder="/Workspace/Users/you@company.com/my_agent"
)

Step 4: Test Agent

code

run_python_file_on_databricks(
    file_path="./my_agent/test_agent.py",
    cluster_id="<cluster_id>"
)

Step 5: Log Model

code

run_python_file_on_databricks(
    file_path="./my_agent/log_model.py",
    cluster_id="<cluster_id>"
)

Step 6: Deploy (Async via Job)

See 7-deployment.md for job-based deployment that doesn't timeout.

Step 7: Query Endpoint

code

query_serving_endpoint(
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}]
)

Quick Start: Deploy a Classical ML Model

python

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression

# Enable autolog with auto-registration
mlflow.sklearn.autolog(
    log_input_examples=True,
    registered_model_name="main.models.my_classifier"
)

# Train - model is logged and registered automatically
model = LogisticRegression()
model.fit(X_train, y_train)

Then deploy via UI or SDK. See 1-classical-ml.md.

MCP Tools

If MCP tools are not available, use the SDK/CLI examples in the reference files below.

Development & Testing

Tool	Purpose
`upload_folder`	Upload agent files to workspace
`run_python_file_on_databricks`	Test agent, log model
`execute_databricks_command`	Install packages, quick tests

Deployment

Tool	Purpose
`create_job`	Create deployment job (one-time)
`run_job_now`	Kick off deployment (async)
`get_run`	Check deployment job status

Querying

Tool	Purpose
`get_serving_endpoint_status`	Check if endpoint is READY
`query_serving_endpoint`	Send requests to endpoint
`list_serving_endpoints`	List all endpoints

Common Workflows

Check Endpoint Status After Deployment

code

get_serving_endpoint_status(name="my-agent-endpoint")

Returns:

json

{
    "name": "my-agent-endpoint",
    "state": "READY",
    "served_entities": [...]
}

Query a Chat/Agent Endpoint

code

query_serving_endpoint(
    name="my-agent-endpoint",
    messages=[
        {"role": "user", "content": "What is Databricks?"}
    ],
    max_tokens=500
)

Query a Traditional ML Endpoint

code

query_serving_endpoint(
    name="sklearn-classifier",
    dataframe_records=[
        {"age": 25, "income": 50000, "credit_score": 720}
    ]
)

Common Issues

Issue	Solution
Invalid output format	Use `self.create_text_output_item(text, id)` - NOT raw dicts!
Endpoint NOT_READY	Deployment takes ~15 min. Use `get_serving_endpoint_status` to poll.
Package not found	Specify exact versions in `pip_requirements` when logging model
Tool timeout	Use job-based deployment, not synchronous calls
Auth error on endpoint	Ensure `resources` specified in `log_model` for auto passthrough
Model not found	Check Unity Catalog path: `catalog.schema.model_name`

Critical: ResponsesAgent Output Format

WRONG - raw dicts don't work:

python

return ResponsesAgentResponse(output=[{"role": "assistant", "content": "..."}])

CORRECT - use helper methods:

python

return ResponsesAgentResponse(
    output=[self.create_text_output_item(text="...", id="msg_1")]
)

Available helper methods:

•self.create_text_output_item(text, id) - text responses
•self.create_function_call_item(id, call_id, name, arguments) - tool calls
•self.create_function_call_output_item(call_id, output) - tool results

model-serving

Databricks Model Serving

Quick Decision: What Are You Deploying?

Prerequisites

Reference Files

Quick Start: Deploy a GenAI Agent

Step 1: Install Packages (in notebook or via MCP)

Step 2: Create Agent File

Step 3: Upload to Workspace

Step 4: Test Agent

Step 5: Log Model

Step 6: Deploy (Async via Job)

Step 7: Query Endpoint

Quick Start: Deploy a Classical ML Model

MCP Tools

Development & Testing

Deployment

Querying

Common Workflows

Check Endpoint Status After Deployment

Query a Chat/Agent Endpoint

Query a Traditional ML Endpoint

Common Issues

Critical: ResponsesAgent Output Format

Resources