AgentSkillsCN

model-serving

部署并查询 Databricks Model Serving 端点。当您(1)将 MLflow 模型或 AI 代理部署至端点,(2)创建 ChatAgent/ResponsesAgent 代理,(3)集成 UC Functions 或向量搜索工具,(4)查询已部署的端点,(5)检查端点状态时,可选用此方案。该方案涵盖经典机器学习模型、自定义 pyfunc 以及生成式 AI 代理。

SKILL.md
--- frontmatter
name: model-serving
description: "Deploy and query Databricks Model Serving endpoints. Use when (1) deploying MLflow models or AI agents to endpoints, (2) creating ChatAgent/ResponsesAgent agents, (3) integrating UC Functions or Vector Search tools, (4) querying deployed endpoints, (5) checking endpoint status. Covers classical ML models, custom pyfunc, and GenAI agents."

Databricks Model Serving

Deploy MLflow models and AI agents to scalable REST API endpoints.

Quick Decision: What Are You Deploying?

Model TypePatternReference
Traditional ML (sklearn, xgboost)mlflow.sklearn.autolog()1-classical-ml.md
Custom Python modelmlflow.pyfunc.PythonModel2-custom-pyfunc.md
GenAI Agent (LangGraph, tool-calling)ResponsesAgent3-genai-agents.md

Prerequisites

  • DBR 16.1+ recommended (pre-installed GenAI packages)
  • Unity Catalog enabled workspace
  • Model Serving enabled

Reference Files

TopicFileWhen to Read
Classical ML1-classical-ml.mdsklearn, xgboost, autolog
Custom PyFunc2-custom-pyfunc.mdCustom preprocessing, signatures
GenAI Agents3-genai-agents.mdResponsesAgent, LangGraph
Tools Integration4-tools-integration.mdUC Functions, Vector Search
Development & Testing5-development-testing.mdMCP workflow, iteration
Logging & Registration6-logging-registration.mdmlflow.pyfunc.log_model
Deployment7-deployment.mdJob-based async deployment
Querying Endpoints8-querying-endpoints.mdSDK, REST, MCP tools
Package Requirements9-package-requirements.mdDBR versions, pip

Quick Start: Deploy a GenAI Agent

Step 1: Install Packages (in notebook or via MCP)

python
%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic
dbutils.library.restartPython()

Or via MCP:

code
execute_databricks_command(code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic")

Step 2: Create Agent File

Create agent.py locally with ResponsesAgent pattern (see 3-genai-agents.md).

Step 3: Upload to Workspace

code
upload_folder(
    local_folder="./my_agent",
    workspace_folder="/Workspace/Users/you@company.com/my_agent"
)

Step 4: Test Agent

code
run_python_file_on_databricks(
    file_path="./my_agent/test_agent.py",
    cluster_id="<cluster_id>"
)

Step 5: Log Model

code
run_python_file_on_databricks(
    file_path="./my_agent/log_model.py",
    cluster_id="<cluster_id>"
)

Step 6: Deploy (Async via Job)

See 7-deployment.md for job-based deployment that doesn't timeout.

Step 7: Query Endpoint

code
query_serving_endpoint(
    name="my-agent-endpoint",
    messages=[{"role": "user", "content": "Hello!"}]
)

Quick Start: Deploy a Classical ML Model

python
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression

# Enable autolog with auto-registration
mlflow.sklearn.autolog(
    log_input_examples=True,
    registered_model_name="main.models.my_classifier"
)

# Train - model is logged and registered automatically
model = LogisticRegression()
model.fit(X_train, y_train)

Then deploy via UI or SDK. See 1-classical-ml.md.


MCP Tools

If MCP tools are not available, use the SDK/CLI examples in the reference files below.

Development & Testing

ToolPurpose
upload_folderUpload agent files to workspace
run_python_file_on_databricksTest agent, log model
execute_databricks_commandInstall packages, quick tests

Deployment

ToolPurpose
create_jobCreate deployment job (one-time)
run_job_nowKick off deployment (async)
get_runCheck deployment job status

Querying

ToolPurpose
get_serving_endpoint_statusCheck if endpoint is READY
query_serving_endpointSend requests to endpoint
list_serving_endpointsList all endpoints

Common Workflows

Check Endpoint Status After Deployment

code
get_serving_endpoint_status(name="my-agent-endpoint")

Returns:

json
{
    "name": "my-agent-endpoint",
    "state": "READY",
    "served_entities": [...]
}

Query a Chat/Agent Endpoint

code
query_serving_endpoint(
    name="my-agent-endpoint",
    messages=[
        {"role": "user", "content": "What is Databricks?"}
    ],
    max_tokens=500
)

Query a Traditional ML Endpoint

code
query_serving_endpoint(
    name="sklearn-classifier",
    dataframe_records=[
        {"age": 25, "income": 50000, "credit_score": 720}
    ]
)

Common Issues

IssueSolution
Invalid output formatUse self.create_text_output_item(text, id) - NOT raw dicts!
Endpoint NOT_READYDeployment takes ~15 min. Use get_serving_endpoint_status to poll.
Package not foundSpecify exact versions in pip_requirements when logging model
Tool timeoutUse job-based deployment, not synchronous calls
Auth error on endpointEnsure resources specified in log_model for auto passthrough
Model not foundCheck Unity Catalog path: catalog.schema.model_name

Critical: ResponsesAgent Output Format

WRONG - raw dicts don't work:

python
return ResponsesAgentResponse(output=[{"role": "assistant", "content": "..."}])

CORRECT - use helper methods:

python
return ResponsesAgentResponse(
    output=[self.create_text_output_item(text="...", id="msg_1")]
)

Available helper methods:

  • self.create_text_output_item(text, id) - text responses
  • self.create_function_call_item(id, call_id, name, arguments) - tool calls
  • self.create_function_call_output_item(call_id, output) - tool results

Resources