AgentSkillsCN

Create Model Server

创建模型服务器

SKILL.md

Create Model Server

Deploy a model serving endpoint using KServe with your choice of runtime.

Usage

typescript
import { createModelServer } from './skills/create-model-server';

await createModelServer({
  projectName: "production-models",
  modelName: "sentiment-classifier",
  modelPath: "s3://models/sentiment/",
  runtime: "openvino",
  minReplicas: 2,
  maxReplicas: 5
});

Parameters

ParameterRequiredDescription
projectNameYesTarget OpenShift AI project
modelNameYesName for the inference service
modelPathYesS3 path to model artifacts
runtimeNoServing runtime: openvino (default), vllm, tgis
gpuCountNoNumber of GPUs (default: 0 for CPU inference)
minReplicasNoMinimum replicas (default: 1)
maxReplicasNoMaximum replicas (default: 3)

Supported Runtimes

  • openvino: Intel OpenVINO for optimized CPU inference
  • vllm: vLLM for high-throughput LLM inference (requires GPU)
  • tgis: Text Generation Inference Server (requires GPU)

What This Skill Does

  1. Validates the model path is accessible
  2. Deploys an InferenceService with KServe
  3. Configures auto-scaling based on replica settings
  4. Returns the inference endpoint URL when ready