Create Model Server
Deploy a model serving endpoint using KServe with your choice of runtime.
Usage
typescript
import { createModelServer } from './skills/create-model-server';
await createModelServer({
projectName: "production-models",
modelName: "sentiment-classifier",
modelPath: "s3://models/sentiment/",
runtime: "openvino",
minReplicas: 2,
maxReplicas: 5
});
Parameters
| Parameter | Required | Description |
|---|---|---|
| projectName | Yes | Target OpenShift AI project |
| modelName | Yes | Name for the inference service |
| modelPath | Yes | S3 path to model artifacts |
| runtime | No | Serving runtime: openvino (default), vllm, tgis |
| gpuCount | No | Number of GPUs (default: 0 for CPU inference) |
| minReplicas | No | Minimum replicas (default: 1) |
| maxReplicas | No | Maximum replicas (default: 3) |
Supported Runtimes
- •openvino: Intel OpenVINO for optimized CPU inference
- •vllm: vLLM for high-throughput LLM inference (requires GPU)
- •tgis: Text Generation Inference Server (requires GPU)
What This Skill Does
- •Validates the model path is accessible
- •Deploys an InferenceService with KServe
- •Configures auto-scaling based on replica settings
- •Returns the inference endpoint URL when ready