Create Model Server

Name: Create Model Server
Rating: 78
Author: maxamillion

Deploy a model serving endpoint using KServe with your choice of runtime.

Usage

typescript

import { createModelServer } from './skills/create-model-server';

await createModelServer({
  projectName: "production-models",
  modelName: "sentiment-classifier",
  modelPath: "s3://models/sentiment/",
  runtime: "openvino",
  minReplicas: 2,
  maxReplicas: 5
});

Parameters

Parameter	Required	Description
projectName	Yes	Target OpenShift AI project
modelName	Yes	Name for the inference service
modelPath	Yes	S3 path to model artifacts
runtime	No	Serving runtime: openvino (default), vllm, tgis
gpuCount	No	Number of GPUs (default: 0 for CPU inference)
minReplicas	No	Minimum replicas (default: 1)
maxReplicas	No	Maximum replicas (default: 3)

Supported Runtimes

•openvino: Intel OpenVINO for optimized CPU inference
•vllm: vLLM for high-throughput LLM inference (requires GPU)
•tgis: Text Generation Inference Server (requires GPU)

What This Skill Does

•Validates the model path is accessible
•Deploys an InferenceService with KServe
•Configures auto-scaling based on replica settings
•Returns the inference endpoint URL when ready