Deploy RAG Pipeline

Name: Deploy Rag Pipeline
Rating: 78
Author: maxamillion

Deploy a complete RAG (Retrieval-Augmented Generation) pipeline on OpenShift AI.

Components

•Vector database for document embeddings
•Embedding model inference service
•LLM inference service with vLLM

Usage

typescript

import { deployRagPipeline } from './skills/deploy-rag-pipeline';

await deployRagPipeline({
  projectName: "customer-support",
  llmModel: "s3://models/llama3-8b/",
  embeddingModel: "s3://models/bge-small/",
  documentSource: "s3://docs/knowledge-base/"
});

Parameters

Parameter	Required	Description
projectName	Yes	Target OpenShift AI project
llmModel	Yes	S3 path to LLM model
embeddingModel	Yes	S3 path to embedding model
documentSource	Yes	S3 path to documents
llmGpuCount	No	GPUs for LLM (default: 2)
embeddingReplicas	No	Embedding model replicas (default: 2)

What This Skill Does

•Creates a new OpenShift AI project (if it doesn't exist)
•Deploys an embedding model using OpenVINO runtime
•Deploys a LLM using vLLM runtime with GPU support
•Configures auto-scaling for the embedding model
•Sets up data connections for model storage

Prerequisites

•S3-compatible storage with model artifacts
•GPU nodes available in the cluster (for LLM)
•Sufficient quota in the target project