Deploy RAG Pipeline
Deploy a complete RAG (Retrieval-Augmented Generation) pipeline on OpenShift AI.
Components
- •Vector database for document embeddings
- •Embedding model inference service
- •LLM inference service with vLLM
Usage
typescript
import { deployRagPipeline } from './skills/deploy-rag-pipeline';
await deployRagPipeline({
projectName: "customer-support",
llmModel: "s3://models/llama3-8b/",
embeddingModel: "s3://models/bge-small/",
documentSource: "s3://docs/knowledge-base/"
});
Parameters
| Parameter | Required | Description |
|---|---|---|
| projectName | Yes | Target OpenShift AI project |
| llmModel | Yes | S3 path to LLM model |
| embeddingModel | Yes | S3 path to embedding model |
| documentSource | Yes | S3 path to documents |
| llmGpuCount | No | GPUs for LLM (default: 2) |
| embeddingReplicas | No | Embedding model replicas (default: 2) |
What This Skill Does
- •Creates a new OpenShift AI project (if it doesn't exist)
- •Deploys an embedding model using OpenVINO runtime
- •Deploys a LLM using vLLM runtime with GPU support
- •Configures auto-scaling for the embedding model
- •Sets up data connections for model storage
Prerequisites
- •S3-compatible storage with model artifacts
- •GPU nodes available in the cluster (for LLM)
- •Sufficient quota in the target project