AgentSkillsCN

Deploy Rag Pipeline

部署 RAG 流程

SKILL.md

Deploy RAG Pipeline

Deploy a complete RAG (Retrieval-Augmented Generation) pipeline on OpenShift AI.

Components

  • Vector database for document embeddings
  • Embedding model inference service
  • LLM inference service with vLLM

Usage

typescript
import { deployRagPipeline } from './skills/deploy-rag-pipeline';

await deployRagPipeline({
  projectName: "customer-support",
  llmModel: "s3://models/llama3-8b/",
  embeddingModel: "s3://models/bge-small/",
  documentSource: "s3://docs/knowledge-base/"
});

Parameters

ParameterRequiredDescription
projectNameYesTarget OpenShift AI project
llmModelYesS3 path to LLM model
embeddingModelYesS3 path to embedding model
documentSourceYesS3 path to documents
llmGpuCountNoGPUs for LLM (default: 2)
embeddingReplicasNoEmbedding model replicas (default: 2)

What This Skill Does

  1. Creates a new OpenShift AI project (if it doesn't exist)
  2. Deploys an embedding model using OpenVINO runtime
  3. Deploys a LLM using vLLM runtime with GPU support
  4. Configures auto-scaling for the embedding model
  5. Sets up data connections for model storage

Prerequisites

  • S3-compatible storage with model artifacts
  • GPU nodes available in the cluster (for LLM)
  • Sufficient quota in the target project