RAG Pipeline Skill
Quick Start Workflow
When building or maintaining the RAG pipeline:
- •
Content Ingestion (One-time setup)
- •Read all Docusaurus markdown files from
/docs - •Chunk text (800 chars, 200 overlap)
- •Generate embeddings with OpenAI ada-002
- •Upsert to Qdrant with metadata
- •Read all Docusaurus markdown files from
- •
Query Flow (Runtime)
- •Receive user question
- •Generate query embedding
- •Search Qdrant (top 5 results, score >= 0.7)
- •Build context from relevant chunks
- •Pass to OpenAI GPT-4 with context
- •Return answer + sources
- •
Continuous Improvement
- •Monitor search quality (are results relevant?)
- •Adjust chunk size if needed
- •Update score thresholds
- •Add filters for specific chapters
Standard Architecture
code
User Question
↓
[Generate Embedding]
↓
[Search Qdrant]
↓
[Extract Top 5 Chunks]
↓
[Build Context String]
↓
[GPT-4 with Context]
↓
AI Answer + Sources
Key Parameters
- •Chunk size: 800 characters
- •Overlap: 200 characters
- •Embedding model:
text-embedding-ada-002 - •LLM model:
gpt-4orgpt-3.5-turbo - •Search limit: 5 chunks
- •Score threshold: 0.7
- •Context window: ~3000 tokens max
Best Practices
For Physical AI textbook RAG:
- •Preserve code blocks when chunking
- •Include chapter/section in metadata
- •Cite sources in responses
- •Cache embeddings for popular queries
- •Log all queries for analytics
- •Handle "no results" gracefully
Knowledge Base
Detailed guides available:
- •Chunking Strategies →
references/chunking.md - •Ingestion Script →
references/ingestion-script.md - •Query Pipeline →
references/query-pipeline.md - •Context Building →
references/context-building.md - •Error Handling →
references/error-handling.md