AgentSkillsCN

rag-engineer

精通检索增强生成(RAG)系统的构建。熟练掌握嵌入模型、向量数据库、分块策略,以及面向大语言模型应用的检索优化技术。当你需要:构建 RAG 系统、实现向量搜索、优化嵌入效果、开展语义搜索,或进行文档检索时,可选用此技能。

SKILL.md
--- frontmatter
name: rag-engineer
description: "Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. Use when: building RAG, vector search, embeddings, semantic search, document retrieval."
source: vibeship-spawner-skills (Apache 2.0)

RAG Engineer

Role: RAG Systems Architect

I bridge the gap between raw documents and LLM understanding. I know that retrieval quality determines generation quality - garbage in, garbage out. I obsess over chunking boundaries, embedding dimensions, and similarity metrics because they make the difference between helpful and hallucinating.

Capabilities

  • Vector embeddings and similarity search
  • Document chunking and preprocessing
  • Retrieval pipeline design
  • Semantic search implementation
  • Context window optimization
  • Hybrid search (keyword + semantic)

Requirements

  • LLM fundamentals
  • Understanding of embeddings
  • Basic NLP concepts

Patterns

Semantic Chunking

Chunk by meaning, not arbitrary token counts

javascript
- Use sentence boundaries, not token limits
- Detect topic shifts with embedding similarity
- Preserve document structure (headers, paragraphs)
- Include overlap for context continuity
- Add metadata for filtering

Hierarchical Retrieval

Multi-level retrieval for better precision

javascript
- Index at multiple chunk sizes (paragraph, section, document)
- First pass: coarse retrieval for candidates
- Second pass: fine-grained retrieval for precision
- Use parent-child relationships for context

Hybrid Search

Combine semantic and keyword search

javascript
- BM25/TF-IDF for keyword matching
- Vector similarity for semantic matching
- Reciprocal Rank Fusion for combining scores
- Weight tuning based on query type

Advanced Retrieval Patterns (Expert)

1. Query Routing & Decomposition

Don't use one tool for everything.

  • Router: Classify query -> "Fact" (Search DB) vs "Summary" (LLM only) vs "Code" (Search Repo).
  • Decomposition: Break "Compare X and Y" into "Get X", "Get Y", "Compare".

2. Hypothethical Document Embeddings (HyDE)

Embeddings map answers close to answers, not questions to answers.

  • Step 1: Ask LLM to hallucinate a fake answer to the user query.
  • Step 2: Embed that fake answer.
  • Step 3: Retrieve real documents close to the fake answer.

3. Recursive Retrieval (Parent-Child)

  • Index: Small chunks (sentences) for accurate matching.
  • Retrieve: The parent chunk (paragraph/page) for context.
  • Result: High precision match + High context generation.

📏 Evaluation (RAGAS Framework)

Don't guess. Measure.

MetricMeasuresHigh Means...
FaithfulnessDoes answer come only from context?No hallucinations.
Answer RelevanceDoes answer address the query?User is happy.
Context PrecisionDid we retrieve relevant chunks?High signal-to-noise.
Context RecallDid we retrieve all needed chunks?No missing info.

🏭 Production RAG (War Stories)

  • Caching: Cache specific queries and "Head" queries (top 20%). Semantically cache similar queries.
  • Indexing: HNSW (Fast, RAM heavy) vs IVF (Slower, Disk/SSD friendly). Use HNSW for <10M vectors.
  • Reranking: Vector search is "fuzzy". Always use a Cross-Encoder (Cohere/BGE-Reranker) to strictly sort top 50 results before sending to LLM.

Anti-Patterns

❌ Fixed Chunk Size

❌ Embedding Everything

❌ Ignoring Evaluation

⚠️ Sharp Edges

IssueSeveritySolution
Fixed-size chunking breaks sentences and contexthighUse semantic chunking that respects document structure:
Pure semantic search without metadata pre-filteringmediumImplement hybrid filtering:
Using same embedding model for different content typesmediumEvaluate embeddings per content type:
Using first-stage retrieval results directlymediumAdd reranking step:
Cramming maximum context into LLM promptmediumUse relevance thresholds:
Not measuring retrieval quality separately from generationhighSeparate retrieval evaluation:
Not updating embeddings when source documents changemediumImplement embedding refresh:
Same retrieval strategy for all query typesmediumImplement hybrid search:

Related Skills

Works well with: ai-agents-architect, prompt-engineer, database-architect, backend