AgentSkillsCN

rag-expert

当您构建或优化RAG系统时,不妨采用这一方法。关键词:RAG、检索、向量搜索、嵌入式表示、分块处理、语义搜索。

SKILL.md
--- frontmatter
name: rag-expert
description: Use when building or optimizing RAG systems. Keywords: RAG, retrieval, vector search, embeddings, chunking, semantic search.

RAG Expert

Overview

Retrieval Quality determines Generation Quality.

This skill provides the architecture and code patterns for building production-grade Retrieval-Augmented Generation systems. It enforces "Garbage In, Garbage Out" discipline.

[!IMPORTANT] Key Principle: Never trust raw retrieval. Always rerank.


Quick Example (30 sec)

python
# 1. Semantic Chunking (Context is King)
# See templates/semantic_chunking.py for full implementation
chunks = semantic_chunking(documents)

# 2. Hybrid Search (Keywords + Vectors)
# See templates/hybrid_search.py for full implementation
retriever = setup_hybrid_search(chunks, embeddings, Chroma)

# 3. Rerank & generate
# Always rerank results before sending to LLM
final_docs = reranker.compress_documents(retrieved_docs, query)

Core Principles

1. Chunking Strategy

Don't slice by character count. Use semantic boundaries (sentences, paragraphs). Bad chunks = context loss = hallucination.

  • Reference: templates/semantic_chunking.py

2. Hybrid Search

Pure vector search is insufficient. Vectors miss specific keywords (IDs, names). BM25 misses concepts. Use both.

  • Reference: templates/hybrid_search.py

3. Evaluation First

Measure retrieval separately. If you improved the prompt but reduced retrieval precision, you failed.

  • Reference: templates/retrieval_eval.py

Excusa vs Realidad (Guardrails)

Excusa del AgenteRealidad (Lo que debes hacer)
"Solo usaré embeddings de OpenAI por defecto"Selecciona el modelo correcto. Revisa references/embedding_models.md para elegir según el caso (código, multilingüe, etc.).
"Un chunk size de 1000 es estándar"Depende del contenido. Usa chunking semántico, no un número arbitrario.
"No necesito reranking, mi vector store es bueno"Mentira. El reranking siempre mejora el Precision@k. Hazlo.
"El usuario no pidió evaluación"Implementa métricas básicas. Sin Recall@k vuelas a ciegas.

Architecture Patterns

Hierarchical Retrieval

Index summaries for fast search, retrieve full chunks for generation.

  1. Summary Index -> Search
  2. Full Doc Store -> Retrieve

Contextual Reranking

Use an LLM or Cross-Encoder to rescore top-K results.

  • Input: Query + Top 50 docs
  • Output: Top 5 relevant docs (re-ordered)

Resources