RAG Retrieval Skill
Query your local document knowledge base using semantic search and get AI-powered answers.
Overview
This skill enables RAG (Retrieval-Augmented Generation) queries against your locally indexed documents. It uses semantic search to find relevant documents and generates answers using Claude Haiku.
Usage
code
/skill rag-retrieval "How to configure the API?"
Features
- •Semantic Search: Uses vector similarity to find relevant documents
- •Hybrid Retrieval: Combines vector search with keyword matching for better accuracy
- •Context-Aware Answers: Uses claude-haiku-4-5-20251001 to generate responses
- •Citation Support: Shows sources for generated answers
- •Performance Monitoring: Tracks query latency and accuracy
Arguments
- •
query(required): Your question or search query - •
--top-k(optional): Number of documents to retrieve (default: 5) - •
--threshold(optional): Minimum similarity score (default: 0.7) - •
--mode(optional): Search mode - "hybrid", "vector", or "keyword" (default: "hybrid")
Examples
Basic Query
code
/skill rag-retrieval "What is the authentication process?"
Retrieve More Context
code
/skill rag-retrieval "How to handle errors?" --top-k 10
Vector-Only Search
code
/skill rag-retrieval "API rate limits" --mode vector
Configuration
The skill uses the following configuration from config/default.yaml:
- •
retrieval.top_k: Default number of documents to retrieve - •
retrieval.hybrid_ratio: Balance between vector and keyword search (0.7 = 70% vector) - •
claude.model: LLM model for response generation - •
claude.max_tokens: Maximum response length
Performance
Typical latencies:
- •Vector search: <100ms
- •End-to-end response: <5 seconds
- •Indexing: ~0.5s per 100 documents
Requirements
- •Indexed documents in
data/vectors/ - •Valid Anthropic API key in environment
- •At least 2GB RAM for vector operations
Troubleshooting
No Results Found
- •Ensure documents are indexed:
python scripts/index.py --input data/documents - •Lower the similarity threshold:
--threshold 0.5 - •Try keyword mode if vector search fails
Slow Responses
- •Reduce top_k value for faster retrieval
- •Check if vector index is optimized for your document count
- •Monitor memory usage with
python -m src.monitoring.tcp_server
API Errors
- •Verify ANTHROPIC_API_KEY environment variable
- •Check API rate limits and quota
- •Review logs in
logs/rag_cli.log