SparkGen RAG
Manage the RAG (Retrieval-Augmented Generation) pipeline.
Dynamic Context
Before any action:
- •Read
config/ai_workflow.yaml—knowledge_bases:andrag:sections - •List documents:
ls documents/ - •Check vector index:
ls local_data/vectors/ 2>/dev/null - •If server running:
curl -sf http://localhost:8000/v1/rag/knowledge-bases -H "X-API-Key: ${API_KEY:-dev-local-key}"
Actions
Ingest Documents (/sparkgen-rag ingest [--kb name] [--source path])
bash
python -m app.rag.ingest --kb ${KB:-default} --source ${SOURCE:-./documents}
This will:
- •Read documents from the source directory
- •Chunk them according to
config/ai_workflow.yamlsettings (size, overlap, strategy) - •Generate embeddings using the configured provider
- •Store vectors in the configured backend (FAISS/Milvus)
- •Report: documents processed, chunks created, time taken
Query (/sparkgen-rag query "<question>" [--mode standard|self_rag|graphrag] [--kb name])
If server is running:
bash
curl -s -X POST http://localhost:8000/v1/rag/query \
-H "Content-Type: application/json" \
-H "X-API-Key: ${API_KEY:-dev-local-key}" \
-d '{"question": "<question>", "mode": "<mode>", "knowledge_base": "<kb>"}'
Display: answer, sources with relevance scores, chunks retrieved.
Evaluate (/sparkgen-rag eval [--kb name])
bash
python -m app.rag.eval --kb ${KB:-default} --config config/rag.yaml
Runs RAG quality evaluation. Reports:
- •Accuracy score
- •Faithfulness score
- •Relevancy score
- •Per-question breakdown
List Knowledge Bases (/sparkgen-rag kb-list)
Parse config/ai_workflow.yaml knowledge_bases: section and display:
| Name | Description | Source Paths | File Types | Vector Store | Chunks |
Add Knowledge Base (/sparkgen-rag kb-add <name> --source <path> [--description text])
Add a new knowledge base to config/ai_workflow.yaml:
yaml
- name: <name>
description: "<description>"
source_paths:
- <path>
file_types: [pdf, docx, txt, md]
auto_ingest: false
collection: <name>_vectors
chunking:
size: 500
overlap: 50
strategy: sliding_window
vector_store:
backend: faiss
index_path: ./local_data/vectors/<name>
Then assign it to relevant agents in their rag.knowledge_bases list.
Config (/sparkgen-rag config)
Show current RAG configuration:
- •Global RAG settings (enabled, default mode, top_k, citations)
- •Reranker settings
- •Self-RAG settings
- •Per-agent RAG overrides
- •Knowledge base details