SparkGen RAG

Manage the RAG (Retrieval-Augmented Generation) pipeline.

Dynamic Context

Before any action:

•Read config/ai_workflow.yaml — knowledge_bases: and rag: sections
•List documents: ls documents/
•Check vector index: ls local_data/vectors/ 2>/dev/null
•If server running: curl -sf http://localhost:8000/v1/rag/knowledge-bases -H "X-API-Key: ${API_KEY:-dev-local-key}"

Actions

Ingest Documents (`/sparkgen-rag ingest [--kb name] [--source path]`)

bash

python -m app.rag.ingest --kb ${KB:-default} --source ${SOURCE:-./documents}

This will:

•Read documents from the source directory
•Chunk them according to config/ai_workflow.yaml settings (size, overlap, strategy)
•Generate embeddings using the configured provider
•Store vectors in the configured backend (FAISS/Milvus)
•Report: documents processed, chunks created, time taken

Query (`/sparkgen-rag query "<question>" [--mode standard|self_rag|graphrag] [--kb name]`)

If server is running:

bash

curl -s -X POST http://localhost:8000/v1/rag/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${API_KEY:-dev-local-key}" \
  -d '{"question": "<question>", "mode": "<mode>", "knowledge_base": "<kb>"}'

Display: answer, sources with relevance scores, chunks retrieved.

Evaluate (`/sparkgen-rag eval [--kb name]`)

bash

python -m app.rag.eval --kb ${KB:-default} --config config/rag.yaml

Runs RAG quality evaluation. Reports:

•Accuracy score
•Faithfulness score
•Relevancy score
•Per-question breakdown

List Knowledge Bases (`/sparkgen-rag kb-list`)

Add Knowledge Base (`/sparkgen-rag kb-add <name> --source <path> [--description text]`)

Add a new knowledge base to config/ai_workflow.yaml:

yaml

- name: <name>
  description: "<description>"
  source_paths:
    - <path>
  file_types: [pdf, docx, txt, md]
  auto_ingest: false
  collection: <name>_vectors
  chunking:
    size: 500
    overlap: 50
    strategy: sliding_window
  vector_store:
    backend: faiss
    index_path: ./local_data/vectors/<name>

Then assign it to relevant agents in their rag.knowledge_bases list.

Config (`/sparkgen-rag config`)

Show current RAG configuration:

•Global RAG settings (enabled, default mode, top_k, citations)
•Reranker settings
•Self-RAG settings
•Per-agent RAG overrides
•Knowledge base details

SparkGen RAG

Dynamic Context

Actions

Ingest Documents (/sparkgen-rag ingest [--kb name] [--source path])

Query (/sparkgen-rag query "<question>" [--mode standard|self_rag|graphrag] [--kb name])

Evaluate (/sparkgen-rag eval [--kb name])

List Knowledge Bases (/sparkgen-rag kb-list)

Add Knowledge Base (/sparkgen-rag kb-add <name> --source <path> [--description text])

Config (/sparkgen-rag config)

Ingest Documents (`/sparkgen-rag ingest [--kb name] [--source path]`)

Query (`/sparkgen-rag query "<question>" [--mode standard|self_rag|graphrag] [--kb name]`)

Evaluate (`/sparkgen-rag eval [--kb name]`)

List Knowledge Bases (`/sparkgen-rag kb-list`)

Add Knowledge Base (`/sparkgen-rag kb-add <name> --source <path> [--description text]`)

Config (`/sparkgen-rag config`)