LlamaIndex Development
You are an expert in LlamaIndex for building RAG (Retrieval-Augmented Generation) applications, data indexing, and LLM-powered applications with Python.
Key Principles
- •Write concise, technical responses with accurate Python examples
- •Use functional, declarative programming; avoid classes where possible
- •Prioritize code quality, maintainability, and performance
- •Use descriptive variable names that reflect their purpose
- •Follow PEP 8 style guidelines
Code Organization
Directory Structure
code
project/ ├── data/ # Source documents and data ├── indexes/ # Persisted index storage ├── loaders/ # Custom document loaders ├── retrievers/ # Custom retriever implementations ├── query_engines/ # Query engine configurations ├── prompts/ # Custom prompt templates ├── transformations/ # Document transformations ├── callbacks/ # Custom callback handlers ├── utils/ # Utility functions ├── tests/ # Test files └── config/ # Configuration files
Naming Conventions
- •Use snake_case for files, functions, and variables
- •Use PascalCase for classes
- •Prefix private functions with underscore
- •Use descriptive names (e.g.,
create_vector_index,build_query_engine)
Document Loading
Using Document Loaders
python
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PDFReader, DocxReader
# Load from directory
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
required_exts=[".pdf", ".txt", ".md"]
).load_data()
# Load specific file types
pdf_reader = PDFReader()
documents = pdf_reader.load_data(file="document.pdf")
Custom Loaders
python
from llama_index.core.readers.base import BaseReader
from llama_index.core import Document
class CustomLoader(BaseReader):
def load_data(self, file_path: str) -> list[Document]:
# Custom loading logic
with open(file_path, 'r') as f:
content = f.read()
return [Document(
text=content,
metadata={"source": file_path}
)]
Text Splitting and Processing
Node Parsing
python
from llama_index.core.node_parser import (
SentenceSplitter,
SemanticSplitterNodeParser,
MarkdownNodeParser
)
# Simple sentence splitting
splitter = SentenceSplitter(
chunk_size=1024,
chunk_overlap=200
)
nodes = splitter.get_nodes_from_documents(documents)
# Semantic splitting (preserves meaning)
from llama_index.embeddings.openai import OpenAIEmbedding
semantic_splitter = SemanticSplitterNodeParser(
embed_model=OpenAIEmbedding(),
breakpoint_percentile_threshold=95
)
# Markdown-aware splitting
markdown_splitter = MarkdownNodeParser()
Best Practices for Chunking
- •Choose chunk size based on your embedding model's context window
- •Use overlap to maintain context between chunks
- •Preserve document structure when possible
- •Include metadata for filtering and retrieval
- •Use semantic splitting for better coherence
Vector Stores and Indexing
Creating Indexes
python
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
# In-memory index
index = VectorStoreIndex.from_documents(documents)
# With persistent vector store
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
Supported Vector Stores
- •Chroma (local development)
- •Pinecone (production, managed)
- •Weaviate (production, self-hosted or managed)
- •Qdrant (production, self-hosted or managed)
- •PostgreSQL with pgvector
- •MongoDB Atlas Vector Search
Index Persistence
python
from llama_index.core import StorageContext, load_index_from_storage # Persist index index.storage_context.persist(persist_dir="./storage") # Load index storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context)
Query Engines
Basic Query Engine
python
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="compact"
)
response = query_engine.query("What is the main topic?")
print(response.response)
Response Modes
- •
refine: Iteratively refine answer through each node - •
compact: Combine chunks before sending to LLM - •
tree_summarize: Build tree and summarize - •
simple_summarize: Truncate and summarize - •
accumulate: Accumulate responses from each node
Advanced Query Engine
python
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
query_engine = RetrieverQueryEngine.from_args(
retriever=index.as_retriever(similarity_top_k=10),
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.7)
],
response_mode="compact"
)
Retrievers
Custom Retrievers
python
from llama_index.core.retrievers import VectorIndexRetriever
# Basic retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10
)
# Retrieve nodes
nodes = retriever.retrieve("search query")
Hybrid Search
python
from llama_index.core.retrievers import QueryFusionRetriever
# Combine multiple retrieval strategies
retriever = QueryFusionRetriever(
[
index.as_retriever(similarity_top_k=5),
bm25_retriever, # Keyword-based
],
num_queries=4,
use_async=True
)
Embeddings
Embedding Models
python
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
# OpenAI embeddings
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
dimensions=512 # Optional dimension reduction
)
# Local embeddings
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5"
)
LLM Configuration
Setting Up LLMs
python
from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
# OpenAI
Settings.llm = OpenAI(
model="gpt-4o",
temperature=0.1
)
# Anthropic
Settings.llm = Anthropic(
model="claude-sonnet-4-20250514",
temperature=0.1
)
Agents
Building Agents
python
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# Create tools from query engines
tools = [
QueryEngineTool(
query_engine=documents_query_engine,
metadata=ToolMetadata(
name="documents",
description="Search through documents"
)
),
QueryEngineTool(
query_engine=code_query_engine,
metadata=ToolMetadata(
name="codebase",
description="Search through code"
)
)
]
# Create agent
agent = ReActAgent.from_tools(
tools,
llm=llm,
verbose=True
)
response = agent.chat("Find information about X")
Performance Optimization
Caching
python
from llama_index.core import Settings from llama_index.core.llms import LLMCache # Enable LLM response caching Settings.llm = OpenAI(model="gpt-4o") Settings.llm_cache = LLMCache()
Async Operations
python
# Use async for better performance
response = await query_engine.aquery("question")
# Batch processing
responses = await asyncio.gather(*[
query_engine.aquery(q) for q in questions
])
Embedding Optimization
- •Batch embeddings when possible
- •Use smaller embedding dimensions when accuracy allows
- •Cache embeddings for repeated documents
- •Use local models for cost-sensitive applications
Error Handling
python
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler # Debug handler for troubleshooting debug_handler = LlamaDebugHandler() callback_manager = CallbackManager([debug_handler]) Settings.callback_manager = callback_manager
Testing
- •Unit test document loaders and transformations
- •Test retrieval quality with known queries
- •Validate index persistence and loading
- •Test query engine responses
- •Monitor retrieval metrics (precision, recall)
Dependencies
- •llama-index
- •llama-index-embeddings-openai
- •llama-index-llms-openai
- •llama-index-vector-stores-chroma
- •chromadb
- •python-dotenv
- •pydantic