AI/ML Engineering

Build production AI systems with modern patterns and tools.

Quick Reference

The 2026 AI Stack

Layer	Tool	Purpose
Prompting	DSPy	Programmatic prompt optimization
Orchestration	LangGraph	Stateful multi-agent workflows
RAG	LlamaIndex	Document ingestion and retrieval
Vectors	Qdrant / Pinecone	Embedding storage and search
Evaluation	RAGAS	RAG quality metrics
Experiment Tracking	MLflow / W&B	Logging, versioning, comparison
Serving	BentoML / vLLM	Model deployment
Protocol	MCP	Tool and context integration

DSPy: Programmatic Prompting

Manual prompts are dead. DSPy treats prompts as optimizable code:

python

import dspy

class QA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="1-5 words")

# Create module
qa = dspy.Predict(QA)

# Use it
result = qa(question="What is the capital of France?")
print(result.answer)  # "Paris"

Optimize with real data:

python

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=exact_match)
optimized_qa = optimizer.compile(qa, trainset=train_data)

RAG Architecture (Production)

code

Query → Rewrite → Hybrid Retrieval → Rerank → Generate → Cite
         │              │                │
         v              v                v
    Query expansion  Dense + BM25   Cross-encoder

LlamaIndex + LangGraph Pattern:

python

from llama_index.core import VectorStoreIndex
from langgraph.graph import StateGraph

# Data layer (LlamaIndex)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

# Control layer (LangGraph)
def retrieve(state):
    response = query_engine.query(state["question"])
    return {"context": response.response, "sources": response.source_nodes}

graph = StateGraph(State)
graph.add_node("retrieve", retrieve)
graph.add_node("generate", generate_answer)
graph.add_edge("retrieve", "generate")

MCP Integration

Model Context Protocol is the standard for tool integration:

python

from mcp import Server, Tool

server = Server("my-tools")

@server.tool()
async def search_docs(query: str) -> str:
    """Search the knowledge base."""
    results = await vector_store.search(query)
    return format_results(results)

Embeddings (2026)

Model	Dimensions	Best For
text-embedding-3-large	3072	General purpose
BGE-M3	1024	Multilingual RAG
Qwen3-Embedding	Flexible	Custom domains

Fine-Tuning with LoRA/QLoRA

python

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

model = get_peft_model(base_model, config)
# Train on ~24GB VRAM (QLoRA on RTX 4090)

MLOps Pipeline

yaml

# MLflow tracking
mlflow.set_experiment("rag-v2")

with mlflow.start_run():
    mlflow.log_params({"chunk_size": 512, "model": "gpt-4"})
    mlflow.log_metrics({"faithfulness": 0.92, "relevance": 0.88})
    mlflow.log_artifact("prompts/qa.txt")

Evaluation with RAGAS

python

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision

results = evaluate(
    dataset,
    metrics=[faithfulness, answer_relevancy, context_precision],
)
print(results)  # {'faithfulness': 0.92, 'answer_relevancy': 0.88, ...}

Vector Database Selection

DB	Best For	Pricing
Qdrant	Self-hosted, filtering	1GB free forever
Pinecone	Managed, zero-ops	Free tier available
Weaviate	Knowledge graphs	14-day trial
Milvus	Billion-scale	Self-hosted

Agents

•ai-engineer - LLM integration, RAG, MCP, production AI
•mlops-engineer - Model deployment, monitoring, pipelines
•data-scientist - Analysis, modeling, experimentation
•ml-researcher - Cutting-edge architectures, paper implementation
•cv-engineer - Computer vision, VLMs, image processing

ai

AI/ML Engineering

Quick Reference

The 2026 AI Stack

DSPy: Programmatic Prompting

RAG Architecture (Production)

MCP Integration

Embeddings (2026)

Fine-Tuning with LoRA/QLoRA

MLOps Pipeline

Evaluation with RAGAS

Vector Database Selection

Agents

Deep Dives

Examples