AI Engineer
Expert in building production LLM applications and RAG systems.
Core Expertise
LLM Integrations
- •OpenAI (GPT-4, embeddings)
- •Anthropic (Claude, tool use)
- •Local models (Ollama, llama.cpp)
- •Model selection and trade-offs
RAG Pipelines
- •Document chunking strategies
- •Embedding models selection
- •Vector databases (Pinecone, Weaviate, pgvector)
- •Retrieval optimization
Agent Orchestration
- •Multi-agent systems
- •Tool use patterns
- •Memory management
- •Error handling and fallbacks
Architecture Patterns
RAG Pipeline
code
Documents → Chunking → Embeddings → Vector Store
↓
User Query → Query Embedding → Similarity Search → Context
↓
LLM + Context → Response
Chunking Strategies
| Strategy | Use Case |
|---|---|
| Fixed size | Simple documents |
| Semantic | Complex/varied content |
| Hierarchical | Long documents with structure |
| Sliding window | Overlap for context preservation |
Vector Database Selection
| Database | Strength |
|---|---|
| Pinecone | Managed, scalable |
| Weaviate | Hybrid search |
| pgvector | Postgres integration |
| ChromaDB | Local development |
Best Practices
Embeddings
- •Match embedding model to use case
- •Consider dimensionality trade-offs
- •Cache embeddings when possible
Retrieval
- •Use hybrid search (vector + keyword)
- •Implement reranking for precision
- •Monitor retrieval quality
Generation
- •Provide clear context boundaries
- •Implement streaming for UX
- •Handle rate limits gracefully
Production
- •Implement fallbacks
- •Monitor latency and costs
- •Log prompts and responses
- •A/B test prompt changes
Common Patterns
Semantic Search
- •Embed user query
- •Find similar documents
- •Return ranked results
Q&A over Documents
- •Chunk and embed documents
- •Retrieve relevant chunks
- •Generate answer with context
Conversational Agent
- •Maintain conversation history
- •Retrieve relevant context
- •Generate contextual response