🤖 LLM Application Patterns

Production-ready patterns for building LLM applications, inspired by Dify and industry best practices.

When to Use This Skill

Use this skill when:

•Designing LLM-powered applications
•Implementing RAG (Retrieval-Augmented Generation)
•Building AI agents with tools
•Setting up LLMOps monitoring
•Choosing between agent architectures

1. RAG Pipeline Architecture

RAG (Retrieval-Augmented Generation) grounds LLM responses in your data.

mermaid

graph LR
    A[Ingest Documents] --> B[Retrieve Context]
    B --> C[Generate Response]
    A --> D[Chunking/Embedding]
    B --> E[Vector Search]
    C --> F[LLM + Context]

1.1 Document Ingestion & Chunking

Strategies include Fixed-size, Semantic, Recursive, and Document-aware splitting. 👉 View Code Example: Chunking Strategies

1.2 Embedding & Storage

Selecting the right Vector DB (Pinecone, Weaviate, Chroma, Pgvector) and Embedding Model. 👉 View Code Example: Vector DB & Embeddings

1.3 Retrieval Strategies

•Semantic Search: Standard embedding similarity.
•Hybrid Search: Semantic + Keyword (BM25) with Reciprocal Rank Fusion (RRF).
•Multi-query: Generating variations for better recall.
•Contextual Compression: Filtering relevant parts before generation. 👉 View Code Example: Retrieval Logic

1.4 Generation with Context

Prompting the LLM with retrieved context and handling citations. 👉 View Code Example: RAG Generation

2. Agent Architectures

2.1 ReAct Pattern (Reasoning + Acting)

The agent interleaves thought, action, and observation steps to solve reasoning tasks. 👉 View Code Example: ReAct Agent

2.2 Function Calling Pattern

Using structured tool definitions (JSON schema) natively supported by LLMs (OpenAI, Anthropic). 👉 View Code Example: Function Calling

2.3 Plan-and-Execute Pattern

Separating planning (high-level steps) from execution (doing the work) to handle complex, long-horizon tasks. 👉 View Code Example: Plan-and-Execute

2.4 Multi-Agent Collaboration

Specialized agents (Researcher, Writer, Critic) working together with a coordinator. 👉 View Code Example: Multi-Agent Team

3. Prompt IDE Patterns

3.1 Prompt Templates with Variables

Managing dynamic prompts with validation and few-shot examples. 👉 View Code Example: Prompt Templates

3.2 Prompt Versioning & A/B Testing

Tracking prompt versions, running A/B tests, and recording outcomes. 👉 View Code Example: Prompt Registry

3.3 Prompt Chaining

Sequencing multiple prompts where the output of one becomes the input of the next (e.g., Research -> Analyze -> Summarize). 👉 View Code Example: Prompt Chaining

4. LLMOps & Observability

4.1 Metrics to Track

Key metrics include Latency (p50/p99), Quality (satisfaction, hallucination), Cost, and Reliability. 👉 View Code Example: Metrics Dictionary

4.2 Logging & Tracing

Structured logging of requests/responses and distributed tracing (OpenTelemetry) to visualize chains. 👉 View Code Example: Logging & Tracing

4.3 Evaluation Framework

Systematically scoring responses for Relevance, Coherence, Groundedness, and Accuracy. 👉 View Code Example: Custom Evaluator

5. Production Patterns

5.1 Caching Strategy

Semantic or exact caching (Redis) to reduce costs and latency for repeated queries. 👉 View Code Example: LLM Cache

5.2 Rate Limiting & Retry

Handling API limits and transient errors with exponential backoff strategies. 👉 View Code Example: Rate Limiter & Retry

5.3 Fallback Strategy

Automatically switching to cheaper/faster or more capable models when the primary model fails. 👉 View Code Example: Model Fallback

Architecture Decision Matrix

Pattern	Use When	Complexity	Cost
Simple RAG	FAQ, docs search	Low	Low
Hybrid RAG	Mixed queries	Medium	Medium
ReAct Agent	Multi-step tasks	Medium	Medium
Function Calling	Structured tools	Low	Low
Plan-Execute	Complex tasks	High	High
Multi-Agent	Research tasks	Very High	Very High

llm-app-patterns