AgentSkillsCN

llm-app-patterns

一套面向LLM应用开发的成熟模式,涵盖RAG管道、智能体架构、提示词IDE以及LLMOps监控方案。当您设计AI应用、实施RAG、构建智能体,或搭建LLM可观测性体系时,不妨采用这一方法。

SKILL.md
--- frontmatter
name: llm-app-patterns
description: Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability.

🤖 LLM Application Patterns

Production-ready patterns for building LLM applications, inspired by Dify and industry best practices.

When to Use This Skill

Use this skill when:

  • Designing LLM-powered applications
  • Implementing RAG (Retrieval-Augmented Generation)
  • Building AI agents with tools
  • Setting up LLMOps monitoring
  • Choosing between agent architectures

1. RAG Pipeline Architecture

RAG (Retrieval-Augmented Generation) grounds LLM responses in your data.

mermaid
graph LR
    A[Ingest Documents] --> B[Retrieve Context]
    B --> C[Generate Response]
    A --> D[Chunking/Embedding]
    B --> E[Vector Search]
    C --> F[LLM + Context]

1.1 Document Ingestion & Chunking

Strategies include Fixed-size, Semantic, Recursive, and Document-aware splitting. 👉 View Code Example: Chunking Strategies

1.2 Embedding & Storage

Selecting the right Vector DB (Pinecone, Weaviate, Chroma, Pgvector) and Embedding Model. 👉 View Code Example: Vector DB & Embeddings

1.3 Retrieval Strategies

  • Semantic Search: Standard embedding similarity.
  • Hybrid Search: Semantic + Keyword (BM25) with Reciprocal Rank Fusion (RRF).
  • Multi-query: Generating variations for better recall.
  • Contextual Compression: Filtering relevant parts before generation. 👉 View Code Example: Retrieval Logic

1.4 Generation with Context

Prompting the LLM with retrieved context and handling citations. 👉 View Code Example: RAG Generation


2. Agent Architectures

2.1 ReAct Pattern (Reasoning + Acting)

The agent interleaves thought, action, and observation steps to solve reasoning tasks. 👉 View Code Example: ReAct Agent

2.2 Function Calling Pattern

Using structured tool definitions (JSON schema) natively supported by LLMs (OpenAI, Anthropic). 👉 View Code Example: Function Calling

2.3 Plan-and-Execute Pattern

Separating planning (high-level steps) from execution (doing the work) to handle complex, long-horizon tasks. 👉 View Code Example: Plan-and-Execute

2.4 Multi-Agent Collaboration

Specialized agents (Researcher, Writer, Critic) working together with a coordinator. 👉 View Code Example: Multi-Agent Team


3. Prompt IDE Patterns

3.1 Prompt Templates with Variables

Managing dynamic prompts with validation and few-shot examples. 👉 View Code Example: Prompt Templates

3.2 Prompt Versioning & A/B Testing

Tracking prompt versions, running A/B tests, and recording outcomes. 👉 View Code Example: Prompt Registry

3.3 Prompt Chaining

Sequencing multiple prompts where the output of one becomes the input of the next (e.g., Research -> Analyze -> Summarize). 👉 View Code Example: Prompt Chaining


4. LLMOps & Observability

4.1 Metrics to Track

Key metrics include Latency (p50/p99), Quality (satisfaction, hallucination), Cost, and Reliability. 👉 View Code Example: Metrics Dictionary

4.2 Logging & Tracing

Structured logging of requests/responses and distributed tracing (OpenTelemetry) to visualize chains. 👉 View Code Example: Logging & Tracing

4.3 Evaluation Framework

Systematically scoring responses for Relevance, Coherence, Groundedness, and Accuracy. 👉 View Code Example: Custom Evaluator


5. Production Patterns

5.1 Caching Strategy

Semantic or exact caching (Redis) to reduce costs and latency for repeated queries. 👉 View Code Example: LLM Cache

5.2 Rate Limiting & Retry

Handling API limits and transient errors with exponential backoff strategies. 👉 View Code Example: Rate Limiter & Retry

5.3 Fallback Strategy

Automatically switching to cheaper/faster or more capable models when the primary model fails. 👉 View Code Example: Model Fallback


Architecture Decision Matrix

PatternUse WhenComplexityCost
Simple RAGFAQ, docs searchLowLow
Hybrid RAGMixed queriesMediumMedium
ReAct AgentMulti-step tasksMediumMedium
Function CallingStructured toolsLowLow
Plan-ExecuteComplex tasksHighHigh
Multi-AgentResearch tasksVery HighVery High

Resources