AI Agent Design Skill
Domain: AI/ML Architecture Inheritance: inheritable Version: 1.0.0 Last Updated: 2026-02-01
Overview
Comprehensive patterns for designing AI agents—autonomous systems that use LLMs to reason, plan, and execute multi-step tasks. Covers single-agent architectures, multi-agent orchestration, tool use, memory systems, and production deployment patterns.
Agent Architecture Fundamentals
What Is an AI Agent?
┌─────────────────────────────────────────────────────────────┐ │ AI AGENT │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Perceive│ → │ Plan │ → │ Act │ → │ Learn │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ ↑ │ │ │ └──────────────────────────────────────────┘ │ │ Feedback Loop │ └─────────────────────────────────────────────────────────────┘
Core Components:
- •Perception: Receive and interpret inputs (user requests, environment state)
- •Planning: Reason about goals, decompose tasks, select actions
- •Action: Execute tools, API calls, or generate outputs
- •Learning: Update memory, refine strategies based on outcomes
Agent vs. Chatbot vs. Workflow
| Aspect | Chatbot | Workflow | Agent |
|---|---|---|---|
| Autonomy | Low | None | High |
| Planning | None | Predefined | Dynamic |
| Tool Use | Limited | Fixed sequence | Flexible |
| Memory | Session only | None | Persistent |
| Error Recovery | Retry/fail | Fail | Reason & adapt |
Single-Agent Patterns
ReAct Pattern (Reasoning + Acting)
The foundation of most modern agents:
┌──────────────────────────────────────────┐ │ ReAct Loop │ ├──────────────────────────────────────────┤ │ 1. Thought: Reason about the task │ │ 2. Action: Choose and execute a tool │ │ 3. Observation: Process tool output │ │ 4. Repeat until task complete │ └──────────────────────────────────────────┘
Example Trace:
User: What's the weather in Seattle and should I bring an umbrella?
Thought: I need to check Seattle weather to answer this question.
Action: weather_api(location="Seattle, WA")
Observation: {"temp": 52, "condition": "rain", "precipitation": 80%}
Thought: It's raining with 80% precipitation chance. User should bring umbrella.
Action: respond("It's 52°F and raining in Seattle with 80% chance of
precipitation. Yes, definitely bring an umbrella!")
Plan-and-Execute Pattern
For complex, multi-step tasks:
┌─────────────────────────────────────────────────────────────┐ │ Plan-and-Execute │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ │ │ │ Planner │ Create high-level plan │ │ └──────┬──────┘ │ │ ↓ │ │ ┌─────────────┐ │ │ │ Executor │ Execute each step │ │ └──────┬──────┘ │ │ ↓ │ │ ┌─────────────┐ │ │ │ Replanner │ Adjust plan based on results │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────┘
When to Use:
- •Tasks requiring multiple distinct phases
- •When order of operations matters
- •When partial failures need recovery
Reflexion Pattern
Self-improvement through reflection:
┌─────────────────────────────────────────────────────────────┐ │ Reflexion │ ├─────────────────────────────────────────────────────────────┤ │ 1. Attempt task │ │ 2. Evaluate outcome (success/failure) │ │ 3. Generate reflection on what went wrong │ │ 4. Store reflection in memory │ │ 5. Retry with reflection context │ └─────────────────────────────────────────────────────────────┘
Multi-Agent Patterns
Supervisor Pattern
Central coordinator delegates to specialized agents:
┌─────────────────────────────────────────────────────────────┐ │ │ │ ┌────────────┐ │ │ │ Supervisor │ │ │ └─────┬──────┘ │ │ ┌─────────────┼─────────────┐ │ │ ↓ ↓ ↓ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Research │ │ Writer │ │ Reviewer │ │ │ │ Agent │ │ Agent │ │ Agent │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘
Use Cases:
- •Content creation pipelines
- •Research + analysis + reporting
- •Code generation + review + testing
Hierarchical Teams
Nested supervisor structure for complex organizations:
┌─────────────────────────────────────────────────────────────┐ │ Top Supervisor │ │ ┌─────────────┴─────────────┐ │ │ ↓ ↓ │ │ ┌───────────────┐ ┌───────────────┐ │ │ │ Research Lead │ │ Writing Lead │ │ │ └───────┬───────┘ └───────┬───────┘ │ │ ┌────┴────┐ ┌────┴────┐ │ │ ↓ ↓ ↓ ↓ │ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ │ │Web │ │Paper │ │Draft │ │Edit │ │ │ │Search │ │Review │ │Writer │ │Writer │ │ │ └───────┘ └───────┘ └───────┘ └───────┘ │ └─────────────────────────────────────────────────────────────┘
Debate/Adversarial Pattern
Multiple agents argue to reach better conclusions:
┌─────────────────────────────────────────────────────────────┐ │ │ │ ┌──────────┐ Argue ┌──────────┐ │ │ │ Agent A │ ◄──────────────► │ Agent B │ │ │ │ (Pro) │ │ (Con) │ │ │ └────┬─────┘ └────┬─────┘ │ │ │ │ │ │ └──────────┬──────────────────┘ │ │ ↓ │ │ ┌────────────┐ │ │ │ Judge │ Synthesize best answer │ │ └────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘
Benefits:
- •Reduces hallucination through verification
- •Explores multiple perspectives
- •Better reasoning on complex questions
Tool Use Patterns
Tool Definition Best Practices
{
"name": "search_database",
"description": "Search the product database. Returns matching products with prices. Use when user asks about product availability or pricing.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search terms (product name, category, or SKU)"
},
"max_results": {
"type": "integer",
"default": 10,
"description": "Maximum results to return (1-100)"
},
"filters": {
"type": "object",
"properties": {
"min_price": { "type": "number" },
"max_price": { "type": "number" },
"in_stock": { "type": "boolean" }
}
}
},
"required": ["query"]
}
}
Tool Design Principles:
- •Clear names: Verb + noun (search_database, send_email)
- •Rich descriptions: Include when to use and what it returns
- •Constrained parameters: Enums, ranges, validation
- •Sensible defaults: Reduce required decisions
- •Error handling: Return structured errors, not exceptions
Tool Selection Strategies
| Strategy | Description | When to Use |
|---|---|---|
| Direct | LLM chooses from all tools | < 10 tools |
| Categorized | Group tools, select category first | 10-50 tools |
| Retrieval | Embed tool descriptions, retrieve relevant | 50+ tools |
| Routing | Specialized selector model | Production scale |
Human-in-the-Loop Tools
┌─────────────────────────────────────────────────────────────┐ │ Human-in-the-Loop Pattern │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Agent Action Request │ │ │ │ │ ↓ │ │ ┌───────────────┐ │ │ │ Risk Check │ │ │ └───────┬───────┘ │ │ │ │ │ Low ──┴── High │ │ │ │ │ │ ↓ ↓ │ │ Execute ┌──────────┐ │ │ Directly │ Human │ │ │ │ Approval │ │ │ └────┬─────┘ │ │ │ │ │ Approve/Reject/Modify │ │ │ └─────────────────────────────────────────────────────────────┘
High-Risk Actions Requiring Approval:
- •Financial transactions
- •Data deletion
- •External communications
- •Permission changes
- •Irreversible operations
Agent Memory Systems
Memory Architecture
┌─────────────────────────────────────────────────────────────┐ │ Agent Memory │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Working Memory │ │ │ │ Current conversation + recent context (in prompt) │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Short-Term Memory │ │ │ │ Session state, intermediate results (key-value) │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Long-Term Memory │ │ │ │ Facts, preferences, history (vector DB + graph) │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘
Memory Types
| Type | Storage | Retrieval | Use Case |
|---|---|---|---|
| Episodic | Vector DB | Semantic search | Past conversations, experiences |
| Semantic | Graph DB | Structured query | Facts, relationships, knowledge |
| Procedural | Code/prompts | Direct lookup | How to perform tasks |
| Working | Prompt context | Always present | Current task state |
Memory Management Patterns
Summarization: Compress old conversations
Full History → Summarize → Store Summary → Discard Full
Forgetting: Remove low-value memories
Memories → Score by (recency × importance × access_count) → Prune lowest
Consolidation: Merge related memories
Similar Memories → Cluster → Create consolidated memory → Archive originals
Planning Strategies
Task Decomposition
Complex Task: "Build a marketing campaign for our new product"
│
┌───────────────┼───────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Research │ │ Content │ │ Launch │
│ Phase │ │ Phase │ │ Phase │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
┌──────┴──────┐ ┌───┴───┐ ┌───┴───┐
↓ ↓ ↓ ↓ ↓ ↓
Analyze Survey Create Write Schedule Monitor
Competitors Users Assets Copy Posts Results
Goal-Oriented Planning
Current State: No marketing campaign
Goal State: Campaign live with 10K impressions
│
↓
┌─────────────────────┐
│ Gap Analysis │
│ What's missing? │
└──────────┬──────────┘
↓
┌─────────────────────┐
│ Action Generation │
│ What can close gap? │
└──────────┬──────────┘
↓
┌─────────────────────┐
│ Action Selection │
│ Best next step? │
└─────────────────────┘
Error Handling & Recovery
Graceful Degradation
┌─────────────────────────────────────────────────────────────┐ │ Error Recovery Ladder │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Level 1: Retry │ │ └── Same action, maybe with backoff │ │ │ │ Level 2: Rephrase │ │ └── Reformulate the action (different query) │ │ │ │ Level 3: Alternative │ │ └── Use different tool for same goal │ │ │ │ Level 4: Partial │ │ └── Return partial results, note limitations │ │ │ │ Level 5: Escalate │ │ └── Ask human for help │ │ │ │ Level 6: Abort │ │ └── Cannot complete, explain why │ │ │ └─────────────────────────────────────────────────────────────┘
Loop Detection
Agents can get stuck. Detect and break loops:
def detect_loop(action_history, window=5, threshold=0.8):
"""Detect if agent is repeating similar actions."""
if len(action_history) < window * 2:
return False
recent = action_history[-window:]
previous = action_history[-window*2:-window]
# Compare action patterns
similarity = calculate_similarity(recent, previous)
return similarity > threshold
Recovery Actions:
- •Inject reflection prompt: "You seem to be repeating. What's different now?"
- •Force tool change: Exclude recently used tools
- •Replan: Discard current plan, start fresh
- •Escalate: Ask user for clarification
Production Considerations
Observability
What to Log:
- •Every LLM call (prompt, completion, tokens, latency)
- •Tool calls (name, parameters, result, duration)
- •State transitions (plan changes, memory updates)
- •Errors and recovery attempts
Trace Structure:
Trace: user_request_abc123
├── parse_intent (50ms)
├── plan_generation (200ms)
├── step_1_research
│ ├── tool_call: search_web (150ms)
│ └── tool_call: summarize (100ms)
├── step_2_write
│ └── llm_call: generate_draft (300ms)
└── step_3_review
└── llm_call: critique (200ms)
Cost Control
| Strategy | Implementation |
|---|---|
| Token budgets | Set max tokens per task |
| Step limits | Maximum N actions per request |
| Tiered models | GPT-4 for planning, GPT-3.5 for execution |
| Caching | Cache tool results, LLM responses |
| Early termination | Stop when "good enough" |
Safety Guardrails
┌─────────────────────────────────────────────────────────────┐ │ Safety Layer │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Input Validation │ │ ├── Prompt injection detection │ │ ├── PII/sensitive data filtering │ │ └── Request rate limiting │ │ │ │ Action Validation │ │ ├── Tool parameter sanitization │ │ ├── Scope/permission checks │ │ └── Dangerous action blocking │ │ │ │ Output Validation │ │ ├── Content policy compliance │ │ ├── Hallucination detection │ │ └── Sensitive data redaction │ │ │ └─────────────────────────────────────────────────────────────┘
Framework Comparison
| Framework | Strengths | Best For |
|---|---|---|
| LangChain | Comprehensive, many integrations | Rapid prototyping |
| LangGraph | Stateful, graph-based flows | Complex multi-agent |
| AutoGen | Multi-agent conversations | Research, code gen |
| CrewAI | Role-based teams | Business workflows |
| Semantic Kernel | Enterprise, .NET/Python | Microsoft stack |
| Agents SDK (OpenAI) | Simple, hosted | Quick single-agent |
Anti-Patterns
❌ Over-Autonomous Agent
Problem: Agent makes too many decisions without checkpoints Solution: Add approval gates for significant actions
❌ Unbounded Loops
Problem: No termination conditions Solution: Set max iterations, cost limits, time bounds
❌ Tool Explosion
Problem: Too many tools confuse the agent Solution: Curate tools, use retrieval for large toolsets
❌ Memory Bloat
Problem: Accumulating context without pruning Solution: Summarize, forget, consolidate
❌ Monolithic Agent
Problem: One agent does everything Solution: Decompose into specialized sub-agents
Activation Triggers
- •"agent", "autonomous", "multi-agent"
- •"tool use", "function calling"
- •"ReAct", "plan and execute"
- •"agent memory", "agent planning"
- •"orchestration", "supervisor agent"
- •"LangChain", "LangGraph", "AutoGen", "CrewAI"
Quick Reference
Agent Design Checklist
- • Define clear agent persona and capabilities
- • Design minimal, well-described tool set
- • Implement appropriate memory architecture
- • Add human-in-the-loop for high-risk actions
- • Set up observability (logging, tracing)
- • Configure safety guardrails
- • Test with adversarial inputs
- • Plan for cost control and scaling
When to Use Agents
✅ Good Fit:
- •Open-ended research tasks
- •Multi-step workflows with decisions
- •Tasks requiring tool orchestration
- •Personalized, context-aware interactions
❌ Poor Fit:
- •Simple Q&A (use RAG)
- •Deterministic workflows (use code)
- •High-stakes with no human oversight
- •Real-time, latency-critical applications
AI Agent Design skill — Building autonomous, reliable AI systems