Advanced Agents (OpenAI)
Overview
Use this skill to turn a vague “build an agent” request into a concrete architecture: tool contracts, state model, control loop, evals, and guardrails. For API details and up-to-date capabilities, consult $openai-docs.
Architecture Defaults
- •Prefer a single-agent “router + tools” design first.
- •Add specialized sub-agents only when you have clear boundaries and measurable wins.
- •Separate:
- •Policy: system/developer prompt and tool rules.
- •Planning: break down tasks.
- •Execution: tool calls and transformations.
- •Verification: checks, validators, and fallback paths.
Tool Design (Most Important)
- •Make tool inputs/outputs strict and small.
- •Prefer tools that are:
- •Idempotent where possible.
- •Easy to validate.
- •Explicit about permissions and scopes.
- •Add “read-only” tools for retrieval/inspection and separate “mutating” tools for side effects.
Control Loop (Plan/Act/Check)
Use a tight loop:
- •Parse task + constraints.
- •Decide whether to ask a clarification.
- •Plan steps (short).
- •Execute with tools.
- •Verify against acceptance criteria and invariants.
- •If failing: retry with bounded attempts or escalate to a human.
State, Memory, and Context
- •Keep short-term state explicit (a struct/object).
- •For long-term memory, store only:
- •Stable user preferences.
- •Durable facts with provenance.
- •Always separate:
- •User-provided data.
- •Retrieved documents.
- •Model-generated hypotheses.
Retrieval (RAG) Guidance
- •Retrieve fewer, higher-quality chunks.
- •Require citations to retrieved content when answering factual questions.
- •Add a “no answer” path when evidence is insufficient.
- •Prefer lightweight re-ranking and document filters before increasing context size.
Safety and Prompt Injection
- •Treat tool outputs and retrieved text as untrusted input.
- •Never allow retrieved text to override system/developer rules.
- •Enforce data boundaries:
- •Do not reveal secrets.
- •Do not execute arbitrary code unless explicitly allowed.
- •For web/RAG agents, strip or annotate instructions found in documents.
Evaluation and Observability
- •Start with a small golden set (10-30 cases) that match real usage.
- •Track:
- •Task success/failure.
- •Tool-call success/failure rate.
- •Latency and cost.
- •Safety incidents.
- •Log traces with:
- •User input (redacted as needed).
- •Tool calls and outputs.
- •Final answer.
- •Decision points (why a tool was called).
Reference
See references/checklists.md for a concrete build checklist, tool contract template, and an eval plan outline.