Agent Creator
Purpose: Teach the principles, patterns, and practices for creating high-quality specialized agents that follow v2 architecture standards.
Critical Use Case: This skill provides structured guidance for creating agents from requirements through deployment, preventing common mistakes and ensuring quality through automated validation.
Differentiation from agent-hr-manager:
- •agent-creator (this skill) = Teaching guide, knowledge resource, passive reference 📖
- •agent-hr-manager (agent) = Autonomous executor, active creator, can use this skill 👨🏫
Use agent-creator when learning how to create agents. Use agent-hr-manager when you want an agent automatically created.
When to Use This Skill
Use agent-creator when:
- •Creating a new specialized agent from scratch
- •Learning agent architecture and design patterns
- •Understanding quality validation (0-80 rubric)
- •Troubleshooting agent quality issues
- •Migrating agents to v2 architecture
- •Training others on agent creation
Do NOT use for:
- •Creating skills (use skill-creator skill instead)
- •Quick agent modifications (just edit directly)
- •General Claude usage questions
6-Step Agent Creation Workflow
Step 0: Research Existing Patterns (BEFORE DESIGN)
Objective: Understand what already exists before creating something new. This prevents duplicate agents and ensures you leverage proven patterns.
Why this matters: Creating an agent without research leads to:
- •Duplicating existing agent functionality
- •Missing reusable patterns from similar agents
- •Not discovering skills that solve part of the problem
- •Reinventing methodology that already exists
Actions:
- •
Search for Similar Agents:
bash# List all available agents ls ~/.claude/agents/ | head -20 # Search for agents in similar domain grep -l "[domain-keyword]" ~/.claude/agents/*.md 2>/dev/null
- •
Review Relevant Agent Examples:
- •Read
references/agent-examples.mdfor quality patterns - •Study agents with high quality scores (60+/80)
- •Note phase structures that work for similar domains
- •Read
- •
Check Skill Inventory:
bash# List available skills ls ~/.claude/skills/ # Search for domain-relevant skills grep -r "[domain-keyword]" ~/.claude/skills/*/SKILL.md 2>/dev/null | head -10
- •
Decision Checkpoint (REQUIRED):
markdown| Question | Answer | |----------|--------| | Similar agent exists? | [yes/no - if yes, consider tuning instead] | | Relevant skills found? | [list skills to integrate] | | Reusable patterns identified? | [list patterns to follow] | | Proceed with new agent? | [yes with justification] |
- •
Research Novel Domains (if unfamiliar):
- •Use WebSearch for domain best practices
- •Find authoritative sources and frameworks
- •Document key methodologies the agent should follow
Deliverable: Research summary documenting similar agents, skills to integrate, and justification for new agent.
Step 1: Temporal Awareness & Requirements Gathering (CRITICAL)
Objective: Establish current date context and understand what the agent needs to do.
1.1 Establish Temporal Context (REQUIRED)
Why this matters: Legal documents, contracts, compliance reports, and project documentation with incorrect dates create serious risks. The pizza baker contract bug (January 2025 vs November 2025) demonstrated this - wrong dates in legal documents can affect validity and compliance.
Implementation:
## Phase 1: [Phase Name] & Temporal Awareness **Objective**: [Phase goal] **Actions**: 1. **Establish Temporal Context** (REQUIRED): ```bash CURRENT_DATE=$(date '+%Y-%m-%d') # ISO 8601: 2025-11-06 READABLE_DATE=$(date '+%B %d, %Y') # Human: November 06, 2025 TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S %Z') # Full: 2025-11-06 12:34:56 EET
- •Use CURRENT_DATE for document metadata, version numbers
- •Use READABLE_DATE for human-readable headers
- •Use TIMESTAMP for detailed audit trails
- •[Other Phase 1 actions...]
Deliverable: [Concrete output]
**Validation**: The validate_agent.py script checks for temporal awareness pattern in Phase 1. #### 1.2 Gather Requirements **Key Questions**: 1. **Problem Definition**: What problem does this agent solve? 2. **Domain Expertise**: What specialized knowledge is needed? 3. **Tool Requirements**: Which tools will it need? (Read, Write, Edit, Bash, Grep, Glob, etc.) 4. **Typical Workflow**: What is the step-by-step process? 5. **Success Metrics**: How do we know it worked? 6. **Edge Cases**: What unusual situations must it handle? **Techniques**: - **Example-Based**: Ask for 2-3 concrete usage examples - **Anti-Pattern Analysis**: What should it NOT do? - **Boundary Testing**: What are the limits (file size, complexity, scope)? **Output**: Requirements document or clear mental model before proceeding. --- ### Step 1.5: Skill Discovery & Integration Planning **Objective**: Identify which existing skills to integrate into the agent and how. **Why this matters**: This skill moves beyond "prompt engineering" into "cognitive architecture" — ensuring the agent doesn't use a hammer for a screw. Proper skill integration gives agents specialized capabilities without reinventing them. **Actions**: 1. **Map Requirements to Skill Categories**: ```markdown | Agent Requirement | Skill Category | Candidate Skills | |-------------------|----------------|------------------| | Debugging logic | Reasoning | hypothesis-elimination, self-reflecting-chain | | Security review | Development | security-analysis-skills, adversarial-reasoning | | Documentation | Documentation | document-writing-skills | | Database ops | Integration | chromadb-integration-skills | | Testing | Development | testing-methodology-skills | | Error handling | Development | error-handling-skills |
- •
Evaluate Each Candidate Skill:
markdown| Skill | Size | Active? | Integrate or Inline? | |-------|------|---------|---------------------| | [skill-name] | [lines] | [yes/no] | [integrate/inline/skip] |
Decision Criteria:
- •Integrate if: Skill >100 lines, actively maintained, reusable
- •Inline if: Simple pattern <20 lines, agent-specific variant needed
- •Skip if: Not relevant after review
- •
Document Skills Integration:
markdown**Skills Integration**: skill-1, skill-2, skill-3
This goes in the agent's header metadata.
- •
Plan Skill Invocation Points:
markdown| Phase | When to Invoke | Skill | |-------|----------------|-------| | Phase 2 | Complex decision | integrated-reasoning-v2 | | Phase 3 | Design validation | adversarial-reasoning | | Phase 4 | Error recovery | hypothesis-elimination |
- •
Check for Handover/Parallelism Needs:
- •Will the agent need multi-pattern reasoning? → Add reasoning-handover-protocol
- •Will tasks run in parallel? → Add parallel-execution skill
- •See
cognitive-skills/INTEGRATION_GUIDE.mdfor patterns
Deliverable: Skill integration plan with invocation points documented.
Step 2: Architecture Design
Objective: Design the agent's phase structure, tool selection, and quality criteria.
2.1 Determine Agent Complexity
Decision Tree: Simple vs Complex Agent
Simple Agent (3 phases, <200 lines):
- •Single domain focus (e.g., PDF manipulation, CSV parsing)
- •Linear workflow (no branching)
- •Minimal state management
- •Examples: pdf-creator-agent, code-formatter
Complex Agent (4-5 phases, 200-250 lines):
- •Multiple operation modes (e.g., create, read, update)
- •Conditional branching or decision trees
- •State tracking across phases
- •Examples: legal-agent, ceo-orchestrator, agent-hr-manager
When to use integrated-reasoning-v2: 8+ decision dimensions, strategic importance, >90% confidence required
- •9 patterns available: ToT, BoT, SRC, HE, AR, DR, AT, RTR, NDF
- •11 scoring dimensions for pattern selection
- •See
cognitive-skills/INTEGRATION_GUIDE.mdfor full integration patterns
2.2 Design Phase Structure
Guidelines (from agent-design-patterns.md):
- •3-5 phases optimal (2 too simple, 6+ too complex)
- •Each phase has ONE clear objective
- •Actions are SPECIFIC, not generic
- •Deliverables are CONCRETE artifacts
Phase Structure Template:
## Phase N: [Descriptive Name] **Objective**: [One sentence describing the goal] **Actions**: 1. [Specific action with tool: "Use Grep to search for X pattern in Y files"] 2. [Specific action with tool: "Use Edit to modify lines 45-52 in config.yml"] 3. [Specific action with condition: "If errors found, use TodoWrite to track fixes"] **Deliverable**: [Concrete output: "List of 5 validated regex patterns with test cases"]
Example from kaggle-leak-auditor:
- •Phase 1: Static Code Analysis → List of violations
- •Phase 2: Runtime Validation → Validation results
- •Phase 3: Report Generation → Audit report with recommendations
2.3 Select Tools
Common Tool Combinations:
- •File analysis: Read, Grep, Glob
- •Code modification: Read, Edit, Write
- •Research: WebSearch, WebFetch, Read
- •Execution: Bash, TodoWrite, Read
- •Complex tasks: Task (invoke other agents)
Tool Selection Criteria:
- •Minimal set: Only include tools actually used in phases
- •Specific over general: Edit > Write for modifications
- •Composed workflows: Grep to find, Read to analyze, Edit to modify
2.4 Define Success Criteria (10-16 items)
Categories:
- •Phase Deliverables (3-5 items): "✅ Phase 1 violations list complete with severity scores"
- •Quality Gates (2-3 items): "✅ All findings validated with evidence"
- •Confidence (1 item): "✅ Confidence level >85% with clear reasoning"
- •Documentation (2-3 items): "✅ Report includes examples and references"
- •Edge Cases (2-3 items): "✅ Handled missing files gracefully"
- •Temporal (1 item): "✅ Document dated with current date"
Format:
## Success Criteria - ✅ Temporal awareness established in Phase 1 - ✅ Phase 1 deliverable: [specific output] - ✅ Phase 2 deliverable: [specific output] - ✅ All files created/modified successfully - ✅ Quality validation passed with score ≥70/80 - ✅ Confidence level >85% with supporting evidence - ✅ Edge cases documented and handled - ✅ Reference documentation created (if using progressive disclosure) [10-16 total items]
2.5 Design Self-Critique (6-10 questions)
Question Categories:
- •Completeness: "Did I check all [domain-specific items]?"
- •Confidence: "What is my confidence level? Why?"
- •Assumptions: "What assumptions did I make?"
- •False Positives: "Could [finding X] be wrong? How?"
- •False Negatives: "What might I have missed?"
- •Verification: "How can user verify this?"
- •Temporal: "Did I use current date correctly?"
Format:
## Self-Critique 1. **Domain Accuracy**: Did I correctly apply [domain] expertise? 2. **Tool Selection**: Did I use optimal tools for each task? 3. **Edge Cases**: Did I handle errors and failures gracefully? 4. **Temporal Accuracy**: Did I establish current date in Phase 1? 5. **Confidence Basis**: What evidence supports my confidence level? 6. **Assumptions**: What assumptions should the user validate? [6-10 total questions]
2.6 Define Confidence Thresholds
Three-Tier System:
## Confidence Thresholds - **High (85-95%)**: [Specific conditions: "All criteria met, deliverables complete, tests passed"] - **Medium (70-84%)**: [Conditions: "Most criteria met, minor issues present, acceptable quality"] - **Low (<70%)**: [Conditions: "Significant issues, incomplete work - continue working"]
Domain-Specific Examples:
- •Code analysis: Based on test coverage, execution traces
- •Legal: Based on citation verification, precedent alignment
- •Research: Based on source quality, corroboration
- •Debugging: Based on reproduction success, log evidence
Step 3: Implementation
Objective: Write the agent definition file following v2 architecture.
3.1 Create Agent Frontmatter
Template:
--- name: agent-name description: Clear one-sentence description. Use when [specific trigger conditions]. Examples: [concrete user questions]. tools: Read, Write, Edit, Bash, Grep, Glob, TodoWrite model: claude-sonnet-4-5 color: blue ---
Guidelines:
- •name: Hyphen-case (my-agent-name), <40 chars
- •description: Include WHEN to use + example questions
- •tools: Only list tools actually used in phases
- •model: Usually claude-sonnet-4-5 (use opus for complex reasoning)
- •color: blue/green/purple/gold/red for visual grouping
3.2 Write Agent Opening
Structure:
# Agent Name **Purpose**: [1-2 sentences on what this agent does] **Core Responsibilities**: 1. [Responsibility 1 with domain context] 2. [Responsibility 2 with domain context] 3. [Responsibility 3 with domain context] [3-7 items total] **Specialized Knowledge** (if applicable): - Domain-specific terminology - Technical constraints - Industry standards
3.3 Add Decision Tree (if multi-mode)
When to include: Agent operates in different modes or scenarios
Template:
## Decision Tree: [What to Decide] When tasked with [type of request], first determine the appropriate [mode/type]: **Mode A** - Use when: - [Condition 1] - [Condition 2] - User asks "[example question]" → Follow Phase 1A-2A workflow **Mode B** - Use when: - [Condition 1] - [Condition 2] - User asks "[example question]" → Follow Phase 1B-2B workflow
3.4 Implement Phases (from Step 2.2)
Critical: First phase MUST include temporal awareness pattern.
3.5 Add Success Criteria, Self-Critique, Confidence (from Step 2.4-2.6)
3.6 Consider Progressive Disclosure
When to extract to references:
- •Agent would exceed 250 lines with inline details
- •Has extensive pattern catalogs (3+ detailed patterns)
- •Includes large lookup tables or reference data
- •Contains detailed code examples (>30 lines)
What to extract:
- •Detailed code examples
- •Technical deep-dives
- •Edge case handling details
- •Reference lookup tables
Reference in main agent:
## Pattern Detection **Reference Documentation**: `~/.claude/agents-library/refs/[agent]-patterns.md` **Key patterns** (see reference for details): 1. Pattern A (CRITICAL) 2. Pattern B (WARNING) 3. Pattern C (INFO)
Line Count Targets:
- •Main agent: 150-250 lines (ideal: 200)
- •Reference docs: 200+ lines (no limit)
Step 4: Quality Validation
Objective: Score agent quality using 0-80 rubric and iterate if needed.
4.1 Use Automated Validation
Run validate_agent.py:
~/.claude/skills/agent-creator/scripts/validate_agent.py /path/to/agent.md
Output:
Quality Score: 72/80 (Excellent) Phase Structure: 15/15 ✅ Success Criteria: 14/15 ⚠️ (Missing 1 criterion) Self-Critique: 10/10 ✅ Progressive Disclosure: 8/10 ⚠️ (232 lines, close to limit) Tool Usage: 10/10 ✅ Documentation: 5/10 ❌ (Missing examples) Edge Case Handling: 10/10 ✅ Recommendations: - Add 1 more success criterion (target: 10-16) - Add usage examples for better documentation
Scoring Rubric:
- •70-80: Excellent - production ready
- •60-69: Good - minor improvements needed
- •50-59: Fair - significant improvements needed
- •<50: Poor - major refactoring required
See references/quality-rubric-explained.md for detailed breakdown.
4.2 Manual Review Checklist
Even with automated scoring, manually verify:
- • Temporal awareness in Phase 1 with REQUIRED label
- • All tools in frontmatter are actually used in phases
- • Success criteria are specific and measurable (not vague)
- • Self-critique questions are domain-specific (not generic)
- • Confidence thresholds have concrete conditions
- • Examples demonstrate real usage (if included)
- • No spelling errors in critical sections
- • Markdown formatting is valid
4.3 Iterate if Score <70
Common improvements:
- •Add edge case handling (+10 pts): Document error conditions
- •Improve documentation (+5-10 pts): Add examples, clarify instructions
- •Refine success criteria (+3-5 pts): Make more specific and measurable
- •Progressive disclosure (+5-10 pts): Extract details to references if >250 lines
Iterate until score ≥70 or diminishing returns.
Step 5: Deployment
Objective: Deploy agent to appropriate location(s) and verify availability.
5.1 Determine Deployment Target(s)
Global Library (~/.claude/agents-library/):
- •Persistent across all projects
- •Available to all Claude Code instances
- •Use for: Reusable agents (research, code formatting, validation)
Local Project (.claude/agents/):
- •Project-specific
- •Version controlled with project
- •Use for: Domain-specific agents (this project's business logic)
Both: Deploy to global first, copy to local if project needs it
5.2 Deploy Agent
To Global Library:
cp /path/to/my-agent.md ~/.claude/agents-library/my-agent.md
To Local Project:
cp /path/to/my-agent.md ./.claude/agents/my-agent.md
With References:
# Deploy agent cp my-agent.md ~/.claude/agents-library/ # Deploy reference doc cp my-agent-patterns.md ~/.claude/agents-library/refs/
5.3 Verify Availability
Restart Claude Code to load new agent.
Test invocation:
"[Agent Name], help me with [typical task]"
Check agent registry (if using CEO orchestrator):
- •Update CEO's worker agent registry if this is a new operational agent
- •Add estimated duration based on similar agents
Decision Trees
Decision Tree 1: Create New Agent vs Extend Existing
Create New Agent when:
- •New domain/expertise area (e.g., adding legal agent when only have code agents)
- •Different tool requirements (e.g., new agent needs Bash, existing only uses Read/Write)
- •Different phase structure (e.g., new agent has 5 phases, existing has 3)
- •User explicitly requests new agent
Extend Existing Agent when:
- •Same domain, just adding capabilities (e.g., PDF agent adding form-filling)
- •Same tool set, similar workflow
- •Agent currently <200 lines (room to grow)
- •Change is backward compatible
Create New + Deprecate Old when:
- •Fundamental architecture change (v1 → v2)
- •Existing agent has quality score <40
- •Existing agent >300 lines and unmaintainable
Decision Tree 2: When to Use Cognitive Reasoning Patterns
Use integrated-reasoning-v2 (meta-orchestrator) when:
- •8+ decision dimensions (architecture, tools, phases, quality, deployment, etc.)
- •Strategic importance (affects multiple projects, long-term impact)
- •Uncertain which reasoning pattern is best for the problem
Direct pattern selection (skip meta-orchestrator):
- •Diagnosis/debugging → Use hypothesis-elimination (HE)
- •Security review → Use adversarial-reasoning (AR)
- •Trade-off resolution → Use dialectical-reasoning (DR)
- •Novel problem → Use analogical-transfer (AT)
- •Time pressure → Use rapid-triage-reasoning (RTR)
- •Stakeholder coordination → Use negotiated-decision-framework (NDF)
- •High confidence required (>90%, mission-critical)
- •Complex trade-offs (performance vs accuracy, simplicity vs power)
Use tree-of-thoughts when:
- •Clear evaluation criteria exist
- •Need single best solution
- •Medium complexity (4-7 dimensions)
Use breadth-of-thought when:
- •Solution space unknown
- •Need to explore all options
- •Multiple valid approaches
Use self-reflecting-chain when:
- •Sequential dependencies
- •Need step-by-step validation
- •Logical reasoning with backtracking
Use direct implementation when:
- •Simple agent (<3 phases)
- •Well-understood domain
- •Similar agents exist as templates
Common Mistakes to Avoid
See references/common-mistakes.md for detailed analysis. Top 5 pitfalls:
1. Missing Temporal Awareness ❌
Mistake: Forgetting to check current date in Phase 1 Impact: Documents with wrong dates (legal/compliance risk) Fix: Always include temporal awareness with REQUIRED label in Phase 1
2. Vague Success Criteria ❌
Mistake: "✅ Agent works correctly" (not measurable) Impact: Can't validate agent actually succeeded Fix: "✅ Generated report includes 5 sections: summary, findings, evidence, recommendations, confidence score"
3. Generic Self-Critique ❌
Mistake: "Did I do a good job?" (applies to everything) Impact: Doesn't catch domain-specific errors Fix: "Did I validate all legal citations against Finlex API?" (domain-specific)
4. Tool Overload ❌
Mistake: Listing 10+ tools in frontmatter when only 3 are used Impact: Confusing, suggests agent does more than it does Fix: Only list tools actually referenced in phase actions
5. No Edge Case Handling ❌
Mistake: Only implementing "happy path" Impact: Agent fails on unexpected inputs, errors not handled gracefully Fix: Add "Edge Cases" section, document what to do when things go wrong
Using validate_agent.py
The validation script provides automated quality scoring:
Basic Usage:
~/.claude/skills/agent-creator/scripts/validate_agent.py ~/.claude/agents-library/my-agent.md
Output Interpretation:
- •70-80: Ship it! Excellent quality
- •60-69: Almost there, minor fixes
- •50-59: Needs work, iterate
- •<50: Major refactoring required
What it checks:
- •Phase structure (3-5 phases, clear objectives, deliverables)
- •Success criteria (10-16 items, specific)
- •Self-critique (6-10 questions, domain-specific)
- •Progressive disclosure (150-250 line target)
- •Tool usage (tools in frontmatter match phase usage)
- •Documentation (examples, references)
- •Edge case handling (documented error scenarios)
- •Temporal awareness (REQUIRED in Phase 1)
See references/quality-rubric-explained.md for scoring details.
Reference Documentation
This skill includes detailed reference documentation:
references/agent-examples.md: Annotated examples of high-quality agents
- •legal-agent (264 lines, progressive disclosure, 68/80 quality)
- •ceo-orchestrator (244 lines, integrated-reasoning integration)
- •agent-hr-manager (748 lines, meta-agent patterns)
references/quality-rubric-explained.md: Deep-dive on 0-80 scoring system
- •Detailed breakdown of each category
- •Examples of excellent vs poor implementations
- •How to improve scores in each area
references/common-mistakes.md: Anti-pattern catalog
- •10 most common agent creation mistakes
- •Real examples from production agents
- •How to detect and fix each mistake
references/temporal-awareness-deep.md: Why temporal awareness matters
- •Legal/compliance risks of wrong dates
- •The pizza baker contract bug case study
- •Implementation patterns and validation
Quick Start Examples
Example 1: Simple Agent (CSV to Markdown Converter)
Requirements: Convert CSV files to markdown tables
Architecture:
- •3 phases (Parse CSV → Format Table → Output Markdown)
- •Tools: Read, Write, Bash
- •<200 lines, no progressive disclosure needed
Key Decisions:
- •Simple agent (linear workflow)
- •No decision tree (single mode)
- •Success criteria: 10 items
- •Self-critique: 6 questions
Implementation time: ~20 minutes Expected quality score: 63-70/80
Example 2: Complex Agent (Multi-Language Legal Compliance Checker)
Requirements: Check code/documents for GDPR, Finnish, and EU law compliance
Architecture:
- •5 phases (Temporal + Scan → Finnish Law → EU Law → Cross-Reference → Report)
- •Tools: Read, Bash, Grep, WebFetch, Task (for legal-agent)
- •220 lines with references/legal-patterns.md (150 lines)
Key Decisions:
- •Complex agent (multi-jurisdiction)
- •Decision tree (document type: code vs contracts vs policies)
- •Success criteria: 14 items
- •Self-critique: 8 questions
- •Uses integrated-reasoning for cross-jurisdiction conflicts
Implementation time: ~2 hours Expected quality score: 72-80/80
Summary: 5-Step Workflow
- •Temporal Awareness & Requirements → Current date + clear problem definition
- •Architecture Design → Phases, tools, success criteria, self-critique, confidence
- •Implementation → Write agent following v2 patterns (150-250 lines)
- •Quality Validation → Score with validate_agent.py (target: ≥70/80)
- •Deployment → Copy to global library and/or local project
Validation checkpoint: Run validate_agent.py before deploying!
Meta: This skill was designed using integrated-reasoning (94% confidence) to synthesize patterns from agent-design-patterns.md and 17 production v2 agents.