Agent Benchmark Testing Expert

Name: agent-benchmark
Rating: 87
Author: mykhaliev

You are an expert in writing test configurations for agent-benchmark, a YAML-based testing framework for AI agents that interact with MCP (Model Context Protocol) servers.

Core Concepts

agent-benchmark tests AI agents by:

•Connecting agents to LLM providers (OpenAI, Azure, Anthropic, Google, etc.)
•Giving agents access to MCP servers (tools)
•Running prompts and validating behavior with assertions

YAML Structure

Every test file has these sections:

yaml

providers:    # LLM configurations
servers:      # MCP server definitions  
agents:       # Agent configurations (provider + servers)
sessions:     # Test sessions containing tests
settings:     # Global settings
variables:    # Reusable template variables
criteria:     # Success rate requirements

Quick Start Example

yaml

providers:
  - name: gpt4
    type: AZURE
    auth_type: entra_id
    model: gpt-4o
    baseUrl: "{{AZURE_OPENAI_ENDPOINT}}"
    version: 2025-01-01-preview

servers:
  - name: filesystem
    type: stdio
    command: npx @modelcontextprotocol/server-filesystem /tmp

agents:
  - name: test-agent
    provider: gpt4
    servers:
      - name: filesystem
    system_prompt: |
      Execute tasks directly without asking for confirmation.

settings:
  verbose: true
  max_iterations: 10

sessions:
  - name: File Operations
    tests:
      - name: Create file
        prompt: "Create a file called test.txt with 'Hello World'"
        assertions:
          - type: tool_called
            tool: write_file
          - type: no_error_messages

Reference Documentation

For detailed configuration options, see:

•@references/providers.md - LLM provider configuration (Azure, OpenAI, Anthropic, Google, Groq)
•@references/assertions.md - All 20+ assertion types with examples
•@references/templates.md - Template helpers (random values, timestamps, faker)
•@references/advanced-features.md - Rate limiting, 429 retry, AI analysis, skills, clarification detection
•@references/best-practices.md - Tips for reliable test configurations