BAT - Bot Acceptance Testing
Bot acceptance testing validates that MCP tools work correctly from a real AI agent's perspective. You design test scenarios dynamically, run them via tests/uat/run_uat.py, and evaluate results.
When to Use BAT
- •PR validation: Test that tool changes work correctly from an agent's perspective
- •Regression detection: Compare behavior between branches
- •Integration verification: Ensure MCP tools work end-to-end with real agent CLIs
Workflow
- •Analyze the change: Read the diff, identify which tools are affected
- •Design scenario: Generate a scenario JSON with setup/test/teardown prompts
- •Run the script: Pipe the scenario to
python tests/uat/run_uat.py - •Evaluate summary: Check
all_passedper agent. If true, you're done. - •Dig deeper on failure: Read
results_filefor full output, stderr, raw JSON - •Regression check: If test fails, re-run with
--branch masterto compare
Output Structure
The runner returns a concise summary to stdout (saves context when all passes):
{
"results_file": "/tmp/bat_results_abc123.json",
"agents": {
"gemini": {
"all_passed": true,
"test": {
"completed": true,
"duration_ms": 8100,
"exit_code": 0,
"num_turns": 5,
"tool_stats": { "totalCalls": 4, "totalSuccess": 4, "totalFail": 0 }
},
"aggregate": {
"total_duration_ms": 15300,
"total_turns": 12,
"total_tool_calls": 9,
"total_tool_success": 9,
"total_tool_fail": 0
}
}
}
}
- •Phase stats:
num_turns,tool_stats(per phase) for fine-grained comparison - •Aggregate stats: Total counts across all phases for overall efficiency comparison
- •On failure: also includes
outputandstderrfor diagnosis - •Full results: raw JSON, complete output always available at
results_file
Scenario Design Guidelines
- •setup_prompt: Create any entities/state the test needs
- •test_prompt: Exercise the tools being tested, ask the agent to report results clearly
- •teardown_prompt: Clean up created entities
- •Keep prompts focused - each scenario tests ONE behavior
- •Ask the agent to report: what succeeded, what failed, any unexpected behavior
Example: Testing Error Signaling
cat <<'EOF' | python tests/uat/run_uat.py --agents gemini
{
"setup_prompt": "Create a test automation called 'bat_error_test' with a time trigger at 23:59 and action to turn on light.bed_light.",
"test_prompt": "Try to get automation 'automation.nonexistent_xyz'. Report if the tool signaled an error or returned a normal response. Then get automation 'automation.bat_error_test' and report its structure.",
"teardown_prompt": "Delete automation 'bat_error_test' if it exists."
}
EOF
Regression Comparison Workflow
Full BAT comparison (recommended):
- •Pull latest master:
git fetch origin master && git checkout master && git pull - •Run on master: Save scenario to file, run and save results
- •Switch to branch:
git checkout feat/my-branch - •Run on branch: Run same scenario, compare stats
Compare these metrics:
- •Task completion: Did both pass? Any new failures?
- •Accuracy: Check agent output quality - did it understand the task correctly?
- •Efficiency: Compare
aggregate.total_tool_calls,aggregate.total_turns,aggregate.total_duration_ms - •Variation testing: Ask the same task in different ways to test robustness
Quick comparison (single command):
# Test the PR branch
echo '{"test_prompt":"..."}' | python tests/uat/run_uat.py --branch feat/tool-errors --agents gemini
# Compare against master
echo '{"test_prompt":"..."}' | python tests/uat/run_uat.py --branch master --agents gemini
Cost Awareness
Each scenario invocation costs API credits (one per agent per phase). Design scenarios efficiently:
- •Combine related checks in a single test_prompt when possible
- •Only use setup/teardown when the test needs specific state
- •Start with one agent, expand to both only when cross-agent comparison matters
Handling Arguments
When /bat is invoked with arguments:
If arguments contain a scenario description, generate the JSON scenario and run it:
/bat test automation create with sunrise trigger then modify to sunset
→ Generate appropriate scenario JSON and execute
If --help or no arguments, show this help text.
Otherwise, treat $ARGUMENTS as instructions for what to test and design+run the scenario accordingly.
Full Documentation
For complete CLI reference and output format, see tests/uat/README.md.