AgentSkillsCN

agent-test

分析航空代理规划器的行为测试结果,并应用改进工作流。适用于运行规划器测试、向 planner_test_cases.json 中添加测试用例、分析测试失败情况、改进规划器提示词、验证规划器改进效果,或在 tests/aviation_agent/ 测试基础设施中开展工作时使用。

SKILL.md
--- frontmatter
name: agent-test
description: >
  Analyze aviation agent planner behavioral test results and apply improvement workflow.
  Use when: running planner tests, adding test cases to planner_test_cases.json,
  analyzing test failures, improving planner prompts, validating planner improvements,
  or working with tests/aviation_agent/ test infrastructure.
allowed-tools: Read, Edit, Write, Bash, Glob, Grep

Aviation Agent Test Improvement

Analyze planner behavioral test results and systematically enhance tool definitions, planner prompts, and test coverage.

Critical: Use Existing Infrastructure Only

DO NOT CREATE NEW SCRIPTS. All infrastructure exists:

  • Test Runner: tests/aviation_agent/test_planner_behavior.py
  • Test Cases: tests/aviation_agent/fixtures/planner_test_cases.json
  • CSV Results: tests/aviation_agent/results/

Quick Start

  1. Read existing test cases from tests/aviation_agent/fixtures/planner_test_cases.json
  2. Generate new test cases and append to JSON file
  3. Run tests:
    bash
    source ./venv/bin/activate
    export $(cat web/server/.env | grep -v '^#' | xargs)
    RUN_PLANNER_BEHAVIOR_TESTS=1 python -m pytest tests/aviation_agent/test_planner_behavior.py -v
    
  4. Report CSV file path and summary metrics
  5. If failures exist, analyze CSV and propose code changes

Three-Agent Workflow

For detailed instructions on each agent, see:

Tool Selection Patterns

Query TypeToolKey Arguments
Routes (from X to Y)find_airports_near_routefrom_location, to_location, filters
Near locationfind_airports_near_locationlocation_query, filters
Airport detailsget_airport_detailsicao_code
Country searchsearch_airportsquery, filters
Notification requirementsget_notification_for_airporticao, day_of_week
Rules question (ONE country)answer_rules_questioncountry_code, question, tags
Rules browsing (list all)browse_rulescountry_code, tags, offset, limit
Rules comparison (2+ countries)compare_rules_between_countriescountries, tags, category

Available Filters

FilterTypeDescription
fuel_type'avgas' | 'jet_a'Preferred over legacy has_avgas/has_jet_a
has_avgasbooleanLegacy - still works
has_jet_abooleanLegacy - still works
has_hard_runwaybooleanPaved/hard surface runways
has_proceduresbooleanIFR procedures available
point_of_entrybooleanCustoms/border crossing
countrystringISO-2 country code
min_runway_length_ftnumberMinimum runway length
max_runway_length_ftnumberMaximum runway length
max_landing_feenumberMaximum landing fee
max_hours_noticenumberNotification requirements
hotelbooleanOn-site hotel
restaurantbooleanOn-site restaurant

Test Case Format

json
{
  "question": "User question in natural language",
  "expected_tool": "tool_name_from_manifest",
  "expected_arguments": {
    "arg1": "value1",
    "filters": { "filter_key": true }
  },
  "description": "Why this tool/args combination is correct"
}

Critical Rules

  1. NEVER create new test scripts - Use existing test_planner_behavior.py
  2. NEVER create analysis scripts - Read CSV files directly
  3. ALWAYS edit existing files - Append to planner_test_cases.json
  4. ALWAYS use venv - source ./venv/bin/activate
  5. ALWAYS load environment - export $(cat web/server/.env | grep -v '^#' | xargs)
  6. ALWAYS run tests and report - Print CSV path and summary metrics

Output Format

After Running Tests

code
Tests completed
Results saved to: tests/aviation_agent/results/planner_test_results_YYYYMMDD_HHMMSS.csv

Summary:
- Total tests: 21
- Passed: 21 (100%)
- Failed: 0 (0%)
- Tool match: 21/21 (100%)
- Args match: 21/21 (100%)

Key Files

  • Test Cases: tests/aviation_agent/fixtures/planner_test_cases.json
  • Test Runner: tests/aviation_agent/test_planner_behavior.py
  • Planner Prompt: shared/aviation_agent/planning.py
  • Tool Definitions: shared/aviation_agent/tools.py
  • Formatter: shared/aviation_agent/formatting.py