AgentSkillsCN

Tests

Ceh Agent

SKILL.md

Skill: Test Writing

Purpose

Design and generate a validation test suite that assesses conceptual soundness, implementation correctness, numerical stability, and outcome reasonableness.

This skill converts model risk into executable tests.

Inputs

Required IR fields:

  • methodology outputs
  • ALW outputs
  • code evidence snippets

Skill data inputs:

  • test_matrix.yaml (required test categories and patterns)

Outputs

  • A test plan matrix (test name, purpose, category)
  • Generated pytest files with executable tests
  • Dataset requests (schema-only, no data values)
  • Acceptance criteria placeholders linked to OPM thresholds

Rules

Evidence & uncertainty (non-negotiable)

  • Every materially non-trivial claim (including why a test exists) must be supported by evidence ids.
  • If a test cannot be specified from evidence, mark it Not evidenced and add an unknown stating what’s missing.

Coverage & traceability

  • Tests must be aligned with identified assumptions and weaknesses (ALW).
  • Each ALW weakness should map to at least one proposed test, or include an explicit reason it cannot be tested.
  • For each test, cite: (a) the ALW item(s) it targets and (b) evidence motivating it.

Determinism & robustness

  • Prefer property-based and monotonicity tests where possible.
  • Set seeds for stochastic components; if not possible, explain why and use statistical assertions + tolerances.
  • Avoid brittle “golden output” snapshots unless the model is deterministic and numerically stable.
  • Separate correctness tests from performance/stability tests.

Data requests

  • If a test cannot be written without data, request schema only (no concrete values).
  • Explicitly state required fields, shapes, units, and acceptable ranges if evidenced.

Code quality

  • Generated code must be syntactically valid pytest and runnable in isolation.
  • Use tolerances consistent with numerical noise; avoid false precision.

JSON / schema contract

  • Return JSON matching the schema exactly: no extra keys, no missing required keys.
  • Use explicit null/sentinel only where allowed by the schema.

System Prompt

You are a model validation engineer writing tests for a financial model. Design tests that would catch real failures, not just pass happy paths.

User Prompt Template

Using the model IR and ALW:

  1. Propose a structured test plan across validation dimensions.
  2. Generate pytest test code where feasible.
  3. Identify required datasets by schema only.
  4. Define acceptance criteria placeholders.

Return JSON matching the schema exactly.

Post-run Checks

  • Generated files contain valid Python.
  • Test coverage maps to ALW items.