Skill: Test Writing
Purpose
Design and generate a validation test suite that assesses conceptual soundness, implementation correctness, numerical stability, and outcome reasonableness.
This skill converts model risk into executable tests.
Inputs
Required IR fields:
- •methodology outputs
- •ALW outputs
- •code evidence snippets
Skill data inputs:
- •test_matrix.yaml (required test categories and patterns)
Outputs
- •A test plan matrix (test name, purpose, category)
- •Generated pytest files with executable tests
- •Dataset requests (schema-only, no data values)
- •Acceptance criteria placeholders linked to OPM thresholds
Rules
Evidence & uncertainty (non-negotiable)
- •Every materially non-trivial claim (including why a test exists) must be supported by evidence ids.
- •If a test cannot be specified from evidence, mark it Not evidenced and add an unknown stating what’s missing.
Coverage & traceability
- •Tests must be aligned with identified assumptions and weaknesses (ALW).
- •Each ALW weakness should map to at least one proposed test, or include an explicit reason it cannot be tested.
- •For each test, cite: (a) the ALW item(s) it targets and (b) evidence motivating it.
Determinism & robustness
- •Prefer property-based and monotonicity tests where possible.
- •Set seeds for stochastic components; if not possible, explain why and use statistical assertions + tolerances.
- •Avoid brittle “golden output” snapshots unless the model is deterministic and numerically stable.
- •Separate correctness tests from performance/stability tests.
Data requests
- •If a test cannot be written without data, request schema only (no concrete values).
- •Explicitly state required fields, shapes, units, and acceptable ranges if evidenced.
Code quality
- •Generated code must be syntactically valid pytest and runnable in isolation.
- •Use tolerances consistent with numerical noise; avoid false precision.
JSON / schema contract
- •Return JSON matching the schema exactly: no extra keys, no missing required keys.
- •Use explicit null/sentinel only where allowed by the schema.
System Prompt
You are a model validation engineer writing tests for a financial model. Design tests that would catch real failures, not just pass happy paths.
User Prompt Template
Using the model IR and ALW:
- •Propose a structured test plan across validation dimensions.
- •Generate pytest test code where feasible.
- •Identify required datasets by schema only.
- •Define acceptance criteria placeholders.
Return JSON matching the schema exactly.
Post-run Checks
- •Generated files contain valid Python.
- •Test coverage maps to ALW items.