AgentSkillsCN

Conformance Test Design

一致性测试设计。

SKILL.md

Skill: Conformance Test Design (Golden Graph/Plan + Triad Matrix)

Intent

  • Enable an agent to design conformance tests that mechanically enforce correctness across the triad matrix ((component × binder × policy_pack)).
  • Ensure determinism, policy correctness, and contract stability are gated by tests rather than conventions.

Scope boundaries

  • In-scope: defining what must be tested (golden graphs/plans, policy outcomes, intent bundles), coverage expectations, and failure semantics.
  • Out-of-scope: implementing the test harness; backend-specific snapshot formats.

Primary concepts

  • Golden cases: canonical inputs with canonical expected outputs.
  • Triad matrix coverage: verifying behavior across the composition axes (component/binder/policy pack).
  • Conformance: verifying invariants, not just happy-path outcomes.

Required inputs/context

  • Contract references: current graph vocabulary, capability contracts, policy rule IDs, and structured output expectations.
  • Policy pack set: named packs and enforcement tiers used in the matrix (explicit inputs).
  • Determinism spec: stable ID, ordering, and serialization expectations.
  • Representative scenarios: minimal but complete examples spanning core capabilities, security intents, and common binder patterns.

Expected outputs

  • Structured outputs (minimum):
    • Test inventory: list of golden cases, matrix dimensions covered, and expected invariant checks.
    • Oracle definition: what constitutes pass/fail and what outputs must match (conceptual).
    • Failure classification: determinism failure vs policy drift vs contract incompatibility vs binder mismatch.
  • Human-readable outputs:
    • Triage guidance: what invariant was violated, likely causes, and what must be fixed (not “loosen the test”).

Acceptance criteria

  • Coverage: conformance suite covers required matrix combinations for touched subsystems.
  • Strictness: nondeterminism and policy drift are treated as hard failures where required.
  • Reproducibility: tests are stable and do not depend on time, network, or environment discovery.

Validation signals

  • Snapshot mismatch for golden outputs is a determinism or semantic drift signal (deny until resolved).
  • Outcome drift across identical inputs/pack selection is a policy drift signal (deny).
  • Matrix gaps for touched capabilities/binders are a coverage failure (deny for high-assurance changes).

Guardrails & forbidden behaviors

  • Forbidden: “fixing” regressions by weakening assertions without addressing the underlying invariant violation.
  • Forbidden: tests that encode backend/provider semantics as the oracle of correctness.
  • Escalation (HITL required):
    • Any intentional change to golden outputs that affects compliance/security posture.

Used by roles

  • Test & Conformance
  • Kernel / Graph Engineer
  • Binder Engineer
  • Policy Pack Authoring