AgentSkillsCN

subagent-testing

通过六边形(端口与适配器)模式,将领域逻辑与基础设施解耦。 触发条件:六边形、端口-适配器、基础设施独立、领域隔离、可测试性 适用场景:业务逻辑分离、基础设施需变更、可测试性至关重要 切勿在以下情况下使用:选择范式(应优先考虑架构范式)、简单的 CRUD 操作

SKILL.md
--- frontmatter
name: subagent-testing
description: TDD-style testing methodology for skills using fresh subagent instances
  to prevent priming bias and validate skill effectiveness. Use when validating skill
  improvements, testing skill effectiveness, preventing priming bias, measuring skill
  impact on behavior. Do not use when implementing skills (use skill-authoring instead),
  creating hooks (use hook-authoring instead).
version: 1.4.0
category: testing
tags:
- testing
- validation
- TDD
- subagents
- fresh-instances
token_budget: 30
progressive_loading: true

Subagent Testing - TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

Table of Contents

  1. Overview
  2. Why Fresh Instances Matter
  3. Testing Methodology
  4. Quick Start
  5. Detailed Testing Guide
  6. Success Criteria

Overview

Fresh instances prevent priming: Each test uses a new Claude conversation to verify the skill's impact is measured, not conversation history effects.

Why Fresh Instances Matter

The Priming Problem

Running tests in the same conversation creates bias:

  • Prior context influences responses
  • Skill effects get mixed with conversation history
  • Can't isolate skill's true impact

Fresh Instance Benefits

  • Isolation: Each test starts clean
  • Reproducibility: Consistent baseline state
  • Measurement: Clear before/after comparison
  • Validation: Proves skill effectiveness, not priming

Testing Methodology

Three-phase TDD-style approach:

Phase 1: Baseline Testing (RED)

Test without skill to establish baseline behavior.

Phase 2: With-Skill Testing (GREEN)

Test with skill loaded to measure improvements.

Phase 3: Rationalization Testing (REFACTOR)

Test skill's anti-rationalization guardrails.

Quick Start

bash
# 1. Create baseline tests (without skill)
# Use 5 diverse scenarios
# Document full responses

# 2. Create with-skill tests (fresh instances)
# Load skill explicitly
# Use identical prompts
# Compare to baseline

# 3. Create rationalization tests
# Test anti-rationalization patterns
# Verify guardrails work

Detailed Testing Guide

For complete testing patterns, examples, and templates:

Success Criteria

  • Baseline: Document 5+ diverse baseline scenarios
  • Improvement: ≥50% improvement in skill-related metrics
  • Consistency: Results reproducible across fresh instances
  • Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts

See Also

  • skill-authoring: Creating effective skills
  • bulletproof-skill: Anti-rationalization patterns
  • test-skill: Automated skill testing command