eaa-hypothesis-verification

在通过 Docker 实验验证主张时使用。在依赖主张之前，运用 TBV 原则对主张进行测试。在实验搭建或主张验证时触发。

SKILL.md

--- frontmatter

name: eaa-hypothesis-verification
description: Use when verifying claims through Docker experimentation. Applies TBV principle to test claims before relying on them. Trigger with experiment setup or claim verification.
version: 1.0.0
compatibility: Requires AI Maestro installed.
agent: test-engineer
context: fork

Hypothesis Verification Skill

Overview

Patterns for personally verifying claims through controlled Docker experimentation. Use this skill when you need to test whether a claim (from docs, researchers, or developers) is actually true.

TBV Principle: Everything is "To Be Verified" until you personally test it. Claims from any source require experimental confirmation before relying on them for decisions.

Prerequisites

•Docker installed and running
•Write access to experiment output directories
•Understanding of the claim to be verified
•Isolation environment for safe experimentation

Instructions

•Identify the claim to be verified (mark as TBV)
•Set up Docker container for isolated testing
•Design experiment with multiple approaches (Multiplicity Rule: 3+)
•Execute experiments and collect measurements
•Document findings in experimentation report
•Classify result: VERIFIED, UNVERIFIED, or PARTIALLY VERIFIED
•Clean up containers and archive prototype if valuable

Checklist

Copy this checklist and track your progress:

Output

Artifact	Location	Purpose
Experimentation Report	`experiments/<claim-name>/REPORT.md`	Documents hypothesis, approaches tested, measurements, and classification
Status Classification	Report header	VERIFIED / UNVERIFIED / PARTIALLY VERIFIED / TBV
Measurement Data	`experiments/<claim-name>/data/`	Raw metrics, logs, benchmark results
Prototype Archive (if valuable)	`prototypes/<claim-name>/`	Working code with README explaining findings
Docker Cleanup Log	Terminal output	Confirms containers removed after experiment

Docker Experimentation

For Docker container setup and experiment infrastructure, see docker-experimentation.md:

•
1. •Why Docker is Required
•
1. •Container Structure Template
•
1. •docker-compose.yml Template
•
1. •Container Cleanup Procedure

Researcher vs Experimenter

For understanding the critical distinction between roles, see researcher-vs-experimenter.md:

•
1. •The Researcher (What OTHERS say is true)
•
1. •The Experimenter (What I can PROVE is true)
•
1. •The TBV Principle (To Be Verified)
•
1. •Workflow Integration: Researcher → Experimenter

Experiment Scenarios

For when to invoke the experimenter, see experiment-scenarios.md:

•
1. •Case 1: Post-Research Validation
•
1. •Case 2: Issue Reproduction in Isolation
•
1. •Case 3: Architectural Bug Investigation
•
1. •Case 4: New API/Tool Evaluation
•
1. •Case 5: Fact-Checking Claims (Quick Verification)

Multiplicity Rule

For the evidence-based selection process, see multiplicity-rule.md:

•
1. •The Multiplicity Process
•
1. •Example: Implementing a Paper Algorithm
•
1. •Iterative Selection Workflow

Output Templates

For experiment documentation and prototype archiving, see output-templates.md:

•
1. •Experiment Directory Structure
•
1. •Experimentation Report Template
•
1. •Prototype Archive Policy
•
1. •Archive README Template

Quick Reference

Status Classifications

Status	Meaning	Safe to Rely On?
VERIFIED	Experimentally confirmed	YES
UNVERIFIED	Tested but failed to match claim	NO (dangerous)
PARTIALLY VERIFIED	True under specific conditions	YES (with conditions)
TBV	Not yet tested	NO (unknown risk)

Implementation vs Experimental Code

Implementation Code	Experimental Code
Permanent (committed)	Ephemeral (deleted after)
Production-ready	Throwaway testbed
Follows specifications	Generates specifications
One chosen solution	Multiple solutions compared
Part of delivery	Part of decision-making

Workflow Integration Points

Workflow	Trigger	Experimenter Action
BUILD	Architecture decision needs validation	Validates with testbeds
DEBUG	Root cause unclear or fix uncertain	Reproduces in isolation, tests fixes
REVIEW	Performance concerns or architectural questions	Benchmarks alternatives

IRON RULES Summary

•Multiplicity: Always test 3+ approaches
•Ephemeral code: Delete after findings documented
•Evidence-based: Conclusions backed by measurements
•Docker isolation: ALL experiments in containers
•Documentation: 50% output is the report
•TBV by default: Everything unverified until tested

Examples

Example 1: Verify API Performance Claim

code

Claim: "Redis caches API responses 10x faster than in-memory dict"
Status: TBV

1. Create Docker container with Redis and Python
2. Implement both approaches:
   - Approach A: In-memory dict cache
   - Approach B: Redis cache
   - Approach C: Redis with connection pooling
3. Run 1000 iterations, measure latency
4. Results:
   - Dict: 0.001ms avg
   - Redis: 0.15ms avg
   - Redis pooled: 0.08ms avg
5. Classification: UNVERIFIED (Redis is slower for simple cases)
6. Conditions: Redis faster only for distributed scenarios

Example 2: Verify Library Compatibility