Prove It

When to use

•The user asserts certainty: “always”, “never”, “guaranteed”, “optimal”, “cannot fail”, “no downside”, “100%”.
•The user asks for a devil’s advocate or proof.
•The claim feels too clean for the domain.

Round cadence (mandatory)

•Default: autoloop (no approvals). Run exactly one gauntlet round per assistant turn and continue next turn until Oracle synthesis. If confidence remains low after Oracle synthesis, continue with additional rounds (11+) and publish an updated Oracle synthesis.
•
After each round, publish:
- •Round Ledger
- •Knowledge Delta
•Do not ask for permission to continue. Pause only when you must ask the user a question or the user says "stop".
•Step mode (explicit): if the user asks to "pause" / "step" / "one round at a time", run one round then wait for "next".
•Fast mode (explicit): if the user explicitly requests "fast mode", run rounds 1-10 + Oracle synthesis in one assistant turn.

Mode invocation

Mode	Default?	How to invoke	Cadence
Autoloop	yes	(no phrase)	1 round/turn; auto-continue
Step mode	no	"step mode" / "pause each round" / "pause" / "step" / "one round at a time"	1 round/turn; wait for "next"
Fast mode	no	"fast mode"	rounds 1-10 + Oracle in one turn

Quick start

•Restate the claim and its scope.
•Default to autoloop. If the user explicitly requests "step mode" or "fast mode", use that instead.
•Run round 1 and publish the Round Ledger + Knowledge Delta.
•Continue per the selected mode until Oracle synthesis. If confidence remains low, run additional rounds and publish an updated Oracle synthesis.

Ten-round gauntlet

•Counterexamples: smallest concrete break.
•Logic traps: missing quantifiers/premises.
•Boundary cases: zero/one/max/empty/extreme scale.
•Adversarial inputs: worst-case distributions/abuse.
•Alternative paradigms: different model flips the conclusion.
•Operational constraints: latency/cost/compliance/availability.
•Probabilistic uncertainty: variance, tail risk, sampling bias.
•Comparative baselines: “better than what?”, which metric?
•Meta-test: fastest disproof experiment.
•Oracle synthesis: tightest surviving claim with boundaries. If confidence is still low, repeat rounds 1-9 as needed, then re-run Oracle synthesis.

Round self-prompt bank (pick exactly 1)

Internal self-prompts for selecting round focus. Do not ask the user unless blocked.

•Counterexamples: What is the smallest input that breaks this?
•Logic traps: What unstated assumption must hold?
•Boundary cases: Which boundary is most likely in real use?
•Adversarial: What does worst-case input look like?
•Alternative paradigm: What objective makes the opposite true?
•Operational: Which dependency/policy is a hard stop?
•Uncertainty: What distribution shift flips the result?
•Baseline: Better than what, on which metric?
•Meta-test: What experiment would change your mind fastest?
•Oracle: What explicit boundaries keep this honest?

Core artifacts

Argument map

code

Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:

Round Ledger (update every turn)

code

Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:

Knowledge Delta (publish every turn)

code

- New:
- Updated:
- Invalidated:

Claim boundary table

code

| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale         |           |              |             |           |
| Data quality  |           |              |             |           |
| Environment   |           |              |             |           |
| Adversary     |           |              |             |           |

Next-tests plan

code

| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|

Domain packs

Performance

Use when the claim is about speed, latency, throughput, or resources.

•Clarify: median vs tail latency vs throughput.
•Identify workload shape (spiky vs steady) and bottleneck resource.

Product

Use when the claim is about user impact, adoption, or behavior.

•Clarify user segment and success metric.
•State the baseline/counterfactual.
•Name the likely unintended behavior/tradeoff.

Oracle synthesis template (round 10 / as needed)

code

Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...

Deliverable format (per turn)

•Round number + focus.
•Round Ledger + Knowledge Delta.
•At most one question for the user (only when blocked).
•In fast mode, run rounds 1-10 + Oracle synthesis in one turn (repeat the above per round).

Activation cues

•"always" / "never" / "guaranteed" / "optimal" / "cannot fail" / "no downside" / "100%"
•"prove it" / "devil's advocate" / "stress test" / "rigor"