Plan Validation

Comprehensive validation of plans before execution. Combines adversarial challenge review with confidence scoring to catch flaws when they're cheap to fix.

Overview

Plan validation evaluates readiness across two complementary approaches:

•Challenge Review - Adversarial analysis to find issues
•Confidence Scoring - Quantitative assessment of plan quality

Part 1: The Five Challenges

Challenge every plan as if you're trying to break it.

1. REQUIREMENTS CHALLENGE

"Does this plan actually solve what the user asked for?"

•Restate the original goal in your own words
•Check each requirement against plan steps
•Identify requirements with NO corresponding step
•Identify steps that don't trace to any requirement

Red flag: If you can't map every requirement to a step, the plan is incomplete.

2. EDGE CASE CHALLENGE

"What inputs or conditions would break this?"

Consider:

•Empty/null/undefined inputs
•Extremely large inputs
•Concurrent access
•Network failures
•Permission denied
•Disk full
•Invalid state transitions

Red flag: If the plan assumes "happy path" only, it will fail in production.

3. SIMPLICITY CHALLENGE

"Is this the simplest solution that works?"

Ask:

•Can any step be removed without breaking functionality?
•Are we adding abstraction that isn't needed yet?
•Are we solving problems the user didn't ask for?
•Could a junior developer understand this approach?

Red flag: If explaining "why" takes longer than explaining "what," it's too complex.

4. INTEGRATION CHALLENGE

"How does this interact with existing code?"

Check:

•Does it conflict with existing patterns in the codebase?
•Does it introduce inconsistency?
•Will it break existing tests?
•Does it respect existing error handling patterns?

Red flag: If the plan ignores codebase conventions, it will create maintenance burden.

5. FAILURE MODE CHALLENGE

"When this fails (not if), what happens?"

For each step, ask:

•What's the blast radius of failure?
•Can we recover, or is manual intervention needed?
•Will the user know something went wrong?
•Is there data loss risk?

Red flag: If any step can fail silently or cause data loss, add safeguards.

Part 2: Confidence Scoring

Calculate a confidence score across four dimensions (100 points total):

Requirements Completeness (25 points)

Factor	Points	Condition
All personas consulted	10	Core team members loaded and their questions asked
User confirmed requirements	10	`gathered_requirements.user_confirmations` is non-empty
Edge cases identified	5	`gathered_requirements.edge_cases` has 2+ items

Step Quality (25 points)

Factor	Points	Condition
All steps atomic	10	Each step has single `primary_tool` and clear purpose
All steps have validation	10	Every step has `success_criteria` with `validation_command`
Dependencies clear	5	`depends_on` defined and no circular dependencies

Context Coverage (25 points)

Factor	Points	Condition
Memory rules embedded	10	`embedded_context.memory_rules` has content
Patterns discovered	10	`embedded_context.discovered_patterns` has 3+ patterns
Constraints documented	5	`embedded_context.constraints` is non-empty

Risk Assessment (25 points)

Factor	Points	Condition
Failure modes identified	10	`challenge_results.failure_modes.notes` has items
Retry behavior defined	10	Every step has `retry_behavior` with hints
Rollback possible	5	Plan has rollback strategy or git snapshot

Combined Output Format

code

╔══════════════════════════════════════════════════════════════╗
║  PLAN VALIDATION RESULTS                                     ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  CHALLENGE RESULTS                                           ║
║  ─────────────────                                           ║
║  REQUIREMENTS: [PASS | GAPS FOUND]                           ║
║  EDGE CASES:   [PASS | RISKS FOUND]                          ║
║  SIMPLICITY:   [PASS | OVERCOMPLICATED]                      ║
║  INTEGRATION:  [PASS | CONFLICTS FOUND]                      ║
║  FAILURE MODES:[PASS | UNHANDLED FAILURES]                   ║
║                                                              ║
║  CONFIDENCE SCORE: 87%                                       ║
║  ────────────────────                                        ║
║  Requirements:    [████████░░] 80%                           ║
║  Step Quality:    [██████████] 100%                          ║
║  Context:         [████████░░] 80%                           ║
║  Risk:            [██████░░░░] 60%                           ║
║                                                              ║
║  VERDICT: [APPROVED | REVISE PLAN]                           ║
║  Recommendation: [action if needed]                          ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Thresholds and Recommendations

Score	Level	Recommendation
90-100%	HIGH	Proceed with confidence
70-89%	MEDIUM	Proceed with caution, consider addressing warnings
50-69%	LOW	Review warnings first, improve plan before building
<50%	CRITICAL	Do not build - improve plan significantly

When to Skip Full Validation

Skip for:

•Single-file changes under 50 lines
•Documentation-only changes
•Formatting/style changes
•Direct user instruction with explicit approach

Still do a quick REQUIREMENTS check even for simple tasks.

Revision Loop

If verdict is REVISE PLAN:

•Apply required changes to plan
•Re-run ONLY the failed challenges
•Repeat until APPROVED

Maximum 2 revision loops. If still failing, surface to user for decision.

Improving Low Scores

Requirements < 70%

•Load more team members and ask their questions
•Get explicit user confirmation on requirements
•Identify more edge cases

Step Quality < 70%

•Break large steps into smaller atomic actions
•Add validation commands to all steps
•Define clear dependencies

Context < 70%

•Load and embed more Memory rules
•Analyze codebase for patterns
•Document environment constraints

Risk < 70%

•Run the challenge phase more thoroughly
•Add fix hints to retry behavior
•Define rollback strategy

Integration with Planner

The Planner should:

•Run challenges after plan construction
•Calculate confidence score
•Display combined results to user
•If score < 70% or challenges fail, ask user to confirm before proceeding
•Store results in plan: plan.validation

json

{
  "validation": {
    "challenge_results": {
      "requirements": "pass",
      "edge_cases": "pass",
      "simplicity": "pass",
      "integration": "pass",
      "failure_modes": "risks_found"
    },
    "confidence_score": {
      "total": 87,
      "breakdown": {
        "requirements": 20,
        "step_quality": 25,
        "context": 20,
        "risk": 22
      }
    },
    "warnings": [
      "Only 1 edge case identified",
      "No constraints documented"
    ],
    "verdict": "APPROVED",
    "recommendation": "PROCEED_WITH_CAUTION"
  }
}

"Your goal is NOT to block progress. Your goal is to catch the issues that would waste hours of debugging later."