AgentSkillsCN

red-blue-validator

通过红蓝对抗的迭代式压力测试,强化决策的稳健性。红队针对各项主张发起严谨而有力的攻击;蓝队则以防御措施、缓解策略与加固手段予以回应。循环往复,直至达到收敛标准,最终形成经受实战考验的成熟方案。 请主动启用以下场景:(1) 需要压力测试的高风险决策,(2) 重大承诺前的战略验证,(3) 架构决策的加固,(4) 提案防御的准备,(5) 安全态势的审查,(6) 以对抗性视角开展投资尽职调查。 触发指令:“让红队来测试”“蓝队”“压力测试”“攻击此计划”“找出弱点”“对抗性审查”“魔鬼代言人”“可能出什么问题”“戳破漏洞”“挑战此决策”“进行战争游戏”

SKILL.md
--- frontmatter
name: red-blue-validator
version: "2.0"
description: >
  Iterative adversarial stress-testing through Red/Blue team dynamics. Red Team
  generates substantive, steel-manned attacks against propositions; Blue Team
  responds with defenses, mitigations, and hardening. Cycles continue until
  convergence criteria are met, producing a battle-tested proposition.

  PROACTIVELY activate for: (1) High-stakes decisions requiring stress-testing,
  (2) Strategy validation before major commitment, (3) Architecture decision
  hardening, (4) Proposal defense preparation, (5) Security posture review,
  (6) Investment due diligence with adversarial lens.

  Triggers: "red team this", "blue team", "stress test", "attack this plan",
  "find weaknesses", "adversarial review", "devil's advocate", "what could go wrong",
  "poke holes in this", "challenge this decision", "war game this"

Red/Blue Team Validator

"Find weaknesses before reality does."

Every proposition—whether a decision, strategy, architecture, or plan—has vulnerabilities. This skill systematically exposes them through iterative adversarial cycles. Red Team attacks with substantive, steel-manned challenges. Blue Team defends with mitigations and hardening. The cycle continues until convergence: a battle-tested proposition with documented defenses.


1. Purpose

Core Value Proposition

Static analysis misses what adversarial pressure reveals. Red/Blue validation simulates the attacks your proposition will face—from competitors, critics, reality itself—and forces you to build defenses before you need them. The output is not just a risk list, but a hardened proposition that has survived systematic assault.

Capabilities

#CapabilityPhaseValue
1Proposition intake with attack surface mappingPre-RoundDefine what can be attacked
2Experience pool loading (domain failure patterns)Pre-RoundAvoid reinventing known failures
3Multi-category attack generationRound N: RedSurface vulnerabilities systematically
4Steel-manning attacks to maximum strengthRound N: RedEnsure attacks are not strawmen
5Severity scoring (CRITICAL/HIGH/MEDIUM/LOW)Round N: RedPrioritize responses
6Defense generation (REFUTE/MITIGATE/ACCEPT/HARDEN)Round N: BlueAddress each attack
7Proposition hardening through iterative refinementRound N: BlueStrengthen against attacks
8Convergence evaluation with explicit criteriaRound N: EvalKnow when to stop
9RISK-ASSESSMENT synthesis (CONTRACT-08)Post-RoundStandardized output
10Hardened proposition generationPost-RoundBattle-tested version
11Attack/defense log compilationPost-RoundAudit trail
12Go/no-go recommendationPost-RoundDecision support

2. When to Use

Ideal Use Cases

ScenarioWhy Red/Blue Validation Matters
Pre-commitment decision reviewSimulate objections before committing resources
Strategy validationWar-game competitive responses and market realities
Architecture decision hardeningStress-test technical choices before implementation
Proposal defense preparationAnticipate and prepare for stakeholder pushback
Investment due diligenceAdversarial review of financial projections and market assumptions
Security posture assessmentSystematic attack surface enumeration
Go/no-go decisionsHigh-stakes decisions need adversarial pressure
Policy/process validationFind edge cases and failure modes
Product launch readinessAnticipate market, competitive, and operational challenges
M&A target evaluationAdversarial review of synergy claims

Anti-Patterns (When NOT to Use)

Anti-PatternWhy It's IneffectiveBetter Alternative
Low-stakes decisionsOver-engineering for trivial choicesJust decide and iterate
Time-critical emergenciesFires need extinguishing, not philosophyAct, then debrief
Already committedAdversarial review after commitment creates conflictUse for future decisions
Early explorationPremature to attack ideas still formingUse after initial validation
Confirmation theaterGoing through motions without genuine adversarial intentEither commit to true adversarial thinking or skip
Reversible decisionsTwo-way doors don't need siege testingSave intensity for one-way doors

3. Parameters

ParameterTypeRequiredDefaultDescription
subject_typeenumyesdecision | strategy | architecture | plan | policy | investment | security
max_roundsintegerno3Maximum red/blue cycles (1-5)
attack_intensityenumnostandardlight | standard | aggressive
attack_categorieslistnoautoCategories to probe (see catalog); auto selects by subject_type
convergence_modeenumnono_new_criticalno_new_critical | all_addressed | round_limit
include_experience_poolbooleannotrueLoad domain-specific failure patterns
steel_manning_levelenumnostandardminimal | standard | maximum
output_modeenumnofullrisk_assessment | hardened_proposition | full_log

Parameter Effects Matrix

ParameterRed Phase EffectBlue Phase EffectConvergence Effect
attack_intensity: lightTop 3 attack categoriesQuick defensesmax_rounds capped at 2
attack_intensity: standardTop 5 attack categoriesFull defense protocolNormal convergence
attack_intensity: aggressiveAll applicable categoriesExhaustive defenseRequires no_new_critical
steel_manning_level: minimal1-pass attacksFaster rounds
steel_manning_level: standard2-pass steel-manningNormal rounds
steel_manning_level: maximum3-pass with ideological Turing testThorough rounds
convergence_mode: no_new_criticalStop when 0 new CRITICAL/HIGH
convergence_mode: all_addressedMust address allStop when no ACCEPT responses
convergence_mode: round_limitStop at max_rounds

Auto-Selected Attack Categories by Subject Type

Subject TypeDefault Attack Categories
decisionASSUMPTIONS, ALTERNATIVES, REVERSIBILITY, CONSEQUENCES, TIMING
strategyCOMPETITIVE, MARKET, EXECUTION, DEPENDENCIES, TIMELINE
architectureSCALABILITY, SECURITY, DEPENDENCIES, OPERATIONAL, EDGE_CASES
planFEASIBILITY, RESOURCES, TIMELINE, DEPENDENCIES, RISKS
policyEDGE_CASES, ENFORCEMENT, UNINTENDED_CONSEQUENCES, POLITICAL
investmentECONOMIC, MARKET, EXECUTION, COMPETITIVE, ASSUMPTIONS
securityATTACK_SURFACE, VULNERABILITIES, DEPENDENCIES, OPERATIONAL

4. Checkpoints

This skill uses interactive checkpoints (see references/checkpoints.yaml) to resolve ambiguity:

  • subject_type_classification — When proposition type is ambiguous
  • attack_intensity_selection — When attack intensity not specified
  • convergence_mode_selection — When convergence criteria not specified
  • premature_convergence_check — When convergence met but warning signs present
  • infinite_loop_risk — When defenses generate more attacks than they resolve
  • output_mode_selection — When output format not specified

5. Iterative Workflow

Workflow Overview

code
┌─────────────────────────────────────────────────────────────────────────────┐
│                       RED/BLUE TEAM VALIDATOR                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ╔══════════════════════════════════════════════════════════════════════╗   │
│  ║                      PRE-ROUND SETUP                                  ║   │
│  ║  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   ║   │
│  ║  │ Proposition │  │   Attack    │  │ Experience  │                   ║   │
│  ║  │   Intake    │─▶│  Surface    │─▶│    Pool     │                   ║   │
│  ║  │             │  │  Mapping    │  │   Loading   │                   ║   │
│  ║  └─────────────┘  └─────────────┘  └─────────────┘                   ║   │
│  ╚══════════════════════════════════════════════════════════════════════╝   │
│                                  │                                           │
│                                  ▼                                           │
│  ╔══════════════════════════════════════════════════════════════════════╗   │
│  ║                         ROUND N                                       ║   │
│  ║  ┌─────────────┐                    ┌─────────────┐                  ║   │
│  ║  │  RED TEAM   │                    │  BLUE TEAM  │                  ║   │
│  ║  │   ATTACK    │───── Attacks ─────▶│   DEFENSE   │                  ║   │
│  ║  │ (Generate & │                    │ (Respond &  │                  ║   │
│  ║  │ Steel-man)  │                    │  Harden)    │                  ║   │
│  ║  └─────────────┘                    └─────────────┘                  ║   │
│  ║         │                                  │                          ║   │
│  ║         └────────────┬─────────────────────┘                          ║   │
│  ║                      ▼                                                ║   │
│  ║              ┌─────────────┐                                          ║   │
│  ║              │ EVALUATION  │                                          ║   │
│  ║              │ & Converge? │                                          ║   │
│  ║              └─────────────┘                                          ║   │
│  ║                      │                                                ║   │
│  ║           ┌──────────┴──────────┐                                     ║   │
│  ║           ▼                     ▼                                     ║   │
│  ║    [NOT CONVERGED]        [CONVERGED]                                 ║   │
│  ║    → Round N+1            → Exit loop                                 ║   │
│  ╚══════════════════════════════════════════════════════════════════════╝   │
│                                  │                                           │
│                                  ▼                                           │
│  ╔══════════════════════════════════════════════════════════════════════╗   │
│  ║                     POST-ROUND SYNTHESIS                              ║   │
│  ║  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   ║   │
│  ║  │    RISK     │  │  Hardened   │  │  Attack/    │                   ║   │
│  ║  │ ASSESSMENT  │  │Proposition  │  │ Defense Log │                   ║   │
│  ║  │(CONTRACT-08)│  │  Output     │  │             │                   ║   │
│  ║  └─────────────┘  └─────────────┘  └─────────────┘                   ║   │
│  ╚══════════════════════════════════════════════════════════════════════╝   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Pre-Round Setup

Purpose: Prepare the battlefield—understand what's being tested and load relevant knowledge.

Steps:

  1. Proposition Intake

    • Receive subject (decision, strategy, architecture, plan, etc.)
    • If verbal, request written summary or create one together
    • Extract key claims and assertions to be defended
    • Identify stakeholders and constraints
    • Note: Proposition should be specific enough to attack
  2. Attack Surface Mapping

    • Identify dimensions available for attack (from attack-vector-catalog)
    • Map proposition claims to attackable surfaces
    • Select attack categories based on subject_type or explicit attack_categories
    • See: references/attack-vector-catalog.md for categories
  3. Experience Pool Loading (if include_experience_pool: true)

    • Load domain-specific failure patterns
    • Reference historical failures in similar contexts
    • Prepare anti-patterns to probe
    • See: references/experience-pool-patterns.md for patterns
  4. Set Parameters

    • Confirm attack intensity, convergence mode, steel-manning level
    • Estimate expected rounds based on complexity

    CHECKPOINT: subject_type_classification

    • If subject_type not specified or ambiguous: AskUserQuestion
    • Present subject type options with attack category implications

    CHECKPOINT: attack_intensity_selection

    • If attack_intensity not specified: AskUserQuestion
    • Present intensity options with effort implications

    CHECKPOINT: convergence_mode_selection

    • If convergence_mode not specified: AskUserQuestion
    • Present convergence options with trade-offs

Quality Gate: Attack Surface Mapped

  • Proposition boundaries explicitly defined
  • Attack categories selected (minimum 3)
  • Experience pool loaded (if enabled)
  • Parameters confirmed

Output: Attack-ready proposition with mapped attack surface


Round N: Red Team Phase

Purpose: Generate substantive, steel-manned attacks on the proposition.

Reference: See references/red-team-techniques.md and references/steel-manning-protocol.md.

Steps:

  1. Attack Generation

    For each attack category in scope, generate attacks:

    TechniqueWhen to UseExpected Yield
    Pre-mortemAlways3-5 attacks
    InversionStrategy, Decision2-4 attacks
    Competitor SimulationStrategy, Investment2-3 attacks
    Stress Test AmplificationArchitecture, Plan2-4 attacks
    Devil's AdvocatePolicy, Decision2-3 attacks
    Blind Spot HunterAll1-3 attacks
    Historical Pattern MatchingAll (with experience pool)2-4 attacks
    Black Hat ThinkingSecurity, Competitive3-5 attacks
    • See: references/red-team-techniques.md for protocols
  2. Steel-Manning (per steel_manning_level)

    For each attack, strengthen to maximum potency:

    LevelPassesProtocol
    minimal1Basic attack formulation
    standard2+ "How can this be more damaging?"
    maximum3+ Ideological Turing test: "Would a true opponent accept this?"

    Steel-manning checklist:

    • Attack is specific, not vague

    • Attack has clear mechanism of harm

    • Attack includes realistic trigger conditions

    • Attack would concern a reasonable proponent

    • Attack is not easily dismissed

    • See: references/steel-manning-protocol.md for full protocol

  3. Severity Scoring

    Score each attack using SEVERITY-SCORING (RUBRIC-07):

    SeverityDefinitionResponse Urgency
    CRITICALBlocks primary objective; cannot proceedMust address in Blue Phase
    HIGHSignificant impact; major rework requiredShould address in Blue Phase
    MEDIUMDegrades quality; should fix but can proceedAddress if time permits
    LOWMinor issue; cosmeticDocument and monitor

    Scoring dimensions:

    • Impact (0.5 weight): How damaging if attack succeeds?
    • Likelihood (0.3 weight): How likely is this attack vector?
    • Detectability (0.2 weight): How hard to see this coming?
  4. Attack Documentation

    For each attack:

    code
    Attack ID: ATK-[round]-[number]
    Category: [From attack-vector-catalog]
    Target: [What aspect of proposition]
    Statement: [Clear attack formulation]
    Mechanism: [How this would cause harm]
    Severity: [CRITICAL | HIGH | MEDIUM | LOW]
    Steel-manning: [minimal | standard | maximum] - [notes]
    Experience pool match: [Pattern ID if applicable]
    

Quality Gate: Attacks Substantive

  • Minimum 3 attacks generated
  • At least 2 different attack categories represented
  • Steel-manning applied per level
  • No trivial or easily dismissed attacks
  • Severities assigned with rationale

Output: Prioritized attack list for Blue Team


Round N: Blue Team Phase

Purpose: Respond to each attack with defenses, mitigations, or proposition hardening.

Reference: See references/blue-team-techniques.md for defense protocols.

Steps:

  1. Attack Triage

    Prioritize attacks by severity:

    • CRITICAL: Must address this round
    • HIGH: Should address this round
    • MEDIUM: Address if time/capacity permits
    • LOW: Document for monitoring
  2. Defense Generation

    For each attack, determine response type:

    Response TypeWhen to UseEffect
    REFUTEAttack is invalid; evidence proves it wrongAttack dismissed
    MITIGATEAttack is valid; add safeguardsRisk reduced
    ACCEPTAttack is valid; insufficient mitigation possibleResidual risk documented
    HARDENModify proposition to eliminate vulnerabilityProposition strengthened

    Defense techniques:

    TechniqueResponse TypeWhen to Use
    Evidence-Based RefutationREFUTEWhen data contradicts attack
    Mitigation DesignMITIGATEWhen attack is valid but manageable
    Contingency PlanningMITIGATEWhen fallback is needed
    Monitoring/DetectionMITIGATEWhen early warning helps
    Hardening ProtocolHARDENWhen proposition can be strengthened
    Risk TransferMITIGATEWhen others can absorb risk
    Staged CommitmentMITIGATEWhen phasing reduces exposure
    Kill Switch DesignMITIGATEWhen reversibility is critical
    • See: references/blue-team-techniques.md for detailed protocols
  3. Defense Documentation

    For each defense:

    code
    Defense ID: DEF-[round]-[number]
    Attack Addressed: ATK-[round]-[number]
    Response Type: [REFUTE | MITIGATE | ACCEPT | HARDEN]
    Defense: [Specific response]
    Evidence/Rationale: [Why this defense works]
    Residual Risk: [ELIMINATED | REDUCED | UNCHANGED]
    Proposition Change: [If HARDEN, what changed]
    
  4. Proposition Hardening

    Apply all HARDEN responses to proposition:

    • Document each modification
    • Track changes between rounds
    • Maintain hardened proposition version
  5. Defense Quality Check

    For each defense, verify:

    • Defense actually addresses the attack (not adjacent issue)
    • REFUTE claims have supporting evidence
    • MITIGATE responses are actionable
    • ACCEPT responses have residual risk documented
    • HARDEN changes don't introduce new vulnerabilities

Quality Gate: Attacks Addressed

  • Every attack has a defense response
  • CRITICAL attacks have REFUTE or MITIGATE (not ACCEPT)
  • HIGH attacks have REFUTE, MITIGATE, or documented ACCEPT with rationale
  • Hardening changes documented
  • No hand-waving defenses

Output: Defense log with updated (hardened) proposition


Round N: Evaluation Phase

Purpose: Determine if another round is needed or convergence achieved.

Reference: See references/convergence-criteria.md for detailed criteria.

Steps:

  1. Assess Round Quality

    Red Team assessment:

    • Were attacks substantive or rehashes of previous rounds?
    • Are there novel attack angles remaining?
    • Is Red Team finding diminishing returns?

    Blue Team assessment:

    • Were defenses genuine or hand-waving?
    • Are mitigations actionable?
    • Has proposition been strengthened?
  2. Apply Convergence Criteria

    ModeStop WhenContinue When
    no_new_criticalRound produced 0 new CRITICAL or HIGH attacksNew CRITICAL or HIGH attacks found
    all_addressedNo ACCEPT responses remain (all REFUTE/MITIGATE/HARDEN)Any ACCEPT responses remain
    round_limitmax_rounds reachedBelow max_rounds

    Override conditions (continue despite convergence):

    • Obvious attack categories not yet explored
    • Blue Team defenses appear inadequate
    • Stakeholder requests additional scrutiny

    Premature termination signs (don't stop too early):

    • Less than 2 rounds completed
    • CRITICAL attacks still have ACCEPT responses
    • Key attack categories unexplored
  3. Document Convergence Decision

    code
    Round [N] Evaluation:
    - New CRITICAL attacks: [count]
    - New HIGH attacks: [count]
    - ACCEPT responses remaining: [count]
    - Convergence mode: [mode]
    - Decision: [CONTINUE | CONVERGED]
    - Rationale: [explanation]
    
  4. Proceed or Exit

    • If NOT CONVERGED: Increment round, return to Red Phase
    • If CONVERGED: Proceed to Post-Round Synthesis

    CHECKPOINT: premature_convergence_check

    • If convergence met but warning signs present: AskUserQuestion
    • Warning signs: <2 rounds, CRITICAL ACCEPTs remain, key categories unexplored

    CHECKPOINT: infinite_loop_risk

    • If new attacks from defenses exceed previous round: AskUserQuestion
    • May indicate fundamental proposition issues

Quality Gate: Convergence Evaluated

  • Explicit continue/stop decision documented
  • Rationale provided
  • Override conditions checked
  • Premature termination signs checked

Output: Convergence decision with rationale


Post-Round Synthesis

Purpose: Compile findings into actionable outputs.

Reference: See templates/ for output formats.

CHECKPOINT: output_mode_selection

  • If output_mode not specified: AskUserQuestion
  • Options: risk_assessment, hardened_proposition, full_log

Steps:

  1. Compile Attack/Defense Log

    Consolidate all rounds:

    • All attacks with responses
    • Round-by-round progression
    • Convergence trajectory
    • See: templates/attack-defense-log.md
  2. Derive RISK-ASSESSMENT (CONTRACT-08)

    Transform unresolved attacks into risks:

    Attack StatusRisk Derivation
    ACCEPT responseDirect risk: attack remains valid
    MITIGATE with residualRisk: partially addressed
    MITIGATE with ELIMINATEDNo risk (resolved)
    REFUTENo risk (invalid attack)
    HARDENNo risk (vulnerability removed)

    Score each derived risk using SEVERITY-SCORING:

    • Include mitigations from Blue Team responses
    • See: templates/risk-assessment-output.md
  3. Generate Hardened Proposition

    Compile final version:

    • Original proposition + all HARDEN modifications
    • List of accepted residual risks
    • Battle-tested confidence score
    • Conditions for validity
    • Review triggers
    • See: templates/hardened-proposition-output.md
  4. Calculate Battle-Tested Confidence

    Score based on:

    • Rounds completed (more = higher confidence)
    • Attack quality (substantive attacks survived)
    • Defense quality (genuine defenses, not hand-waving)
    • Residual risk profile (fewer ACCEPT = higher confidence)
    ScoreMeaning
    80-100High confidence: withstood aggressive scrutiny
    60-79Moderate confidence: key challenges addressed
    40-59Low confidence: significant risks remain
    0-39Very low confidence: fundamental issues unresolved
  5. Generate Go/No-Go Recommendation

    RecommendationWhen
    PROCEEDLow/very low residual risk; proposition battle-tested
    PROCEED_WITH_CAUTIONModerate risk; mitigations in place
    SIGNIFICANT_CONCERNSHigh risk; key attacks unresolved
    DO_NOT_PROCEEDVery high risk; fundamental flaws exposed

Quality Gates:

  • RISK-ASSESSMENT complete with go/no-go
  • All attacks traced to risks or resolutions
  • Hardened proposition documented
  • Battle-tested confidence calculated
  • Attack/defense log compiled

Output: RISK-ASSESSMENT (CONTRACT-08), Hardened Proposition, Attack/Defense Log


5. Attack Vector Catalog

Ten categories of attacks, applicable across subject types:

5.1 ASSUMPTIONS

Definition: Attacks targeting hidden, unstated, or fragile assumptions.

Attack PatternTargetExample
Hidden assumption exposureUnstated beliefs"You're assuming customers want this feature"
Load-bearing challengeCritical assumptions"If this assumption fails, the whole plan collapses"
Temporal decayTime-sensitive assumptions"This assumption won't hold in 2 years"
Behavioral assumptionsHuman behavior predictions"You're assuming the team will change behavior"
Counterfactual reversalAny assumption"What if the opposite is true?"

Risk Level: HIGH (often invisible until failure)

5.2 DEPENDENCIES

Definition: Attacks targeting external or internal dependencies.

Attack PatternTargetExample
External dependency failureThird parties"What if the vendor goes out of business?"
Technology obsolescenceTech dependencies"This framework may not be maintained in 3 years"
Team capability dependencyPeople"This requires skills the team doesn't have"
Resource availabilityResources"What if the budget is cut 30%?"
Single point of failureCritical dependencies"Everything depends on this one system"

Risk Level: HIGH (external factors often uncontrollable)

5.3 EDGE_CASES

Definition: Attacks targeting boundary conditions and unusual scenarios.

Attack PatternTargetExample
Boundary conditionsLimits"What happens at 0? At max capacity?"
Scale extremesVery large/small"Does this work with 1 user? 1 million?"
Timing edge casesTiming"What if these events happen simultaneously?"
Data qualityInputs"What if the input data is garbage?"
Concurrency/race conditionsParallel operations"What if two users do this at the same time?"

Risk Level: MEDIUM (often discoverable through testing)

5.4 SCALABILITY

Definition: Attacks targeting ability to grow or shrink.

Attack PatternTargetExample
Horizontal scaling limitsAdding instances"Can you just add more servers?"
Vertical scaling limitsBigger instances"What if you need 10x the memory?"
Cost scaling non-linearityEconomics"Costs grow O(n²) with users"
Operational complexityTeam capacity"Can the team manage 50 services?"
Data volume scalingStorage/processing"What happens with 10TB of data?"

Risk Level: HIGH (often not discovered until growth happens)

5.5 SECURITY

Definition: Attacks targeting security posture and vulnerabilities.

Attack PatternTargetExample
Attack surface exposureEntry points"Every API is an attack vector"
Data breach scenariosData protection"What if this database is compromised?"
Authentication gapsIdentity"How do you prevent unauthorized access?"
Authorization gapsPermissions"Can users access others' data?"
Compliance violationsRegulations"Does this violate GDPR?"

Risk Level: CRITICAL (security failures can be catastrophic)

5.6 COMPETITIVE

Definition: Attacks targeting competitive dynamics.

Attack PatternTargetExample
Competitor responseCompetitive reaction"What will [competitor] do when they see this?"
Market timingWindows"The market window may close before launch"
Differentiation erosionUniqueness"This feature can be copied in weeks"
Pricing pressureEconomics"Competitor can undercut by 50%"
Acquisition/partnership disruptionStrategic moves"What if competitor acquires your key partner?"

Risk Level: HIGH (competitive dynamics are unpredictable)

5.7 OPERATIONAL

Definition: Attacks targeting day-to-day operations.

Attack PatternTargetExample
Complexity explosionManageability"This will be impossible to debug"
Incident scenariosFailure recovery"What's the MTTR when this breaks at 3 AM?"
Recovery timeResilience"Can you recover within SLA?"
Monitoring gapsObservability"How would you even know it's failing?"
On-call burdenTeam health"This will burn out the team"

Risk Level: MEDIUM-HIGH (operational issues compound)

5.8 ECONOMIC

Definition: Attacks targeting financial viability.

Attack PatternTargetExample
Unit economics failurePer-unit costs"Each customer costs more than they pay"
Cost structure vulnerabilityFixed costs"Break-even requires 10x current volume"
Revenue model fragilityIncome sources"Revenue depends on one customer segment"
Funding/cash flowCapital"You'll run out of runway in 8 months"
Market size overestimationTAM/SAM/SOM"Your market is 1/10th the claimed size"

Risk Level: HIGH (financial failure is existential)

5.9 ORGANIZATIONAL

Definition: Attacks targeting people and organization.

Attack PatternTargetExample
Capability gapsSkills"No one on the team has done this before"
Key person dependencyIndividuals"If [person] leaves, this fails"
Cultural resistanceAdoption"The organization will reject this change"
Political oppositionStakeholders"[Executive] will block this"
Change managementTransition"Users will refuse to migrate"

Risk Level: MEDIUM-HIGH (organizational dynamics are complex)

5.10 TEMPORAL

Definition: Attacks targeting timing and duration.

Attack PatternTargetExample
Timeline compressionDeadlines"What if the deadline is moved up 3 months?"
Timeline extension impactDelays"What if this takes twice as long?"
Market window closureTiming"The opportunity won't exist in 12 months"
Technology obsolescenceTech lifecycle"This technology will be obsolete"
Regulatory timelineExternal deadlines"Regulation changes in 6 months"

Risk Level: HIGH (timing failures are often unrecoverable)


6. Convergence Criteria

Mode Definitions

ModeDefinitionBest For
no_new_criticalStop when round produces 0 new CRITICAL or HIGH attacksMost use cases
all_addressedStop when no ACCEPT responses remainHigh-stakes decisions
round_limitStop at max_rounds regardlessTime-constrained reviews

Measurement Methods

no_new_critical:

  • Count CRITICAL attacks generated this round: must be 0
  • Count HIGH attacks generated this round: must be 0
  • Attacks that are variants of previous attacks don't count as "new"

all_addressed:

  • Count ACCEPT responses across all rounds
  • Must be 0 (all attacks have REFUTE, MITIGATE, or HARDEN)

round_limit:

  • Simply check current_round >= max_rounds

Override Conditions (Continue Despite Convergence)

  • Obvious attack categories not yet explored
  • Stakeholder requests additional rounds
  • Blue Team defenses appear superficial
  • Recent hardening changes may introduce new vulnerabilities

Premature Termination Signs (Don't Stop Too Early)

  • Less than 2 rounds completed
  • CRITICAL attacks still have ACCEPT responses
  • Attack quality improving (not diminishing) each round
  • Key experience pool patterns not yet probed

7. Output Specifications

7.1 Primary Output: RISK-ASSESSMENT

Compliant with CONTRACT-08 from artifact-contracts.yaml.

See: templates/risk-assessment-output.md for complete XML template.

Key extensions for adversarial validation:

  • <adversarial_summary> with attack/defense statistics
  • Risks traced to source attacks (ATK-X-Y)
  • Battle-tested confidence score
  • Defense quality assessment

7.2 Secondary Output: Hardened Proposition

See: templates/hardened-proposition-output.md for complete template.

Includes:

  • Original proposition vs. battle-tested version
  • All modifications with rationale
  • Accepted residual risks
  • Conditions for validity
  • Review triggers

7.3 Secondary Output: Attack/Defense Log

See: templates/attack-defense-log.md for complete template.

Includes:

  • Round-by-round attack and defense tables
  • Convergence evaluation per round
  • Severity distribution
  • Resolution statistics

8. Quality Gates Summary

#GateCriterionPhase
1Attack Surface MappedProposition boundaries defined, categories selectedPre-Round
2Experience Pool LoadedDomain patterns available (if enabled)Pre-Round
3Attacks SubstantiveAttacks are non-trivial, steel-mannedRound N: Red
4Attacks DiverseAt least 2 different categories representedRound N: Red
5Severities AssignedAll attacks have severity with rationaleRound N: Red
6All Attacks AddressedEvery attack has a defense responseRound N: Blue
7Critical Attacks DefendedCRITICAL/HIGH have REFUTE or MITIGATERound N: Blue
8No Hand-WavingDefenses are actionable, not vagueRound N: Blue
9Convergence EvaluatedExplicit continue/stop decisionRound N: Eval
10Risks DerivedUnresolved attacks become risksPost-Round
11Go/No-Go IssuedClear recommendationPost-Round
12Hardened PropositionBattle-tested version documentedPost-Round

Gate Requirements by Intensity

GateLightStandardAggressive
Attack categories35All applicable
Minimum attacks51015+
Steel-manning levelminimalstandardmaximum
Convergence moderound_limit (2)no_new_criticalno_new_critical
Max rounds235

9. Behavioral Guidelines

Red Team Principles

  • Steel-man, don't strawman: Make attacks as strong as possible
  • Attack the proposition, not the proposer: Focus on ideas, not people
  • Be creative but realistic: Novel attacks should be plausible
  • Prioritize ruthlessly: CRITICAL issues first
  • Use the experience pool: Don't reinvent known failures
  • Ideological Turing test: Would a true critic accept this attack?

Blue Team Principles

  • Defend genuinely, don't dismiss: Every attack deserves honest consideration
  • Evidence over assertion: REFUTE claims need proof
  • Actionable mitigations: MITIGATE responses must be specific
  • Honest acceptance: If you can't defend, ACCEPT the risk
  • Harden proactively: Don't wait for attacks to strengthen
  • Avoid defensive denial: Admitting weakness is strength

Tone Calibration

IntensityRed Team ToneBlue Team Tone
LightCollaborative skepticQuick sanity check
StandardProfessional adversaryThorough defense
AggressiveDetermined opponentComprehensive rebuttal

10. Workflow Integration

Upstream Skills

SkillProvidesUse Case
assumption-validatorAssumption inventoryAttack assumptions already surfaced
expert-panel-deliberationMulti-perspective inputDiverse attack/defense viewpoints
research-interviewerKNOWLEDGE-CORPUSDomain knowledge for attacks

Downstream Skills

SkillReceivesUse Case
expert-panel-deliberationRISK-ASSESSMENTPanel review of risks
generate-ideasAttack gapsGenerate alternatives for failed propositions

Skill Chaining Example

code
assumption-validator      → RISK-ASSESSMENT (assumption-derived)
                               ↓
red-blue-validator        → RISK-ASSESSMENT (adversarial-derived)
                               ↓
expert-panel-deliberation → Final recommendation with multi-expert review

11. References

DocumentPurpose
references/attack-vector-catalog.md10 attack categories with specific attacks
references/red-team-techniques.md8 attack generation techniques
references/blue-team-techniques.md8 defense techniques
references/steel-manning-protocol.mdProtocol for maximizing attack strength
references/convergence-criteria.mdDetailed criteria for stopping
references/experience-pool-patterns.md50+ failure patterns by domain

Core Library References

LibraryElementUsage
core/skill-patterns.yamlPATTERN-06: ADVERSARIAL-VALIDATEWorkflow pattern
core/artifact-contracts.yamlCONTRACT-08: RISK-ASSESSMENTOutput format
core/scoring-rubrics.yamlRUBRIC-07: SEVERITY-SCORINGAttack severity
core/technique-taxonomy.yamlCAT-UR, CAT-PPAdversarial techniques

12. Templates

TemplatePurpose
templates/risk-assessment-output.mdCONTRACT-08 compliant RISK-ASSESSMENT with adversarial extensions
templates/attack-defense-log.mdRound-by-round attack/defense documentation
templates/hardened-proposition-output.mdBattle-tested proposition with modifications

13. Examples

Example 1: Architecture Decision — Microservices Migration

yaml
input:
  subject: "Migrate payment processing from monolith to microservices"
  subject_type: architecture
  max_rounds: 3
  attack_intensity: standard
  convergence_mode: no_new_critical
  include_experience_pool: true
  steel_manning_level: standard

flow:
  pre_round:
    proposition: "Decompose payment monolith into 5 microservices over 12 months"
    attack_surface:
      - ASSUMPTIONS: Team capability, timeline, complexity estimates
      - DEPENDENCIES: Infrastructure, vendor APIs, data consistency
      - SCALABILITY: Service coordination overhead
      - OPERATIONAL: Debugging distributed systems
      - EDGE_CASES: Partial failures, network partitions
    experience_pool_loaded:
      - "Distributed monolith anti-pattern"
      - "Service boundary misalignment"
      - "Operational complexity explosion"

  round_1:
    red_team:
      attacks:
        - ATK-1-1: "Team has zero production microservices experience"
          Category: ORGANIZATIONAL
          Severity: CRITICAL
          Steel-manned: "Even with training, production microservices require
                        tacit knowledge that only comes from operating them"

        - ATK-1-2: "12-month timeline ignores learning curve and unknowns"
          Category: TEMPORAL
          Severity: HIGH
          Steel-manned: "Industry benchmarks show microservices migrations
                        typically take 2-3x initial estimates"

        - ATK-1-3: "Distributed transactions will break payment consistency"
          Category: EDGE_CASES
          Severity: CRITICAL
          Steel-manned: "Payment systems require ACID guarantees that
                        eventual consistency cannot provide"

        - ATK-1-4: "Debugging distributed payment failures at 3 AM"
          Category: OPERATIONAL
          Severity: HIGH
          Steel-manned: "When payments fail across service boundaries,
                        MTTR could exceed SLA without distributed tracing expertise"

      new_critical: 2
      new_high: 2

    blue_team:
      defenses:
        - DEF-1-1: Response to ATK-1-1
          Type: MITIGATE
          Defense: "Hire 2 senior engineers with microservices experience.
                   Engage architecture consultancy for first 6 months."
          Residual: REDUCED

        - DEF-1-2: Response to ATK-1-2
          Type: HARDEN
          Defense: "Extend timeline to 18 months. Add 3-month buffer for unknowns."
          Proposition Change: "12 months" → "18 months with 3-month buffer"
          Residual: ELIMINATED

        - DEF-1-3: Response to ATK-1-3
          Type: HARDEN
          Defense: "Keep payment processing in single service with ACID guarantees.
                   Only extract non-critical services to microservices."
          Proposition Change: "5 microservices" → "3 microservices + 1 payment service"
          Residual: ELIMINATED

        - DEF-1-4: Response to ATK-1-4
          Type: MITIGATE
          Defense: "Implement distributed tracing (Jaeger) before migration.
                   Establish on-call runbooks. Require observability as launch gate."
          Residual: REDUCED

    evaluation:
      new_critical: 2
      new_high: 2
      convergence_mode: no_new_critical
      decision: CONTINUE
      rationale: "New critical attacks found; continue to Round 2"

  round_2:
    red_team:
      attacks:
        - ATK-2-1: "Hiring 2 senior engineers in 6 months is optimistic"
          Category: ORGANIZATIONAL
          Severity: HIGH
          Steel-manned: "Market for microservices expertise is extremely
                        competitive; 6-month hiring timeline may slip"

        - ATK-2-2: "Distributed tracing adds operational complexity itself"
          Category: OPERATIONAL
          Severity: MEDIUM
          Steel-manned: "Jaeger requires infrastructure, maintenance,
                        and expertise to operate"

        - ATK-2-3: "Service boundary around payments may be wrong"
          Category: ASSUMPTIONS
          Severity: MEDIUM
          Steel-manned: "Without event storming, service boundaries
                        are guesses that may need rework"

      new_critical: 0
      new_high: 1

    blue_team:
      defenses:
        - DEF-2-1: Response to ATK-2-1
          Type: MITIGATE
          Defense: "Begin hiring immediately. Have contingency: extend
                   consultancy or use contractor bridge if hiring slips."
          Residual: REDUCED

        - DEF-2-2: Response to ATK-2-2
          Type: ACCEPT
          Defense: "Accept additional complexity as cost of observability.
                   Allocate 0.5 FTE for observability platform."
          Residual: UNCHANGED (but monitored)

        - DEF-2-3: Response to ATK-2-3
          Type: HARDEN
          Defense: "Conduct event storming workshop before finalizing
                   service boundaries. Add 4 weeks for domain modeling."
          Proposition Change: Add event storming phase
          Residual: ELIMINATED

    evaluation:
      new_critical: 0
      new_high: 1
      convergence_mode: no_new_critical
      decision: CONTINUE (HIGH attack found)

  round_3:
    red_team:
      attacks:
        - ATK-3-1: "Event storming may reveal the migration is unnecessary"
          Category: ASSUMPTIONS
          Severity: MEDIUM
          Steel-manned: "Domain modeling might show modular monolith
                        is sufficient for scaling needs"

        - ATK-3-2: "Consultancy dependency creates knowledge transfer risk"
          Category: DEPENDENCIES
          Severity: MEDIUM
          Steel-manned: "If consultants leave, tacit knowledge leaves with them"

      new_critical: 0
      new_high: 0

    blue_team:
      defenses:
        - DEF-3-1: Response to ATK-3-1
          Type: ACCEPT
          Defense: "Valid point. Event storming is a gate; if it reveals
                   microservices aren't needed, we pivot to modular monolith."
          Residual: UNCHANGED (but this is a feature, not a bug)

        - DEF-3-2: Response to ATK-3-2
          Type: MITIGATE
          Defense: "Require knowledge transfer sessions, documentation
                   deliverables, and pair programming in consultancy contract."
          Residual: REDUCED

    evaluation:
      new_critical: 0
      new_high: 0
      convergence_mode: no_new_critical
      decision: CONVERGED
      rationale: "No new CRITICAL or HIGH attacks. Proposition has been hardened."

  post_round:
    hardened_proposition:
      original: "Migrate payment processing from monolith to 5 microservices
                over 12 months"

      battle_tested: "Migrate to 3 microservices + 1 payment service over
                     18 months (with 3-month buffer), after event storming
                     confirms boundaries. Requires: 2 hired senior engineers,
                     architecture consultancy, distributed tracing infrastructure,
                     0.5 FTE observability platform maintenance."

      modifications:
        - MOD-1: 5 services → 3 + 1 payment (Response to ATK-1-3)
        - MOD-2: 12 months → 18 months + buffer (Response to ATK-1-2)
        - MOD-3: Added event storming prerequisite (Response to ATK-2-3)
        - MOD-4: Added observability infrastructure requirement (Response to ATK-1-4)

      accepted_residual_risks:
        - "Hiring timeline may slip (mitigated by contingency)"
        - "Observability platform adds operational overhead (accepted)"
        - "Event storming may reveal migration unnecessary (feature)"
        - "Consultancy knowledge transfer requires active management"

      battle_tested_confidence: 72
      confidence_rationale: "Proposition survived 3 rounds of substantive attacks.
                           Critical issues addressed through hardening.
                           Residual risks are manageable and monitored."

    risk_assessment:
      total_risks: 4
      critical_risks: 0
      high_risks: 1 (hiring timeline)
      moderate_risks: 3
      risk_profile: MODERATE
      go_no_go: PROCEED_WITH_CAUTION

      recommendation: |
        PROCEED_WITH_CAUTION — The original proposition had critical flaws
        (team capability, payment consistency, timeline). The hardened
        proposition addresses these through:
        - Scoped migration (keeping payments transactional)
        - Extended timeline with buffer
        - Event storming validation gate
        - Observability prerequisites

        Key risks to monitor:
        1. Hiring: Start immediately; have contingency ready
        2. Consultancy knowledge transfer: Contract requirements
        3. Event storming outcome: Be prepared to pivot if boundaries don't hold

Example 2: Strategy Validation — Market Expansion

yaml
input:
  subject: "Expand to European market in Q3 with existing product"
  subject_type: strategy
  max_rounds: 2
  attack_intensity: standard
  convergence_mode: no_new_critical

flow:
  round_1:
    red_team:
      attacks:
        - ATK-1-1: "GDPR compliance timeline is unrealistic"
          Severity: CRITICAL
        - ATK-1-2: "Competitor X already dominates EU market"
          Severity: HIGH
        - ATK-1-3: "Pricing model assumes US willingness-to-pay"
          Severity: HIGH
        - ATK-1-4: "No local sales team or market knowledge"
          Severity: MEDIUM

    blue_team:
      defenses:
        - DEF-1-1: HARDEN - Push launch to Q4; engage GDPR consultancy
        - DEF-1-2: MITIGATE - Focus on underserved segments competitor ignores
        - DEF-1-3: HARDEN - Conduct pricing research; plan EU-specific pricing
        - DEF-1-4: MITIGATE - Partner with EU distributor initially

    evaluation:
      new_critical: 1
      decision: CONTINUE

  round_2:
    red_team:
      attacks:
        - ATK-2-1: "EU distributor takes 40% margin"
          Severity: MEDIUM
        - ATK-2-2: "Underserved segments may be underserved for good reason"
          Severity: MEDIUM

    blue_team:
      defenses:
        - DEF-2-1: ACCEPT - Cost of market entry; build direct sales in Year 2
        - DEF-2-2: MITIGATE - Validate segment with EU market research

    evaluation:
      new_critical: 0
      new_high: 0
      decision: CONVERGED

  post_round:
    hardened_proposition: |
      Launch EU expansion in Q4 (not Q3) with:
      - GDPR compliance verified by consultancy
      - EU-specific pricing based on market research
      - Initial distribution through EU partner
      - Focus on [specific underserved segments]
      - Year 2: Build direct sales capability

    risk_assessment:
      profile: MODERATE
      go_no_go: PROCEED_WITH_CAUTION

Example 3: Investment Decision — Series B Funding

yaml
input:
  subject: "Accept Series B term sheet at $50M valuation"
  subject_type: investment
  max_rounds: 2
  attack_intensity: standard
  convergence_mode: no_new_critical

flow:
  pre_round:
    proposition: "Accept $15M Series B at $50M pre-money valuation from [VC Firm]"
    attack_surface:
      - ECONOMIC: Valuation, dilution, runway
      - ASSUMPTIONS: Growth projections, market size
      - DEPENDENCIES: VC firm reputation, board dynamics
      - COMPETITIVE: Market timing, competitor funding

  round_1:
    red_team:
      attacks:
        - ATK-1-1: "Valuation assumes 3x YoY growth; current trajectory is 1.8x"
          Category: ASSUMPTIONS
          Severity: HIGH
          Steel-manned: "At 1.8x growth, next round valuation math doesn't work;
                        down round likely in 18 months"

        - ATK-1-2: "15-month runway at current burn; need to hit milestones or raise bridge"
          Category: ECONOMIC
          Severity: HIGH
          Steel-manned: "Milestones require growth acceleration you haven't demonstrated"

        - ATK-1-3: "[VC Firm] has reputation for replacing founders at Series C"
          Category: DEPENDENCIES
          Severity: MEDIUM
          Steel-manned: "3 of their last 5 Series B companies had founder transitions"

        - ATK-1-4: "Competitor just raised $40M; will outspend on customer acquisition"
          Category: COMPETITIVE
          Severity: HIGH
          Steel-manned: "Their CAC advantage compounds; market share gap widens"

    blue_team:
      defenses:
        - DEF-1-1: Response to ATK-1-1
          Type: HARDEN
          Defense: "Negotiate milestone-based valuation adjustment; lower initial
                   valuation with ratchet up if growth targets hit"
          Proposition Change: Add milestone ratchet provision

        - DEF-1-2: Response to ATK-1-2
          Type: MITIGATE
          Defense: "Negotiate 18-month runway minimum; reduce burn by 20% through
                   hiring pause; extend runway to 20 months"
          Residual: REDUCED

        - DEF-1-3: Response to ATK-1-3
          Type: MITIGATE
          Defense: "Negotiate founder-friendly protective provisions; 2-year
                   employment agreements; board composition safeguards"
          Residual: REDUCED

        - DEF-1-4: Response to ATK-1-4
          Type: ACCEPT
          Defense: "Competitive pressure is real but unavoidable. Focus on
                   capital-efficient growth and product differentiation over
                   CAC war. This is market reality, not term sheet issue."
          Residual: UNCHANGED

    evaluation:
      new_critical: 0
      new_high: 3
      decision: CONTINUE

  round_2:
    red_team:
      attacks:
        - ATK-2-1: "Milestone ratchet creates misaligned incentives; may optimize
                   for metrics over business health"
          Category: ASSUMPTIONS
          Severity: MEDIUM

        - ATK-2-2: "Hiring pause delays product roadmap; competitive gap widens"
          Category: TEMPORAL
          Severity: MEDIUM

    blue_team:
      defenses:
        - DEF-2-1: Response to ATK-2-1
          Type: MITIGATE
          Defense: "Structure milestones around leading indicators (retention,
                   NPS) not just growth metrics"
          Residual: REDUCED

        - DEF-2-2: Response to ATK-2-2
          Type: ACCEPT
          Defense: "Trade-off accepted; survival > speed. Revisit hiring
                   after 6-month runway checkpoint."
          Residual: UNCHANGED

    evaluation:
      new_critical: 0
      new_high: 0
      decision: CONVERGED

  post_round:
    hardened_proposition: |
      Accept Series B with modifications:
      - Milestone-based valuation: $45M base + $10M ratchet if 2.5x growth
      - 18-month minimum runway commitment
      - Founder protective provisions (2-year agreements, board balance)
      - Hiring pause for 6 months; revisit at runway checkpoint
      - Milestones tied to retention/NPS, not just growth

    risk_assessment:
      total_risks: 4
      risk_profile: MODERATE
      go_no_go: PROCEED_WITH_CAUTION

      recommendation: |
        PROCEED_WITH_CAUTION — Accept modified term sheet. Key risks:
        1. Competitive pressure (accepted as market reality)
        2. Growth trajectory uncertainty (mitigated by ratchet)
        3. Founder/board dynamics (mitigated by provisions)

        Negotiate the hardened terms before signing. Walk away if
        milestone ratchet or protective provisions rejected.

14. Quick Start

Minimal Invocation

code
Red team this: [paste proposition]

Standard Invocation

code
subject_type: decision
attack_intensity: standard
convergence_mode: no_new_critical

Proposition: [description or document]

Full Parameter Invocation

code
subject_type: architecture
max_rounds: 4
attack_intensity: aggressive
attack_categories: [ASSUMPTIONS, SCALABILITY, SECURITY, OPERATIONAL]
convergence_mode: all_addressed
include_experience_pool: true
steel_manning_level: maximum
output_mode: full

Proposition: [detailed description]

Context:
- Stakes: [why this matters]
- Constraints: [limitations]
- Stakeholders: [who cares]