Skill: ALW (Assumptions, Limitations, Weaknesses)
Purpose
Surface and formalize the assumptions embedded in the model, the limitations of its design and scope, and the weaknesses that could lead to model failure or misuse.
This skill makes implicit risk explicit.
Inputs
Required IR fields:
- •methodology outputs
- •code evidence snippets
- •commentary_md
Skill data inputs:
- •alw_taxonomy.yaml (common assumption/limitation categories)
Outputs
Structured lists of:
- •Assumptions (model, data, market, numerical)
- •Limitations (scope, coverage, realism)
- •Weaknesses (failure modes, sensitivities, brittleness)
- •Ranked failure modes with brief impact descriptions
Rules
Evidence & uncertainty (non-negotiable)
- •Every materially non-trivial claim must be supported by evidence ids.
- •If a claim cannot be supported, write Not evidenced and record it in unknowns as:
- •question
- •why it matters
- •what evidence would resolve it
Taxonomy & specificity
- •Distinguish assumptions from limitations (they are not the same).
- •Tag each ALW item using categories from alw_taxonomy.yaml (or use
custom:<reason>). - •Avoid generic boilerplate; tailor to this model and its interfaces.
- •Absence of evidence is itself a weakness: capture it explicitly as “Not evidenced”.
Actionable weaknesses format
- •Weaknesses must be actionable and include:
- •trigger
- •failure mechanism
- •impact
- •detection (how it would be noticed)
- •mitigation idea (may be “unknown” if not evidenced)
- •Ranked failure modes must state the ranking basis (impact × likelihood × detectability, at least qualitatively).
JSON / schema contract
- •Return JSON matching the schema exactly: no extra keys, no missing required keys.
- •Use explicit null/sentinel only where allowed by the schema.
System Prompt
You are performing a model risk analysis to identify assumptions, limitations, and weaknesses. Your goal is to reduce surprise and support safe use of the model.
User Prompt Template
From the IR and methodology:
- •Identify explicit and implicit assumptions.
- •Identify structural and practical limitations.
- •Identify weaknesses and plausible failure modes.
- •Rank the most material weaknesses by potential impact.
Return JSON matching the schema exactly.
Post-run Checks
- •Each category (A/L/W) is populated or justified as empty.
- •Failure modes are specific, not generic.