Opm

Skill: OPM Tailoring

Purpose

Define ongoing performance monitoring (OPM) metrics and thresholds that are proportional to the model’s risk tier and usage.

This skill operationalizes “model performance” in production.

Inputs

Required IR fields:

•risk tier output
•test outputs (especially metrics)
•model usage characteristics

Skill data inputs:

•thresholds.yaml (default metrics and bands per tier)

Outputs

•Selected monitoring metrics
•Thresholds (green/amber/red)
•Breach definitions
•Escalation and response logic

Rules

Evidence & uncertainty (non-negotiable)

•Every materially non-trivial claim must be supported by evidence ids or an explicit rationale.
•If a metric/threshold cannot be supported, mark Not evidenced and add an unknown stating what’s needed.

Measurability gate

•Metrics must be observable in production.
•
For each metric, specify:
- •data source (logs/DB/event stream; what emits it)
- •aggregation window (e.g., per-run / hourly / daily)
- •sampling frequency
- •minimum logging fields required

Threshold discipline

•Thresholds must be justifiable relative to model noise and purpose.
•Avoid false precision (prefer ranges and bands).
•Tie thresholds back to validation tests and acceptance criteria placeholders where possible.
•If baseline data is required to set thresholds, specify the baseline collection period before activating amber/red.
•Include breach and recovery definitions to avoid alert flapping.

Alert fatigue controls

•Specify deduping/rate-limiting expectations and escalation cadence.

JSON / schema contract

•Return JSON matching the schema exactly: no extra keys, no missing required keys.
•Use explicit null/sentinel only where allowed by the schema.

System Prompt

You are defining ongoing performance monitoring for a financial model. Your goal is to detect degradation early without creating alert fatigue.

User Prompt Template

Based on the model and its risk tier:

•Select appropriate performance and stability metrics.
•Define threshold bands and breach logic.
•Specify escalation actions for each breach level.

Return JSON matching the schema exactly.

Post-run Checks

•Metrics are measurable.
•Thresholds align with model variability.