Outcome & Usage Monitor

Overview

Design monitoring frameworks for AI capabilities in production. Track whether capabilities are delivering promised value, detect usage anomalies, and generate stakeholder-appropriate reports before problems become critical.

Core principle: Monitor value delivered, not just activity. Catch drift, override patterns, and emerging issues before they impact business outcomes.

When to Use

•AI capability is live in production
•Need to track business case realization
•Stakeholders requesting value reports
•Detecting adoption or quality issues
•Setting up ongoing monitoring dashboards

Output Format

yaml

monitoring_framework:
  capability: "[Name]"
  live_since: "[Duration]"

  monitoring_cadence:
    real_time: ["[Metric]"]
    daily: ["[Metric]"]
    weekly: ["[Metric]"]
    monthly: ["[Metric]"]
    quarterly: ["[Metric]"]

value_realization:
  business_case_tracking:
    - metric: "[Original promised metric]"
      promised: "[Target value]"
      measurement_method: |
        [Specific calculation with formula]
      current_assessment: "[Actual value] ([above/below] target)"
      variance_explanation: "[Why different from expected]"
      action: "[Required follow-up]"

  value_attribution:
    ai_contribution:
      auto_processed: "[Volume] ([%]) - zero human touch"
      assisted_decisions: "[Volume] ([%]) - human with AI proposal"
      calculation: |
        Time saved = ([Auto volume] × [Avg manual time]) +
                    ([Assisted volume] × [Time saved per assisted])
    human_contribution:
      escalations: "[Volume] ([%]) - required judgment"
      overrides: "[Volume] ([%]) - correcting AI"
    net_value_formula: |
      Net Value = [Time saved × Hourly cost] +
                  [Failures avoided × Failure cost] -
                  [System operating costs] -
                  [Override investigation costs]
    monthly_value_delivered: "$[Calculated amount]"

usage_patterns:
  adoption_metrics:
    total_users: "[Count]"
    active_users: "[Count]"
    adoption_rate: "[%]"
    low_usage_users:
      - user: "[ID]"
        volume: "[% of average]"
        flag: "[Investigate reason]"

  override_analysis:
    overall_rate: "[%] (baseline: [%])"
    trend: "[Increasing | Stable | Decreasing]"

    by_analyst:
      - analyst: "[ID]"
        rate: "[%]"
        comparison: "[X]× average"
        flag: "[Above/Below threshold]"

    by_reason:
      - reason: "[Category]"
        percentage: "[%]"
        trend: "[Direction]"
        action: "[If increasing, what to do]"

    by_time:
      - pattern: "[Time pattern]"
        observation: "[What's happening]"
        hypothesis: "[Why]"

    anomaly_flag: |
      [Pattern description if concerning]
      [Correlation with other factors]

  escalation_patterns:
    rate: "[%]"
    trend: "[Direction]"
    resolution_time: "[Average]"
    top_reasons:
      - "[Reason 1]"
      - "[Reason 2]"

quality_metrics:
  accuracy:
    current: "[%]"
    target: "[%]"
    status: "[On target | Warning | Critical]"
    trend_data:
      - period: "[Month]"
        accuracy: "[%]"
        note: "[Context if anomaly]"

  drift_detection:
    confidence_distribution:
      expected: "[Description of normal]"
      current: "[Current observation]"
      drift_index: "[Calculated value]"
      status: "[Healthy | Monitor | Investigate]"

    format_changes:
      - source: "[Counterparty/System]"
        date: "[When detected]"
        impact: "[What happened]"
        response: "[How resolved]"
        current_status: "[Resolved | Open]"

  error_categorization:
    - category: "[Error type]"
      rate: "[%]"
      top_causes: ["[Cause 1]", "[Cause 2]"]
      trend: "[Direction]"

risk_indicators:
  leading_indicators:
    - indicator: "[Metric name]"
      why_leading: "[Why this predicts problems]"
      threshold: "[Alert trigger]"
      current: "[Current value]"
      status: "[Healthy | Warning | Critical]"
      action_if_triggered: "[What to do]"

  lagging_indicators:
    - indicator: "[Metric name]"
      threshold: "[Alert trigger]"
      current: "[Current value]"
      status: "[Healthy | Warning | Critical]"

  alert_configuration:
    critical:
      - trigger: "[Condition]"
        notify: ["[Role 1]", "[Role 2]"]
        action: "[Immediate response]"
    warning:
      - trigger: "[Condition]"
        notify: ["[Role]"]
        action: "[Investigation]"

stakeholder_reports:
  executive_summary:
    audience: "[Role - e.g., Operations VP]"
    frequency: "[Quarterly]"
    format: "1-page summary"
    content:
      value_delivered: "$[Amount] this [period]"
      business_case_status: |
        [Metric 1]: [Actual] vs [Target] - [Status]
        [Metric 2]: [Actual] vs [Target] - [Status]
      key_highlights:
        - "[Achievement 1]"
        - "[Achievement 2]"
      attention_items:
        - "[Issue 1]"
        - "[Issue 2]"
      next_period_focus:
        - "[Priority 1]"

  risk_dashboard:
    audience: "[Risk Committee]"
    frequency: "[Monthly]"
    format: "RAG status dashboard"
    content:
      overall_rating: "[GREEN | AMBER | RED]"
      risk_categories:
        - category: "[Risk type]"
          status: "[RAG]"
          trend: "[↑ | → | ↓]"
          key_indicator: "[Metric: Value]"
      incidents_this_period:
        - date: "[Date]"
          description: "[What happened]"
          impact: "[Business impact]"
          resolution: "[How resolved]"
      override_summary:
        rate: "[%]"
        trend: "[Direction]"
        investigation_status: "[Status]"

  operations_dashboard:
    audience: "[Operations Manager]"
    frequency: "[Weekly]"
    format: "Operational metrics"
    content:
      volume_summary: "[Volume] processed"
      routing_breakdown:
        auto_match: "[%]"
        review: "[%]"
        escalation: "[%]"
      analyst_performance:
        - analyst: "[ID]"
          volume: "[Count]"
          override_rate: "[%]"
      queue_health: "[Status]"

  audit_evidence:
    audience: "[Internal Audit]"
    frequency: "[Quarterly]"
    format: "Control evidence package"
    content:
      control_matrix:
        - control: "[Control name]"
          design_effectiveness: "[Effective | Gap]"
          operating_effectiveness: "[Effective | Gap]"
          evidence: "[Where to find]"
      sample_testing:
        population: "[Total decisions]"
        sample_size: "[Count]"
        accuracy_validated: "[%]"
      audit_trail_verification:
        completeness: "[%]"
        retrievability: "[Time to retrieve]"

continuous_improvement:
  retraining_triggers:
    - trigger: "[Condition - e.g., Accuracy below 95% for 7 days]"
      detection: "[How detected - automated alert]"
      action: "[Model Operations review]"
      owner: "[Role]"

    - trigger: "[Condition - e.g., New format causes >20 errors]"
      detection: "[Error categorization]"
      action: "[Format training requested]"
      owner: "[Role]"

    - trigger: "[Condition - e.g., Drift index exceeds 0.2]"
      detection: "[Automated drift monitoring]"
      action: "[Scheduled retraining]"
      owner: "[Role]"

  feedback_loop:
    sources:
      - source: "[Override reasons]"
        aggregation: "[Monthly]"
        owner: "[Role]"
      - source: "[Escalation reasons]"
        aggregation: "[Weekly]"
        owner: "[Role]"
      - source: "[User feedback channel]"
        aggregation: "[Continuous]"
        owner: "[Role]"
    review_cadence: "[Monthly]"
    improvement_owner: "[Product Owner]"

  enhancement_pipeline:
    in_progress:
      - enhancement: "[Description]"
        priority: "[High | Medium | Low]"
        status: "[Status]"
        expected_impact: "[Metric improvement]"
    next_quarter:
      - "[Enhancement 1]"
      - "[Enhancement 2]"
    backlog:
      - "[Future enhancement]"

  issue_tracking:
    open_issues:
      - issue: "[Description]"
        priority: "[Priority]"
        owner: "[Role]"
        status: "[Investigation | In progress | Blocked]"
        target_resolution: "[Date]"

Value Attribution Framework

Don't just report activity - attribute value to AI vs human:

yaml

value_attribution:
  ai_contribution:
    auto_processed:
      volume: 3600  # 72% of 5000
      human_time_per_item: "0 minutes"
      time_saved: "3600 × 5 min = 300 hours/day"

    assisted_decisions:
      volume: 1200  # 24% of 5000
      time_saved_per_item: "3 minutes (vs 5 min manual)"
      time_saved: "1200 × 3 min = 60 hours/day"

  human_contribution:
    escalations:
      volume: 200  # 4% of 5000
      time_per_item: "15 minutes"
      human_hours: "200 × 15 min = 50 hours/day"

    overrides:
      volume: 576  # 12% of 4800 reviewed
      time_per_item: "8 minutes"
      human_hours: "576 × 8 min = 77 hours/day"

  net_calculation:
    gross_savings: "360 hours/day"
    human_cost: "127 hours/day"
    net_savings: "233 hours/day"
    monthly_value: "$233 × $50/hr × 22 days = $256K/month"

Leading vs Lagging Indicators

Distinguish early warning signs from confirmed problems:

yaml

risk_indicators:
  leading_indicators:  # Early warning - act before problem
    - indicator: "Override rate trend"
      why_leading: "Rising overrides precede accuracy drops"
      threshold: ">15%"
      current: "12%"
      status: "Warning"
      action: "Investigate before accuracy impact"

    - indicator: "Confidence distribution shift"
      why_leading: "Shift precedes error rate increase"
      threshold: "Drift index >0.15"
      current: "0.08"
      status: "Healthy"

    - indicator: "New format detection"
      why_leading: "New formats cause errors before flagged"
      threshold: "Any new format"
      current: "1 detected this week"
      status: "Warning"

  lagging_indicators:  # Confirmed problems - already happened
    - indicator: "Monthly accuracy"
      threshold: "<95%"
      current: "97.2%"
      status: "Healthy"

    - indicator: "Settlement failures from matching"
      threshold: ">5/month"
      current: "2/month"
      status: "Healthy"

Anomaly Detection Rules

Systematic detection, not ad-hoc observation:

yaml

anomaly_detection:
  statistical_rules:
    - metric: "Override rate by analyst"
      baseline: "Mean + 2 standard deviations"
      flag_when: "Analyst exceeds baseline"
      current_flags:
        - analyst: "Analyst_07"
          rate: "28%"
          baseline: "15%"
          deviation: "2.4σ"

    - metric: "Confidence score distribution"
      baseline: "30-day rolling average"
      flag_when: "Jensen-Shannon divergence >0.15"
      current: "0.08"

  pattern_rules:
    - pattern: "Same error repeating"
      flag_when: ">5 identical errors in 1 hour"
      current: "None"

    - pattern: "Accuracy drop by counterparty"
      flag_when: "Single counterparty <85% for 24h"
      current_flags:
        - counterparty: "CP_Alpha"
          accuracy: "82%"
          duration: "36 hours"

  correlation_rules:
    - correlation: "Volume vs accuracy"
      flag_when: "Accuracy drops with volume spike"
      hypothesis: "Model capacity issue"

    - correlation: "Time of day vs override rate"
      flag_when: "Override rate higher end-of-day"
      hypothesis: "Analyst fatigue or rush"

Stakeholder-Specific Outputs

Different audiences need different views:

yaml

stakeholder_outputs:
  operations_vp_quarterly:
    format: "1-page executive summary"
    must_include:
      - "Net value delivered ($)"
      - "Business case actual vs target"
      - "3 key achievements"
      - "2 attention items with actions"
    must_not_include:
      - "Technical metrics"
      - "Individual analyst data"
      - "System architecture details"

  risk_committee_monthly:
    format: "RAG dashboard"
    must_include:
      - "Overall risk rating with trend"
      - "Risk category breakdown"
      - "Incidents with resolution status"
      - "Override pattern analysis"
    must_not_include:
      - "Value/ROI metrics"
      - "Enhancement roadmap"

  internal_audit_quarterly:
    format: "Control evidence package"
    must_include:
      - "Control matrix with effectiveness"
      - "Sample testing results"
      - "Audit trail verification"
      - "Exception documentation"
    must_not_include:
      - "Business value metrics"
      - "Operational performance"

  model_operations_daily:
    format: "Technical dashboard"
    must_include:
      - "Real-time volume/accuracy"
      - "Confidence distribution"
      - "Drift indicators"
      - "Active alerts"
    must_not_include:
      - "Business value"
      - "Risk committee concerns"

Common Mistakes

Mistake	Why It's Wrong	Do This Instead
Activity metrics only	"5000 processed" ≠ value	Calculate value delivered
No attribution	Can't measure AI contribution	Explicit AI vs human split
One dashboard for all	Different audiences, different needs	Stakeholder-specific views
Lagging only	Problems already happened	Leading indicators with thresholds
Ad-hoc anomaly detection	Misses patterns	Statistical detection rules
Prose reports	Not actionable	Structured YAML, RAG status
No retraining triggers	Model degrades silently	Automated trigger conditions

Red Flags in Your Output

If your monitoring framework has these, it's not ready:

•Same report for all stakeholders
•No value attribution formula
•Activity metrics without business value
•Alerts without threshold definitions
•No leading indicators
•No retraining triggers
•Override analysis without by-analyst breakdown
•Anomaly detection without statistical baseline

Financial Services Context

Financial services monitoring requires:

Regulatory Compliance

•Audit trail completeness verification
•7-year retention confirmation
•Decision explainability documentation
•Regulatory query response capability

Risk Committee Reporting

•Monthly RAG dashboards
•Incident tracking with resolution
•Override pattern analysis
•Drift monitoring

Value Attribution

•Clear business case tracking
•FTE efficiency calculation
•Settlement failure reduction measurement
•Net value delivered

Continuous Improvement

•Retraining triggers automated
•Feedback loop from overrides
•Enhancement pipeline prioritized
•Issue tracking with owners

Monitoring Framework Checklist

Before finalizing: