AgentSkillsCN

outcome-usage-monitor

在为生产环境中的AI能力设计监控框架时使用。建议在部署完成后使用。该技能可生成价值实现追踪、使用模式分析、风险指标,以及面向不同利益相关方的报告。

SKILL.md
--- frontmatter
name: outcome-usage-monitor
description: Use when designing monitoring frameworks for AI capabilities in production. Use after deployment. Produces value realization tracking, usage pattern analysis, risk indicators, and stakeholder-specific reporting.

Outcome & Usage Monitor

Overview

Design monitoring frameworks for AI capabilities in production. Track whether capabilities are delivering promised value, detect usage anomalies, and generate stakeholder-appropriate reports before problems become critical.

Core principle: Monitor value delivered, not just activity. Catch drift, override patterns, and emerging issues before they impact business outcomes.

When to Use

  • AI capability is live in production
  • Need to track business case realization
  • Stakeholders requesting value reports
  • Detecting adoption or quality issues
  • Setting up ongoing monitoring dashboards

Output Format

yaml
monitoring_framework:
  capability: "[Name]"
  live_since: "[Duration]"

  monitoring_cadence:
    real_time: ["[Metric]"]
    daily: ["[Metric]"]
    weekly: ["[Metric]"]
    monthly: ["[Metric]"]
    quarterly: ["[Metric]"]

value_realization:
  business_case_tracking:
    - metric: "[Original promised metric]"
      promised: "[Target value]"
      measurement_method: |
        [Specific calculation with formula]
      current_assessment: "[Actual value] ([above/below] target)"
      variance_explanation: "[Why different from expected]"
      action: "[Required follow-up]"

  value_attribution:
    ai_contribution:
      auto_processed: "[Volume] ([%]) - zero human touch"
      assisted_decisions: "[Volume] ([%]) - human with AI proposal"
      calculation: |
        Time saved = ([Auto volume] × [Avg manual time]) +
                    ([Assisted volume] × [Time saved per assisted])
    human_contribution:
      escalations: "[Volume] ([%]) - required judgment"
      overrides: "[Volume] ([%]) - correcting AI"
    net_value_formula: |
      Net Value = [Time saved × Hourly cost] +
                  [Failures avoided × Failure cost] -
                  [System operating costs] -
                  [Override investigation costs]
    monthly_value_delivered: "$[Calculated amount]"

usage_patterns:
  adoption_metrics:
    total_users: "[Count]"
    active_users: "[Count]"
    adoption_rate: "[%]"
    low_usage_users:
      - user: "[ID]"
        volume: "[% of average]"
        flag: "[Investigate reason]"

  override_analysis:
    overall_rate: "[%] (baseline: [%])"
    trend: "[Increasing | Stable | Decreasing]"

    by_analyst:
      - analyst: "[ID]"
        rate: "[%]"
        comparison: "[X]× average"
        flag: "[Above/Below threshold]"

    by_reason:
      - reason: "[Category]"
        percentage: "[%]"
        trend: "[Direction]"
        action: "[If increasing, what to do]"

    by_time:
      - pattern: "[Time pattern]"
        observation: "[What's happening]"
        hypothesis: "[Why]"

    anomaly_flag: |
      [Pattern description if concerning]
      [Correlation with other factors]

  escalation_patterns:
    rate: "[%]"
    trend: "[Direction]"
    resolution_time: "[Average]"
    top_reasons:
      - "[Reason 1]"
      - "[Reason 2]"

quality_metrics:
  accuracy:
    current: "[%]"
    target: "[%]"
    status: "[On target | Warning | Critical]"
    trend_data:
      - period: "[Month]"
        accuracy: "[%]"
        note: "[Context if anomaly]"

  drift_detection:
    confidence_distribution:
      expected: "[Description of normal]"
      current: "[Current observation]"
      drift_index: "[Calculated value]"
      status: "[Healthy | Monitor | Investigate]"

    format_changes:
      - source: "[Counterparty/System]"
        date: "[When detected]"
        impact: "[What happened]"
        response: "[How resolved]"
        current_status: "[Resolved | Open]"

  error_categorization:
    - category: "[Error type]"
      rate: "[%]"
      top_causes: ["[Cause 1]", "[Cause 2]"]
      trend: "[Direction]"

risk_indicators:
  leading_indicators:
    - indicator: "[Metric name]"
      why_leading: "[Why this predicts problems]"
      threshold: "[Alert trigger]"
      current: "[Current value]"
      status: "[Healthy | Warning | Critical]"
      action_if_triggered: "[What to do]"

  lagging_indicators:
    - indicator: "[Metric name]"
      threshold: "[Alert trigger]"
      current: "[Current value]"
      status: "[Healthy | Warning | Critical]"

  alert_configuration:
    critical:
      - trigger: "[Condition]"
        notify: ["[Role 1]", "[Role 2]"]
        action: "[Immediate response]"
    warning:
      - trigger: "[Condition]"
        notify: ["[Role]"]
        action: "[Investigation]"

stakeholder_reports:
  executive_summary:
    audience: "[Role - e.g., Operations VP]"
    frequency: "[Quarterly]"
    format: "1-page summary"
    content:
      value_delivered: "$[Amount] this [period]"
      business_case_status: |
        [Metric 1]: [Actual] vs [Target] - [Status]
        [Metric 2]: [Actual] vs [Target] - [Status]
      key_highlights:
        - "[Achievement 1]"
        - "[Achievement 2]"
      attention_items:
        - "[Issue 1]"
        - "[Issue 2]"
      next_period_focus:
        - "[Priority 1]"

  risk_dashboard:
    audience: "[Risk Committee]"
    frequency: "[Monthly]"
    format: "RAG status dashboard"
    content:
      overall_rating: "[GREEN | AMBER | RED]"
      risk_categories:
        - category: "[Risk type]"
          status: "[RAG]"
          trend: "[↑ | → | ↓]"
          key_indicator: "[Metric: Value]"
      incidents_this_period:
        - date: "[Date]"
          description: "[What happened]"
          impact: "[Business impact]"
          resolution: "[How resolved]"
      override_summary:
        rate: "[%]"
        trend: "[Direction]"
        investigation_status: "[Status]"

  operations_dashboard:
    audience: "[Operations Manager]"
    frequency: "[Weekly]"
    format: "Operational metrics"
    content:
      volume_summary: "[Volume] processed"
      routing_breakdown:
        auto_match: "[%]"
        review: "[%]"
        escalation: "[%]"
      analyst_performance:
        - analyst: "[ID]"
          volume: "[Count]"
          override_rate: "[%]"
      queue_health: "[Status]"

  audit_evidence:
    audience: "[Internal Audit]"
    frequency: "[Quarterly]"
    format: "Control evidence package"
    content:
      control_matrix:
        - control: "[Control name]"
          design_effectiveness: "[Effective | Gap]"
          operating_effectiveness: "[Effective | Gap]"
          evidence: "[Where to find]"
      sample_testing:
        population: "[Total decisions]"
        sample_size: "[Count]"
        accuracy_validated: "[%]"
      audit_trail_verification:
        completeness: "[%]"
        retrievability: "[Time to retrieve]"

continuous_improvement:
  retraining_triggers:
    - trigger: "[Condition - e.g., Accuracy below 95% for 7 days]"
      detection: "[How detected - automated alert]"
      action: "[Model Operations review]"
      owner: "[Role]"

    - trigger: "[Condition - e.g., New format causes >20 errors]"
      detection: "[Error categorization]"
      action: "[Format training requested]"
      owner: "[Role]"

    - trigger: "[Condition - e.g., Drift index exceeds 0.2]"
      detection: "[Automated drift monitoring]"
      action: "[Scheduled retraining]"
      owner: "[Role]"

  feedback_loop:
    sources:
      - source: "[Override reasons]"
        aggregation: "[Monthly]"
        owner: "[Role]"
      - source: "[Escalation reasons]"
        aggregation: "[Weekly]"
        owner: "[Role]"
      - source: "[User feedback channel]"
        aggregation: "[Continuous]"
        owner: "[Role]"
    review_cadence: "[Monthly]"
    improvement_owner: "[Product Owner]"

  enhancement_pipeline:
    in_progress:
      - enhancement: "[Description]"
        priority: "[High | Medium | Low]"
        status: "[Status]"
        expected_impact: "[Metric improvement]"
    next_quarter:
      - "[Enhancement 1]"
      - "[Enhancement 2]"
    backlog:
      - "[Future enhancement]"

  issue_tracking:
    open_issues:
      - issue: "[Description]"
        priority: "[Priority]"
        owner: "[Role]"
        status: "[Investigation | In progress | Blocked]"
        target_resolution: "[Date]"

Value Attribution Framework

Don't just report activity - attribute value to AI vs human:

yaml
value_attribution:
  ai_contribution:
    auto_processed:
      volume: 3600  # 72% of 5000
      human_time_per_item: "0 minutes"
      time_saved: "3600 × 5 min = 300 hours/day"

    assisted_decisions:
      volume: 1200  # 24% of 5000
      time_saved_per_item: "3 minutes (vs 5 min manual)"
      time_saved: "1200 × 3 min = 60 hours/day"

  human_contribution:
    escalations:
      volume: 200  # 4% of 5000
      time_per_item: "15 minutes"
      human_hours: "200 × 15 min = 50 hours/day"

    overrides:
      volume: 576  # 12% of 4800 reviewed
      time_per_item: "8 minutes"
      human_hours: "576 × 8 min = 77 hours/day"

  net_calculation:
    gross_savings: "360 hours/day"
    human_cost: "127 hours/day"
    net_savings: "233 hours/day"
    monthly_value: "$233 × $50/hr × 22 days = $256K/month"

Leading vs Lagging Indicators

Distinguish early warning signs from confirmed problems:

yaml
risk_indicators:
  leading_indicators:  # Early warning - act before problem
    - indicator: "Override rate trend"
      why_leading: "Rising overrides precede accuracy drops"
      threshold: ">15%"
      current: "12%"
      status: "Warning"
      action: "Investigate before accuracy impact"

    - indicator: "Confidence distribution shift"
      why_leading: "Shift precedes error rate increase"
      threshold: "Drift index >0.15"
      current: "0.08"
      status: "Healthy"

    - indicator: "New format detection"
      why_leading: "New formats cause errors before flagged"
      threshold: "Any new format"
      current: "1 detected this week"
      status: "Warning"

  lagging_indicators:  # Confirmed problems - already happened
    - indicator: "Monthly accuracy"
      threshold: "<95%"
      current: "97.2%"
      status: "Healthy"

    - indicator: "Settlement failures from matching"
      threshold: ">5/month"
      current: "2/month"
      status: "Healthy"

Anomaly Detection Rules

Systematic detection, not ad-hoc observation:

yaml
anomaly_detection:
  statistical_rules:
    - metric: "Override rate by analyst"
      baseline: "Mean + 2 standard deviations"
      flag_when: "Analyst exceeds baseline"
      current_flags:
        - analyst: "Analyst_07"
          rate: "28%"
          baseline: "15%"
          deviation: "2.4σ"

    - metric: "Confidence score distribution"
      baseline: "30-day rolling average"
      flag_when: "Jensen-Shannon divergence >0.15"
      current: "0.08"

  pattern_rules:
    - pattern: "Same error repeating"
      flag_when: ">5 identical errors in 1 hour"
      current: "None"

    - pattern: "Accuracy drop by counterparty"
      flag_when: "Single counterparty <85% for 24h"
      current_flags:
        - counterparty: "CP_Alpha"
          accuracy: "82%"
          duration: "36 hours"

  correlation_rules:
    - correlation: "Volume vs accuracy"
      flag_when: "Accuracy drops with volume spike"
      hypothesis: "Model capacity issue"

    - correlation: "Time of day vs override rate"
      flag_when: "Override rate higher end-of-day"
      hypothesis: "Analyst fatigue or rush"

Stakeholder-Specific Outputs

Different audiences need different views:

yaml
stakeholder_outputs:
  operations_vp_quarterly:
    format: "1-page executive summary"
    must_include:
      - "Net value delivered ($)"
      - "Business case actual vs target"
      - "3 key achievements"
      - "2 attention items with actions"
    must_not_include:
      - "Technical metrics"
      - "Individual analyst data"
      - "System architecture details"

  risk_committee_monthly:
    format: "RAG dashboard"
    must_include:
      - "Overall risk rating with trend"
      - "Risk category breakdown"
      - "Incidents with resolution status"
      - "Override pattern analysis"
    must_not_include:
      - "Value/ROI metrics"
      - "Enhancement roadmap"

  internal_audit_quarterly:
    format: "Control evidence package"
    must_include:
      - "Control matrix with effectiveness"
      - "Sample testing results"
      - "Audit trail verification"
      - "Exception documentation"
    must_not_include:
      - "Business value metrics"
      - "Operational performance"

  model_operations_daily:
    format: "Technical dashboard"
    must_include:
      - "Real-time volume/accuracy"
      - "Confidence distribution"
      - "Drift indicators"
      - "Active alerts"
    must_not_include:
      - "Business value"
      - "Risk committee concerns"

Common Mistakes

MistakeWhy It's WrongDo This Instead
Activity metrics only"5000 processed" ≠ valueCalculate value delivered
No attributionCan't measure AI contributionExplicit AI vs human split
One dashboard for allDifferent audiences, different needsStakeholder-specific views
Lagging onlyProblems already happenedLeading indicators with thresholds
Ad-hoc anomaly detectionMisses patternsStatistical detection rules
Prose reportsNot actionableStructured YAML, RAG status
No retraining triggersModel degrades silentlyAutomated trigger conditions

Red Flags in Your Output

If your monitoring framework has these, it's not ready:

  • Same report for all stakeholders
  • No value attribution formula
  • Activity metrics without business value
  • Alerts without threshold definitions
  • No leading indicators
  • No retraining triggers
  • Override analysis without by-analyst breakdown
  • Anomaly detection without statistical baseline

Financial Services Context

Financial services monitoring requires:

Regulatory Compliance

  • Audit trail completeness verification
  • 7-year retention confirmation
  • Decision explainability documentation
  • Regulatory query response capability

Risk Committee Reporting

  • Monthly RAG dashboards
  • Incident tracking with resolution
  • Override pattern analysis
  • Drift monitoring

Value Attribution

  • Clear business case tracking
  • FTE efficiency calculation
  • Settlement failure reduction measurement
  • Net value delivered

Continuous Improvement

  • Retraining triggers automated
  • Feedback loop from overrides
  • Enhancement pipeline prioritized
  • Issue tracking with owners

Monitoring Framework Checklist

Before finalizing:

  • Value attribution formula provided (not just activity)
  • Leading indicators identified (with thresholds)
  • Lagging indicators included (confirmed problems)
  • Anomaly detection rules defined (statistical)
  • Override analysis structured (by-analyst, by-reason, by-time)
  • Stakeholder reports differentiated (VP ≠ Risk ≠ Audit)
  • Retraining triggers automated (conditions → actions)
  • Feedback loop defined (sources → aggregation → review)
  • Alert configuration complete (critical vs warning)
  • YAML format for automation/integration