evaluation

运用恰当的指标与置信区间评估生物医学机器学习模型。在以下场景中使用：(1) 计算分类指标（AUC-ROC、平衡准确率、灵敏度、特异性、F1值）并附带置信区间；(2) 评估分割模型（Dice系数、IoU、Hausdorff距离、表面Dice）；(3) 进行生存分析（C-index、Kaplan-Meier、Cox PH、时变AUC）；(4) 对不同模型进行统计学比较（Wilcoxon检验、配对t检验）。

SKILL.md

--- frontmatter

name: evaluation
description: >
  Evaluate biomedical ML models with appropriate metrics and confidence
  intervals. Use when: (1) Computing classification metrics (AUC-ROC,
  balanced accuracy, sensitivity, specificity, F1) with confidence intervals,
  (2) Evaluating segmentation models (Dice, IoU, Hausdorff, surface Dice),
  (3) Survival analysis (C-index, Kaplan-Meier, Cox PH, time-dependent AUC),
  (4) Statistical comparison between models (Wilcoxon, paired t-test).

Evaluation Metrics

References

File	Apply When
references/evaluation-metrics.md	Classification metrics, confidence intervals, statistical comparisons
references/segmentation-metrics.md	Dice, IoU, Hausdorff (HD95), surface Dice, multi-class segmentation
references/survival-metrics.md	C-index, Kaplan-Meier, log-rank, Cox PH, time-dependent AUC, bootstrap CIs

Note: segmentation-metrics.md references compute_ci() from evaluation-metrics.md.