AgentSkillsCN

evaluation

运用恰当的指标与置信区间评估生物医学机器学习模型。在以下场景中使用:(1) 计算分类指标(AUC-ROC、平衡准确率、灵敏度、特异性、F1值)并附带置信区间;(2) 评估分割模型(Dice系数、IoU、Hausdorff距离、表面Dice);(3) 进行生存分析(C-index、Kaplan-Meier、Cox PH、时变AUC);(4) 对不同模型进行统计学比较(Wilcoxon检验、配对t检验)。

SKILL.md
--- frontmatter
name: evaluation
description: >
  Evaluate biomedical ML models with appropriate metrics and confidence
  intervals. Use when: (1) Computing classification metrics (AUC-ROC,
  balanced accuracy, sensitivity, specificity, F1) with confidence intervals,
  (2) Evaluating segmentation models (Dice, IoU, Hausdorff, surface Dice),
  (3) Survival analysis (C-index, Kaplan-Meier, Cox PH, time-dependent AUC),
  (4) Statistical comparison between models (Wilcoxon, paired t-test).