Chemometrics Shared Foundations
When to Use What
Task: Choose cross-validation strategy Use: references/validation-strategies.md
Task: Evaluate regression or classification model Use: references/performance-metrics.md
Task: Determine if sample size is sufficient Use: references/sample-size-guidance.md
Task: Detect or prevent overfitting Use: references/overfitting-prevention.md
Task: Write methods section or prepare for publication Use: references/reporting-standards.md
Task: Follow chemometrics project workflow Use: references/workflow.md
Quick Reference: CV Decision Tree
code
What is your sample size? +-- n < 20: LOOCV (high variance — consider repeated random splits) +-- 20 <= n < 50: LOOCV or 5-Fold CV (repeat 3-10x) +-- 50 <= n < 200: 5-Fold or 10-Fold CV (repeat 3-10x) +-- n >= 200: 10-Fold CV or Hold-Out (70/30 or 80/20) Special cases: Time series -> TimeSeriesSplit (no future leakage) Batches/groups -> GroupKFold (keep groups together) Imbalanced -> StratifiedKFold (preserve class ratios) Spatial data -> Spatial CV (geographic splits)
Quick Reference: Metrics
Regression: RMSEP (primary), R-squared, RPD, Bias, SEP Classification: Sensitivity, Specificity, F1-score (primary), Accuracy, ROC AUC
RPD Interpretation (Saeys et al. 2005)
| RPD | Quality |
|---|---|
| > 2.5 | Excellent quantitative |
| 2.0-2.5 | Good quantitative |
| 1.8-2.0 | Fair (screening) |
| 1.4-1.8 | Very rough screening |
| < 1.4 | Unreliable |
R-squared Interpretation
| R-squared | Quality |
|---|---|
| > 0.9 | Excellent |
| > 0.8 | Good |
| > 0.7 | Acceptable |
| < 0.7 | Poor (most applications) |
See Also
- •ML method selection:
../chemometrics-ml-selection/SKILL.md - •MS metabolomics:
../chemometrics-ms-metabolomics/SKILL.md - •Hybrid modeling:
../chemometrics-hybrid-modeling/SKILL.md - •Model validation:
../chemometrics-validation/SKILL.md