Evaluate Model Skill
This skill allows you to compare a new candidate model against the currently deployed production model to verify improvements in accuracy, precision, recall, and F1 score.
When to use
- •After training a new model.
- •Before deploying a model to production.
- •To verify if a model regression has occurred.
Requirements
- •A trained PyTorch model file (
.pt). - •The production model file (usually
cherry_system/cherry_detection/resource/cherry_classification.pt). - •Validation dataset located at
../cherry_classification/data(or specified path). - •Python environment with
torch,torchvision,numpy,tqdm.
Usage
Run the compare_models.py script from the repository root.
bash
python training/scripts/compare_models.py \ --new-model <path/to/new_model.pt> \ --prod-model <path/to/production_model.pt> \ [--unnormalized] \ [--architecture <resnet50|mobilenet_v3_large|efficientnet_b0>]
Options
- •
--new-model: Path to the new candidate model. - •
--prod-model: Path to the baseline model (usuallycherry_system/cherry_detection/resource/cherry_classification.pt). - •
--unnormalized: Important! Use this flag if the new model was trained on 0-255 raw images (no ImageNet normalization). The production system typically uses unnormalized images. - •
--architecture: Model architecture (default:resnet50). - •
--device:cpuorcuda(default: auto-detect).
Example
Evaluate a new unnormalized ResNet50 model:
bash
python training/scripts/compare_models.py \ --new-model training/experiments/resnet50_augmented_unnormalized/model_best_fixed.pt \ --prod-model cherry_system/cherry_detection/resource/cherry_classification.pt \ --unnormalized
Interpreting Results
The script prints a comparison table.
- •Accuracy > 93% is the typical target.
- •Ensure F1 Score does not regress.
- •Check per-class metrics if specific class performance (e.g., Pit recall) is critical.