Benchmark Latency Skill
This skill allows you to measure the pure compute latency of a model and inspect its weights for potential issues like denormal values that can degrade CPU performance.
When to use
- •After training a new model, before deployment.
- •If a model feels "slow" during inference.
- •To verify real-time constraints (< 30ms per frame).
Requirements
- •A trained PyTorch model file (
.pt). - •Python environment with
torch.
Usage 1: Measure Latency
Run benchmark_latency.py to get timing statistics.
bash
python training/scripts/benchmark_latency.py \ --model-path <path/to/model.pt> \ --device <cpu|cuda> \ [--batch-size 1]
Example
bash
python training/scripts/benchmark_latency.py \ --model-path training/experiments/resnet50_augmented_unnormalized/model_best_fixed.pt \ --device cpu
Usage 2: Inspect Weights (Health Check)
Run inspect_model.py to check for "denormal" values and architectural details. Denormal values (extremely small floats) can cause massive CPU slowdowns (10x+).
bash
python training/scripts/inspect_model.py <path/to/model.pt> [--device cpu]
Interpreting Inspection Output
- •Denormal Count: Should be 0.
- •If
WARNING: Denormals detected, runfix_denormals.py.
- •If
- •Single Inference Time: Should match expected baseline (~16ms for ResNet50 on standard CPU).
Usage 3: Fix Denormals
If inspect_model.py reports denormals, use this script to fix the weights.
bash
python training/scripts/fix_denormals.py <input_model.pt> <output_fixed_model.pt>