run-regression
Run regression tests against gold standard segmentations.
Usage
code
/run-regression [--gold <name>] [--all]
Where:
- •
--gold <name>- Test a specific gold standard - •
--all- Test all gold standards (default)
What This Skill Does
- •Loads gold standard(s) from GoldStandards/
- •Runs segmentation with recorded parameters
- •Computes Dice/Hausdorff vs gold
- •Flags regressions if metrics fall below thresholds
- •Generates regression report
Regression Thresholds
Default thresholds (can be configured):
- •Dice coefficient: >= 0.80
- •Hausdorff 95%: <= 10.0mm
Regressions are flagged when:
- •Dice drops below threshold
- •Hausdorff exceeds threshold
Execution Steps
Step 1: Read .env file
Get SLICER_PATH from .env
Step 2: Launch Regression Tests
bash
<SLICER_PATH> --python-script scripts/run_tests.py --exit regression
Or for interactive mode (stays open):
bash
<SLICER_PATH> --python-script scripts/run_tests.py regression
Step 3: Check Results
Results are saved to:
code
test_runs/<timestamp>_regression/ ├── results.json # Pass/fail per gold standard ├── metrics.json # Detailed metrics └── screenshots/ # Visual comparison
Output Format
results.json
json
{
"summary": {
"total_tests": 2,
"passed": 1,
"failed": 1
},
"tests": [
{
"name": "regression_gold",
"gold_standards": [
{
"name": "MRBrainTumor1_tumor",
"dice": 0.89,
"hausdorff_95": 4.2,
"passed": true
},
{
"name": "MRHead_ventricle",
"dice": 0.75,
"hausdorff_95": 12.3,
"passed": false,
"regression": true
}
]
}
]
}
When to Run
- •Before commits: Ensure no regressions
- •After algorithm changes: Verify improvements don't break existing cases
- •After parameter tuning: Confirm tuning is effective
- •CI/CD pipeline: Automated regression detection
Interpreting Results
PASS
code
MRBrainTumor1_tumor: Dice: 0.89 Hausdorff 95%: 4.2mm PASS
The algorithm reproduces the gold standard within thresholds.
REGRESSION
code
MRHead_ventricle: Dice: 0.75 Hausdorff 95%: 12.3mm ** REGRESSION **
Investigate:
- •Was an algorithm changed recently?
- •Were parameters modified?
- •Is the gold standard still appropriate?
Tips
- •Run regression tests after any algorithm change
- •If a regression is valid (gold standard was wrong), update the gold standard
- •Keep track of why regressions occur in git commit messages
- •Consider adding more gold standards for different tissue types