Safe Refactoring for Scientific Code
Overview
Change code structure without changing behavior. Zero tolerance for behavioral changes during refactoring.
Core principle: Establish baseline, refactor, verify exact match (within floating-point noise).
Announce at start: "I'm using the safe-refactoring skill to restructure this code."
When to Use This Skill
Use for:
- •Improving code readability without changing logic
- •Extracting reusable functions
- •Renaming variables/functions for clarity
- •Reorganizing code structure
- •Performance optimization (without changing numerical behavior)
Don't use for:
- •Changing behavior or algorithms (use scientific-tdd instead)
- •Adding new features (use scientific-tdd instead)
- •Fixing bugs (use scientific-tdd or fix directly with tests)
Process Checklist
Copy to TodoWrite:
Safe Refactoring Progress: - [ ] Run full test suite (establish baseline) - [ ] Run snapshot tests (establish baseline) - [ ] Capture coverage report - [ ] Perform refactoring - [ ] Run full test suite (must match baseline exactly) - [ ] Run snapshot tests (must match baseline exactly) - [ ] Compare coverage (should stay same or improve) - [ ] Run quality checks (ruff + black) - [ ] Verify no numerical differences - [ ] Commit refactoring
Strict Rules
ZERO tolerance for:
- •Any test that passed before and fails after
- •Any test that failed before and passes after (suggests test was broken)
- •Any snapshot differences (not even floating-point noise)
- •Decreased test coverage
- •Any behavioral changes
If any of these occur: Revert and investigate why.
Detailed Steps
Step 1: Run Full Test Suite (Baseline)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/baseline_tests.txt
Record:
- •Total tests:
grep "passed" /tmp/baseline_tests.txt - •Any failures (if refactoring existing code with known issues)
- •Test execution time
Expected: All tests pass (or document any known failures)
Step 2: Run Snapshot Tests (Baseline)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/baseline_snapshots.txt
CRITICAL: Snapshots must match exactly after refactoring.
Expected: All snapshot tests pass
Step 3: Capture Coverage Report
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_baseline.json
Record: Coverage percentage for files being refactored
Why: Coverage should not decrease during refactoring (ideally improves)
Step 4: Perform Refactoring
Refactoring techniques:
- •
Extract function:
python# Before def complex_function(): # ... 50 lines of code result = x * 2 + y # ... more code return final_result # After def complex_function(): # ... code result = _calculate_intermediate(x, y) # ... code return final_result def _calculate_intermediate(x, y): return x * 2 + y - •
Rename for clarity:
python# Before def f(x): return x * 2 # After def calculate_doubled_value(value): return value * 2 - •
Reorganize structure:
python# Before: All in one file # After: Separated into modules # - core_logic.py # - utilities.py # - validation.py
- •
Optimize performance (numerically equivalent):
python# Before for i in range(n): result[i] = f(x[i]) # After (JAX) result = jax.vmap(f)(x)
During refactoring:
- •Make small, incremental changes
- •Test after each change if possible
- •Keep numerical operations identical
- •Maintain exact same algorithms
Step 5: Run Full Test Suite (Verify Match)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/refactored_tests.txt
Compare to baseline:
diff /tmp/baseline_tests.txt /tmp/refactored_tests.txt
MUST verify:
- •Same number of tests run
- •Same tests pass
- •Same tests fail (if any)
- •Similar execution time (within 20%)
If differences:
- •Any new test failures: REVERT IMMEDIATELY
- •Any new test passes: Investigate (test was broken?)
- •Different test count: Investigate (tests missing or duplicated?)
Step 6: Run Snapshot Tests (Verify Match)
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/refactored_snapshots.txt
CRITICAL: Must match baseline EXACTLY.
Expected: All snapshot tests pass, no differences
If snapshot differences:
- •DO NOT UPDATE SNAPSHOTS
- •Investigate why behavior changed
- •This is NOT a refactoring if behavior changed
- •Revert and reconsider approach
Step 7: Compare Coverage
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_refactored.json
Compare:
# If you have jq installed jq '.totals.percent_covered' coverage_baseline.json jq '.totals.percent_covered' coverage_refactored.json
Expected:
- •Coverage stays same or improves
- •Never decreases
If coverage decreased:
- •Some code paths no longer tested
- •Investigate and fix or revert
Step 8: Run Quality Checks
/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff check src/ /Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff format src/ /Users/edeno/miniconda3/envs/non_local_detector/bin/black src/
Expected: All checks pass
Fix any issues: Refactoring is good opportunity to improve code quality
Step 9: Verify No Numerical Differences
For mathematical code, verify numerical equivalence:
# Run golden regression /Users/edeno/miniconda3/envs/non_local_detector/bin/pytest \ src/non_local_detector/tests/test_golden_regression.py -v
Expected: Exact match (or differences < 1e-14)
If differences > 1e-14:
- •This is NOT a pure refactoring
- •Behavior has changed
- •Use numerical-validation skill instead
Step 10: Commit Refactoring
Only commit if ALL checks pass:
git add <refactored_files> <test_files> git commit -m "refactor: improve <component> code structure - Extract <function> for reusability - Rename <variable> for clarity - Reorganize <module> structure No behavioral changes: - All tests pass (N tests) - Snapshots unchanged - Coverage: X% → Y% 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>"
Performance Optimization Refactoring
When optimizing for performance:
- •
Capture performance baseline:
bashpytest --durations=10 > /tmp/baseline_durations.txt
- •
Make optimization
- •
Verify numerical equivalence (use numerical-validation skill)
- •
Measure performance improvement:
bashpytest --durations=10 > /tmp/optimized_durations.txt
- •
Document improvement:
codeOptimization: Use JAX vmap instead of for loop Speedup: 3.2x (450ms → 140ms) Numerical difference: < 1e-14 (verified)
Integration with Other Skills
- •Before refactoring: Consider if change actually needs new behavior (use scientific-tdd instead)
- •With numerical-validation: If refactoring mathematical code, use numerical-validation to verify equivalence
- •With jax skill: When optimizing JAX code, use jax skill for best practices
Example Workflow
Task: Extract position decoding logic into reusable function
1. Baseline: - Run pytest: 427 passed, 0 failed - Run snapshots: 15 passed, 0 failed - Coverage: 69% 2. Refactor: - Extract _decode_position_from_posterior() function - Update 3 call sites to use new function - No logic changes, just extraction 3. Verify: - Run pytest: 427 passed, 0 failed ✓ - Run snapshots: 15 passed, 0 failed ✓ - Coverage: 69% (unchanged) ✓ 4. Quality: - Ruff: All checks pass ✓ - Black: Formatted ✓ 5. Commit: "refactor: extract position decoding into reusable function"
Red Flags
STOP and revert if:
- •Any test changes status (pass → fail or fail → pass)
- •Any snapshot differences appear
- •Coverage decreases
- •Numerical differences > 1e-14
- •You're tempted to update snapshots
- •You're adding new logic (use scientific-tdd instead)
Safe to proceed if:
- •All tests match baseline exactly
- •No snapshot changes
- •Coverage same or better
- •Code quality improves
- •No new functionality added
Common Mistakes
"It's just a small behavioral change"
- •No such thing in refactoring
- •Any behavioral change = not refactoring
- •Use scientific-tdd for behavioral changes
"I'll update the snapshots since the new output is better"
- •That's not refactoring, it's changing behavior
- •Refactoring = zero snapshot changes
- •Use scientific-tdd if output should change
"Tests are slow, I'll skip them"
- •Never skip tests during refactoring
- •Tests are your safety net
- •Without tests, you can't verify it's a refactoring
"Coverage went down but the code is better"
- •Better code shouldn't lose coverage
- •Investigate why coverage decreased
- •Fix or revert