AgentSkillsCN

preprocess-debug

调试预处理流程中的故障。指导用户阅读检查点文件、检查各步骤的产物、解读质量控制指标、查看可视化PNG图像,并定位是哪一步出了问题、为何出错。在预处理运行产生意外结果、程序崩溃,或生成低质量输出时使用。

SKILL.md
--- frontmatter
name: preprocess-debug
description: "Debug preprocessing pipeline failures. Guides through reading checkpoint files, checking step artifacts, interpreting QC metrics, examining visualization PNGs, and identifying which step failed and why. Use when a preprocessing run produces unexpected results, crashes, or generates poor-quality outputs."

Preprocess Debug — Preprocessing Failure Diagnosis

When to Use

  • "Why did preprocessing fail for patient X?"
  • "The registration output looks wrong"
  • "Skull stripping removed too much / too little"
  • "Intensity normalization produced weird values"
  • "Which step is causing the problem?"
  • Any preprocessing quality issue or crash investigation

Diagnostic Workflow

Step 1: Identify the failing step

Check the visualization directory for which step produced the last PNG:

bash
ls -la {viz_root}/MenGrowth-XXXX/MenGrowth-XXXX-YYY/
# step1_data_harmonization_t1c.png  ← exists
# step2_bias_field_correction_t1c.png  ← exists
# step3_resampling_t1c.png  ← MISSING → Step 3 failed

Check logs for error messages (look for ERROR or RuntimeError).

Step 2: Check step artifacts

Each step may produce artifacts in {artifacts}/MenGrowth-XXXX/MenGrowth-XXXX-YYY/:

ArtifactProduced byWhat to check
t1c_bias_field.nii.gzBias field correctionShould be smooth, low-frequency field
t1c_brain_mask.nii.gzSkull strippingLoad in viewer — verify mask covers brain + tumor
*.h5, *.mat transformsRegistrationVerify transforms exist and are non-zero

Step 3: Examine visualization PNGs

Each step's PNG shows before/after comparison. Look for:

StepWhat to check in visualization
Data harmonizationOrientation correct? Background removed?
Bias field correctionIntensity gradients reduced?
ResamplingResolution changed? No aliasing artifacts?
Cubic paddingVolume centered? Correct padding extent?
RegistrationModalities aligned? Atlas overlay reasonable?
Skull strippingMask boundary at brain surface? Tumor included?
Intensity normalizationHistogram shifted to expected range?
Longitudinal registrationTimepoints aligned? No excessive deformation?

Step 4: Check specific failure patterns

Registration failures

  • "No reference modality found": Check that reference_modality_priority matches available modalities
  • Poor alignment: Try engine: "antspyx" instead of "nipype" (or vice versa)
  • Divergence: Check if use_center_of_mass_init: true is set
  • ANTs crash: Look for ITK ERROR in logs; check if input volumes have valid affine matrices
  • Quality warning: If correlation dissimilarity > quality_warning_threshold, registration may have failed

Skull stripping failures

  • Over-stripping (tumor removed): HD-BET is more robust to pathology than SynthStrip; try method: "hdbet" with hdbet_mode: "accurate"
  • Under-stripping (skull remaining): Increase SynthStrip border parameter or switch methods
  • GPU OOM: Set hdbet_device: "cpu" or synthstrip_device: "cpu"

Intensity normalization failures

  • All zeros output: Check that brain mask exists and covers tissue
  • Extreme values: Use clip_range: [-5.0, 5.0] for z-score
  • NaN values: Input may contain NaN — check with nib.load(path).get_fdata() for NaN/Inf
  • Wrong mask source: Check logs for "Using brain mask from:" vs "using nonzero voxels as fallback"

Data harmonization failures

  • Wrong orientation: Check reorient_to matches expected convention (RAS vs LPS)
  • Background not removed: Try method: "otsu_foreground" instead of default
  • NRRD parse error: Check if input file is valid NRRD with nrrd.read(path)

Resampling failures

  • ECLARE environment not found: Verify conda_environment_eclare matches installed env
  • GPU OOM with ECLARE: Reduce batch_size or switch to method: "bspline"
  • Extreme anisotropy: Use method: "composite" for volumes with > 5mm slice thickness

Step 5: Check the temp-file mechanism

If a step crashes mid-write, a .tmp.nii.gz file may be left behind:

bash
find {output_root} -name "*.tmp*"

These should be deleted before re-running.

Key Debugging Code Locations

IssueFile to read
Step handler logicmengrowth/preprocessing/src/steps/{step_name}.py
Brain mask resolutionSearch for brain_mask in the step handler
Registration quality metricsmengrowth/preprocessing/src/registration/diagnostic_parser.py
Checkpoint statemengrowth/preprocessing/src/checkpoint.py
QC metrics interpretationmengrowth/preprocessing/src/config.pyQCMetricsConfig
Step execution orderYAML config steps: list
Config validation errorsmengrowth/preprocessing/src/config.py__post_init__() methods

Quick Config Fixes

ProblemConfig change
Step taking too longIncrease shrink_factor, reduce iterations
Registration not convergingSwitch engine, adjust sampling_percentage, add transform stages
Mask too aggressiveAdjust fill_value, border, or switch skull stripping method
Step crashing on GPUSet device to "cpu" in step config
Want to skip a stepRemove from steps: list or set method: null in step config
Need to re-run from step NRemove all outputs from step N onward, or use checkpoint resume