Explore — MenGrowth Codebase Navigation
When to Use
- •"Where is X implemented?"
- •"How does phase/step N work?"
- •"Why was patient/file X rejected?"
- •"Where would I add a new quality check / modality / plot / preprocessing step?"
- •"What calls what?" or "Trace the data flow for ..."
- •Any task that requires understanding the codebase before modifying it
Quick Module Map
code
mengrowth/
cli/
curate_dataset.py ← Curation pipeline orchestrator
preprocess.py ← Preprocessing pipeline CLI
preprocessing/
config.py ← Curation config dataclasses (grep for @dataclass)
utils/
reorganize_raw_data.py ← Phase 1 (scan_source_baseline, scan_source_controls, copy_and_organize)
filter_raw_data.py ← Phase 2 (filter_dataset) + Phase 4 (reid_patients_and_studies)
quality_filtering.py ← Phase 3 (run_quality_filtering, validate_file, _validate_patient)
metadata.py ← MetadataManager, PatientMetadata, Excel parsing
quality_analysis/
analyzer.py ← Phase 5 (QualityAnalyzer.run, scan_dataset, analyze_patient)
metrics.py ← Per-image metric functions (SimpleITK-based)
visualize.py ← Phase 6 (QualityVisualizer, generate_html_report)
src/
config.py ← Preprocessing config (StepRegistry, StepMetadata, all step configs)
preprocess.py ← PreprocessingOrchestrator (step execution engine)
base.py ← BasePreprocessingStep ABC (execute + visualize)
steps/
step_registry.py ← Re-exports StepRegistry from config
data_harmonization.py ← Step 1 handler
bias_field_correction.py ← Step 2 handler
resampling.py ← Step 3 handler
cubic_padding.py ← Step 4 handler
registration.py ← Step 5 handler
skull_stripping.py ← Step 6 handler
intensity_normalization.py ← Step 7 handler
longitudinal_registration.py ← Step 8 handler
data_harmonization/ ← NRRD→NIfTI (io.py), reorientation (orient.py), head masking
bias_field_correction/ ← N4 via SimpleITK (n4_sitk.py)
resampling/ ← BSpline, ECLARE, Composite
registration/ ← ANTs registration (nipype + antspyx implementations)
skull_stripping/ ← HD-BET (hdbet.py), SynthStrip (synthstrip.py)
normalization/ ← Z-score, KDE, FCM, WhiteStripe, PercentileMinMax, LSQ
checkpoint.py ← Checkpoint/resume support
Exploration Strategies
Strategy 1: "Where is feature X?"
| Looking for... | Go to |
|---|---|
| A CLI flag or argument (curation) | cli/curate_dataset.py → argparse section |
| A CLI flag or argument (preprocessing) | cli/preprocess.py → argparse section |
| A curation threshold or default | preprocessing/config.py → search the dataclass name |
| A preprocessing config or default | preprocessing/src/config.py → search the dataclass name |
| A quality check (A1–E1) | preprocessing/utils/quality_filtering.py → search by check ID or function |
| Modality synonym mapping | preprocessing/config.py → RawDataConfig.standardize_modality() |
| Patient ID normalization | preprocessing/utils/reorganize_raw_data.py → extract_patient_id() |
| Re-identification logic | preprocessing/utils/filter_raw_data.py → reid_patients_and_studies() |
| A curation plot or visualization | preprocessing/quality_analysis/visualize.py |
| Clinical metadata field | preprocessing/utils/metadata.py → PatientMetadata dataclass |
| Rejection tracking | Search for rejected_files or rejection_reason across utils/ |
| Step execution level | preprocessing/src/config.py → STEP_METADATA dict |
| Step handler registration | preprocessing/src/preprocess.py → _register_step_handlers() |
| Brain mask resolution | Step handler's _resolve_mask_path() or search for brain_mask |
| Registration diagnostics | preprocessing/src/registration/diagnostic_parser.py |
| Normalization methods | preprocessing/src/normalization/ → one file per method |
Strategy 2: "How does data flow through phase/step N?"
Curation phases:
| Phase | Entry Function | Key Subfunctions |
|---|---|---|
| 1 | reorganize_raw_data() | scan_source_baseline(), scan_source_controls(), copy_and_organize() |
| 2 | filter_dataset() | normalize_study_sequences(), remove_non_required_sequences() |
| 3 | run_quality_filtering() | _validate_patient() → validate_file(), study-level, patient-level |
| 4 | reid_patients_and_studies() | Inline in filter_raw_data.py |
| 5 | QualityAnalyzer.run() | scan_dataset(), analyze_patient(), compute_*_metrics() |
| 6 | QualityVisualizer.generate_all() | Individual plot methods, generate_html_report() |
Preprocessing steps:
| Step | Handler Location | Key Implementation |
|---|---|---|
| 1 | steps/data_harmonization.py | data_harmonization/io.py, orient.py, head_masking/ |
| 2 | steps/bias_field_correction.py | bias_field_correction/n4_sitk.py |
| 3 | steps/resampling.py | resampling/bspline.py, eclare.py, composite.py |
| 4 | steps/cubic_padding.py | Inline padding logic |
| 5 | steps/registration.py | registration/multi_modal_coregistration.py, intra_study_to_atlas.py |
| 6 | steps/skull_stripping.py | skull_stripping/hdbet.py, synthstrip.py |
| 7 | steps/intensity_normalization.py | normalization/zscore.py, kde.py, etc. |
| 8 | steps/longitudinal_registration.py | registration/longitudinal_registration.py |
Strategy 3: "Why was patient/file X rejected?"
- •Open
quality/rejected_files.csv - •Filter by
patient_idorfilename - •Check
stagecolumn:- •0 → Phase 1 (reorganization): file excluded by glob pattern, unrecognized modality, or duplicate
- •1 → Phase 2 (completeness): missing sequences or insufficient studies
- •2 → Phase 3 (quality): failed a blocking check
- •If stage=2, cross-reference with
quality/quality_issues.csvfor the specificcheck_name - •To find the check implementation: search
quality_filtering.pyfor thecheck_namevalue
Strategy 4: "Where do I add a new ___?"
| Adding... | Steps |
|---|---|
| Quality check | 1. Add config @dataclass in config.py under QualityFilteringConfig. 2. Add function in quality_filtering.py returning ValidationResult. 3. Wire into validate_file() or study/patient level. |
| Preprocessing step | 1. Create handler in src/steps/. 2. Add StepMetadata entry in src/config.py. 3. Add config @dataclass in src/config.py. 4. Register handler in preprocess.py → _register_step_handlers(). |
| Modality | Add synonyms in configs/raw_data.yaml → modality_synonyms. If required, add to FilteringConfig.sequences. |
| Source directory | Add scanner function in reorganize_raw_data.py. Call from reorganize_raw_data(). |
| Plot | Add method to QualityVisualizer. Add toggle in PlotConfig. Call from generate_all(). |
| Metric | Add function in metrics.py. Wire into analyze_patient() in analyzer.py. |
| Normalization method | Add class in src/normalization/. Import in preprocess.py. Add to config dispatch. |
| Registration variant | Add implementation in src/registration/. Update factory in factory.py. |
Key Patterns to Recognize
Configuration Pattern
Every tunable parameter is a field in a @dataclass with a default value, parsed from YAML:
python
@dataclass
class SomeCheckConfig:
enabled: bool = True
threshold: float = 5.0
action: str = "block"
Validation Result Pattern (Curation)
Every quality check returns:
python
ValidationResult(passed=bool, check_name=str, message=str, action="warn"|"block", details=dict)
Step Execution Pattern (Preprocessing)
python
StepExecutionContext(patient_id, study_dir, modality, paths, orchestrator, step_name, step_config)
Handlers receive context and dispatch to implementation classes (e.g., ZScoreNormalizer, BSplineResampler).
Parallel Execution Pattern
- •CPU-bound →
ProcessPoolExecutorat patient granularity - •I/O-bound →
ThreadPoolExecutorat file granularity - •Results sorted by ID after collection for determinism
- •All arguments must be picklable (pure dataclasses, Paths, primitives)
Temp-File Pattern (Preprocessing)
python
temp_path = output_path.with_suffix('.tmp.nii.gz')
# ... write to temp_path ...
temp_path.rename(output_path)
Grep Cheatsheet
bash
# Find a quality check by ID grep -n "check_name.*snr\|B1" mengrowth/preprocessing/utils/quality_filtering.py # Find all blocking checks grep -n 'action.*=.*"block"' mengrowth/preprocessing/utils/quality_filtering.py # Find all curation config dataclasses grep -n "@dataclass" mengrowth/preprocessing/config.py # Find all preprocessing config dataclasses grep -n "@dataclass" mengrowth/preprocessing/src/config.py # Find step execution levels grep -n "StepMetadata" mengrowth/preprocessing/src/config.py # Find where a patient gets rejected grep -rn "mark_excluded\|rejection_reason" mengrowth/preprocessing/utils/ # Find parallel execution points grep -rn "ProcessPoolExecutor\|ThreadPoolExecutor" mengrowth/ # Find all CLI flags grep -n "add_argument" mengrowth/cli/curate_dataset.py mengrowth/cli/preprocess.py # Find brain mask resolution grep -rn "brain_mask" mengrowth/preprocessing/src/ # Find normalization implementations grep -rn "class.*Normalizer" mengrowth/preprocessing/src/normalization/
File Format Reference
| File | Location | Schema |
|---|---|---|
rejected_files.csv | quality/ | source_path, filename, patient_id, study_name, rejection_reason, source_type, stage |
quality_issues.csv | quality/ | patient_id, study_id, modality, file_path, check_name, action, message, level, details |
id_mapping.json | dataset/ | {P*: {new_id: MenGrowth-*, studies: {old_idx: new_id}}} |
quality_metrics.json | quality/ | Hierarchical: patient → study → modality → metric values |
per_study_metrics.csv | qc_analysis/ | One row per (study, sequence) with all computed metrics |
metadata_enriched.csv | dataset/ | patient_id, age, sex, ..., included, exclusion_reason, MenGrowth_ID |