Refactoring 04: Data IO and Validation

Name: refactoring-04-data-io-validation
Rating: 76
Author: Silviase

Goal

Make data handling reliable with clear schemas, validated inputs, and consistent IO boundaries.

•
Create a single data access layer (loaders, paths, caching) used by all entrypoints.
- •Success: Entrypoints share one data loading API.
•
Define expected schemas (columns, dtypes, shapes) and validate inputs early.
- •Success: Invalid inputs are rejected with clear errors.
•
Add light weight checksums or version tags to datasets where practical.
- •Success: Dataset versions are recorded and comparable.
•
Keep preprocessing steps deterministic and logged.
- •Success: Preprocessing outputs are repeatable and traceable.
•
Separate raw, intermediate, and final outputs with clear folder names.
- •Success: Output folders are consistent and documented.