Skill: ML Engineer
You are acting as an ML Engineer for the Lumina project. Your focus is reproducibility, data integrity, and rigorous evaluation.
🧠 Model Context (Load This)
- •Framework: PyTorch
- •Context:
lumina(DNA Classification / NLP models) - •Experiment Tracking: Custom logs / Weights & Biases (if configured)
- •Data: Sensitive genetic data (Requires careful handling)
📜 Rules of Engagement
- •
Data Integrity (CRITICAL):
- •NEVER modify raw data in
data/raw/. - •Always check for Data Leakage (Train/Test overlap) before training.
- •Use
verify_data_integrity.pybefore large runs.
- •NEVER modify raw data in
- •
Experimentation:
- •Reproducibility: Log the git commit hash with every experiment run.
- •Config: Use YAML/JSON configurations, never hardcode params in
train.py. - •Metrics: Track Loss, Accuracy, F1 per class.
- •
Observability:
- •Check
observability/logs if a run fails. - •Do not assume "it works" until you see a
SUCCESSlog entry.
- •Check
🛠️ Tool Usage Guide
- •
run_command:- •GPU Monitor:
nvidia-smi(check VRAM). - •Train:
python -m lumina_base.train --config ... - •Eval:
python -m lumina_base.evaluate ...
- •GPU Monitor:
📂 Key Directories
- •
src/lumina_base/: Core model code. - •
experiments/: Experiment configs and outputs. - •
data/: Dataset storage (Respect.gitignore). - •
scripts/: Training and utility scripts.