ML Model Builder

Name: ml-model-builder
Rating: 92
Author: js1972

Overview

Guide Codex to gather ML requirements, build a baseline, iterate to a stronger model, and save code and artifacts under artefacts/.

•
Before running any Python code or installing dependencies, create a venv in the current working directory:
- •python3 -m venv .venv
•
Always run Python and pip from the venv:
- •.venv/bin/python, .venv/bin/pip
•Assume implicit approval to install dependencies into the venv. Do not install system-wide packages.
•Keep dependencies minimal and report what was installed.

•
Ask the minimum required inputs before training:
- •dataset location(s) (local path(s) or URL(s))
- •task type
- •evaluation metric (or accept default)
- •split strategy
•Ask for time column and any entity/group identifier to choose an appropriate split and CV strategy (see references/defaults.md).
•Ask for a random seed (default in references/defaults.md).
•Ask task-specific requirements (see references/defaults.md).
•Ask for any domain-specific feature ideas and confirm whether to apply standard feature engineering (date parts, lags, transforms).
•Ask whether to run explainability (SHAP) and whether to change training bounds.
•If multiple dataset locations are provided, ask how to combine them and whether to add a source column.
•Run a quick LLM suitability check (see references/defaults.md). If it triggers, recommend an LLM-based approach and ask whether to proceed with classical ML anyway.
•Confirm defaults when the user does not specify values.

•Support local CSV/Parquet or HTTP(S) URL to CSV/Parquet only.
•Support multiple files; default to row-wise concat if schemas align.
•Validate target column exists (if applicable) and identify feature types.
•If target is derived (threshold or date difference), record the rule in artefacts/config.json.
•Apply default feature engineering where appropriate (see references/defaults.md) and allow the user to override.
•Run a quick leakage guard: flag features that are identical to the target, contain the target name, or are derived directly from the target.
•Do not attempt authenticated cloud buckets.

•Improve preprocessing and model selection.
•Include non-sklearn models when appropriate (e.g., XGBoost, LightGBM, CatBoost). Install them into the venv if needed.
•Expand feature engineering for the iteration stage if it improves the metric and does not introduce leakage.
•Use the agreed metric to pick the best model.
•Respect bounds and stop early after repeated non-improving trials.
•Keep a clear audit trail in artefacts/config.json.

•Create artefacts/ if it does not exist.
•
Save:
- •train.py
- •infer.py
- •model artifact
- •metrics.json
- •config.json
- •optional SHAP output if requested
•Ensure infer.py uses the same preprocessing as train.py.
•
Provide a concise final summary to the user:
- •best model, key metric(s), baseline vs final score
- •split strategy and key preprocessing choices
- •training bounds used and whether early stopping occurred
- •artifact paths
•Also save the same summary to results.md in the project root.
•Include a short, non-expert explanation of each reported metric in the final summary and in results.md.
•Include a data profile summary in results.md (row/column counts, missingness, target distribution or summary).
•Include a brief feature engineering summary in results.md.

•Defaults, task requirements, baseline and iteration guidance: references/defaults.md
•Artifact naming and JSON structure: references/artifacts.md
•Example prompts and expected clarifications: references/examples.md