Notebook ML Architect
Expert guidance for production-quality ML notebooks.
Quick Reference
| Operation | Use Case |
|---|---|
| audit | Analyze notebook for anti-patterns, leakage, reproducibility issues |
| refactor | Transform notebook into modular Python pipeline |
| template | Generate new notebook from EDA/classification/experiment template |
| report | Create markdown summary from executed notebook |
| convert | Extract Python script from notebook |
Audit Workflow
When auditing a notebook:
- •Read the notebook using the Read tool
- •Check structure against ml-workflow-guide.md
- •Detect anti-patterns using anti-patterns.md
- •Check for data leakage using leakage-checklist.md
- •Run analysis script if deeper inspection needed:
bash
python scripts/analyze_notebook.py <notebook.ipynb>
Audit Checklist
- • Execution order: Cells numbered sequentially (no gaps, no out-of-order)
- • Random seeds: Set early (np.random.seed, torch.manual_seed, random.seed)
- • Imports at top: All imports in first code cell(s)
- • No hardcoded paths: Use relative paths or config variables
- • Train/test split: Clear separation before any modeling
- • No data leakage: Pre-processing after split, no test data peeking
- • Modularization: Functions/classes for reusable logic
- • Dependencies documented: requirements.txt or environment.yml referenced
Severity Levels
- •CRITICAL: Data leakage, missing train/test split, results unreproducible
- •HIGH: No seeds, hardcoded paths, execution order issues
- •MEDIUM: Missing modularization, no dependency docs
- •LOW: Naming conventions, missing comments, style issues
Refactoring Guide
Transform notebooks into production pipelines:
Step 1: Identify Sections
Look for markdown headers that indicate logical sections:
- •Data loading
- •Preprocessing
- •Feature engineering
- •Model definition
- •Training
- •Evaluation
Step 2: Extract Functions
Convert repeated or complex cell code into functions:
# Before: inline code
df = pd.read_csv('data.csv')
df = df.dropna()
df['feature'] = df['a'] * df['b']
# After: function
def load_and_prepare_data(path: str) -> pd.DataFrame:
df = pd.read_csv(path)
df = df.dropna()
df['feature'] = df['a'] * df['b']
return df
Step 3: Create Module Structure
project/ ├── data.py # Data loading and preprocessing ├── features.py # Feature engineering ├── model.py # Model definition ├── train.py # Training loop ├── evaluate.py # Evaluation metrics ├── config.py # Configuration parameters └── main.py # Pipeline entry point
Step 4: Use convert script
python scripts/convert_to_script.py notebook.ipynb output.py --group-by-sections
Template Generation
Generate new notebooks from templates:
Available Templates
- •
EDA Template (
assets/templates/eda_template.ipynb)- •Data loading, basic info, missing values, distributions, correlations
- •
Classification Template (
assets/templates/classification_template.ipynb)- •Full supervised learning pipeline with evaluation metrics
- •
Experiment Template (
assets/templates/experiment_template.ipynb)- •Parameterized notebook for experiment tracking
Using Templates
Copy template to project and customize:
cp ~/.claude/skills/notebook-ml-architect/assets/templates/classification_template.ipynb ./my_experiment.ipynb
Or generate programmatically with modifications.
Reproducibility Checklist
Required Elements
- •
Random Seeds Use the reproducibility header snippet:
python# Copy from assets/snippets/reproducibility_header.py
- •
Environment Capture
pythonimport sys print(f"Python: {sys.version}") for pkg in ['numpy', 'pandas', 'sklearn', 'torch']: try: mod = __import__(pkg) print(f"{pkg}: {mod.__version__}") except ImportError: pass - •
Dependency File
bashpip freeze > requirements.txt # Or for conda: conda env export > environment.yml
- •
Data Versioning
- •Record data source, download date, preprocessing steps
- •Use relative paths from project root
- •Consider DVC for large datasets
MCP Tool Usage
Context7 - Library API Lookups
When you need accurate API information:
1. Call resolve-library-id with library name 2. Call get-library-docs with the returned ID and topic
Examples:
- •sklearn train_test_split parameters
- •papermill execute_notebook options
- •nbformat cell structure
Exa Search - Current Best Practices
When you need up-to-date recommendations:
- •Use
web_search_exafor discovery - •Use
crawling_exato pull full content from good URLs - •Use
deep_search_exafor focused queries
Examples:
- •"PyTorch reproducibility best practices 2024"
- •"How to handle class imbalance"
- •"MLflow notebook integration"
GitHub Search - Real-World Patterns
When you need to see how others do it:
searchGitHub with: - query: specific code pattern - language: ["Python"] - path: ".ipynb" for notebooks
Examples:
- •Production notebook seeding patterns
- •Evaluation metric implementations
- •Config management in notebooks
Script Reference
analyze_notebook.py
Parse notebook and extract structure:
python scripts/analyze_notebook.py <notebook.ipynb> [--output json|text]
Output includes:
- •Cell counts by type
- •Import statements
- •Function/class definitions
- •Detected issues
run_notebook.py
Execute notebook with parameters:
python scripts/run_notebook.py input.ipynb output.ipynb \
--params '{"learning_rate": 0.01, "epochs": 100}' \
--timeout 3600
convert_to_script.py
Extract Python from notebook:
python scripts/convert_to_script.py notebook.ipynb output.py \ --include-markdown \ --group-by-sections \ --add-main
Common Issues and Fixes
Data Leakage
Problem: Preprocessing on full dataset before split
# BAD scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Fits on all data X_train, X_test = train_test_split(X_scaled)
Fix: Split first, fit on train only
# GOOD X_train, X_test = train_test_split(X) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Transform only
Hidden State
Problem: Variables from previous runs affect results
# Cell 1 run multiple times results.append(model.score(X_test, y_test)) # results grows each run
Fix: Initialize state in cell
results = [] # Always start fresh results.append(model.score(X_test, y_test))
Missing Seeds
Problem: Different results each run
X_train, X_test = train_test_split(X, y) # Random each time
Fix: Set seeds explicitly
SEED = 42 X_train, X_test = train_test_split(X, y, random_state=SEED)