SKILL: Jupyter Notebook Patterns
Notebook Structure
Cells should follow this order:
- •Setup — imports, configuration, constants
- •Data Loading — load raw data with validation
- •EDA — exploratory data analysis, distributions, correlations
- •Preprocessing — cleaning, feature engineering
- •Modeling — model training and evaluation
- •Visualization — charts and figures
- •Conclusions — findings summary, next steps
Parameterized Execution with Papermill
python
# In notebook cell tagged with "parameters": # Click: View -> Cell Toolbar -> Tags -> add "parameters" tag dataset = "data/train.csv" # papermill will override this output_dir = "outputs" n_estimators = 100 random_state = 42
bash
# Execute with papermill papermill input.ipynb output.ipynb \ -p dataset "data/test.csv" \ -p n_estimators 200 \ -p random_state 0
Programmatic Notebook Creation
python
import nbformat as nbf
nb = nbf.v4.new_notebook()
nb.cells = [
nbf.v4.new_markdown_cell("# Analysis: {Title}"),
nbf.v4.new_code_cell("import pandas as pd\nimport numpy as np"),
nbf.v4.new_code_cell("df = pd.read_csv('data.csv')\ndf.head()"),
]
with open('analysis.ipynb', 'w') as f:
nbf.write(nb, f)
Export
bash
# To HTML (with outputs) jupyter nbconvert --to html --execute notebook.ipynb # To PDF jupyter nbconvert --to pdf --execute notebook.ipynb # Execute in place jupyter nbconvert --to notebook --execute --inplace notebook.ipynb
Git Hygiene
bash
# Clear outputs before commit jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True \ --inplace notebook.ipynb # Or use nbstripout (installs as git filter) pip install nbstripout nbstripout --install
Rules
- •Restart kernel and run all cells before committing.
- •Clear all outputs before git commit.
- •Tag parameter cells for papermill.
- •Each notebook should be self-contained and reproducible.
- •Include
random_stateparameter for reproducibility. - •Use relative paths for data files.