MLOps Coding - Productionizing Skill

Goal

To convert experimental code (notebooks/scripts) into a high-quality, distributable Python package. This skill enforces the src/ layout, a Hybrid Paradigm (OOP structure + Functional purity), and Strict Configuration to ensure scalability, security, and maintainability.

Prerequisites

•Language: Python
•Manager: uv
•Context: Moving from notebooks/ to src/.

Instructions

1. Packaging Structure (`src` Layout)

Adopt the src layout to prevent import errors and separate source from tooling.

•

Directory Tree:

text

my-project/
├── pyproject.toml       # Dependencies & Metadata
├── uv.lock
├── README.md
└── src/
    └── my_package/      # Main package directory
        ├── __init__.py
        ├── io/          # Side-effects (Datasets, APIs)
        ├── domain/      # Pure business logic (Models, Features)
        └── application/ # Orchestration (Training loops, Inference)

•
Configuration: Use pyproject.toml for all build metadata and dependencies.

2. Modularity & Paradigm (Hybrid Style)

Balance structure with predictability.

•
Domain Layer (Pure):
- •Rule: Code here must be deterministic and free of side effects (no I/O).
- •Use Case: Feature transformations, Model architecture definitions.
- •Style: Functional (pure functions) or Immutable Objects (dataclasses).
•
I/O Layer (Impure):
- •Rule: Isolate external interactions here.
- •Use Case: Loading data from S3, saving models to disk, logging to MLflow.
- •Style: OOP (Classes to manage connections/state).
•
Application Layer (Orchestration):
- •Rule: Wire Domain and I/O together.
- •Use Case: Tuning, Training, Inference, Evaluation, etc.

3. Application Entrypoints

Create standard, installable CLI tools.

•
Define Script: Create src/my_package/scripts.py with a main() function.

•

Register: Add to pyproject.toml:

toml

[project.scripts]
my-tool = "my_package.scripts:main"

•
CLI Execution:
- •Dev: uv run my-tool (No install needed).
- •Prod: pip install . -> my-tool (Installed on PATH).
•
Guard: Always use if __name__ == "__main__": in scripts to prevent execution on import.

4. Configuration Management

Decouple settings from code using OmegaConf (Parsing) and Pydantic (Validation).

•

Define Schema (Pydantic):

•Create a class that defines expected types and defaults.

python

from pydantic import BaseModel

class TrainingConfig(BaseModel):
    batch_size: int = 32
    learning_rate: float = 0.001
    use_gpu: bool = False

•

Parse & Validate (OmegaConf):

•Load YAML, merge with CLI args, and validate against the schema.

python

import omegaconf

# 1. Load YAML
conf = omegaconf.OmegaConf.load("config.yaml")
# 2. Merge with CLI (optional)
cli_conf = omegaconf.OmegaConf.from_cli()
merged = omegaconf.OmegaConf.merge(conf, cli_conf)
# 3. Validate -> Returns a validated Pydantic object
cfg: TrainingConfig = TrainingConfig(**omegaconf.OmegaConf.to_container(merged))

•
Secrets: Use Environment Variables (os.getenv), never commit them.

5. Documentation & Quality

Make code usable and maintainable.

•

Docstrings: Use Google Style docstrings for all modules, classes, and functions.

python

def calculate_metric(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """Calculates the accuracy score.

    Args:
        y_true: Ground truth labels.
        y_pred: Predicted labels.

    Returns:
        The accuracy as a float between 0 and 1.
    """

•
Type Hints: Use standard python typing (typing, list[str]) everywhere.

6. Best Practices Summary

•Config != Code: Never hardcode paths or hyperparams; use the Pydantic + OmegaConf pattern.
•Entrypoints are APIs: Design your CLI ([project.scripts]) as the public interface for your automation tools.
•Immutable Core: Keep your domain logic side-effect free; push I/O to the edges.

Self-Correction Checklist

• No Side Effects on Import: Does import my_package run any code? (It shouldn't).
• Src Layout: Is code inside src/?
• Config Safety: Are secrets excluded from pyproject.toml and YAML?
• Typing: Are function signatures fully type-hinted?
• Entrypoints: Is the CLI registered in pyproject.toml?

MLOps Coding - Productionizing Skill

Goal

Prerequisites

Instructions

1. Packaging Structure (src Layout)

2. Modularity & Paradigm (Hybrid Style)

3. Application Entrypoints

4. Configuration Management

5. Documentation & Quality

6. Best Practices Summary

Self-Correction Checklist

1. Packaging Structure (`src` Layout)