MLflow Development Skill
Expert guidance for ML lifecycle management with MLflow, including GenAI/LLM tracking and MCP integration.
Core Concepts
MLflow is an open-source platform for managing the ML lifecycle with four main components:
- •Tracking: Log parameters, metrics, and artifacts
- •Models: Package and deploy models
- •Registry: Centralize model storage and versioning
- •Projects: Package reproducible runs
Quick Start
Configuration
import mlflow
# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
# Set experiment
mlflow.set_experiment("my-experiment")
Autologging
MLflow provides automatic logging for major frameworks:
ML Frameworks
import mlflow # Scikit-learn mlflow.sklearn.autolog() # PyTorch mlflow.pytorch.autolog() # TensorFlow/Keras mlflow.tensorflow.autolog()
GenAI/LLM Providers
import mlflow # OpenAI mlflow.openai.autolog() # Anthropic mlflow.anthropic.autolog() # LangChain mlflow.langchain.autolog()
What gets logged automatically:
- •Model parameters and hyperparameters
- •Training metrics
- •Model artifacts and dependencies
- •Tokens, latency, and cost (for LLMs)
- •Tool calls and function invocations
Manual Tracking
import mlflow
with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_params({"batch_size": 32, "epochs": 100})
# Log metrics
mlflow.log_metric("train_loss", 0.5)
# Log metrics with steps
for epoch in range(num_epochs):
train_loss = train_model()
mlflow.log_metric("train_loss", train_loss, step=epoch)
# Log model
mlflow.sklearn.log_model(model, name="model")
GenAI Tracing
Basic Tracing
import mlflow
@mlflow.trace
def my_llm_app(question: str) -> str:
"""Traced LLM application"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
# Trace is automatically logged
result = my_llm_app("What is MLflow?")
# Access trace
trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=trace_id)
Production Context
# Add production context to traces
mlflow.update_current_trace(
tags={
"mlflow.trace.session": session_id,
"mlflow.trace.user": user_id,
"environment": "production"
}
)
Model Registry
Register and Deploy Models
import mlflow
from mlflow import MlflowClient
client = MlflowClient()
# Register during training
with mlflow.start_run():
mlflow.sklearn.log_model(
model,
name="model",
registered_model_name="MyModel"
)
# Set alias for deployment
client.set_registered_model_alias(
name="MyModel",
alias="champion",
version=1
)
# Load model by alias
model = mlflow.pyfunc.load_model("models:/MyModel@champion")
Model Versioning
# Load specific version
model = mlflow.pyfunc.load_model("models:/MyModel/2")
# Load by stage
model = mlflow.pyfunc.load_model("models:/MyModel/Production")
# Transition model stage
client.transition_model_version_stage(
name="MyModel",
version=2,
stage="Production"
)
MCP Integration
MLflow has MCP support for trace operations:
Installation
# Install as UV tool (recommended) uv tool install "mlflow[genai,mcp]>=2.19.0" # Or add to project uv add "mlflow[genai,mcp]>=2.19.0"
Starting MLflow MCP Server
# Run directly (after uv tool install) mlflow mcp run # With custom tracking URI MLFLOW_TRACKING_URI=sqlite:///mlruns.db mlflow mcp run
Claude Code Configuration
.mcp.json (project configuration):
{
"mcpServers": {
"mlflow": {
"command": "mlflow",
"args": ["mcp", "run"],
"env": {
"MLFLOW_TRACKING_URI": "sqlite:///mlruns.db"
}
}
}
}
Environment Variables
| Variable | Required | Description |
|---|---|---|
MLFLOW_TRACKING_URI | Yes | MLflow tracking server URL or sqlite path |
MLFLOW_EXPERIMENT_ID | No | Default experiment ID |
DATABRICKS_HOST | For Databricks | Workspace URL |
DATABRICKS_TOKEN | For Databricks | Personal access token |
MCP Tools Available
The MLflow MCP server exposes these tools:
| Tool | Purpose |
|---|---|
search_traces | Search traces with filters (experiment_id, tags, timestamps) |
get_trace | Get detailed trace info including spans, inputs, outputs |
log_feedback | Log feedback scores (accuracy, quality, custom) |
log_expectation | Log expected values for trace evaluation |
evaluate_traces | Run automated evaluation with scorers |
list_scorers | List available evaluation scorers |
register_llm_judge | Create custom LLM-based scorer |
set_trace_tag / delete_trace_tag | Manage trace metadata |
delete_traces | Clean up traces by criteria |
Workflow Example:
1. search_traces → Find traces to evaluate 2. evaluate_traces → Run built-in scorers (Correctness, Safety, etc.) 3. log_feedback → Add human feedback 4. get_trace → Inspect detailed results
Best Practices
- •Organize Experiments: Use hierarchical naming and tags
- •Version Everything: Use Git versioning for GenAI apps
- •Add Production Context: Always include session, user, and environment info
- •Monitor Costs: Track token usage and estimated costs
- •Regular Evaluation: Run evaluations with multiple scorers
Cross-Skill Integration
Marimo Dashboard for Experiments
Build interactive experiment dashboards:
import marimo as mo
import mlflow
# Experiment selector
experiments = mlflow.search_experiments()
exp_select = mo.ui.dropdown(
options={e.name: e.experiment_id for e in experiments},
label="Select Experiment"
)
# Display runs with filtering
runs_df = mlflow.search_runs(experiment_ids=[exp_select.value])
mo.ui.table(runs_df, selection="single", label="Experiment Runs")
PINA Training Tracking
Track physics-informed neural network training:
import mlflow
from pina import Trainer
from pina.callbacks import MetricTracker
mlflow.set_experiment("pina-experiments")
with mlflow.start_run():
mlflow.log_params({"layers": [64, 64], "activation": "Tanh"})
trainer = Trainer(solver, max_epochs=1000, callbacks=[MetricTracker()])
trainer.train()
# Log PINA metrics
for key, value in trainer.callback_metrics.items():
mlflow.log_metric(key, value)
mlflow.pytorch.log_model(solver.model, "pinn")
Using context7 for Documentation
Query up-to-date MLflow documentation directly:
# context7 Library IDs (no resolve needed):
# - /mlflow/mlflow (official docs, 9559 snippets)
# - /websites/mlflow (website docs, 36205 snippets)
# Example: query-docs("/mlflow/mlflow", "mlflow.trace decorator usage")
When to Use This Skill
✅ Use MLflow when:
- •Tracking ML experiments and hyperparameters
- •Building and deploying LLM/GenAI applications
- •Managing model lifecycle and versioning
- •Comparing model performance across iterations
- •Need production observability for GenAI apps
- •Collaborating on ML projects
❌ Don't use MLflow when:
- •Simple one-off scripts without iteration
- •No need for experiment tracking or model versioning
- •Using platform-specific tools (e.g., SageMaker Experiments)
Reference Documentation
For detailed guides, see the references folder:
- •GenAI Tracking: Complete LLM tracking guide with all providers, Git versioning, token tracking, and trace querying
- •Framework Integrations: LangChain, LlamaIndex, DSPy, CrewAI, and ML framework integration examples
- •Evaluation: GenAI evaluation, built-in scorers, custom scorers, and model comparison
- •Production Deployment: FastAPI integration, async logging, Kubernetes deployment, and monitoring
- •Model Registry: Complete model registry guide with versioning, aliases, stages, and deployment patterns
Example Templates
Ready-to-use templates in the examples folder:
- •langchain_tracking.py: LangChain with MLflow autologging - chains, agents, and production patterns
- •fastapi_tracing.py: FastAPI production tracing with streaming, batch processing, and health checks
- •hyperparameter_tuning.py: Grid search, random search, Optuna integration, and nested runs
Resources
- •Documentation: https://mlflow.org/docs/latest/
- •GitHub: https://github.com/mlflow/mlflow
- •GenAI Guide: https://mlflow.org/docs/latest/genai/
- •Tracking API: https://mlflow.org/docs/latest/tracking.html
- •Model Registry: https://mlflow.org/docs/latest/model-registry.html