5.6. Model Registries
What is a model registry?
A model registry is a centralized repository designed to manage the lifecycle of machine learning models. It acts as a version control system for models, tracking their journey from training and experimentation to staging and production deployment. This makes it an indispensable tool for collaborative and scalable MLOps.
Why is a model registry essential?
A model registry provides critical governance and operational capabilities:
- Centralized Storage and Versioning: It offers a single source of truth for all models, allowing you to store, version, and retrieve them systematically. If a new model version performs poorly, you can instantly roll back to a previous one.
- Complete Lineage Tracking: It records the entire history of a model, including the data it was trained on, the code used, hyperparameters, and performance metrics. This ensures full reproducibility and auditability.
- Streamlined Deployment: It simplifies the transition of models from development to production environments, often integrating with CI/CD pipelines to automate deployment workflows.
- Clear Governance and Promotion: It establishes a formal process for promoting models through different stages (e.g., from "Staging" to "Production"), ensuring that only validated and approved models are deployed.
Which model registry solution should you use?
The right choice depends on your existing ecosystem and requirements.
- Cloud-Based Platforms: Major cloud providers offer tightly integrated solutions, such as Google Vertex AI, AWS SageMaker, and Azure ML.
- Third-Party Solutions: Platforms like Weights & Biases and Neptune AI provide comprehensive experiment tracking and model management features.
- Open-Source: MLflow Model Registry is a popular, framework-agnostic option that you can host yourself.
To begin with MLflow, install it in your project:
uv add mlflow
Then, verify the installation and start the tracking server:
uv run mlflow doctor
uv run mlflow server
What is the difference between an MLflow model and a registered model?
An MLflow Model is the output of a training run, logged during an MLflow experiment using a command like mlflow.sklearn.log_model()
. Think of it as a saved artifact.
A Registered Model is a more formal entity. When you "register" an MLflow Model, you give it a unique name in the registry. This registered model then acts as a container for all its different versions, allowing you to manage its lifecycle, assign aliases, and track its deployment status.
How do you integrate the MLflow Registry into your project?
Integrating the MLflow Model Registry involves four main steps: initializing, saving, registering, and loading.
1. Initializing
First, configure MLflow to know where to store its data. For local development, you can point both the tracking and registry URIs to a local directory.
import mlflow
# Set the location for MLflow to store experiment runs and artifacts
mlflow.set_tracking_uri("file://./mlruns")
# Set the location for the model registry
mlflow.set_registry_uri("file://./mlruns")
# Create a registered model name (only needs to be done once)
client = mlflow.tracking.MlflowClient()
try:
client.create_registered_model("bikes")
except mlflow.exceptions.MlflowException:
pass # Model already exists
2. Saving
Next, log your model during a training run. You can do this manually or use autologging for convenience.
import mlflow
with mlflow.start_run(run_name="training") as run:
model = ... # Your model training logic
# Log the model, which returns its metadata
model_info = mlflow.sklearn.log_model(model, "models")
3. Registering
Once the model is logged, register it to a specific name in the registry. This creates a new version.
# model_info is the output from the log_model call in the previous step
model_version = mlflow.register_model(
name="bikes",
model_uri=model_info.model_uri
)
print(f"Model version {model_version.version} registered.")
4. Loading
Finally, load a specific model version from the registry for inference or testing.
# Load version 1 of the "bikes" model
model_uri = "models:/bikes/1"
model = mlflow.sklearn.load_model(model_uri)
predictions = model.predict(data)
How do you define a model's input/output schema?
A model signature explicitly defines the schema of a model's inputs and outputs. This is crucial for validation and creating a clear contract for how the model should be used.
import mlflow
from mlflow.models.signature import infer_signature
# X_train and y_train are your training features and targets
signature = infer_signature(X_train, y_train)
mlflow.sklearn.log_model(
model,
artifact_path="models",
signature=signature,
input_example=X_train.head(5) # Log an example for UI visualization
)
How do you access models in the registry programmatically?
You can interact with the registry using the MlflowClient
to search for models, retrieve versions, and manage their stages.
import mlflow
client = mlflow.tracking.MlflowClient()
# Search for all versions of the "bikes" model
model_versions = client.search_model_versions("name='bikes'")
for mv in model_versions:
print(f"Version: {mv.version}, Stage: {mv.current_stage}, URI: {mv.source}")
How do you promote a model to production?
Aliases are the recommended way to manage model deployments. An alias is a mutable, named pointer to a specific model version. Instead of hardcoding version numbers in your applications, you point to an alias like "champion" or "production".
Assigning an Alias
You can assign an alias to a model version that has been tested and is ready for deployment.
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.set_registered_model_alias(
name="bikes",
alias="champion",
version=1 # The version number you want to promote
)
Loading a Model Using an Alias
Your production application can then load the model using this stable alias.
import mlflow
model_uri = "models:/bikes@champion"
champion_model = mlflow.pyfunc.load_model(model_uri=model_uri)
predictions = champion_model.predict(inputs)
This way, updating the production model is as simple as reassigning the "champion" alias to a new version, with no changes needed in the client application.
How do you roll back a problematic model?
If the "champion" model is not performing as expected, rolling back is straightforward. Simply reassign the alias to a previously known stable version.
# Reassign the alias to a previous, stable version (e.g., version 2)
client.set_registered_model_alias(name="bikes", alias="champion", version=2)
Your application will automatically pick up the old version the next time it loads the model from the models:/bikes@champion
URI, effectively rolling back the change.
How can you package custom logic with your model?
MLflow's PyFunc flavor lets you create a custom model class, enabling you to bundle preprocessing or post-processing logic with your model. This ensures that your custom logic is always executed alongside the model.
import mlflow.pyfunc
import pandas as pd
class PreprocessingModel(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# Load your trained model artifact
self.model = mlflow.sklearn.load_model(context.artifacts["model_path"])
def predict(self, context, model_input):
# Apply custom preprocessing logic
processed_input = model_input.apply(lambda col: col.fillna(col.mean()))
# Return predictions from the underlying model
return self.model.predict(processed_input)
# Save the custom model with the original model as an artifact
mlflow.pyfunc.save_model(
path="custom_model",
python_model=PreprocessingModel(),
artifacts={"model_path": "path/to/your/sklearn/model"}
)