7.0. Reproducibility

What is reproducibility in MLOps?

Reproducibility in MLOps is the ability to re-create the exact same results of a machine learning experiment or model, given the same code, data, and environment. This is a fundamental requirement for validating findings, debugging models, and ensuring consistent behavior over time. Achieving reproducibility builds trust and transparency, enabling independent verification and accelerating development by providing a stable foundation.

What is the difference between reproducibility and replicability?

While often used interchangeably, these terms have distinct meanings in a scientific context:

Reproducibility means obtaining the same results using the same code and data. It is a direct validation of the experimental process. If you run the same script on the same dataset, you should get the exact same model artifact or evaluation metric.
Replicability means obtaining consistent results and conclusions across different studies that aim to answer the same scientific question, often with different code, data, or experimental setups. It validates the scientific finding itself.

In MLOps, the primary focus is on reproducibility, as it forms the basis for reliable and auditable systems.

Why is reproducibility crucial in MLOps?

Reproducibility is a cornerstone of scientific rigor and operational excellence in machine learning. Its importance stems from several key factors:

Trust and Validation: It proves that results are not due to chance or a specific, unrecorded setup. This builds confidence in the model's reliability.
Debugging and Iteration: When a model's performance degrades, a reproducible workflow allows you to trace the exact changes that caused the issue, enabling rapid fixes.
Collaboration: Team members can confidently build upon each other's work, knowing that the results are verifiable and stable.
Regulatory Compliance: In industries like finance and healthcare, regulatory bodies often require a complete audit trail. Reproducibility provides a transparent and verifiable record of how a model was built and validated.
Knowledge Transfer: It ensures that insights from past experiments are preserved and can be accurately revisited for future projects.

How can you implement reproducibility in your MLOps projects?

Achieving reproducibility requires a systematic approach that combines specific tools and best practices:

Environment Management: Use tools like Docker or uv to create isolated and consistent environments. This ensures that the Python version, system libraries, and all dependencies are identical across every run.
Code Versioning: Employ Git to track every change to your codebase. A specific Git commit hash should correspond to a unique version of your model training script and supporting modules.
Data Versioning: Track the datasets used for training and evaluation. Tools like DVC or MLflow Data allow you to version datasets, ensuring you can always access the exact data used in an experiment.
Randomness Control: Machine learning involves inherent randomness (e.g., weight initialization, data shuffling). Control this by setting fixed random seeds in your code for all libraries that have stochastic elements.
Experiment Tracking: Use tools like MLflow to meticulously log all experiment details, including parameters, metrics, artifacts (like models), and the versions of code and data used.
Automated Pipelines: Define your entire workflow—from data preprocessing to model training and evaluation—as code in an automated pipeline. This ensures every step is executed in the correct order and with the correct configuration.

How can you control randomness in AI/ML frameworks?

By setting a specific seed, you ensure that random number generators produce the same sequence of numbers every time. This leads to consistent results across different executions.

Here is how you can fix randomness for several popular machine learning frameworks.

Python

import random

random.seed(42)

NumPy

import numpy as np

np.random.seed(42)

Scikit-learn

Many Scikit-learn models and functions accept a random_state parameter.

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(random_state=42)

PyTorch

import torch

torch.manual_seed(42)

For CUDA operations, you should also set the seed for all GPUs:

if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)

For full reproducibility, you may need to disable certain non-deterministic algorithms in cuDNN, though this can impact performance:

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

TensorFlow

import tensorflow as tf

tf.random.set_seed(42)

How can you build deterministic Python packages for reproducibility?

A deterministic build ensures that the same source code always produces a bit-for-bit identical package (wheel). This is vital for guaranteeing that a deployed application is exactly the same as the one tested.

To achieve this with uv, you can use a justfile to define a constrained build process:

# run package tasks
[group('package')]
package: package-build

# build package constraints
[group('package')]
package-constraints constraints="constraints.txt":
    uv pip compile pyproject.toml --generate-hashes --output-file={{constraints}}

# build python package
[group('package')]
package-build constraints="constraints.txt": clean-build package-constraints
    uv build --build-constraint={{constraints}} --require-hashes --wheel

The --build-constraint flag forces uv to use the exact dependency versions specified in constraints.txt, while --require-hashes validates that each package matches its expected hash, preventing any variation.

How can you use MLflow Projects to enforce reproducibility?

MLflow Projects provides a standard format for packaging and running data science code, making it highly reusable and reproducible. By defining a project in an MLproject file, you specify its environment, dependencies, and entry points, ensuring consistent execution anywhere.

Defining an MLflow Project

Create an MLproject file in your project's root directory with the following YAML structure. This example is from the mlops-python-package template:

# https://mlflow.org/docs/latest/projects.html

name: bikes
python_env: python_env.yaml
entry_points:
  main:
    parameters:
      conf_file: path
    command: "PYTHONPATH=src python -m bikes {conf_file}"

name: Defines the project's name.
python_env: Points to the environment definition file (e.g., a Conda or uv lock file).
entry_points: Defines runnable workflows. The main entry point here runs the bikes module, passing a configuration file as a parameter.

Executing an MLflow Project

Run the project from the command line:

mlflow run --experiment-name=bikes --run-name=Training -P conf_file=confs/training.yaml .

This command executes the project located in the current directory (.), automatically sets up the specified environment, and passes the training configuration file to the main entry point.

Benefits of Using MLflow Projects

Automated Environment Setup: MLflow automatically creates the correct environment before running the code.
Standardized Execution: Ensures the project runs the same way on any machine.
Simplified Sharing: Makes it easy to share projects with colleagues, knowing they can run them without manual setup.
Enhanced Collaboration: Provides a common framework for teams to build and execute ML workflows.