5.5. AI/ML Experiments
What is an AI/ML experiment?
An AI/ML experiment is a structured process where data scientists and machine learning engineers systematically apply various algorithms, tune parameters, and manipulate datasets to develop predictive models. The goal is to find the most effective model that solves a given problem or enhances the performance of existing solutions. These experiments are iterative, involving trials of different configurations to evaluate their impact on model accuracy, efficiency, and other relevant metrics.
Why do you need AI/ML experiment?
- Improve reproducibility: Reproducibility is crucial in AI/ML projects to ensure that experiments can be reliably repeated and verified by others. Experiment tracking helps in documenting the setup, code, data, and outcomes, making it easier to replicate results.
- Find the best hyperparameters: Hyperparameter tuning is a fundamental step in enhancing model performance. AI/ML experiments allow you to systematically test different hyperparameter configurations to identify the ones that yield the best results.
- Assign tags to organize your experiments: Tagging helps in categorizing experiments based on various criteria such as the type of model, dataset used, or the objective of the experiment. This organization aids in navigating and analyzing experiments efficiently.
- Track the performance improvement during a run: Continuous monitoring of model performance metrics during experiment runs helps in understanding the impact of changes and guiding further adjustments.
- Integrate with several AI/ML frameworks: Experiment tracking tools support integration with popular AI/ML frameworks, streamlining the experimentation process across different environments and tools.
AI/ML experimentation is distinct in MLOps due to the inherent complexity and non-deterministic nature of machine learning tasks. Leveraging experiment tracking tools equips teams with a structured approach to manage this complexity, akin to how scientists document their research findings.
Which AI/ML experiment solution should you use?
There are various AI/ML experiment tracking solutions available, each offering unique features. Major cloud providers like Google Cloud (Vertex AI), Azure (Azure ML) and AWS (SageMaker) offer integrated MLOps platforms that include experiment tracking capabilities. There are also vendor-specific tools such as Weights & Biases and Neptune AI that specialize in experiment tracking. Among the open-source options, MLflow stands out as a robust and versatile choice for tracking experiments, integrating with a wide range of ML libraries and frameworks.
To install MLflow, execute:
poetry install mlflow
To verify the installation and start the MLflow server:
poetry run mlflow doctor
poetry run mlflow server
For Docker Compose users, the following configuration can launch an MLflow server from a docker-compose.yml
file:
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.11.0
ports:
- 5000:5000
environment:
- MLFLOW_HOST=0.0.0.0
command: mlflow server
Run docker compose up
to start the server.
Further deployment details for broader access can be found in MLflow's documentation.
How should you configure MLflow in your project?
Configuring MLflow in your project enables efficient tracking of experiments. Initially, set the tracking and registry URIs to a local folder like ./mlruns
, and specify the experiment name. Enabling autologging will automatically record metrics, parameters, and models without manual instrumentation.
import mlflow
mlflow.set_tracking_uri("file://./mlruns")
mlflow.set_registry_uri("file://./mlruns")
mlflow.set_experiment(experiment_name="Bike Sharing Demand Prediction")
mlflow.autolog()
To begin tracking, wrap your code in an MLflow run context, specifying details like the run name and description:
with mlflow.start_run(
run_name="Demand Forecast Model Training",
description="Training with enhanced feature set",
log_system_metrics=True,
) as run:
# Your model training code here
Which information can you track in an AI/ML experiment?
MLflow's autologging capability simplifies the tracking of experiments by automatically recording several information. You can complement autologging by manually logging additional information:
- Parameters with
mlflow.log_param()
for individual key-value pairs, ormlflow.log_params()
for multiple parameters. - Metrics using
mlflow.log_metric()
for single key-value metrics, capturing the evolution of metrics over time, or []mlflow.log_metrics()
](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metrics) for multiple metrics. - Input datasets and context with
mlflow.log_input()
, including tags for detailed categorization. - Tags for the active run through
mlflow.set_tag()
for single tags ormlflow.set_tags()
for multiple tags. - Artifacts such as files or directories with
mlflow.log_artifact()
ormlflow.log_artifacts()
for logging multiple files.
How can you compare AI/ML experiments in your project?
Comparing AI/ML experiments is crucial for identifying the most effective models and configurations. MLflow offers two primary methods for comparing experiments: through its web user interface (UI) and programmatically. Here's how you can use both methods:
Comparing Experiments via the MLflow Web UI
-
Launch the MLflow Tracking Server: Start the MLflow tracking server if it isn't running already.
-
Navigate to the Experiments Page: Navigate to the experiments page where all your experiments are listed.
-
Select Experiments to Compare: Find the experiments you're interested in comparing and use the checkboxes to select them. You can select multiple experiments for comparison.
-
Use the Compare Button: After selecting the experiments, click on the "Compare" button. This will take you to a comparison view where you can see the runs side-by-side.
-
Analyze the Results: The comparison view will display key metrics, parameters, and other logged information for each run. Use this information to analyze the performance and characteristics of each model or configuration.
Comparing Experiments Programmatically
Comparing experiments programmatically offers more flexibility and can be integrated into your analysis or reporting tools.
- Search Runs: Use the
mlflow.search_runs()
function to query the experiments you want to compare. You can filter experiments based on experiment IDs, metrics, parameters, and tags. For example:
import mlflow
# Assuming you know the experiment IDs or names
experiment_ids = ["1", "2"]
runs_df = mlflow.search_runs(experiment_ids)
-
Filter and Sort: Once you have the dataframe with runs, you can use pandas operations to filter, sort, and manipulate the data to focus on the specific metrics or parameters you're interested in comparing.
-
Visualize the Comparison: For a more intuitive comparison, consider visualizing the results using libraries such as Matplotlib or Seaborn. For example, plotting the performance metrics of different runs can help in visually assessing which configurations performed better.
import matplotlib.pyplot as plt
# Example: Comparing validation accuracy of different runs
plt.figure(figsize=(10, 6))
for _, row in runs_df.iterrows():
plt.plot(row['metrics.validation_accuracy'], label=f"Run {row['run_id'][:7]}")
plt.title("Comparison of Validation Accuracy Across Runs")
plt.xlabel("Epoch")
plt.ylabel("Validation Accuracy")
plt.legend()
plt.show()
These methods enable you to conduct thorough comparisons between different experiments, helping guide your decisions on model improvements and selections.
What are some tips and tricks for using AI/ML experiments?
To maximize the efficacy of AI/ML experiments:
- Align logged information with relevant business metrics to ensure experiments are focused on meaningful outcomes.
- Use nested runs to structure experiments hierarchically, facilitating organized exploration of parameter spaces.
with mlflow.start_run() as parent_run: param = [0.01, 0.02, 0.03] # Create a child run for each parameter setting for p in param: with mlflow.start_run(nested=True) as child_run: mlflow.log_param("p", p) ... mlflow.log_metric("val_loss", val_loss)
- Employ tagging extensively to enhance the searchability and categorization of experiments.
- Track detailed progress by logging steps and timestamps, providing insights into the evolution of model performance.
mlflow.log_metric(key="train_loss", value=train_loss, step=epoch, timestamp=now)
- Regularly log models to the model registry for versioning and to facilitate deployment processes.