Skip to content

5.1. Task Automation

What is task automation?

Task automation is the practice of using software to execute repetitive, manual command-line tasks, minimizing human intervention. This practice boosts efficiency, ensures consistency, and reduces errors.

A classic example is the make utility, which automates software builds through a Makefile. By defining tasks like configure, build, and install, developers can run a single command to prepare a project:

make configure build install

This simple command streamlines a complex sequence of operations, making the development process faster and more reliable.

Why is task automation crucial in MLOps?

In MLOps, where reproducibility and consistency are paramount, task automation is not just a convenience—it's a necessity.

  • Ensures Reproducibility: Automating tasks like data preprocessing, model training, and evaluation guarantees that every step is executed identically, which is critical for reproducible results.
  • Promotes Collaboration: It allows teams to share a standardized set of commands for common actions (e.g., just test, just deploy), ensuring everyone follows the same procedures and reducing environment-specific errors.
  • Reduces Human Error: Manual command entry is prone to typos. Automation eliminates these mistakes, leading to more reliable builds, tests, and deployments.
  • Improves Efficiency: By automating routine actions, you adhere to the Don't Repeat Yourself (DRY) principle, freeing up AI/ML engineers to focus on more complex challenges.

Which task runner should you use?

While Make is a powerful and widely adopted tool, its syntax can be cryptic (e.g., $*, $%, :=) and its strict formatting rules, like requiring tabs instead of spaces, present a steep learning curve.

A modern, more intuitive alternative is Just, a command runner written in Rust. It offers a cleaner syntax, is easier to learn, and integrates seamlessly with modern development practices.

Consider this example from the MLOps Python Package template for building a Python wheel file:

# Defines a group for all package-related tasks.
[group('package')]
# The main 'package' task, which depends on 'package-build'.
package: package-build

# Task to compile and lock dependencies.
[group('package')]
package-constraints constraints="constraints.txt":
    uv pip compile pyproject.toml --generate-hashes --output-file={{constraints}}

# Task to build the Python package wheel.
[group('package')]
# Depends on 'clean-build' and 'package-constraints' to run first.
package-build constraints="constraints.txt": clean-build package-constraints
    uv build --build-constraint={{constraints}} --require-hashes --wheel

This justfile is far more readable than its Makefile equivalent. Developers can execute the primary task with a simple command:

# Execute the build task
just package

How do you configure a task automation system?

Setting up Just is simple. First, add it to your project's development dependencies:

uv add --group dev just

Next, create a justfile at the root of your repository to define your project's tasks and settings. This file serves as the central entry point for all automated actions.

# Main configuration file for Just. See docs: https://just.systems/man/en/

# REQUIRES: Ensure necessary command-line tools are available.
docker := require("docker")
find := require("find")
rm := require("rm")
uv := require("uv")

# SETTINGS: Configure Just's behavior.
set dotenv-load := true # Automatically load environment variables from a .env file.

# VARIABLES: Define global constants for your tasks.
PACKAGE := "bikes"
REPOSITORY := "bikes"
SOURCES := "src"
TESTS := "tests"

# DEFAULTS: Define the default action when 'just' is run without arguments.
default:
    @just --list # Display a list of available tasks.

# IMPORTS: Modularize tasks by importing them from other files.
import 'tasks/check.just'
import 'tasks/clean.just'
import 'tasks/commit.just'
import 'tasks/doc.just'
import 'tasks/docker.just'
import 'tasks/format.just'
import 'tasks/install.just'
import 'tasks/mlflow.just'
import 'tasks/package.just'
import 'tasks/project.just'

This setup provides a robust foundation for organizing and managing your project's automation scripts. For more details, refer to the official Just documentation.

How should you organize your tasks in an MLOps project?

A well-organized task structure is key to a maintainable project. For MLOps, it's best practice to group related tasks into separate files within a tasks/ directory. This modular approach keeps your justfile clean and makes tasks easy to find and manage.

Example structure from the MLOps Python Package template:

  • tasks/
    • check.just: Code quality, typing, formatting, and security checks.
    • clean.just: Remove temporary files and build artifacts.
    • commit.just: Git-related commit hooks and actions.
    • docker.just: Build and manage Docker containers.
    • install.just: Set up the development environment.
    • package.just: Build and distribute the Python package.
    • ...and so on.

Each file contains related tasks. For example, tasks/check.just might define a suite of validation tasks:

# Meta-task to run all checks sequentially.
[group('check')]
check: check-code check-type check-format check-security check-coverage

# Check code quality with Ruff.
[group('check')]
check-code:
    uv run ruff check {{SOURCES}} {{TESTS}}

# Check test coverage with Pytest.
[group('check')]
check-coverage numprocesses="auto" cov_fail_under="80":
    uv run pytest --numprocesses={{numprocesses}} --cov={{SOURCES}} --cov-fail-under={{cov_fail_under}} {{TESTS}}

# Check code formatting with Ruff.
[group('check')]
check-format:
    uv run ruff format --check {{SOURCES}} {{TESTS}}

# Check for security vulnerabilities with Bandit.
[group('check')]
check-security:
    uv run bandit --recursive --configfile=pyproject.toml {{SOURCES}}

# Check type hints with Mypy.
[group('check')]
check-type:
    uv run mypy {{SOURCES}} {{TESTS}}

This structure allows you to run individual tasks, a subset of tasks, or an entire group with simple commands:

# Run only the code quality checker
just check-code

# Run both the code and format checkers
just check-code check-format

# Run all validation tasks defined in the 'check' meta-task
just check

What are some best practices for writing automation tasks?

To maximize the benefits of task automation, follow these best practices:

  • Keep Tasks Atomic: Each task should have a single, well-defined purpose (e.g., check-typing instead of check-and-format). This makes them easier to debug, reuse, and combine.
  • Use Variables and .env Files: Abstract hardcoded values like paths, image names, or version numbers into variables. Use .env files for environment-specific configuration.
  • Create Meta-Tasks: Combine smaller, atomic tasks into larger workflows. The check task, which runs a series of other checks, is a perfect example. This simplifies complex sequences of actions.
  • Parameterize Your Tasks: Design tasks to accept arguments, which increases their flexibility. For instance, allowing the test runner to accept a specific file to test.
  • Document Your Tasks: Use comments (#) in your justfile to explain what each task does, its dependencies, and any parameters it accepts. Just automatically includes these comments in its --list output.

Additional Resources