5.1. Task Automation
What is task automation?
Task automation refers to the process of automating repetitive and manual command-line tasks using software tools. This enables tasks to be performed with minimal human intervention, increasing efficiency and accuracy. A common example of task automation in software development is the use of make
, a utility that automates the execution of predefined tasks like configure
, build
, and install
within a project repository. By executing a simple command:
make configure build install
developers can streamline the compilation and installation process of software projects, saving time and reducing the likelihood of errors.
Why do you need task automation?
Task automation is essential for several reasons:
- Don't repeat yourself: Automating tasks helps in avoiding the repetition of similar tasks, ensuring that you spend your time on tasks that require your unique skills and insights.
- Share common actions: It enables teams to share a common set of tasks, ensuring consistency and reliability across different environments and among different team members.
- Avoid typing mistakes: Automation reduces the chances of errors that can occur when manually typing commands or performing repetitive tasks, leading to more reliable outcomes.
Embracing task automation is a step towards improving efficiency for programmers. The initial effort in setting up automation pays off by saving time and reducing errors, making it a valuable practice in software development.
Which tools should you use to automate your tasks?
While Make
is a ubiquitous and powerful tool for task automation, its syntax can be challenging due to its use of unique symbols (e.g., $*, $%, :=, ...) and strict formatting rules, such as the requirement for tabs instead of spaces. This complexity can make Make
intimidating for newcomers.
For those seeking a more approachable alternative, PyInvoke
offers a simpler, Python-based syntax for defining and running tasks. Here is an example showcasing how to build a Python package (wheel file) using PyInvoke:
"""Package tasks for pyinvoke."""
from invoke.context import Context
from invoke.tasks import task
from . import cleans
BUILD_FORMAT = "wheel"
@task(pre=[cleans.dist])
def build(ctx: Context, format: str = BUILD_FORMAT) -> None:
"""Build a python package with the given format."""
ctx.run(f"poetry build --format={format}")
@task(pre=[build], default=True)
def all(_: Context) -> None:
"""Run all package tasks."""
This example illustrates how tasks can be easily defined and automated using Python, making it accessible for those already familiar with the language. Developers can then execute the task from their terminal:
# execute the build task
inv build
How can you configure your task automation system?
Configuring your task automation system with PyInvoke is straightforward. It can be installed as a Python dependency through:
poetry add -G dev invoke
Then, to configure PyInvoke for your project, create an invoke.yaml
file in your repository:
run:
echo: true
project:
name: bikes
This configuration file allows you to define general settings under run
and project-specific variables under project
. Detailed documentation and more configuration options can be found on PyInvoke's website.
How should you organize your tasks in your project folder?
For an MLOps project, it's advisable to organize tasks into categories and place them within a tasks/
directory at the root of your repository. This directory can include files for different task categories such as cleaning, commits, container management, and more. Here's an example structure:
- tasks
- tasks/__init__.py
- tasks/checks.py
- tasks/cleans.py
- tasks/commits.py
- tasks/containers.py
- tasks/dags.py
- tasks/docs.py
- tasks/formats.py
- tasks/installs.py
- tasks/mlflow.py
- tasks/packages.py
In the tasks/__init__.py
file, you should import and add all task modules to a collection:
"""Task collections for the project."""
from invoke import Collection
from . import checks, cleans, commits, containers, dags, docs, formats, installs, mlflow, packages
ns = Collection()
ns.add_collection(checks)
ns.add_collection(cleans)
ns.add_collection(commits)
ns.add_collection(containers)
ns.add_collection(dags, default=True)
ns.add_collection(docs)
ns.add_collection(formats)
ns.add_collection(installs)
ns.add_collection(mlflow)
ns.add_collection(packages)
Each module, like checks
, can define multiple tasks. For example:
"""Check tasks for pyinvoke."""
from invoke.context import Context
from invoke.tasks import task
@task
def poetry(ctx: Context) -> None:
"""Check poetry config files."""
ctx.run("poetry check --lock")
@task
def format(ctx: Context) -> None:
"""Check the formats with ruff."""
ctx.run("poetry run ruff format --check src/ tasks/ tests/")
@task
def type(ctx: Context) -> None:
"""Check the types with mypy."""
ctx.run("poetry run mypy src/ tasks/ tests/")
@task
def code(ctx: Context) -> None:
"""Check the codes with ruff."""
ctx.run("poetry run ruff check src/ tasks/ tests/")
@task
def test(ctx: Context) -> None:
"""Check the tests with pytest."""
ctx.run("poetry run pytest --numprocesses='auto' tests/")
@task
def security(ctx: Context) -> None:
"""Check the security with bandit."""
ctx.run("poetry run bandit --recursive --configfile=pyproject.toml src/")
@task
def coverage(ctx: Context) -> None:
"""Check the coverage with coverage."""
ctx.run("poetry run pytest --numprocesses='auto' --cov=src/ --cov-fail-under=80 tests/")
@task(pre=[poetry, format, type, code, security, coverage], default=True)
def all(_: Context) -> None:
"""Run all check tasks."""
These tasks can then be invoked from the command line as needed, providing a structured and efficient way to manage and execute project-related tasks.
# run the code checker
inv checks.code
# run the code and format checker
inv checks.code checks.format
# run all the check tasks in the module
inv checks