Skip to content

5.3. CI/CD Workflows

What is CI/CD?

CI/CD stands for Continuous Integration and Continuous Deployment or Continuous Delivery. It's a method used in software development that focuses on automating the process of integrating code changes from multiple contributors into a single software project, as well as automating the delivery or deployment of the software to production environments. Continuous Integration involves automatically testing and building the software every time a team member commits changes to version control. Continuous Deployment or Delivery takes these integrated changes and automatically deploys them to production environments, ensuring a seamless flow from code commit to deployment.

What is a CI/CD workflow?

A CI/CD workflow refers to the automated sequence of steps that software undergoes from development to deployment. This workflow typically includes stages such as building the application, running automated tests to verify code quality and functionality, and deploying the code to a production or staging environment. The goal of a CI/CD workflow is to enable developers to release new changes to customers quickly and with high confidence by automating the release process.

Why do you need CI/CD workflows?

  • Guardrail the code base: CI/CD workflows act as a safeguard for the codebase, ensuring that all changes meet quality standards and pass all tests before merging. This protects the main branch from breaking changes, keeping the software in a deployable state.
  • Automate publication tasks: They streamline the process of getting software from version control into the hands of users by automating the build, test, and deployment tasks. This automation reduces the manual work involved in software releases and speeds up the delivery process.
  • Report code quality to others: CI/CD systems often integrate with other tools to provide visibility into the health of the codebase. They can generate reports on code quality, test coverage, and other metrics, making it easier to maintain high standards and improve the software over time.

By automating these processes, a CI/CD system reduces the load on development teams, ensures a consistent level of quality, and facilitates faster, more reliable software releases.

Which solution should you use for your CI/CD?

There are many CI/CD solutions available, each with its own set of features and integrations. GitHub Actions is one of the most popular and accessible solutions, especially for projects hosted on GitHub. It integrates deeply with GitHub repositories, offering a seamless experience for building, testing, and deploying software directly from GitHub.

To get started with GitHub Actions for your CI/CD workflows, you'll need to:

  1. Create a .github/workflows directory in your repository if it doesn't already exist.
  2. Define your workflow in a YAML file within this directory. The file should specify triggers for the workflow, jobs to run, and steps within those jobs.
  3. Use predefined actions from the GitHub Marketplace or define your own steps to install dependencies, run tests, build your software, and deploy it.

Which workflows should you set up for your MLOps project?

For MLOps projects, setting up both Verification and Publication workflows is crucial to ensure the quality and seamless deployment of machine learning models and related software.

Verification Workflow

This workflow runs on pull requests to the main branch, verifying code quality and functionality before merging:

name: Check
on:
  pull_request:
    branches:
      - main
concurrency:
  cancel-in-progress: true
  group: ${{ github.workflow }}-${{ github.ref }}
jobs:
  checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: poetry install --with checks
      - run: poetry run invoke checks

Here is a breakdown of each attribute in the workflow:

  • name: Sets the name of the workflow as "Check", identifying it in the GitHub Actions UI.
  • on: Specifies the event that triggers the workflow, in this case, a pull request.
  • concurrency: Manages how workflow runs are handled concurrently. If cancel-in-progress is set to true, any in-progress runs of the workflow will be canceled when a new run is triggered.
  • jobs: Defines the jobs to be run as part of the workflow.
    • checks: Identifies a job within the workflow, named "checks".
      • runs-on: Specifies the type of virtual host machine to run the job on.
      • steps: Lists the steps to be executed as part of the job.
        • - uses: actions/checkout@v4: Utilizes the checkout action to access the repository code within the job.
        • - uses: ./.github/actions/setup: Applies a custom action located in the repository to set up the environment.
        • - run: poetry install --with checks: Executes the command to install dependencies specified under the "checks" group with Poetry.
        • - run: poetry run invoke checks: Runs a command using Poetry to execute predefined checks.

Publication Workflow

This workflow is triggered by a release event and is responsible for publishing the software or model:

name: Publish
on:
  release:
    types: [published]
env:
  DOCKER_IMAGE: ghcr.io/fmind/mlops-python-package
concurrency:
  cancel-in-progress: true
  group: publish-workflow
jobs:
  pages:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: poetry install --with docs
      - run: poetry run invoke docs
      - uses: JamesIves/github-pages-deploy-action@v4
        with:
          folder: docs/
          branch: gh-pages
  packages:
    permissions:
      packages: write
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/setup
      - run: poetry run invoke packages
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v5
        with:
          push: true
          context: .
          cache-to: type=gha
          cache-from: type=gha
          tags: |
            ${{ env.DOCKER_IMAGE }}:latest
            ${{ env.DOCKER_IMAGE }}:${{ github.ref_name }}

This workflow details steps for deploying documentation and building, tagging, and pushing Docker images, ensuring that releases are systematically handled.

How can you avoid repeating some steps between CI/CD workflows?

To avoid repetition and maintain consistency across workflows, common steps can be abstracted into reusable actions. By creating a setup action under .github/actions, you encapsulate common setup tasks:

name: Setup
description: Setup for project workflows
runs:
  using: composite
  steps:
    - run: pipx install invoke poetry
      shell: bash
    - uses: actions/setup-python@v5
      with:
        python-version: 3.12
        cache: poetry

This composite action can then be referenced in multiple workflows, ensuring a DRY (Don't Repeat Yourself) approach to CI/CD configuration.

You can also find more actions to use in your workflows from GitHub Marketplace: https://github.com/marketplace?type=actions

What are some tips and tricks for using CI/CD workflows for MLOps?

  • Automate Regular Tasks: Identify and automate manual tasks to increase efficiency and reduce the potential for human error.
  • Master GitHub Actions Syntax: Understanding the capabilities and triggers available in GitHub Actions allows for more dynamic and responsive workflows.
  • Use Concurrency Wisely: The concurrency attribute helps to manage workflow runs by canceling or queuing multiple runs, optimizing resource usage.
  • Leverage GitHub CLI: The GitHub Command Line Interface can streamline workflow executions: gh workflow run ....
  • Implement Branch Protection: Enforcing branch protection rules ensures that merges can only occur after successful workflow completions, maintaining code quality and stability.

CI-CD Workflow additional resources