5.4. Software Containers
What are software containers?
Software containers are lightweight, stand-alone packages that include everything needed to run a piece of software, including the code, runtime, system tools, system libraries, and settings. Containers are isolated from each other and the host system, yet they share the OS kernel, making them more efficient and faster than traditional virtual machines. Containers ensure consistency across different development and deployment environments, addressing the "it works on my machine" headache.
Why do you need software containers?
- Package system dependencies: Containers encapsulate all dependencies required by an application, such as specific versions of libraries and other software, ensuring that it can run in any environment without issues.
- Share reusable software artifacts: Containers allow for the creation and sharing of reusable software artifacts. For instance, you can create a container image with a configured environment that can be shared with other developers, ensuring everyone works with the same setup.
- Make your package runnable from any system: Containers abstract away the underlying operating system, allowing applications to run uniformly across different environments. This is particularly useful in heterogeneous environments where systems may be running different versions of operating systems or have different configurations.
While Python packages can be challenging to install across different systems, Docker ensures consistency between operating systems and environments. Running a Docker image is often more straightforward and faster than managing Python packages directly on the system.
Which tool should you use for creating containers?
The most common tool for creating containers is Docker. Docker simplifies the process of building, running, and managing containers. It uses Dockerfile to automate the deployment of applications in lightweight and portable containers.
To install Docker, follow these steps:
- Visit the Docker website and download the Docker Desktop application for your operating system.
- Follow the installation instructions provided on the website.
- Once installed, open a terminal or command prompt and verify the installation by running
docker --version
.
Docker offers some paid services, such as Docker Hub private repositories and Docker Enterprise Edition, which provide additional features like enhanced security, user management, and automated image builds. It's important to contact your IT administrators to see if these solutions are available in your organization and to explore potential alternatives.
Which tool should you use for hosting containers?
Containers are software packages that must be hosted on a container registry. You can use either Docker Hub or GitHub Packages to host your containers.
To use GitHub Packages, follow these steps:
- Create a GitHub account and sign in.
- Navigate to your repository where you want to host your container.
- In the repository, go to "Packages" and follow the instructions to publish your container image.
You must then login to the system from the command-line and tag with image by adding ghcr.io/[user]
at the beginning, such in: ghcr.io/fmind/mlops-python-package
.
Which image should you create for your MLOps projects?
To start using Docker for your MLOps projects, you can define a simple image in a Dockerfile
in your project repository as follows:
# https://docs.docker.com/engine/reference/builder/
FROM python:3.12
COPY dist/*.whl .
RUN pip install *.whl
CMD ["bikes", "--help"]
You can then build and push this image with the following commands:
poetry build # build the wheel file
docker build --tag=bikes:latest .
docker run --rm bikes:latest
Which tips and tricks should you use to optimize your container workflow?
- Enable Multi-platform with docker buildx: Use Docker Buildx, a Docker CLI plugin, to build your images once and use them across multiple platforms. This is especially useful for creating images that need to run on both AMD64 and ARM architectures.
- Enable cache on your local laptop and CI/CD: Utilize Docker's caching mechanisms to speed up your builds. Configure your Dockerfile and CI/CD pipelines to reuse layers from previously built images, significantly reducing build time and bandwidth consumption.
- Clean up cache directories on your container image: After installing dependencies, ensure to remove cache directories and temporary files within your Dockerfile. This reduces the image size, making deployments faster and more efficient.
- Run container image linters like hadolint: Use tools like Hadolint to analyze your Dockerfile for best practices and common mistakes. This helps in optimizing your Dockerfile and ensuring security and efficiency.
- Install CUDA drivers if you need to use GPU: For projects requiring GPU acceleration, include CUDA drivers and libraries in your container images. This allows your applications to leverage GPU resources for compute-intensive tasks, essential for many machine learning and deep learning workflows.