3.0. Package
What is a Python package?
A Python package is a structured directory of Python modules that can be easily distributed and installed. In MLOps, packaging is the foundation for creating reproducible, maintainable, and shareable machine learning systems.
Packages are typically distributed as wheels (.whl
files), a pre-built format that makes installation faster and more reliable than installing from source code.
Why create a Python package for an ML project?
Packaging your ML project is a critical step in moving from research to production. It provides several key advantages:
- Reproducibility: It bundles your code and its specific dependencies, ensuring that it runs consistently across different environments.
- Modularity: It encourages you to organize code into reusable components (e.g., for data processing, feature engineering, or model training), which can be shared across projects.
- Simplified Deployment: It allows you to distribute your project as a versioned library for other services to use or as a standalone application with defined entrypoints.
- Clear Structure: It enforces a standardized project structure, making it easier for new team members to understand and contribute to the codebase.
Which tool should you use to build a Python package?
While the Python packaging ecosystem has many tools, as humorously noted in this xkcd comic, the modern standard is uv. It is an extremely fast and comprehensive tool that handles dependency management, virtual environments, and package building.
Key uv
commands for packaging include:
uv sync
: Installs the base dependencies listed inpyproject.toml
.uv sync --all-groups
: Installs all dependencies, including optional groups for development, testing, and documentation.uv build --wheel
: Builds your package into a.whl
file, which appears in thedist/
directory.
For developers exploring other options, tools like PDM, Hatch, and Pipenv also offer robust packaging and dependency management features.
Should you use Conda for production ML projects?
Conda is popular among data scientists for its ability to manage both Python and non-Python dependencies. However, for production MLOps, it presents challenges like slow performance and a complex dependency resolver.
For production environments, the industry-standard approach is to use uv
for managing Python dependencies defined in pyproject.toml
and Docker for creating isolated, reproducible environments that include system-level dependencies. This combination provides superior performance, compatibility, and control.
How can you install new dependencies with uv?
Please refer to this section of the course.
What metadata is essential for a Python package?
The pyproject.toml
file is the heart of your package, defining its identity, dependencies, and build configuration.
Here is an example with explanations for each section:
# https://docs.astral.sh/uv/reference/settings/
# https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
# Core project metadata used by PyPI and installation tools.
[project]
name = "bikes"
version = "3.0.0"
description = "Predict the number of bikes available."
authors = [{ name = "Médéric HURIER", email = "github@fmind.dev" }]
readme = "README.md"
requires-python = ">=3.13"
dependencies = [] # List your production dependencies here
license = { file = "LICENSE.txt" }
keywords = ["mlops", "python", "package"]
# URLs that appear on your package's PyPI page.
[project.urls]
Homepage = "https://github.com/fmind/bikes"
Documentation = "https://fmind.github.io/bikes"
Repository = "https://github.com/fmind/bikes"
"Bug Tracker" = "https://github.com/fmind/bikes/issues"
Changelog = "https://github.com/fmind/bikes/blob/main/CHANGELOG.md"
# Defines command-line scripts. 'bikes' will be a command that runs the 'main' function.
[project.scripts]
bikes = 'bikes.scripts:main'
# Configures uv to install optional dependency groups by default during development.
[tool.uv]
default-groups = ["checks", "commits", "dev", "docs", "notebooks"]
# Specifies the build tool (Hatchling, in this case) to create the package.
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
Where should you structure the source code for your package?
Always place your package's source code inside a src
directory. This is known as the src
layout and is a best practice for several reasons:
- Prevents Import Conflicts: It ensures that your installed package is used during testing, not the local source files. This prevents bugs where the code works locally but fails after installation.
- Clean Separation: It keeps your importable package code separate from project root files like
pyproject.toml
, tests, and documentation.
To create this structure, run:
mkdir -p src/bikes
touch src/bikes/__init__.py
The __init__.py
file tells Python to treat the src/bikes
directory as a package.
Should you publish your Python package, and where?
The decision to publish depends on your audience:
- Public Packages: If you want to share your work with the open-source community, publish it to the Python Package Index (PyPI). This makes it installable by anyone using
pip
oruv
. - Private Packages: For internal company projects or proprietary code, use a private artifact registry. Popular choices include AWS CodeArtifact, GCP Artifact Registry, or GitHub Packages.
To publish your package to a configured repository, use the command:
uv publish