5.2. Pre-Commit Hooks
What are pre-commit hooks?
Pre-commit hooks are automated checks that run before a commit or push is completed in your version control system. They serve as a first line of defense to ensure code quality and compliance with coding standards before the code is shared with the team or integrated into the main codebase. These hooks can run a variety of tasks, from syntax checks and code formatting to more complex operations like static analysis.
Why do you need pre-commit hooks?
- Share clean code: Pre-commit hooks help maintain a high code quality by enforcing coding standards and identifying issues early.
- Fix common issues: By catching common issues before commit, pre-commit hooks save time and effort in debugging and fixing problems that could have been easily avoided.
- Avoid failed CI/CD jobs: Running checks before committing reduces the likelihood of CI/CD jobs failing due to minor issues that could have been caught early.
Compared to CI/CD workflows, pre-commit hooks have the advantage of running locally on your computer, which makes them faster and easier to debug. You should consider balancing your validation jobs between pre-commit hooks and CI/CD systems based on their unique benefits, leveraging pre-commit hooks for quick, local checks and CI/CD pipelines for more comprehensive tests.
Which tool should you use to setup pre-commit hooks?
The most widely used tool for managing pre-commit hooks is pre-commit
. It provides a flexible framework for configuring and managing the hooks you want to run before a commit. The tool supports a wide range of hooks and can be easily integrated into any MLOps project.
To install pre-commit
as part of your Poetry project, use the following command:
poetry add -G commits pre-commit
Then, create a .pre-commit-config.yaml
file in your project directory with a configuration similar to the following, which outlines which hooks to run:
# https://pre-commit.com
# https://pre-commit.com/hooks.html
default_language_version:
python: python3.12
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-merge-conflict
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
- id: trailing-whitespace
To install the pre-commit hooks, use the following commands:
poetry run pre-commit install --hook-type pre-push
poetry run pre-commit install --hook-type commit-msg
You can now automatically run your hooks before each commit or push, or trigger them manually with:
poetry run pre-commit run
Which hooks should you use for an MLOps project?
For an MLOps project utilizing Python, it's beneficial to configure your pre-commit-config.yaml
to include hooks that validate both the quality and format of your code. Here’s an example configuration that includes the ruff
and ruff-format
hooks for code quality and formatting, respectively, along with other useful checks:
default_language_version:
python: python3.12
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-merge-conflict
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.3
hooks:
- id: ruff
- id: ruff-format
Additional hooks are available at the pre-commit website, offering a wide range of checks for different needs.
How can you improve your commit messages?
Commit messages play a crucial role in software development, offering insights into what changes have been made and why. To enhance the quality of your commit messages and ensure consistency across contributions, Commitizen, a Python tool, can be extremely helpful. It not only formats commit messages but also helps in converting these messages into a comprehensive CHANGELOG. Here's how you can leverage Commitizen to streamline your commit messages:
To get started with Commitizen, you can install it using the following command:
poetry add -G commits commitizen
This command adds Commitizen to your project as a development dependency, ensuring that it is available for formatting commit messages.
Once installed, Commitizen offers several commands to assist with your commits:
# display a guide to help you format your commit messages
poetry run cz info
# bump your package version according to semantic versioning
poetry run cz bump
# interactively create a new commit message following best practices
poetry run cz commit
These commands are designed to guide you through creating structured and informative commit messages, bumping your project version appropriately, and even updating your CHANGELOG automatically.
To configure Commitizen to fit your project needs, you can set it up in your pyproject.toml
file as shown below:
[tool.commitizen]
name = "cz_conventional_commits" # Uses the conventional commits standard
tag_format = "v$version" # Customizes the tag format
version_scheme = "pep440" # Follows the PEP 440 version scheme
version_provider = "poetry" # Uses poetry for version management
update_changelog_on_bump = true # Automatically updates the CHANGELOG when the version is bumped
Integrating Commitizen into your pre-commit workflow ensures that all commits adhere to a consistent format, which is crucial for collaborative projects. You can add it to your .pre-commit-config.yaml
like this:
- repo: https://github.com/commitizen-tools/commitizen
rev: v3.18.3 # The version of Commitizen you're using
hooks:
- id: commitizen # Ensures your commit messages follow the conventional format
- id: commitizen-branch
stages: [push] # Optionally, enforce commit message checks on push
By incorporating Commitizen into your development workflow, you not only standardize commit messages across your project but also create a more readable and navigable project history. This practice is invaluable for team collaboration and can significantly improve the maintenance and understanding of your project over time.
Is there a way to bypass a hook validation?
There are occasions when bypassing a pre-commit hook is necessary, such as when you need to make a quick fix or are confident the commit does not introduce any issues. To bypass the pre-commit hooks, you can use the following commands:
git commit -m "My message" --no-verify
git push --no-verify
What are some best practices to set up hooks?
Implementing pre-commit hooks effectively involves considering both the development workflow and the specific needs of your project. Here are some best practices:
- Prioritize fast-executing hooks to maintain an interactive development workflow without significant delays.
- Review available hooks to tailor your pre-commit configuration to your project's technology stack and needs.
- Collaborate on hook selection by discussing with your team which hooks to enable, ensuring a consensus and uniform coding standards.
- Use fixed hook versions to avoid unexpected behavior from updates, aligning them with your project's versions.
- Run complex checks locally, such as unit tests, to catch issues before they reach the CI/CD pipeline, balancing the immediacy of pre-commit hooks with the thoroughness of CI/CD checks.