5.2. Pre-Commit Hooks

What are pre-commit hooks?

Pre-commit hooks are automated scripts that run against your code before you commit it to version control. Think of them as a quality gatekeeper for your codebase. They act as a first line of defense, enforcing standards and catching issues on your local machine before the code is shared with your team or integrated into the main branch. These hooks can perform a wide range of tasks, from simple code formatting and syntax checks to more complex static analysis.

Why are pre-commit hooks essential?

Pre-commit hooks are a cornerstone of modern development workflows for several key reasons:

Enforce Consistent Standards: They automatically enforce coding standards (like formatting and linting), ensuring that all code contributed to the project is clean and consistent.
Prevent Simple Mistakes: They catch common errors, such as lingering debug statements, syntax errors, or secrets before they are even committed, saving significant time on debugging and code reviews.
Reduce CI/CD Failures: By running checks locally, you can identify and fix issues that would otherwise cause a CI/CD pipeline to fail. This tightens the feedback loop, making you more productive.

While CI/CD workflows are crucial for comprehensive, server-side validation (like running a full test suite), pre-commit hooks offer the advantage of immediate feedback. They run locally, making them faster and easier to debug. A best practice is to use pre-commit hooks for rapid local checks and reserve more time-consuming and resource-intensive jobs for your CI/CD pipeline.

How do you set up pre-commit hooks?

The industry-standard tool for this task is pre-commit. It offers a robust and flexible framework for configuring and managing hooks in any project.

First, add pre-commit to your project's development dependencies:

uv add --group commit pre-commit

Next, create a .pre-commit-config.yaml file in your project's root directory. This file defines which hooks to run. Here is a basic configuration to get you started:

# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
default_language_version:
  python: python3.13
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: check-added-large-files # Prevents committing large files
      - id: check-case-conflict # Checks for files that would conflict on case-insensitive filesystems
      - id: check-merge-conflict # Checks for files that contain merge conflict strings
      - id: check-toml # Checks TOML files for syntax errors
      - id: check-yaml # Checks YAML files for syntax errors
      - id: debug-statements # Checks for debugger imports and calls
      - id: end-of-file-fixer # Ensures files end in a newline
      - id: mixed-line-ending # Replaces mixed line endings with a consistent one
      - id: trailing-whitespace # Trims trailing whitespace

Finally, install the hooks into your local .git directory. This command makes your Git repository aware of the hooks.

# Install hooks to run automatically before each commit
uv run pre-commit install

# (Optional) Install hooks for other git actions
uv run pre-commit install --hook-type pre-push
uv run pre-commit install --hook-type commit-msg

Now, the configured hooks will run automatically on every git commit. You can also run them manually against all files at any time:

# Run all hooks on all files
uv run pre-commit run --all-files

What is a good set of hooks for an MLOps project?

For a Python-based MLOps project, a robust hook configuration should address code quality, formatting, and security. The following pre-commit-config.yaml is an excellent starting point, incorporating powerful tools like Ruff and Bandit.

# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks

default_language_version:
  python: python3.13
repos:
  # Standard checks for file integrity and syntax
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: 'v5.0.0'
    hooks:
      - id: check-added-large-files
      - id: check-case-conflict
      - id: check-merge-conflict
      - id: check-toml
      - id: check-yaml
      - id: debug-statements
      - id: end-of-file-fixer
      - id: mixed-line-ending
      - id: trailing-whitespace

  # Ultra-fast Python linter and formatter
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: 'v0.9.9' # Use a recent version
    hooks:
      - id: ruff # Lints the code for errors and style issues
      - id: ruff-format # Formats the code

  # Security scanner for finding common vulnerabilities in Python code
  - repo: https://github.com/PyCQA/bandit
    rev: '1.8.3' # Use a recent version
    hooks:
    - id: bandit
      args: ["-c", "pyproject.toml"] # Point to the Bandit config in pyproject.toml
      additional_dependencies: ["bandit[toml]"]

This configuration provides a strong foundation. You can find many additional hooks on the official website to tailor the setup to your project's specific needs, such as hooks for notebooks, Dockerfiles, or Terraform files.

How can you standardize commit messages?

Clear and consistent commit messages are vital for a healthy project history. They explain the "what" and "why" of changes. Commitizen is a Python tool that enforces a consistent commit message format, like the Conventional Commits standard. It also automates version bumping and CHANGELOG generation.

First, add commitizen to your development dependencies:

uv add --group commit commitizen

Next, configure it in your pyproject.toml file:

[tool.commitizen]
name = "cz_conventional_commits"  # Use the Conventional Commits standard
tag_format = "v$version"          # Customize the git tag format
version_scheme = "pep440"         # Follow PEP 440 for versioning
version_provider = "pep621"       # Get the version from pyproject.toml
update_changelog_on_bump = true   # Auto-update CHANGELOG.md on version bump

Now, you can use commitizen's commands:

# Interactively create a properly formatted commit message
uv run cz commit

# Bump the version and update the changelog based on commit history
uv run cz bump

# Display information about the commitizen configuration
uv run cz info

To enforce this standard automatically, integrate commitizen with your pre-commit hooks. Add the following to your .pre-commit-config.yaml:

  - repo: https://github.com/commitizen-tools/commitizen
    rev: 'v3.27.0' # Use a recent version
    hooks:
      # Checks if the commit message follows the conventional format.
      # This runs during the `commit-msg` stage.
      - id: commitizen
      # Checks if the branch name is compliant (e.g., starts with a ticket number).
      # This is useful to run during the `pre-push` stage.
      - id: commitizen-branch
        stages: [pre-push]

Using commitizen ensures every commit contributes to a readable, navigable, and professional project history, which is invaluable for collaboration and long-term maintenance.

What is the difference between `pre-commit`, `pre-push`, and `commit-msg` hooks?

The pre-commit framework can manage hooks at different stages of the Git workflow. Understanding the most common ones is key to using them effectively:

pre-commit: This is the most common hook. It runs before you even type a commit message. Its purpose is to inspect the snapshot of the files you are about to commit. This is the ideal stage for running fast checks like linters, formatters, and syntax checkers. If any of these checks fail, the commit is aborted, allowing you to fix the issues before committing.
commit-msg: This hook runs after the pre-commit hook and before the commit is finalized. It takes the commit message as an argument. Its primary use case is to validate the commit message itself, for example, to ensure it follows a specific format (like Conventional Commits, enforced by commitizen). If the hook fails, the commit is aborted.
pre-push: This hook runs before you push your commits to a remote repository. It's your last line of defense on the client side. Because it runs less frequently than pre-commit, it's a suitable place for longer-running checks that you might not want to run on every single commit, such as running a lightweight test suite or validating branch names.

How can you bypass a hook?

On rare occasions, you may need to bypass a hook—for example, to commit a work-in-progress that you don't intend to push. To skip all hooks for a single commit or push, use the --no-verify flag.

# Bypass hooks for a single commit
git commit -m "WIP: work in progress" --no-verify

# Bypass hooks for a single push
git push --no-verify

Use this option with caution. Bypassing hooks should be the exception, not the rule, as it defeats the purpose of having automated quality checks.

What are the best practices for using hooks?

To implement pre-commit hooks effectively, follow these guidelines:

Keep it Fast: Prioritize hooks that execute quickly (ideally in seconds). Slow hooks create friction and tempt developers to bypass them. Linters and formatters are great; running a full test suite is usually too slow for a pre-commit hook.
Pin Your Dependencies: Always pin hook versions (rev) in your .pre-commit-config.yaml. This ensures that all developers on the team use the exact same version of the tools, preventing inconsistencies and "it works on my machine" issues.
Collaborate on Configuration: The hook configuration should be a team decision. Discuss and agree upon the standards you want to enforce to ensure buy-in and consistency across the project.
Balance Local vs. CI/CD: Use pre-commit for quick, local feedback. Reserve comprehensive, time-consuming checks (like integration tests, end-to-end tests, or complex builds) for your CI/CD pipeline. The pre-push hook can be a good middle ground for semi-slow checks.
Start Simple, Iterate: Begin with a small, essential set of hooks (e.g., trailing-whitespace, ruff, ruff-format). You can always add more specialized hooks later as the project's needs evolve.