3.3. Entrypoints

What are package entrypoints?

Package entrypoints are mechanisms in Python packaging that facilitate the exposure of scripts and utilities to end users. Entrypoints streamline the process of integrating and utilizing the functionalities of a package, whether that be through Command-Line Interfaces (CLI) or by other software packages.

To elaborate, entrypoints are specified in a package's setup configuration, marking certain functions or classes to be directly accessible. This setup benefits both developers and users by simplifying access to a package's capabilities, improving interoperability among different software components, and enhancing the user experience by providing straightforward commands to execute tasks.

Why do you need to set up entrypoints?

Entrypoints are essential for making specific functionalities of your package directly accessible from the command-line interface (CLI) or to other software. By setting up entrypoints, you allow users and other programs to find and execute components of your package directly from the CLI, streamlining operations like script execution, service initiation, or utility invocation. Additionally, entrypoints facilitate dynamic discovery and utilization of your package's functionalities by other software and frameworks, such as Apache Airflow, without the need for hard-coded paths or module names. This flexibility is particularly beneficial in complex, interconnected systems where adaptability and ease of use are paramount.

How do you create a command-line script?

Creating a Python command-line script involves a few steps that turn a Python file into a tool that can be executed directly from the command line. Here’s how to do it:

Create a CLI parser

A CLI parser is a crucial component that handles command-line arguments, extracting values to configure your application. Python's argparse module is typically adequate for this purpose. The example below outlines how to set up a parser, defining the program's description and establishing various arguments and flags:

files: This argument accepts one or more configuration files. The nargs="*" parameter allows multiple files to be specified.
--extras: This flag allows additional configuration strings to be passed. Like files, it can accept multiple entries.
--schema: A boolean flag that, when specified, will print the program's schema and exit the program.
--help: An implicit flag provided by argparse that displays a help message detailing the script’s arguments and flags.

import argparse

parser = argparse.ArgumentParser(description="Run an AI/ML job from YAML/JSON configs.")
parser.add_argument("files", nargs="*", help="Config files for the job (local path only).")
parser.add_argument("-e", "--extras", nargs="*", default=[], help="Additional config strings for the job.")
parser.add_argument("-s", "--schema", action="store_true", help="Print settings schema and exit.")

For alternative libraries, you might consider Typer, Click or Fire, which offer different styles and additional functionalities.

Create a main function

The main function is the core of your script, which interprets command-line inputs and acts upon them. It is conventional to have it accept a list of arguments and return an integer status code (0 for success, other values indicate errors):

def main(argv: list[str] | None = None) -> int:
    args = parser.parse_args(argv)
    if args.schema:
        # Print the schema of the settings
        print("Schema details here...")
        return 0
    # Execute the main application logic here
    return 0

Expose the main function

To ensure the main function executes when the script is run (and not when imported as a module), use the following condition:

if __name__ == "__main__":
    main()

You can then run your script from the command line like this:

# Running the script without using uv
$ python script.py config.yml
# Running the script using uv
$ uv run python script.py --schema

How do you create entrypoints with uv?

Creating entrypoints with uv involves specifying them in the pyproject.toml file under the [project.scripts] section. This section outlines the command-line scripts that your package will make available:

[project.scripts]
bikes = 'bikes.scripts:main'

In this syntax, bikes represents the command users will enter in the CLI to activate your tool. The path bikes.scripts:main directs uv to execute the main function found in the scripts module of the bikes package. Upon installation, uv generates an executable script for this command, integrating your package's functionality seamlessly into the user's command-line environment, alongside other common utilities:

$ uv run bikes one two three

This snippet run the bikes entrypoint from the CLI and passes 3 positional arguments: one, two, and three.

How can you use an entrypoint in other software?

Defining and installing a package with entrypoints enables other software to easily leverage these entrypoints. For example, within Apache Airflow, you can incorporate a task in a Directed Acyclic Graph (DAG) to execute one of your CLI tools as part of an automated workflow. By utilizing Airflow's BashOperator or PythonOperator, your package’s CLI tool can be invoked directly, facilitating seamless integration:

from airflow import DAG
from datetime import datetime, timedelta
from airflow.providers.databricks.operators.databricks import DatabricksSubmitRunNowOperator

# Define default arguments for your DAG
default_args = {...}

# Create a DAG instance
with DAG(
    'databricks_submit_run_example',
    default_args=default_args,
    description='An example DAG to submit a Databricks job',
    schedule_interval='@daily',
    catchup=False,
) as dag:
    # Define a task to submit a job to Databricks
    submit_databricks_job = DatabricksSubmitRunNowOperator(
        task_id='main',
        json={
            "python_wheel_task": {
                "package_name": "bikes",
                "entry_point": "bikes",
                "parameters": [ "one", "two", "three" ],
            },
        }
    )

    # Set task dependencies and order (if you have multiple tasks)
    # In this simple example, there's only one task
    submit_databricks_job

In this example, submit_databricks_job is a task that executes the bikes entrypoint.

How can you use an entrypoint from the command-line (CLI)?

Once your Python package has been packaged with uv and a wheel file is generated, you can install and use the package directly from the command-line interface (CLI). Here are the steps to accomplish this:

Build your package: Use uv to compile your project into a distributable format, such as a wheel file. This is done with the uv build --wheel command, which generates the package files in the dist/ directory.

uv build --wheel

Install your package: With the generated wheel file (*.whl), use pip to install your package into your Python environment. The pip install command looks for the wheel file in the dist/ directory, matching the pattern bikes*.whl, which is the package file created by uv.

pip install dist/bikes*.whl

Run your package from the CLI: After installation, you can invoke the package's entrypoint—defined in your pyproject.toml file—directly from the command line. In this case, the bikes command followed by any necessary arguments. If your entrypoint is designed to accept arguments, they can be passed directly after the command. Ensure the arguments are separated by spaces unless specified otherwise in your documentation or help command.

bikes one two three

Which should be the input or output of your entrypoint?

Inputs for your entrypoint can vary based on the requirements and functionalities of your package but typically include:

Configuration files (e.g., JSON, YAML, TOML): These files can define essential settings, parameters, and options required for your tool or package to function. Configuration files are suited for static settings that remain constant across executions, such as environment settings or predefined operational parameters.
Command-line arguments (e.g., --verbose, --account): These arguments provide a dynamic way for users to specify options, flags, and parameters at runtime, offering adaptability for different operational scenarios.

Outputs from your entrypoint should be designed to provide valuable insights and effects, such as:

Side effects: The primary purpose of your tool or package, which could include data processing, report generation, or initiating other software processes.
Logging: Detailed logs are crucial for debugging, monitoring, and understanding how your tool or package operates within larger systems or workflows.

Careful design of your entrypoints' inputs and outputs ensures your package can be integrated and used efficiently across a wide range of environments and applications, maximizing its utility and effectiveness.