3.3. Entrypoints
What are package entrypoints?
Package entrypoints are a formal way to make your Python package's functionality accessible from the command line. Think of them as creating a "front door" to your application, allowing users and other programs to run specific functions as if they were native system commands.
By defining an entrypoint in your package's configuration, you are creating a stable, public interface. This is essential for professional-grade tools because it:
- Simplifies Execution: Users can run your tool with a simple command (e.g.,
bikes-run-training
) instead of a long Python invocation (python -m bikes.scripts.training --args...
). - Enhances Usability: It provides a clean, standard way to interact with your package, complete with argument parsing and help messages.
- Enables Automation: Other systems, like CI/CD pipelines or workflow orchestrators (e.g., Apache Airflow), can reliably call your tool, making it a building block in larger MLOps workflows.
What is the difference between a script and an entrypoint?
While both can be executed, they differ in their integration and professionalism:
Feature | Standalone Script | Package Entrypoint |
---|---|---|
Execution | python path/to/script.py |
my-command |
Installation | Not formally installed; must know the file's path. | Installed into the environment's PATH via pip or uv . |
Dependencies | Managed manually or through an external requirements.txt . |
Explicitly defined within the package's pyproject.toml . |
Discoverability | Low; requires knowledge of the project's internal structure. | High; becomes a discoverable command in the user's shell. |
Use Case | Quick, informal tasks; internal project utilities. | Reusable, distributable tools intended for end-users or automation. |
In short, running a file directly is suitable for development, but defining an entrypoint is the standard for creating robust, distributable command-line applications.
How do you create a command-line script?
A command-line script is a Python file designed to be executed from the terminal. The foundation of a good script involves a parser for arguments, a main function for logic, and a guard for execution.
1. Create a CLI Parser
A parser handles command-line arguments, converting them into variables your script can use. Python's built-in argparse
module is a powerful choice.
The example below sets up a parser with several common argument types:
- Positional arguments (files
): Required inputs that are order-dependent.
- Optional arguments (--extras
): Flags that provide additional options.
- Boolean flags (--schema
): Switches that trigger an action when present.
import argparse
# Initialize the parser with a description
parser = argparse.ArgumentParser(description="Run an AI/ML job from YAML/JSON configs.")
# Define arguments
parser.add_argument("files", nargs="*", help="One or more configuration files for the job.")
parser.add_argument("-e", "--extras", nargs="*", default=[], help="Additional key=value config strings.")
parser.add_argument("-s", "--schema", action="store_true", help="Print the settings schema and exit.")
For more modern or streamlined CLI development, consider libraries like Typer, Click, or Fire.
2. Create a Main Function
The main
function contains your script's core logic. It's a best practice for it to accept command-line arguments and return an integer exit code: 0
for success, and a non-zero value for errors. This is critical for automation, as other scripts can check the exit code to see if your tool succeeded.
def main(argv: list[str] | None = None) -> int:
"""Parses arguments and executes the main application logic."""
args = parser.parse_args(argv)
if args.schema:
# A simple action: print schema details and exit successfully.
print("Schema details here...")
return 0
# Main application logic would go here.
print(f"Loading configs from: {args.files}")
print(f"Applying extras: {args.extras}")
# ... execute job ...
return 0
3. Expose the Main Function
To allow the script to be both runnable and importable, use the if __name__ == "__main__"
guard. This ensures the main()
function is called only when the file is executed directly.
if __name__ == "__main__":
# This block runs only when the script is executed directly.
# For example: python your_script.py
raise SystemExit(main())
main()
in raise SystemExit()
ensures the script exits with the integer return code from main
.
How do you declare entrypoints in pyproject.toml
?
To transform your script into a formal entrypoint, you declare it in your pyproject.toml
file under the [project.scripts]
section. This tells packaging tools like uv
to create an executable command during installation.
[project.scripts]
bikes = "bikes.scripts:main"
Here’s the breakdown of the syntax command = "path:function"
:
- bikes
: This is the command that will be created. Users will type bikes
in their terminal.
- bikes.scripts:main
: This is the location of the function to execute.
- bikes.scripts
: The Python module path (i.e., bikes/scripts.py
).
- :main
: The specific function to call within that module.
When a user installs your package, uv
or pip
automatically generates a small wrapper script in the environment's bin/
directory. This wrapper imports and runs your specified function, effectively placing your tool on the system PATH
.
How do you execute an entrypoint?
You can run your entrypoint in two primary contexts:
1. During Development
While developing, you don't need to constantly build and install your package. uv
provides a convenient way to run your entrypoints directly from your source code:
# The 'uv run' command executes an entrypoint from the current project
$ uv run bikes config.yml --extras key=value
2. After Installation
For end-users or production environments, the standard workflow is to build and install the package:
# 1. Build the package into a wheel file in the dist/ directory
uv build --wheel
# 2. Install the package from the generated wheel file
pip install dist/bikes-*.whl
# 3. Run the entrypoint as a native command
bikes config.yml --extras key=value
How are entrypoints used in MLOps workflows?
Entrypoints are fundamental to MLOps because they create standardized, automatable components. For example, you can define entrypoints for training a model, validating data, or deploying a service. These can then be orchestrated by other systems.
Consider a workflow in Apache Airflow that runs a job on Databricks. Instead of embedding complex logic in Airflow, you can simply call your package's entrypoint. This decouples the orchestration (Airflow) from the implementation (your Python package).
from airflow import DAG
from datetime import datetime
from airflow.providers.databricks.operators.databricks import DatabricksSubmitRunNowOperator
with DAG(
dag_id='databricks_run_training_pipeline',
start_date=datetime(2023, 1, 1),
schedule_interval='@daily',
catchup=False,
) as dag:
# This task tells Databricks to install and run our 'bikes' package.
train_model_task = DatabricksSubmitRunNowOperator(
task_id='train_production_model',
json={
"python_wheel_task": {
"package_name": "bikes", # The package to install
"entry_point": "bikes", # The entrypoint to run
"parameters": [ # Arguments passed to the entrypoint
"configs/production.yml",
"--environment",
"production"
],
},
}
)
python_wheel_task
in Databricks is configured to run the bikes
entrypoint, demonstrating a clean separation of concerns.
What are best practices for designing entrypoints?
Well-designed entrypoints are robust, predictable, and easy to use.
-
Inputs:
- Configuration Files (YAML, JSON, etc.): Use for complex or static settings that don't change often, such as model hyperparameters or dataset paths.
- Command-Line Arguments: Use for dynamic values that override defaults or control runtime behavior, like
--verbose
for logging levels or--date
for a specific run date.
-
Outputs & Behavior:
- Return Meaningful Exit Codes: Always return
0
on success and a non-zero integer on failure. This is the universal signal for success or failure in shell environments and is crucial for automation. - Produce Structured Logs: Instead of plain
print()
statements, use a logging library. Emitting structured logs (e.g., in JSON format) makes them machine-readable, which is invaluable for monitoring and alerting systems. - Be Idempotent: Where possible, design your entrypoint so that running it multiple times with the same inputs produces the same result.
- Respect the Single Responsibility Principle: Create distinct entrypoints for distinct tasks (e.g.,
bikes-train
,bikes-predict
,bikes-validate-data
) rather than one massive entrypoint with many modes.
- Return Meaningful Exit Codes: Always return