3.1. Modules
What are Python modules?
Python modules are files containing Python code that serve as a fundamental organizational unit in Python programming. These modules encapsulate definitions such as functions, classes, variables, and constants, making it easier to organize, reuse, and share code across different parts of a program.
Modules in Python are more than just physical files; they represent a namespace for Python objects. Importing a module allows you to access these objects in your current script or interactive session.
To discover the physical location of a module, use its __file__
attribute:
import math
print(math.__file__)
You can also enumerate the objects inside a module with the dir()
function:
import math
print(dir(math))
Why do you need Python modules?
Python modules are essential for managing complexity in your projects. They provide a way to segment your code into distinct namespaces, making your projects more organized, readable, and maintainable. For example, in a machine learning project, you might have separate modules for models (models.py
), data processing (datasets.py
), and utility functions (utils.py
). This separation helps in understanding, testing, and collaborating on large codebases.
Modules become indispensable as your project grows beyond a simple script. While a project with less than 100 lines of code might not need separate modules, larger projects benefit greatly from a modular structure.
How should you create a Python module?
Creating a Python module is as simple as creating a .py
file within your project package. For example, in a project structured with a src
directory, you might organize your modules as follows:
$ touch src/bikes/models.py
$ touch src/bikes/datasets.py
This creates two modules, models.py
and datasets.py
, under the bikes
package. Each module can then contain specific functionalities related to your project, such as defining data models or handling dataset loading and preprocessing.
How should you import your Python module?
Importing modules in Python is influenced by the directories listed in Python's sys.path
, akin to path resolution in Unix systems. When importing a module, Python searches through these directories and imports the first match.
To see what directories are in your search path, you can use:
import sys
print(sys.path)
After installing your package locally (e.g., using poetry install
), your package's directory will be added to sys.path
, allowing you to import its modules without specifying their full path.
How should you organize your Python modules?
Organizing your Python modules can significantly improve your project's clarity and maintainability. Here are a few strategies for structuring your modules:
- Flat Layout: Organize modules by major concept or component, using nouns for names. Examples include:
models.py
datasets.py
services.py
splitters.py
- IO and Domain Separation: Separate modules based on their interaction with the external world (I/O) and internal logic (domain), inspired by IO monad in Haskell and Domain-Driven Design. For instance:
- IO Layer:
io/services.py
io/datasets.py
- Domain Layer:
domain/models.py
domain/schemas.py
- High-Level Tasks:
training.py
tuning.py
inference.py
- IO Layer:
The latter approach distinguishes between the unpredictable nature of I/O operations and the more controlled domain logic, with high-level tasks integrating these layers.
What are the risks of using Python modules?
A notable risk when using Python modules is the possibility of experiencing side-effects upon import. Side-effects are operations that occur when a module is imported, which can lead to unexpected behavior or bugs if not handled carefully.
# A module with a potentially harmful operation
# lib.py
import os
os.system("rm -rf /") # This command is extremely dangerous!
# main.py
import lib # Importing lib.py could lead to data loss
To minimize this risk, restrict side-effects to specific entry points and ensure modules primarily contain definitions like functions and classes. Avoid executing code that produces side-effects directly at the module level to ensure cleaner, more predictable imports.