4.2. Testing
What are software tests?
Software tests are a set of automated procedures used to ensure that software behaves as intended. They play a critical role in maintaining software reliability, functionality, and in preventing regressions.
Tests can be categorized based on their complexity and scope:
- Unit Test: Focuses on individual components or functions, verifying that each part works in isolation. For example, testing a single function that calculates the sum of two numbers.
- Regression Test: Ensures that previously developed and tested software still performs correctly after it has been changed or interfaced with other software. This type of test is crucial for identifying unintended side effects of updates.
- End-to-End Test: Simulates real user scenarios from start to finish, ensuring the system as a whole operates as expected. It's the most comprehensive form of testing, covering the interaction between various parts of the software and external systems.
Why are tests important in Python projects?
Testing is indispensable in Python development for various reasons:
- Quality Assurance: Confirms that the software fulfills its intended requirements and functions correctly.
- Regression Prevention: Aids in preventing regressions, where updates unintentionally alter or break existing features.
- Refactoring Confidence: Empowers developers to refactor and enhance code with the assurance that they won't unknowingly disrupt existing functionality.
- Serve Documentation: Acts as practical documentation that clarifies how the code is meant to operate.
While developers often use print statements for debugging, testing offers more durable assurances. Once a behavior is validated through tests, it can be continuously verified at each code change, unlike print statements which require manual developer intervention. Testing is especially critical in dynamic languages like Python, where compilers cannot validate the program before execution.
Which tool should you use to run Python tests?
Although Python includes the unittest
module, it tends to be verbose. We suggest using pytest, a contemporary framework that simplifies writing small, clear tests, and scales well for complex application and library testing:
# content of tests/test_sample.py
def inc(x):
return x + 1
def test_answer():
assert inc(3) == 4 # assert the function matches the intended behavior
To run pytest
on your code base for behavior validation:
# pytest installation (one-time)
uv add pytest
# pytest execution
uv run pytest tests/
You can enhance pytest
with additional plugins:
- pytest-cov: This plugin generates coverage reports, helping identify untested parts of the codebase:
# generate a coverage report
pytest --cov=src/ tests/
- pytest-xdist: Enables parallel test execution, utilizing all available CPU cores to speed up the testing process:
# run pytest using all computer cores
pytest -n auto tests/
How should you configure your code base for testing?
To prevent committing pytest
cache files to git, add .pytest_cache
to your .gitignore
file.
VS Code natively supports pytest
. Activate this feature in your *.code-workspace
file with the following settings:
{
"settings": {
"python.testing.pytestEnabled": true,
"python.testing.pytestArgs": [
"tests"
]
}
}
Adjust pytest
configuration globally in your pyproject.toml
file:
[tool.coverage.run]
branch = true # report coverage by branch (if)
source = ["src"] # set the default source folder
omit = ["__main__.py"] # exclude certain files from coverage report
[tool.pytest.ini_options]
addopts = "--verbosity=2" # increase the verbosity level
pythonpath = ["src"] # set the default python path
How should you structure your test structures?
Organize your tests in a dedicated tests
folder, mirroring the structure of your project. For each module in your project, create a corresponding test file in the tests
folder, naming it after the module but prefixed with test_
. For instance, tests for a module named models.py
should be in a file named test_models.py
.
src/
bikes/
models.py
metrics.py
datasets.py
tests/
test_models.py
test_metrics.py
test_datasets.py
Begin each test function with test_
to clearly indicate its purpose. This organization helps in maintaining clarity and ease of navigation within the test suite. You can also separate each test case in 3 steps: Given, When, Then:
def test_inputs_schema(inputs_reader: datasets.Reader) -> None:
# given
schema = schemas.InputsSchema
# when
data = inputs_reader.read()
# then
assert schema.check(data) is not None, "Inputs data should be valid!"
How can you define reusable test components?
pytest
introduces the concept of fixtures, powerful tools for setting up objects that can be reused across multiple tests. You can define fixtures either in the module where they're used or in a tests/conftest.py
file to share them throughout your tests. This is particularly useful in MLOps code bases, where you might not want to retrain models or reload datasets before each test. Here's how you can define and use fixtures:
# in conftest.py
import pytest
import os
@pytest.fixture(scope="session")
def tests_path() -> str:
"""Return the path of the tests folder."""
file_path = os.path.abspath(__file__)
parent_directory = os.path.dirname(file_path)
return parent_directory
@pytest.fixture(scope="function")
def tmp_outputs_path(tmp_path: str) -> str:
"""Return a tmp path for the outputs dataset."""
return os.path.join(tmp_path, "outputs.parquet")
These fixtures, especially when utilized in a shared conftest.py
, facilitate setting up a common testing environment across the code base, ensuring efficiency and consistency. Fixtures with a session
scope will be used across all executions, while function
scope will be refreshed before each test.
How can you avoid repetition in your test scenarios?
To minimize repetition in tests and cover a variety of scenarios, pytest.mark.parametrize
allows you to run the same test function with different sets of arguments. This approach is akin to calling the function multiple times with various inputs:
import pytest
@pytest.mark.parametrize( "name, interval, greater_is_better",
[
("mean_squared_error", [0, float("inf")], False),
("mean_absolute_error", [0, float("inf")], False),
], )
def test_sklearn_metric( name: str,
interval: list,
greater_is_better: bool ) -> None:
# Example test body
assert name in ["mean_squared_error", "mean_absolute_error"]
assert isinstance(interval, list)
assert isinstance(greater_is_better, bool)
In this modified example, test_sklearn_metric
will execute twice, verifying the function behaves correctly across both scenarios. This demonstrates how pytest.mark.parametrize
effectively broadens test coverage with minimal code duplication.
How can you validate the output and exceptions of your program?
pytest
offers several out-of-the-box fixtures for thorough testing, including output capture and exception testing. Here's how you can validate program output and handle exceptions:
import json
import pytest
def test_json_print(capsys) -> None:
# Example setup (replace with actual command execution)
print(json.dumps({"key": "value"})) # Simulated program output
captured = capsys.readouterr()
# Validate
assert captured.err == "", "Captured error should be empty!"
assert json.loads(captured.out), "Captured output should be a valid JSON!"
def test_main_no_configs() -> None:
# given
argv = []
# when
with pytest.raises(RuntimeError) as error:
# Replace with actual function call that should raise an error
raise RuntimeError("No configs provided.")
# then
assert "No configs provided." in str(error.value), "Expected RuntimeError was not raised!"
These examples illustrate capturing program output to verify it's a valid JSON and ensuring the program raises the expected exception under certain conditions, enhancing the robustness of your testing strategy.
Is it easy to define tests for MLOps code bases?
Testing MLOps code bases poses unique challenges compared to other types of code. You must deal with complex data structures, such as dataframes, and the inherent randomness in machine learning models. Furthermore, ML processes like model tuning can be time-consuming, requiring careful design of your tests to facilitate quicker iterations.
Gaining proficiency in writing effective tests for ML code bases comes with experience. As you become more familiar with the nuances of your code and ML workflows, you'll develop strategies for efficient and meaningful testing. Ensuring your ML components are not the weakest link in your software applications is crucial, and a solid testing foundation will bolster confidence in your development process.
What are best practices for writing unit tests?
- Write Readable and Clear Tests: Ensure your tests are straightforward and easy to understand, with descriptive names and necessary comments.
- Keep Tests Independent: Each test should run independently of others, allowing any order of execution.
- Use Fixtures for Setup and Teardown: Utilize fixtures for consistent setup and cleanup, reducing redundancy across tests.
- Regularly Run Your Tests: Integrate testing into your development and CI/CD workflows to catch issues early.
- Aim for High Test Coverage: Cover as much of your code as possible, especially critical paths, to ensure reliability.
- Keep Tests Fast: Optimize test speed to maintain efficiency in your development cycle.
- Review and Update Tests Regularly: As your codebase evolves, ensure your tests remain relevant and reflective of current functionalities.
- Test for Different Scenarios and Edge Cases: Beyond happy paths, test for potential failure modes and edge conditions to ensure robustness.
- Strive for at least 80% Code Coverage: This target helps ensure that the majority of your codebase is verified by tests, safeguarding against regressions and encourage a culture of quality.