Skip to content

4.2. Testing

What are software tests?

Software tests are a set of automated procedures used to ensure that software behaves as intended. They play a critical role in maintaining software reliability, functionality, and in preventing regressions.

Tests can be categorized based on their complexity and scope:

  • Unit Test: Focuses on individual components or functions, verifying that each part works in isolation. For example, testing a single function that calculates the sum of two numbers.
  • Regression Test: Ensures that previously developed and tested software still performs correctly after it has been changed or interfaced with other software. This type of test is crucial for identifying unintended side effects of updates.
  • End-to-End Test: Simulates real user scenarios from start to finish, ensuring the system as a whole operates as expected. It's the most comprehensive form of testing, covering the interaction between various parts of the software and external systems.

Why are tests important in Python projects?

Testing is indispensable in Python development for various reasons:

  • Quality Assurance: Confirms that the software fulfills its intended requirements and functions correctly.
  • Regression Prevention: Aids in preventing regressions, where updates unintentionally alter or break existing features.
  • Refactoring Confidence: Empowers developers to refactor and enhance code with the assurance that they won't unknowingly disrupt existing functionality.
  • Serve Documentation: Acts as practical documentation that clarifies how the code is meant to operate.

While developers often use print statements for debugging, testing offers more durable assurances. Once a behavior is validated through tests, it can be continuously verified at each code change, unlike print statements which require manual developer intervention. Testing is especially critical in dynamic languages like Python, where compilers cannot validate the program before execution.

Which tool should you use to run Python tests?

Although Python includes the unittest module, it tends to be verbose. We suggest using pytest, a contemporary framework that simplifies writing small, clear tests, and scales well for complex application and library testing:

# content of tests/test_sample.py
def inc(x):
    return x + 1

def test_answer():
    assert inc(3) == 4 # assert the function matches the intended behavior

To run pytest on your code base for behavior validation:

# pytest installation (one-time)
poetry add pytest
# pytest execution
poetry run pytest tests/

You can enhance pytest with additional plugins:

  • pytest-cov: This plugin generates coverage reports, helping identify untested parts of the codebase:
# generate a coverage report
pytest --cov=src/ tests/
  • pytest-xdist: Enables parallel test execution, utilizing all available CPU cores to speed up the testing process:
# run pytest using all computer cores
pytest -n auto tests/

How should you configure your code base for testing?

To prevent committing pytest cache files to git, add .pytest_cache to your .gitignore file.

VS Code natively supports pytest. Activate this feature in your *.code-workspace file with the following settings:

{
    "settings": {
        "python.testing.pytestEnabled": true,
        "python.testing.pytestArgs": [
            "tests"
        ]
    }
}

Adjust pytest configuration globally in your pyproject.toml file:

[tool.coverage.run]
branch = true  # report coverage by branch (if)
source = ["src"]  # set the default source folder
omit = ["__main__.py"]  # exclude certain files from coverage report

[tool.pytest.ini_options]
addopts = "--verbosity=2"  # increase the verbosity level
pythonpath = ["src"]  # set the default python path

How should you structure your test structures?

Organize your tests in a dedicated tests folder, mirroring the structure of your project. For each module in your project, create a corresponding test file in the tests folder, naming it after the module but prefixed with test_. For instance, tests for a module named models.py should be in a file named test_models.py.

src/
    bikes/
        models.py
        metrics.py
        datasets.py
tests/
    test_models.py
    test_metrics.py
    test_datasets.py

Begin each test function with test_ to clearly indicate its purpose. This organization helps in maintaining clarity and ease of navigation within the test suite. You can also separate each test case in 3 steps: Given, When, Then:

def test_inputs_schema(inputs_reader: datasets.Reader) -> None:
    # given
    schema = schemas.InputsSchema
    # when
    data = inputs_reader.read()
    # then
    assert schema.check(data) is not None, "Inputs data should be valid!"

How can you define reusable test components?

pytest introduces the concept of fixtures, powerful tools for setting up objects that can be reused across multiple tests. You can define fixtures either in the module where they're used or in a tests/conftest.py file to share them throughout your tests. This is particularly useful in MLOps code bases, where you might not want to retrain models or reload datasets before each test. Here's how you can define and use fixtures:

# in conftest.py
import pytest
import os

@pytest.fixture(scope="session")
def tests_path() -> str:
    """Return the path of the tests folder."""
    file_path = os.path.abspath(__file__)
    parent_directory = os.path.dirname(file_path)
    return parent_directory

@pytest.fixture(scope="function")
def tmp_outputs_path(tmp_path: str) -> str:
    """Return a tmp path for the outputs dataset."""
    return os.path.join(tmp_path, "outputs.parquet")

These fixtures, especially when utilized in a shared conftest.py, facilitate setting up a common testing environment across the code base, ensuring efficiency and consistency. Fixtures with a session scope will be used across all executions, while function scope will be refreshed before each test.

How can you avoid repetition in your test scenarios?

To minimize repetition in tests and cover a variety of scenarios, pytest.mark.parametrize allows you to run the same test function with different sets of arguments. This approach is akin to calling the function multiple times with various inputs:

import pytest

@pytest.mark.parametrize( "name, interval, greater_is_better",
    [
        ("mean_squared_error", [0, float("inf")], False),
        ("mean_absolute_error", [0, float("inf")], False),
    ], )
def test_sklearn_metric( name: str,
    interval: list,
    greater_is_better: bool ) -> None:
    # Example test body
    assert name in ["mean_squared_error", "mean_absolute_error"]
    assert isinstance(interval, list)
    assert isinstance(greater_is_better, bool)

In this modified example, test_sklearn_metric will execute twice, verifying the function behaves correctly across both scenarios. This demonstrates how pytest.mark.parametrize effectively broadens test coverage with minimal code duplication.

How can you validate the output and exceptions of your program?

pytest offers several out-of-the-box fixtures for thorough testing, including output capture and exception testing. Here's how you can validate program output and handle exceptions:

import json
import pytest

def test_json_print(capsys) -> None:
    # Example setup (replace with actual command execution)
    print(json.dumps({"key": "value"}))  # Simulated program output
    captured = capsys.readouterr()
    # Validate
    assert captured.err == "", "Captured error should be empty!"
    assert json.loads(captured.out), "Captured output should be a valid JSON!"

def test_main_no_configs() -> None:
    # given
    argv = []
    # when
    with pytest.raises(RuntimeError) as error:
        # Replace with actual function call that should raise an error
        raise RuntimeError("No configs provided.")
    # then
    assert "No configs provided." in str(error.value), "Expected RuntimeError was not raised!"

These examples illustrate capturing program output to verify it's a valid JSON and ensuring the program raises the expected exception under certain conditions, enhancing the robustness of your testing strategy.

Is it easy to define tests for MLOps code bases?

Testing MLOps code bases poses unique challenges compared to other types of code. You must deal with complex data structures, such as dataframes, and the inherent randomness in machine learning models. Furthermore, ML processes like model tuning can be time-consuming, requiring careful design of your tests to facilitate quicker iterations.

Gaining proficiency in writing effective tests for ML code bases comes with experience. As you become more familiar with the nuances of your code and ML workflows, you'll develop strategies for efficient and meaningful testing. Ensuring your ML components are not the weakest link in your software applications is crucial, and a solid testing foundation will bolster confidence in your development process.

What are best practices for writing unit tests?

  1. Write Readable and Clear Tests: Ensure your tests are straightforward and easy to understand, with descriptive names and necessary comments.
  2. Keep Tests Independent: Each test should run independently of others, allowing any order of execution.
  3. Use Fixtures for Setup and Teardown: Utilize fixtures for consistent setup and cleanup, reducing redundancy across tests.
  4. Regularly Run Your Tests: Integrate testing into your development and CI/CD workflows to catch issues early.
  5. Aim for High Test Coverage: Cover as much of your code as possible, especially critical paths, to ensure reliability.
  6. Keep Tests Fast: Optimize test speed to maintain efficiency in your development cycle.
  7. Review and Update Tests Regularly: As your codebase evolves, ensure your tests remain relevant and reflective of current functionalities.
  8. Test for Different Scenarios and Edge Cases: Beyond happy paths, test for potential failure modes and edge conditions to ensure robustness.
  9. Strive for at least 80% Code Coverage: This target helps ensure that the majority of your codebase is verified by tests, safeguarding against regressions and encourage a culture of quality.

Testing additional resources