blob: f8c08f9c1763c1777c75b41d971a5fe3cff41512 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Airflow Unit Tests
==================
All unit tests for Apache Airflow are run using `pytest <http://doc.pytest.org/en/latest/>`_.
**The outline for this document in GitHub is available via the button in the top-right corner (icon with 3 dots and 3 lines).**
Writing Unit Tests
------------------
Follow these guidelines when writing unit tests:
* For standard unit tests that do not require integration with external systems, ensure all communications are simulated (mocked).
* All Airflow tests are run with ``pytest``. Ensure your IDE or runners (see below) are configured to use ``pytest`` by default.
* For tests, use standard Python "asserts" and ``pytest`` decorators/context managers for testing rather than ``unittest`` ones. See `pytest docs <http://doc.pytest.org/en/latest/assert.html>`__ for details.
* Use the ``pytest.mark.parametrize`` marker for tests that have variations in parameters. See `pytest docs <https://docs.pytest.org/en/latest/how-to/parametrize.html>`__ for details.
* Use ``pytest.warns`` to capture warnings instead of the ``recwarn`` fixture. We aim for zero warnings in our tests; therefore, we run pytest with ``--disable-warnings`` and utilize a custom warning capture system.
Handling warnings
.................
By default, specific warnings are prohibited in new tests:
* ``airflow.exceptions.AirflowProviderDeprecationWarning``
Any test triggering this warning without capturing it will fail.
.. code-block:: console
[Breeze:3.10.19] root@91e633d08aa8:/opt/airflow# pytest tests/models/test_dag.py::TestDag::test_clear_dag
...
FAILED tests/models/test_dag.py::TestDag::test_clear_dag[None-None] - airflow.exceptions.RemovedInAirflow3Warning: Calling `DAG.create_dagrun()` without an explicit data interval is deprecated
**NOTE:** As of Airflow 3.0, the test file ``tests/models/test_dag.py`` has been relocated to ``airflow-core/tests/unit/models/test_dag.py``.
To avoid this, ensure that:
* Do not use deprecated methods, classes, or arguments in your test cases.
* Your changes do not affect other components. For example, deprecating a part of Airflow Core or a Community Supported Provider might trigger new deprecation warnings. In this case, changes should also be made in all affected components in a backward-compatible way.
* Use ``pytest.warns``. See `pytest doc <https://docs.pytest.org/en/latest/how-to/capture-warnings.html#warns>`__ context manager to catch warnings when testing deprecated components. (Yes, we still need to test legacy/deprecated features until they are completely removed).
.. code-block:: python
def test_deprecated_argument():
with pytest.warns(AirflowProviderDeprecationWarning, match="expected warning pattern"):
SomeDeprecatedClass(foo="bar", spam="egg")
Mocking time-related functionality in tests
-------------------------------------------
Mocking sleep calls
...................
To speed up test execution and avoid unnecessary delays, you should mock sleep calls in tests or set the sleep time to 0.
If the method you are testing includes a call to ``time.sleep()`` or ``asyncio.sleep()``, mock these calls.
How to mock ``sleep()`` depends on how it is imported:
* If ``time.sleep`` is imported as ``import time``:
.. code-block:: python
@mock.patch("time.sleep", return_value=None)
def test_your_test():
pass
* If ``sleep`` is imported directly using ``from time import sleep``:
.. code-block:: python
@mock.patch("path.to.the.module.sleep", return_value=None)
def test_your_test():
pass
For methods that use ``asyncio`` for async sleep calls, the process is identical.
**NOTE:** There are certain cases where the method's correct functioning depends on actual time passing.
In those cases, the test with the mock will fail. It is acceptable to leave it unmocked in such scenarios.
Use your judgment and prefer mocking whenever possible.
Controlling date and time
.........................
Some features rely on the current date and time, e.g., a function that generates timestamps or measures the passing of time.
To test such features reliably, we use the ``time-machine`` library to control the system's time:
.. code-block:: python
@time_machine.travel(datetime(2025, 3, 27, 21, 58, 1, 2345), tick=False)
def test_log_message(self):
"""
The tested code uses datetime.now() to generate a timestamp.
Freezing time ensures the timestamp is predictable and testable.
"""
By setting ``tick=False``, time is frozen at the specified moment and does not advance during the test.
If you want time to progress from a fixed starting point, you can set ``tick=True``.
Airflow configuration for unit tests
------------------------------------
Some unit tests require special configuration set as the ``default``. This is done automatically by
adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in a Pytest auto-use
fixture. This, in turn, makes Airflow load test configuration from the file
``airflow/config_templates/unit_tests.cfg``. Test configuration from there replaces the original
defaults from ``airflow/config_templates/config.yml``. If you want to add some test-only configuration
as a default for all tests, you should add the value to this file.
You can also override the values in individual tests by patching environment variables following
the usual ``AIRFLOW__SECTION__KEY`` pattern or using the ``conf_vars`` context manager.
Airflow unit test types
-----------------------
Airflow tests in the CI environment are split into several test types. You can narrow down which
test types you want to use in various ``breeze testing`` sub-commands in three ways:
* By specifying the ``--test-type`` when running a single test type in ``breeze testing core-tests``, ``breeze testing providers-tests``, or ``breeze testing integration-tests`` commands.
* By specifying a space-separated list of test types via the ``--parallel-test-types`` or ``--excluded-parallel-test-types`` options when running tests in parallel.
The defined test types are:
* ``Always`` - Tests that should always be executed (always sub-folder).
* ``API`` - Tests for the Airflow API (api, api_internal, api_fastapi sub-folders).
* ``CLI`` - Tests for the Airflow CLI (cli folder).
* ``Core`` - Tests for core Airflow functionality (core, executors, jobs, models, ti_deps, utils sub-folders).
* ``Operators`` - Tests for operators (operators folder).
* ``WWW`` - Tests for the Airflow webserver (www folder).
* ``Providers`` - Tests for all Airflow Providers (providers folder).
* ``Other`` - All other tests remaining after the above tests are selected.
We also have types that run "all" tests (ignoring folders, but looking at ``pytest`` markers with filters applied):
* ``All-Postgres`` - Tests that require a Postgres database. Only run when the backend is Postgres (``backend("postgres")`` marker).
* ``All-MySQL`` - Tests that require a MySQL database. Only run when the backend is MySQL (``backend("mysql")`` marker).
* ``All-Quarantined`` - Tests that are flaky and need to be fixed (``quarantined`` marker).
* ``All`` - All tests are run (this is the default).
We also have ``Integration`` tests that run with external software via the ``--integration`` flag in the ``breeze`` environment (via ``breeze testing integration-tests``).
* ``Integration`` - Tests that require external integration images running in docker-compose.
This structure exists for two reasons:
1. To allow selectively running only a subset of test types for some PRs.
2. To allow efficient parallel execution of tests on Self-Hosted runners.
For case 2: We can utilize the memory and CPUs available on both CI and local development machines to run
tests in parallel. However, we cannot use the pytest xdist plugin for this. Instead, we split the tests into test
types and run each type with its own database instance and separate container. The tests in each type run with exclusive access to their database, and tests within a type run sequentially.
This is necessary because these tests rely on shared databases and update/reset/cleanup data during execution.
DB and non-DB tests
-------------------
There are two kinds of unit tests in Airflow: DB and non-DB tests. This chapter describes the differences
between these two types.
Airflow non-DB tests
....................
Non-DB tests are run once for each tested Python version with the ``none`` database backend (which
causes any database access to fail). These tests are run with the ``pytest-xdist`` plugin in parallel, which
means we can efficiently utilize multi-processor machines (including ``self-hosted`` runners with
8 CPUs, where we run tests with maximum parallelism).
It is usually straightforward to run these tests in a local virtualenv because they do not require any
database setup. They also run much faster than DB tests. You can run them with the ``pytest`` command
or with ``breeze`` (which has all dependencies automatically installed). You can also select specific tests, folders, or modules for Pytest to collect/run.
The example below shows how to run all tests, parallelizing them with ``pytest-xdist`` (by specifying the ``tests`` folder):
.. code-block:: bash
pytest airflow-core/tests --skip-db-tests -n auto
The ``--skip-db-tests`` flag will only run tests that are not marked as DB tests.
You can also use the ``breeze`` command to run all the tests (they will run in a separate container,
with the selected Python version and without access to any database). Adding the ``--use-xdist`` flag will run all
tests in parallel using the ``pytest-xdist`` plugin.
You can run parallel commands via ``breeze testing core-tests`` or ``breeze testing providers-tests``
by adding the parallel flags:
.. code-block:: bash
breeze testing core-tests --skip-db-tests --backend none --use-xdist
You can pass a list of test types to execute via ``--parallel-test-type`` or exclude them via ``--exclude-parallel-test-types``:
.. code-block:: bash
breeze testing providers-tests --run-in-parallel --skip-db-tests --backend none --parallel-test-types "Providers[google] Providers[amazon]"
Additionally, you can enter an interactive shell with ``breeze`` and run tests from there to iterate. Source files in ``breeze`` are mounted as volumes, so you can modify them locally and
rerun in Breeze as needed (``-n auto`` will parallelize tests using the ``pytest-xdist`` plugin):
.. code-block:: bash
breeze shell --backend none --python 3.10
> pytest airflow-core/tests --skip-db-tests -n auto
Airflow DB tests
................
Some Airflow tests require a database connection. These tests store and read data
from the Airflow DB using Airflow's core code. It is crucial to run these tests against all real databases
that Airflow supports to check if SQLAlchemy queries and the database schema are correct.
These tests should be marked with the ``@pytest.mark.db_test`` decorator at one of the following levels:
* Test method level
* Test class level
* Module level (using ``pytestmark = pytest.mark.db_test`` at the top of the module)
DB tests are run against multiple supported databases, database versions, and Python versions. To save time, not all combinations are
tested, but enough variations are covered to detect potential problems.
By default, DB tests use SQLite and the "airflow.db" database created in the ``${AIRFLOW_HOME}`` folder. You do not need to do anything to create or initialize the database.
However, if you need to clean and restart the DB, you can run tests with the ``--with-db-init`` flag to re-initialize it. You can also set the ``AIRFLOW__DATABASE__SQL_ALCHEMY_CONN`` environment
variable to point to a supported database (Postgres, MySQL, etc.), and the tests will use that database. You
might need to run ``airflow db reset`` to initialize the database in that case.
It is perfectly fine to run "non-DB" tests when you have a database configured. However, if you want to run *only*
DB tests (as done in our CI for ``Database`` runs), you can use the ``--run-db-tests-only`` flag to filter
out non-DB tests. (You can specify the whole ``tests`` directory or any specific folder/file selection).
.. code-block:: bash
pytest airflow-core/tests --run-db-tests-only
You can also run DB tests within the ``breeze`` dockerized environment. You can choose the backend with the
``--backend`` flag. The default is ``sqlite``, but you can also use ``postgres`` or ``mysql``.
You can also select the backend version and Python version. Breeze will list the available test types via ``--help`` and provide auto-complete.
The example below runs ``Core`` tests with the ``postgres`` backend and Python ``3.10``:
You can also run the commands via ``breeze testing core-tests`` or ``breeze testing providers-tests``
by adding the parallel flags manually:
.. code-block:: bash
breeze testing core-tests --run-db-tests-only --backend postgres --run-in-parallel
You can pass a list of test types to execute via ``--parallel-test-type`` or exclude them via ``--exclude-parallel-test-types``:
.. code-block:: bash
breeze testing providers-tests --run-in-parallel --run-db-tests-only --parallel-test-types "Providers[google] Providers[amazon]"
If you want to iterate on tests, you can enter the interactive shell and run tests iteratively—either by package/module/test or by test type, whatever ``pytest`` supports.
.. code-block:: bash
breeze shell --backend postgres --python 3.10
> pytest airflow-core/tests --run-db-tests-only
As explained before, you cannot run DB tests in parallel using the ``pytest-xdist`` plugin. However, ``breeze`` supports splitting all tests into test-types to run in separate containers with separate databases using the ``--run-in-parallel`` flag.
.. code-block:: bash
breeze testing core-tests --run-db-tests-only --backend postgres --python 3.10 --run-in-parallel
Examples of marking test as DB test
...................................
You can apply the marker on the method/function/class level with the ``@pytest.mark.db_test`` decorator or
at the module level with ``pytestmark = pytest.mark.db_test`` at the top of the module.
It is up to the author to decide whether to mark the test, class, or module as a "DB-test". Generally, the fewer DB tests, the better. If we can clearly separate DB parts from non-DB parts, we should.
However, it is acceptable if a few non-DB tests are marked as DB tests because they are part of a class or module that is "mostly-DB".
Sometimes, when a class can be clearly split into DB and non-DB parts, it is better to split the class
into two separate classes and mark only the DB class as a DB test.
Method level:
.. code-block:: python
import pytest
@pytest.mark.db_test
def test_add_tagging(self, sentry, task_instance): ...
Class level:
.. code-block:: python
import pytest
@pytest.mark.db_test
class TestDatabricksHookAsyncAadTokenSpOutside: ...
Module level (at the top of the module):
.. code-block:: python
import pytest
from airflow.models.baseoperator import BaseOperator
from airflow.models.dag import DAG
from airflow.ti_deps.dep_context import DepContext
from airflow.ti_deps.deps.task_concurrency_dep import TaskConcurrencyDep
pytestmark = pytest.mark.db_test
Best practices for DB tests
...........................
Usually, when adding new tests, you create tests similar to the existing ones. In most cases,
you do not have to worry about the test type. It will be automatically selected for you because the Test Class or module you are adding to is already marked with the ``db_test`` marker.
You should strive to write "pure" non-DB unit tests. However, sometimes it is better to plug into the existing framework of DagRuns, Dags, Connections, and Variables to use the Database directly rather than mocking all DB access. This decision is up to you.
However, if you choose to write DB tests, you must ensure the ``db_test`` marker is added either to the test method, class (with decorator), or whole module (with pytestmark).
If your test accesses the database but is not marked properly, the Non-DB test run in CI will fail with this message:
.. code ::
"Your test accessed the DB but `_AIRFLOW_SKIP_DB_TESTS` is set.
Either make sure your test does not use database or mark your test with `@pytest.mark.db_test`.
How to verify if a DB test is correctly classified
..................................................
If you want to verify if your DB test is correctly classified, you can run the test or group
of tests with the ``--skip-db-tests`` flag.
You can run all (or a subset of) test types to ensure all problems are fixed:
.. code-block:: bash
breeze testing core-tests --skip-db-tests tests/your_test.py
For the whole test suite:
.. code-block:: bash
breeze testing core-tests --skip-db-tests
For selected test types (e.g., only ``Providers/API/CLI`` code):
.. code-block:: bash
breeze testing providers-tests --skip-db-tests --parallel-test-types "Providers[google] Providers[amazon]"
You can also enter the interactive shell with the ``--skip-db-tests`` flag and run tests iteratively:
.. code-block:: bash
breeze shell --skip-db-tests
> pytest tests/your_test.py
How to make your test not depend on DB
......................................
This is tricky and there is no single solution. Sometimes we can mock out methods that require
DB access or objects that normally require a database. Sometimes we can decide to test just a single method
of a class rather than a complex set of steps. Generally speaking, it is better to have as many "pure"
unit tests (requiring no DB) as possible compared to DB tests. They are usually faster and more reliable.
Special cases
.............
There are some tricky test cases that require special handling. Here are some of them:
Parameterized tests stability
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parameterized tests require a stable order of parameters if they are run via ``xdist``. This is because parameterized
tests are distributed among multiple processes and handled separately. In some cases, parameterized tests
have an undefined/random order (or parameters are not hashable, e.g., a set of enums). In such cases,
the xdist execution will fail, and you will get an error mentioning "Known Limitations of xdist".
You can see details about the limitation `here <https://pytest-xdist.readthedocs.io/en/latest/known-limitations.html>`_.
The error in this case will look similar to:
.. code-block::
Different tests were collected between gw0 and gw7. The difference is:
The fix is to sort the parameters in ``parametrize``. For example, instead of this:
.. code-block:: python
@pytest.mark.parametrize("status", ALL_STATES)
def test_method(): ...
do this:
.. code-block:: python
@pytest.mark.parametrize("status", sorted(ALL_STATES))
def test_method(): ...
Similarly, if your parameters are defined as a result of ``utcnow()`` or another dynamic method, you should
avoid that or assign unique IDs for those parametrized tests. Instead of this:
.. code-block:: python
@pytest.mark.parametrize(
"url, expected_dag_run_ids",
[
(
f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte="
f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}",
[],
),
(
f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte="
f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}",
["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"],
),
],
)
def test_end_date_gte_lte(url, expected_dag_run_ids): ...
Do this:
.. code-block:: python
@pytest.mark.parametrize(
"url, expected_dag_run_ids",
[
pytest.param(
f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte="
f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}",
[],
id="end_date_gte",
),
pytest.param(
f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte="
f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}",
["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"],
id="end_date_lte",
),
],
)
def test_end_date_gte_lte(url, expected_dag_run_ids): ...
Problems with Non-DB test collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sometimes, even if the whole module is marked as ``@pytest.mark.db_test``, parsing the file and collecting
tests will fail when ``--skip-db-tests`` is used because some imports or objects created in the
module read from the database.
Moving such initialization code to inside the tests or pytest fixtures (and passing objects needed by tests as fixtures rather than importing them from the module) usually helps. Similarly, you might
use DB-bound objects (like Connection) in your ``parametrize`` specification—this will also fail pytest
collection. Move the creation of such objects to inside the tests.
Example: Moving object creation from top-level to inside tests. This code will break test collection even if
the test is marked as a DB test:
.. code-block:: python
TI = TaskInstance(
task=BashOperator(task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now()),
run_id="fake_run",
state=State.RUNNING,
)
class TestCallbackRequest:
@pytest.mark.parametrize(
"input,request_class",
[
(CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest),
(
TaskCallbackRequest(
full_filepath="filepath",
simple_task_instance=SimpleTaskInstance.from_ti(ti=TI),
is_failure_callback=True,
),
TaskCallbackRequest,
),
(
DagCallbackRequest(
full_filepath="filepath",
dag_id="fake_dag",
run_id="fake_run",
is_failure_callback=False,
),
DagCallbackRequest,
),
(
SlaCallbackRequest(
full_filepath="filepath",
dag_id="fake_dag",
),
SlaCallbackRequest,
),
],
)
def test_from_json(self, input, request_class): ...
Instead, do this (which will not break collection). The ``TaskInstance`` is not initialized when the module is parsed;
it will only be initialized when the test gets executed:
.. code-block:: python
pytestmark = pytest.mark.db_test
class TestCallbackRequest:
@pytest.mark.parametrize(
"input,request_class",
[
(CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest),
(
None, # to be generated when test is run
TaskCallbackRequest,
),
(
DagCallbackRequest(
full_filepath="filepath",
dag_id="fake_dag",
run_id="fake_run",
is_failure_callback=False,
),
DagCallbackRequest,
),
(
SlaCallbackRequest(
full_filepath="filepath",
dag_id="fake_dag",
),
SlaCallbackRequest,
),
],
)
def test_from_json(self, input, request_class):
if input is None:
ti = TaskInstance(
task=BashOperator(
task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now()
),
run_id="fake_run",
state=State.RUNNING,
)
input = TaskCallbackRequest(
full_filepath="filepath",
simple_task_instance=SimpleTaskInstance.from_ti(ti=ti),
is_failure_callback=True,
)
Sometimes it is difficult to rewrite the tests, so you might add conditional handling and mock out some
database-bound methods or objects to avoid hitting the database during test collection. The code below
will hit the Database while parsing the tests because this is what ``Variable.setdefault`` does when
the parametrize specification is parsed—even if the test is marked as a DB test.
.. code-block:: python
from airflow.models.variable import Variable
pytestmark = pytest.mark.db_test
initial_db_init()
@pytest.mark.parametrize(
"env, expected",
[
pytest.param(
{"plain_key": "plain_value"},
"{'plain_key': 'plain_value'}",
id="env-plain-key-val",
),
pytest.param(
{"plain_key": Variable.setdefault("plain_var", "banana")},
"{'plain_key': 'banana'}",
id="env-plain-key-plain-var",
),
pytest.param(
{"plain_key": Variable.setdefault("secret_var", "monkey")},
"{'plain_key': '***'}",
id="env-plain-key-sensitive-var",
),
pytest.param(
{"plain_key": "{{ var.value.plain_var }}"},
"{'plain_key': '{{ var.value.plain_var }}'}",
id="env-plain-key-plain-tpld-var",
),
],
)
def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): ...
You can make the code conditional and mock out ``Variable`` to avoid hitting the database.
.. code-block:: python
from airflow.models.variable import Variable
pytestmark = pytest.mark.db_test
if os.environ.get("_AIRFLOW_SKIP_DB_TESTS") == "true":
# Handle collection of the test by non-db case
Variable = mock.MagicMock() # type: ignore[misc] # noqa: F811
else:
initial_db_init()
@pytest.mark.parametrize(
"env, expected",
[
pytest.param(
{"plain_key": "plain_value"},
"{'plain_key': 'plain_value'}",
id="env-plain-key-val",
),
pytest.param(
{"plain_key": Variable.setdefault("plain_var", "banana")},
"{'plain_key': 'banana'}",
id="env-plain-key-plain-var",
),
pytest.param(
{"plain_key": Variable.setdefault("secret_var", "monkey")},
"{'plain_key': '***'}",
id="env-plain-key-sensitive-var",
),
pytest.param(
{"plain_key": "{{ var.value.plain_var }}"},
"{'plain_key': '{{ var.value.plain_var }}'}",
id="env-plain-key-plain-tpld-var",
),
],
)
def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): ...
You can also use a fixture to create an object that needs the database.
.. code-block:: python
from airflow.models import Connection
pytestmark = pytest.mark.db_test
@pytest.fixture()
def get_connection1():
return Connection()
@pytest.fixture()
def get_connection2():
return Connection(host="apache.org", extra={})
@pytest.mark.parametrize(
"conn",
[
"get_connection1",
"get_connection2",
],
)
def test_as_json_from_connection(self, conn: Connection):
conn = request.getfixturevalue(conn)
...
Running Unit tests
------------------
Running Unit Tests from PyCharm IDE
...................................
To run unit tests from the PyCharm IDE, create a `local virtualenv <../07_local_virtualenv.rst>`_,
select it as the default project's environment, then configure your test runner:
.. image:: images/pycharm/configure_test_runner.png
:align: center
:alt: Configuring test runner
and run unit tests as follows:
.. image:: images/pycharm/running_unittests.png
:align: center
:alt: Running unit tests
**NOTE:** You can run unit tests in the standalone local virtualenv
(with no Breeze installed) if they do not have dependencies such as
Postgres/MySQL/Hadoop/etc.
Running Unit Tests from PyCharm IDE using Breeze
................................................
Ideally, all unit tests should be run using the standardized Breeze environment. While not
as convenient as the one-click "play button" in PyCharm, the IDE can be configured to do
this in two clicks.
1. Add Breeze as an "External Tool":
a. From the settings menu, navigate to ``Tools > External Tools``.
b. Click the plus symbol to open the ``Create Tool`` popup and fill it out:
.. image:: images/pycharm/pycharm_create_tool.png
:align: center
:alt: Installing Python extension
2. Add the tool to the context menu:
a. From the settings menu, navigate to ``Appearance & Behavior > Menus & Toolbars > Project View Popup Menu``.
b. Click on the list of entries where you would like it to be added. Right above or below ``Project View Popup Menu Run Group`` may be a good choice; you can drag and drop this list to rearrange the placement later.
c. Click the plus symbol at the top of the popup window.
d. Find your ``External Tool`` in the new ``Choose Actions to Add`` popup and click OK. If you followed the image above, it will be at ``External Tools > External Tools > Breeze``.
**Note:** This only adds the option to that specific menu. If you would like to add it to the context menu
when right-clicking on a tab at the top of the editor, for example, follow the steps above again
and place it in the ``Editor Tab Popup Menu``.
.. image:: images/pycharm/pycharm_add_to_context.png
:align: center
:alt: Installing Python extension
3. To run tests in Breeze, right-click on the file or directory in the ``Project View`` and click Breeze.
Running Unit Tests from Visual Studio Code
..........................................
To run unit tests from Visual Studio Code:
1. Using the ``Extensions`` view, install the Python extension. Reload if required.
.. image:: images/vscode_install_python_extension.png
:align: center
:alt: Installing Python extension
2. Using the ``Testing`` view, click on ``Configure Python Tests`` and select the ``pytest`` framework.
.. image:: images/vscode_configure_python_tests.png
:align: center
:alt: Configuring Python tests
.. image:: images/vscode_select_pytest_framework.png
:align: center
:alt: Selecting pytest framework
3. Open ``/.vscode/settings.json`` and add ``"python.testing.pytestArgs": ["tests"]`` to enable test discovery.
.. image:: images/vscode_add_pytest_settings.png
:align: center
:alt: Enabling tests discovery
4. Now you are able to run and debug tests from both the ``Testing`` view and test files.
.. image:: images/vscode_run_tests.png
:align: center
:alt: Running tests
Running Unit Tests in local virtualenv
......................................
To run unit, integration, and system tests from Breeze and your
virtualenv, you can use the `pytest <http://doc.pytest.org/en/latest/>`_ framework.
The custom ``pytest`` plugin runs ``airflow db init`` and ``airflow db reset`` the first
time you launch them. So, you can count on the database being initialized. Currently,
when you run tests that are not supported **in the local virtualenv, they may either fail
or provide an error message**.
There are many available options for selecting a specific test in ``pytest``. Details can be found
in the official documentation, but here are a few basic examples:
.. code-block:: bash
pytest airflow-core/tests/unit/core -k "TestCore and not check"
This runs the ``TestCore`` class but skips tests in this class that include 'check' in their names.
For better performance (due to test collection), run:
.. code-block:: bash
pytest airflow-core/tests/unit/core/test_core.py -k "TestCore and not bash"
This flag is useful when used to run a single test like this:
.. code-block:: bash
pytest airflow-core/tests/unit/core/test_core.py -k "test_check_operators"
This can also be done by specifying a full path to the test:
.. code-block:: bash
pytest airflow-core/tests/unit/core/test_core.py::TestCore::test_dag_params_and_task_params
To run the whole test class, enter:
.. code-block:: bash
pytest airflow-core/tests/unit/core/test_core.py::TestCore
You can use all available ``pytest`` flags. For example, to increase the log level
for debugging purposes, enter:
.. code-block:: bash
pytest --log-cli-level=DEBUG airflow-core/tests/unit/core/test_core.py::TestCore
Running Tests using Breeze interactive shell
............................................
You can run tests interactively using regular pytest commands inside the Breeze shell. This has the
advantage that the Breeze container has all the dependencies installed that are needed to run the tests,
and it will ask you to rebuild the image if needed (e.g., if new dependencies should be installed).
By using the interactive shell and iterating over tests, you can re-run tests one-by-one
or group-by-group immediately after modifying them.
Entering the shell is as easy as:
.. code-block:: bash
breeze
This should drop you into the container.
You can also use other switches (like ``--backend``) to configure the environment for your
tests (e.g., to switch to a different database backend - see ``--help`` for more details).
Once inside the container, you can run regular pytest commands. For example:
.. code-block:: bash
pytest --log-cli-level=DEBUG airflow-core/tests/unit/core/test_core.py::TestCore
Running Tests using Breeze from the Host
........................................
If you wish to only run tests and not drop into the shell, use the ``tests`` command.
You can add extra targets and pytest flags after the ``tests`` command. Note that
often you want to run the tests with a clean/reset DB, so usually, you want to add the ``--db-reset`` flag
to the breeze command. The Breeze image will usually have all the dependencies needed, and it
will ask you to rebuild the image if needed.
.. code-block:: bash
breeze testing providers-tests providers/http/tests/http/hooks/test_http.py airflow-core/tests/unit/core/test_core.py --db-reset --log-cli-level=DEBUG
You can run the whole core test suite without adding the test target:
.. code-block:: bash
breeze testing core-tests --db-reset
You can run the whole providers test suite without adding the test target:
.. code-block:: bash
breeze testing providers-tests --db-reset
You can also specify individual tests or a group of tests:
.. code-block:: bash
breeze testing core-tests --db-reset airflow-core/tests/unit/core/test_core.py::TestCore
You can also limit the tests to execute to a specific group of tests:
.. code-block:: bash
breeze testing core-tests --test-type Other
In the case of Providers tests, you can run tests for all providers:
.. code-block:: bash
breeze testing providers-tests --test-type Providers
You can limit the set of providers you would like to run tests for:
.. code-block:: bash
breeze testing providers-tests --test-type "Providers[airbyte,http]"
You can also run all providers but exclude specific ones:
.. code-block:: bash
breeze testing providers-tests --test-type "Providers[-amazon,google]"
Sometimes you need to inspect docker-compose after the tests command completes,
for example, when the test environment could not be properly set due to
failed health checks. This can be achieved with the ``--skip-docker-compose-down``
flag:
.. code-block:: bash
breeze testing core-tests --skip-docker-compose-down
Running full Airflow unit test suite in parallel
................................................
If you run ``breeze testing core-tests --run-in-parallel`` or
``breeze testing providers-tests --run-in-parallel``, tests are executed in parallel
on your development machine, using as many cores as are available to the Docker engine.
If your Docker environment has limited memory (less than 8 GB), then ``Integration``, ``Provider``,
and ``Core`` tests are run sequentially, with the Docker setup cleaned between test runs
to minimize memory usage.
This approach allows for a massive speedup in full test execution. On a machine with 8 CPUs
(16 cores), 64 GB of RAM, and a fast SSD, the full suite of tests can complete in about
5 minutes (!) — compared to more than 30 minutes when run sequentially.
.. note::
On MacOS, you might have fewer CPUs and less memory available to run tests than you have on the host,
simply because your Docker engine runs in a Linux Virtual Machine under the hood. If you want to make
use of parallelism and memory usage for CI tests, you might want to increase the resources available
to your docker engine. See the `Resources <https://docs.docker.com/docker-for-mac/#resources>`_ chapter
in the ``Docker for Mac`` documentation on how to do it.
You can also limit the parallelism by specifying the maximum number of parallel jobs via the
``MAX_PARALLEL_TEST_JOBS`` variable. If you set it to "1", all test types will be run sequentially.
.. code-block:: bash
MAX_PARALLEL_TEST_JOBS="1" ./scripts/ci/testing/ci_run_airflow_testing.sh
.. note::
In case you would like to cleanup after execution of such tests, you might have to cleanup
some of the docker containers running if you use ctrl-c to stop execution. You can easily do it by
running this command (it will kill all running docker containers, so do not use it if you want to keep some
containers running):
.. code-block:: bash
docker kill $(docker ps -q)
Running Backend-Specific Tests
..............................
Tests that use a specific backend are marked with a custom pytest marker ``pytest.mark.backend``.
The marker has a single parameter - the name of the backend. It corresponds to the ``--backend`` switch of
the Breeze environment (one of ``mysql``, ``sqlite``, or ``postgres``). Backend-specific tests only run when
the Breeze environment is running with the correct backend. If you specify more than one backend
in the marker, the test runs for all specified backends.
Example of the ``postgres`` only test:
.. code-block:: python
@pytest.mark.backend("postgres")
def test_copy_expert(self): ...
Example of the ``postgres,mysql`` test (skipped with the ``sqlite`` backend):
.. code-block:: python
@pytest.mark.backend("postgres", "mysql")
def test_celery_executor(self): ...
You can use the custom ``--backend`` switch in pytest to only run tests specific to that backend.
Here is an example of running only postgres-specific backend tests:
.. code-block:: bash
pytest --backend postgres
Running Long-running tests
..........................
Some tests run for a long time. Such tests are marked with the ``@pytest.mark.long_running`` annotation.
These tests are skipped by default. You can enable them with the ``--include-long-running`` flag. You
can also decide to run *only* those tests with the ``-m long-running`` flag.
Running Quarantined tests
.........................
Some of our tests are quarantined. This means that the test will be run in isolation and re-run several times.
Also, when quarantined tests fail, the whole test suite will not fail. Quarantined tests are usually flaky tests that need attention and fixing.
These tests are marked with the ``@pytest.mark.quarantined`` annotation.
They are skipped by default. You can enable them with the ``--include-quarantined`` flag. You
can also decide to run *only* those tests with the ``-m quarantined`` flag.
Compatibility Provider unit tests against older Airflow releases
----------------------------------------------------------------
Why we run provider compatibility tests
.......................................
Our CI runs provider tests for providers with previous compatible Airflow releases. This allows checking
if the providers still work when installed on older Airflow versions.
The back-compatibility tests are based on the configuration specified in the
``PROVIDERS_COMPATIBILITY_TESTS_MATRIX`` constant in the ``./dev/breeze/src/airflow_breeze/global_constants.py``
file - which specifies:
* Python version
* Airflow version
* Which providers should be removed for the tests (exclusions)
* Whether to run tests for this Airflow/Python version
These tests can be used to test the compatibility of providers with past (and future!) releases of Airflow.
For example, it could be used to run the latest provider versions with released or main
Airflow 3 if they are developed independently.
The tests use the current source version of the ``tests`` folder and current ``providers``, so care should be
taken that the tests implemented for providers in the sources allow running against previous versions
of Airflow and against Airflow installed from the PyPI package rather than from sources.
Running the compatibility tests locally
.......................................
Running tests locally is easy with the appropriate ``breeze`` command. In CI, the command
is slightly different as it runs using providers built using wheel packages, but it is faster
to run it locally and easier to iterate if you need to fix a provider using provider sources mounted
directly to the container.
1. Make sure to build the latest Breeze CI image:
.. code-block:: bash
breeze ci-image build --python 3.9
2. Enter the breeze environment by selecting the appropriate Airflow version and choosing
the ``providers-and-tests`` option for the ``--mount-sources`` flag.
.. code-block:: bash
breeze shell --use-airflow-version 2.9.1 --mount-sources providers-and-tests
3. You can then run tests as usual:
.. code-block:: bash
pytest providers/<provider>/tests/.../test.py
4. Iterate with the tests and providers. Both providers and tests are mounted from local sources, so
changes you make locally in both tests and provider sources are immediately reflected inside the
breeze container, and you can re-run the tests inside the ``breeze`` container without restarting it
(which makes it faster to iterate).
.. note::
Since providers are installed from sources rather than from packages, plugins from providers are not
recognized by ProvidersManager for Airflow < 2.10, and tests that expect plugins to work might not work.
In such cases, you should follow the ``CI`` way of running the tests (see below).
Implementing compatibility for provider tests for older Airflow versions
........................................................................
When you implement tests for providers, you should ensure that they are compatible with older Airflow versions.
Note that some tests, if written without taking care about compatibility, might not work with older
versions of Airflow. This is due to refactoring, renaming, and tests relying on Airflow internals that
are not part of the public API. We deal with this in one of the following ways:
1) If the whole provider is supposed to only work for a later Airflow version, we remove the whole provider
by excluding it from the compatibility test configuration (see below).
2) Some compatibility shims are defined in ``devel-common/src/tests_common/test_utils/compat.py`` - and
they can be used to make the tests compatible. For example, importing ``ParseImportError`` after the
exception has been renamed from ``ImportError``. This would fail in Airflow 2.9, but we have a fallback
import in ``compat.py`` that falls back to the old import automatically. So, all tests testing/expecting
``ParseImportError`` should import it from the ``tests_common.tests_utils.compat`` module. There are a few
other compatibility shims defined there, and you can add more if needed in a similar way.
3) If only some tests are not compatible and use features that are available only in a newer Airflow version,
we can mark those tests with the appropriate ``AIRFLOW_V_3_X_PLUS`` boolean constant defined
in ``version_compat.py``. For example:
.. code-block:: python
from tests_common.test_utils.version_compat import AIRFLOW_V_3_0_PLUS
@pytest.mark.skipif(not AIRFLOW_V_3_0_PLUS, reason="The tests should be skipped for Airflow < 3.0")
def some_test_that_only_works_for_airflow_3_0_plus():
pass
4) Sometimes, the tests should only be run when Airflow is installed from the sources in main.
In this case, you can add the conditional ``skipif`` marker for ``RUNNING_TESTS_AGAINST_AIRFLOW_PACKAGES``
to the test. For example:
.. code-block:: python
from tests_common import RUNNING_TESTS_AGAINST_AIRFLOW_PACKAGES
@pytest.mark.skipif(
RUNNING_TESTS_AGAINST_AIRFLOW_PACKAGES, reason="Plugin initialization is done early in case of packages"
)
def test_plugin():
pass
5) Sometimes Pytest collection fails because certain imports used by the tests either do not exist,
fail with a RuntimeError about compatibility ("minimum Airflow version is required"), or raise ``AirflowOptionalProviderFeatureException``. In such cases, you should wrap the imports in the
``ignore_provider_compatibility_error`` context manager adding the ``__file__``
module name as a parameter. This will stop failing pytest collection and automatically skip the whole
module from unit tests.
For example:
.. code-block:: python
with ignore_provider_compatibility_error("2.8.0", __file__):
from airflow.providers.common.io.xcom.backend import XComObjectStorageBackend
6) In some cases, to enable pytest collection on older Airflow versions, you might need to convert
a top-level import into a local import so that the Pytest parser does not fail on collection.
Running provider compatibility tests in CI
..........................................
In CI, these tests are run in a slightly more complex way because we want to run them against the built
providers rather than those mounted from sources.
In case of canary runs, we add the ``--clean-airflow-installation`` flag that removes all packages before
installing the older Airflow version, and then installs development dependencies
from the latest Airflow. This avoids cases where a provider depends on a new dependency added in the latest
version of Airflow. This clean removal and re-installation takes quite some time, so to speed up the tests in regular PRs, we only do this in canary runs.
The exact way CI tests are run can be reproduced locally by building providers from a selected tag/commit and
using them to install and run tests against the selected Airflow version.
Here is how to reproduce it:
1. Make sure to build the latest Breeze CI image:
.. code-block:: bash
breeze ci-image build --python 3.10
2. Build providers from latest sources:
.. code-block:: bash
rm dist/*
breeze release-management prepare-provider-distributions --include-not-ready-providers \
--skip-tag-check --distribution-format wheel
3. Prepare provider constraints:
.. code-block:: bash
breeze release-management generate-constraints --airflow-constraints-mode constraints-source-providers --answer yes
4. Remove providers that are not compatible with the Airflow version installed by default. You can look up
the incompatible providers in the ``PROVIDERS_COMPATIBILITY_TESTS_MATRIX`` constant in the
``./dev/breeze/src/airflow_breeze/global_constants.py`` file.
5. Enter the breeze environment, installing the selected Airflow version and the providers prepared from main:
.. code-block:: bash
breeze shell --use-distributions-from-dist --distribution-format wheel --use-airflow-version 2.9.1 \
--install-airflow-with-constraints --providers-skip-constraints --mount-sources tests
In case you want to reproduce a canary run, you need to add the ``--clean-airflow-installation`` flag:
.. code-block:: bash
breeze shell --use-distributions-from-dist --distribution-format wheel --use-airflow-version 2.9.1 \
--install-airflow-with-constraints --providers-skip-constraints --mount-sources tests --clean-airflow-installation
6. You can then run tests as usual:
.. code-block:: bash
pytest providers/<provider>/tests/.../test.py
7. Iterate with the tests.
The tests are run using:
* Airflow installed from PyPI
* Tests coming from the current Airflow sources (mounted inside the breeze image)
* Providers built from the current Airflow sources and placed in dist
This means that you can modify and run tests and re-run them because sources are mounted from the host.
However, if you want to modify provider code, you need to exit breeze, rebuild the provider package, and
restart breeze using the command above.
Rebuilding a single provider package can be done using this command:
.. code-block:: bash
breeze release-management prepare-provider-distributions \
--skip-tag-check --distribution-format wheel <provider>
Lowest direct dependency resolution tests
-----------------------------------------
We have special tests that run with the lowest direct resolution of dependencies for Airflow and providers.
This is run to check whether we are using a feature that is not available in an
older version of some dependencies.
Tests with lowest-direct dependency resolution for Airflow
..........................................................
You can test minimum dependencies installed by Airflow by running (for example, to run "Core" tests):
.. code-block:: bash
breeze testing core-tests --force-lowest-dependencies --test-type "Core"
You can also iterate on the tests and versions of the dependencies by entering the breeze shell and
running the tests from there, after manually downgrading the dependencies:
.. code-block:: bash
breeze shell # enter the container
cd airflow-core
uv sync --resolution lowest-direct
or run the ``--force-lowest-dependencies`` switch directly from the breeze command line:
.. code-block:: bash
breeze shell --force-lowest-dependencies --test-type "Core"
The way it works: after you enter the breeze container, you run ``uv-sync`` in the airflow-core
folder to downgrade the dependencies to the lowest version that is compatible
with the dependencies specified in airflow-core dependencies. You will see it in the output of the breeze
command as a sequence of downgrades like this:
.. code-block:: diff
- aiohttp==3.9.5
+ aiohttp==3.9.2
- anyio==4.4.0
+ anyio==3.7.1
Tests with lowest-direct dependency resolution for a Provider
.............................................................
Similarly, we can test if the provider tests are working for the lowest dependencies of a specific provider.
These tests can be easily run locally with breeze (replace PROVIDER_ID with the id of the provider):
.. code-block:: bash
breeze testing providers-tests --force-lowest-dependencies --test-type "Providers[PROVIDER_ID]"
If you find that the tests are failing for some dependencies, make sure to add the minimum version for
the dependency in the ``provider.yaml`` file of the appropriate provider and re-run it.
You can also iterate on the tests and versions of the dependencies by entering the breeze shell,
manually downgrading dependencies for the provider, and running the tests after that:
.. code-block:: bash
breeze shell
cd providers/PROVIDER_ID
uv sync --resolution lowest-direct
or run the ``--force-lowest-dependencies`` switch directly from the breeze command line:
.. code-block:: bash
breeze shell --force-lowest-dependencies --test-type "Providers[google]"
Similarly to "Core" tests, the dependencies will be downgraded to the lowest version that is
compatible with the dependencies specified in the provider dependencies, and you will see the list of
downgrades in the output of the breeze command. Note that this will include combined downgrades of both
Airflow and selected provider dependencies, so the list will be longer than in "Core" tests
and longer than **just** the dependencies of the provider. For example, for a ``google`` provider, part of the
downgraded dependencies will contain both Airflow and Google Provider dependencies:
.. code-block:: diff
- flask-login==0.6.3
+ flask-login==0.6.2
- flask-session==0.5.0
+ flask-session==0.4.0
- flask-wtf==1.2.1
+ flask-wtf==1.1.0
- fsspec==2023.12.2
+ fsspec==2023.10.0
- gcloud-aio-bigquery==7.1.0
+ gcloud-aio-bigquery==6.1.2
- gcloud-aio-storage==9.2.0
You can also (if your local virtualenv can install the dependencies for the provider)
reproduce the same set of dependencies in your local virtual environment by running:
.. code-block:: bash
cd airflow-core
uv sync --resolution lowest-direct
for Airflow core, and:
.. code-block:: bash
cd providers/PROVIDER_ID
uv sync --resolution lowest-direct
for the providers.
How to fix failing lowest-direct dependency resolution tests
............................................................
When your tests pass in regular tests but fail in "lowest-direct" dependency resolution tests, you need
to figure out one of the problems:
* Lower-bindings missing in the ``pyproject.toml`` file (in ``airflow-core`` or the corresponding provider).
This is usually a very easy thing that takes a little bit of time to figure out. Especially if you
just added a new feature from a library you use, check the release notes for the minimum
version of the library you can use and set it as ``>=VERSION`` in the ``pyproject.toml``.
* Figuring out if airflow-core or the provider needs additional providers or additional dependencies in the dev
dependency group for the provider. Sometimes tests need another provider to be installed that is not
normally needed as a required dependency of the provider being tested. Those dependencies
should be added after the ``# Additional devel dependencies`` comment in the case of providers. Adding the
dependencies here means that when ``uv sync`` is run, the packages and their dependencies will be installed.
.. code-block:: toml
[dependency-groups]
dev = [
"apache-airflow",
"apache-airflow-task-sdk",
"apache-airflow-devel-common",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-fab",
# Additional devel dependencies (do not remove this line and add extra development dependencies)
# Need to exclude 1.3.0 due to missing aarch64 binaries, fixed with 1.3.1++
"deltalake>=1.1.3,!=1.3.0",
"apache-airflow-providers-microsoft-azure",
]
Sometimes it might get a bit tricky to know the minimum version of the library you should be using.
In this case, you can easily find it by looking at the error and list of downgraded packages and
guessing which one is causing the problem. You can then look at the release notes of the
library and find the minimum version. Alternatively, you can use the technique known as bisecting, which allows
you to quickly figure out the right version without knowing the root cause of the problem.
Assume you suspect library "foo", which was downgraded from 1.0.0 to 0.1.0, is causing the problem. The bisecting
technique looks like this:
* Run ``uv sync --resolution lowest-direct`` (the ``foo`` library is downgraded to 0.1.0). Your test should
fail.
* Make sure that just upgrading the ``foo`` library to 1.0.0 -> re-run failing test (with ``pytest <test>``)
and see that it passes.
* Downgrade the ``foo`` library to 0.1.0 -> re-run failing test (with ``pytest <test>``) and see that it
fails.
* Look at the list of versions available for the library between 0.1.0 and 1.0.0 (for example, via
`<https://pypi.org/project/foo/#history>`_ link - where ``foo`` is your library).
* Find a middle version between 1.0.0 and 0.1.0 and upgrade the library to this version - see if the
test passes or fails. If it passes, continue finding the middle version between the current version
and the lower version. If it fails, continue finding the middle version between the current version and
the higher version.
* Continue this way until you find the lowest version that passes the test.
* Set this version in the ``pyproject.toml`` file, run ``uv sync --resolution lowest-direct``, and see if the test
passes. If it does, you are done. If it does not, repeat the process.
You can also skip some tests when force lowest dependencies are used in breeze by adding the marker below. This is sometimes needed if your "core" or "provider" tests depend on
all or many providers to be installed (for example, tests loading multiple examples or connections):
.. code-block:: python
from tests_common.pytest_plugin import skip_if_force_lowest_dependencies_marker
@skip_if_force_lowest_dependencies_marker
def test_my_test_that_should_be_skipped():
assert 1 == 1
You can also locally set the ``FORCE_LOWEST_DEPENDENCIES`` environment variable to ``true`` before
running ``pytest`` to skip these tests when running locally.
Other Settings
--------------
Enable masking secrets in tests
...............................
By default, masking secrets in tests is disabled because it might have side effects
on other tests that intend to check ``logging/stdout/stderr`` values.
If you need to test masking secrets in test cases,
you have to apply ``pytest.mark.enable_redact`` to the specific test case, class, or module.
.. code-block:: python
@pytest.mark.enable_redact
def test_masking(capsys):
mask_secret("eggs")
RedactedIO().write("spam eggs and potatoes")
assert "spam *** and potatoes" in capsys.readouterr().out
Skip test on unsupported platform / environment
...............................................
You can apply the marker ``pytest.mark.platform(name)`` to a specific test case, class, or module
to prevent it from running on an unsupported platform.
- ``linux``: Run test only on Linux platform.
- ``breeze``: Run test only inside the Breeze container. This might be useful if you run
potentially dangerous things in tests or if it expects to use common Breeze utilities.
Warnings capture system
.......................
By default, all warnings captured during the test runs are saved into ``tests/warnings.txt``.
If required, you can change the path by providing ``--warning-output-path`` as a pytest CLI argument
or by setting the environment variable ``CAPTURE_WARNINGS_OUTPUT``.
.. code-block:: console
[Breeze:3.10.19] root@3f98e75b1ebe:/opt/airflow# pytest airflow-core/tests/unit/core/ --warning-output-path=/foo/bar/spam.egg
...
========================= Warning summary. Total: 28, Unique: 12 ==========================
airflow: total 11, unique 1
runtest: total 11, unique 1
other: total 7, unique 1
runtest: total 7, unique 1
tests: total 10, unique 10
runtest: total 10, unique 10
Warnings saved into /foo/bar/spam.egg file.
================================= short test summary info =================================
You might also disable warning capture by providing ``--disable-capture-warnings`` as a pytest CLI argument
or by setting the `global warnings filter <https://docs.python.org/3/library/warnings.html#the-warnings-filter>`__
to **ignore**, e.g., set the ``PYTHONWARNINGS`` environment variable to ``ignore``.
.. code-block:: bash
pytest airflow-core/tests/unit/core/ --disable-capture-warnings
Keep tests using environment variables
......................................
By default, all environment variables related to Airflow (starting with ``AIRFLOW__``) are cleared before running tests
to avoid potential side effects. However, in some scenarios, you might want to disable this mechanism and keep the
environment variables you defined to configure your Airflow environment. For example, you might want to run tests
against a specific database configured through the environment variable ``AIRFLOW__DATABASE__SQL_ALCHEMY_CONN``,
or run tests using a specific executor configured through ``AIRFLOW__CORE__EXECUTOR``.
To keep using environment variables you defined in your environment, you need to provide ``--keep-env-variables`` as
a pytest CLI argument.
.. code-block:: bash
pytest airflow-core/tests/unit/core/ --no-db-cleanup
This parameter is also available in Breeze.
.. code-block:: bash
breeze testing core-tests --keep-env-variables
Disable database cleanup before each test module
................................................
By default, the database is cleared of all items before running tests. This is to avoid potential conflicts with
existing resources in the database when running tests using the database. However, in some scenarios, you might want to
disable this mechanism and keep the database as is. For example, you might want to run tests in parallel against the
same database. In that case, you need to disable the database cleanup, otherwise, the tests will conflict with
each other (one test will delete the resources that another one is creating).
To disable the database cleanup, you need to provide ``--no-db-cleanup`` as a pytest CLI argument.
.. code-block:: bash
pytest airflow-core/tests/unit/core/ --no-db-cleanup
This parameter is also available in Breeze.
.. code-block:: bash
breeze testing core-tests --no-db-cleanup airflow-core/tests/unit/core/
Code Coverage
-------------
Airflow's CI process automatically uploads the code coverage report to codecov.io.
For the most recent coverage report of the main branch, visit: https://codecov.io/gh/apache/airflow.
Generating Local Coverage Reports:
..................................
If you wish to obtain coverage reports for specific areas of the codebase on your local machine, follow these steps:
a. Initiate a breeze shell.
b. Execute one of the commands below based on the desired coverage area:
- **Core:** ``python scripts/cov/core_coverage.py``
- **REST API:** ``python scripts/cov/restapi_coverage.py``
- **CLI:** ``python scripts/cov/cli_coverage.py``
- **Other:** ``python scripts/cov/other_coverage.py``
c. After execution, run the following commands from the repository root
(inside the Breeze shell):
.. code-block:: bash
cd htmlcov/
python -m http.server 5555
The Breeze container maps port ``5555`` inside the container to
``25555`` on the host, so you can open the coverage report at
http://localhost:25555 in your browser.
.. note::
You no longer need to start the Airflow web server to view the
coverage report. The lightweight HTTP server above is sufficient and
avoids an extra service. If port 25555 on the host is already in use,
adjust the container-to-host mapping with
``BREEZE_PORTS_EXTRA="<host_port>:5555" breeze start-airflow``.
Modules Not Fully Covered:
..........................
Each coverage command provides a list of modules that aren't fully covered. If you wish to enhance coverage for a particular module:
a. Work on the module to improve its coverage.
b. Once coverage reaches 100%, you can safely remove the module from the list of modules that are not fully covered.
This list is inside each command's source code.
Tracking SQL statements
-----------------------
You can run tests with SQL statements tracking. To do this, use the ``--trace-sql`` option and pass the
columns to be displayed as an argument. Each query will be displayed on a separate line.
Supported values:
* ``num`` - Displays the query number.
* ``time`` - Displays the query execution time.
* ``trace`` - Displays the simplified (one-line) stack trace.
* ``sql`` - Displays the SQL statements.
* ``parameters`` - Displays SQL statement parameters.
If you only provide ``num``, then only the final number of queries will be displayed.
By default, pytest does not display output for successful tests. If you still want to see them, you must
pass the ``--capture=no`` option.
If you run the following command:
.. code-block:: bash
pytest --trace-sql=num,sql,parameters --capture=no \
airflow-core/tests/unit/jobs/test_scheduler_job.py -k test_process_dags_queries_count_05
You will see database queries for the given test on the screen.
SQL query tracking does not work properly if your test runs subprocesses. Only queries from the main process
are tracked.
-----
For other kinds of tests, look at the `Testing document <../09_testing.rst>`__