blob: 401b383ab3bc94e1fe78f8d55474d079c3e5e738 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Airflow Integration Tests
=========================
Integration tests in Airflow check the interactions between Airflow components and external services
that could run as separate Docker containers, without connecting to an external API on the internet.
These tests require ``airflow`` Docker image and extra images with integrations (such as ``celery``, ``mongodb``, etc.).
The integration tests are all stored in the ``tests/integration`` folder, and similarly to the unit tests they all run
using `pytest <http://doc.pytest.org/en/latest/>`_, but they are skipped by default unless ``--integration`` flag is passed to pytest.
**The outline for this document in GitHub is available at top-right corner button (with 3-dots and 3 lines).**
Enabling Integrations
---------------------
Airflow integration tests cannot be run in the local virtualenv. They can only run in the Breeze
environment and in the CI, with their respective integrations enabled. See `CI <../../dev/breeze/doc/ci/README.md>`_ for
details about Airflow CI.
When you initiate a Breeze environment, by default, all integrations are disabled. This enables only unit tests
to be executed in Breeze. You can enable an integration by passing the ``--integration <INTEGRATION>``
switch when starting Breeze, either with ``breeze shell`` or with ``breeze start-airflow``. As there's no need to simulate
a full setup of Airflow during integration tests, using ``breeze shell`` (or simply ``breeze``) to run them is
sufficient. You can specify multiple integrations by repeating the ``--integration`` switch, or by using the ``--integration all-testable`` switch
that enables all testable integrations. You may use ``--integration all`` switch to enable all integrations that
includes also non-testable integrations such as openlineage.
NOTE: Every integration requires a separate container with the corresponding integration image.
These containers take precious resources on your PC, mainly the memory. The started integrations are not stopped
until you stop the Breeze environment with the ``breeze down`` command.
While you can run ``breeze`` or ``breeze shell` with one or multiple integrations enabled, some integrations
are pure "provider" integrations, and some are "core" integration. Which means that some of the integrations
have tests defined in the "providers" tests (because they are testing provider features), but other
integrations are testing core features of Airflow (Fore example executors with CeleryExecutor).
The following integrations are available ("Core tests ?" column indicates if the integration is a
core or provider type of test.
.. BEGIN AUTO-GENERATED INTEGRATION LIST
+--------------+-------------------------------------------------------+
| Identifier | Description |
+==============+=======================================================+
| cassandra | Integration required for Cassandra hooks. |
+--------------+-------------------------------------------------------+
| celery | Integration required for Celery executor tests. |
+--------------+-------------------------------------------------------+
| drill | Integration required for drill operator and hook. |
+--------------+-------------------------------------------------------+
| kafka | Integration required for Kafka hooks. |
+--------------+-------------------------------------------------------+
| kerberos | Integration that provides Kerberos authentication. |
+--------------+-------------------------------------------------------+
| keycloak | Integration for manual testing of multi-team Airflow. |
+--------------+-------------------------------------------------------+
| mongo | Integration required for MongoDB hooks. |
+--------------+-------------------------------------------------------+
| mssql | Integration required for mssql hooks. |
+--------------+-------------------------------------------------------+
| openlineage | Integration required for Openlineage hooks. |
+--------------+-------------------------------------------------------+
| otel | Integration required for OTEL/opentelemetry hooks. |
+--------------+-------------------------------------------------------+
| pinot | Integration required for Apache Pinot hooks. |
+--------------+-------------------------------------------------------+
| qdrant | Integration required for Qdrant tests. |
+--------------+-------------------------------------------------------+
| redis | Integration required for Redis tests. |
+--------------+-------------------------------------------------------+
| statsd | Integration required for Statsd hooks. |
+--------------+-------------------------------------------------------+
| tinkerpop | Integration required for gremlin operator and hook. |
+--------------+-------------------------------------------------------+
| trino | Integration required for Trino hooks. |
+--------------+-------------------------------------------------------+
| ydb | Integration required for YDB tests. |
+--------------+-------------------------------------------------------+
.. END AUTO-GENERATED INTEGRATION LIST'
To start a shell with ``mongo`` integration enabled, enter:
.. code-block:: bash
breeze --integration mongo
You could add multiple ``--integration`` options as the types of the integrations that you want to enable.
For example, to start a shell with both ``mongo`` and ``cassandra`` integrations enabled, enter:
.. code-block:: bash
breeze --integration mongo --integration cassandra
To start all testable integrations, enter:
.. code-block:: bash
breeze --integration all-testable
To start all integrations, enter:
.. code-block:: bash
breeze --integration all
Note that Kerberos is a special kind of integration. Some tests run differently when
Kerberos integration is enabled (they retrieve and use a Kerberos authentication token) and differently when the
Kerberos integration is disabled (they neither retrieve nor use the token). Therefore, one of the test jobs
for the CI system should run all tests with the Kerberos integration enabled to test both scenarios.
Running Integration Tests
-------------------------
All integration tests are marked with a custom pytest marker ``pytest.mark.integration``.
The marker has a single parameter - the name of integration.
Example of the ``celery`` integration test:
.. code-block:: python
@pytest.mark.integration("celery")
def test_real_ping(self):
hook = RedisHook(redis_conn_id="redis_default")
redis = hook.get_conn()
assert redis.ping(), "Connection to Redis with PING works."
The markers can be specified at the test level or the class level (then all tests in this class
require an integration). You can add multiple markers with different integrations for tests that
require more than one integration.
If such a marked test does not have a required integration enabled, it is skipped.
The skip message clearly says what is needed to use the test.
To run all tests with a certain integration, use the custom pytest flag ``--integration``.
You can pass several integration flags if you want to enable several integrations at once.
**NOTE:** If an integration is not enabled in Breeze or CI,
the affected test will be skipped.
To run only ``mongo`` integration tests:
.. code-block:: bash
pytest --integration mongo tests/integration
To run integration tests for ``mongo`` and ``celery``:
.. code-block:: bash
pytest --integration mongo --integration celery tests/integration
Here is an example of the collection limited to the ``providers/apache`` sub-directory:
.. code-block:: bash
pytest --integration cassandra tests/integrations/providers/apache
Running Integration Tests from the Host
---------------------------------------
You can also run integration tests using Breeze from the host. Depending on the type of integration,
you can rum "providers" or "core" integration tests. You can consult the table above to see which
integration is "core" and which is "provider" one, also by running the
``breeze providers-integration-tests --help`` or ``breeze core-integration-tests --help`` command
you can see the list of available integrations for each type of test.
Runs all core integration tests:
.. code-block:: bash
breeze testing core-integration-tests --db-reset --integration all-testable
Runs all providers integration tests:
.. code-block:: bash
breeze testing providers-integration-tests --db-reset --integration all-testable
Runs mongo providers integration tests:
.. code-block:: bash
breeze testing providers-integration-tests --db-reset --integration mongo
Runs kerberos core integration tests:
.. code-block:: bash
breeze testing core-integration-tests --db-reset --integration kerberos
Writing Integration Tests
-------------------------
Before creating the integration tests, you'd like to make the integration itself (i.e., the service) available for use.
For that, you'll first need to create a Docker Compose YAML file under ``scripts/ci/docker-compose``, named
``integration-<INTEGRATION>.yml``. The file should define one service for the integration, and another one
for the Airflow instance that depends on it. It is recommended to stick to the following guidelines:
1. Name the ``services::<INTEGRATION>::container_name`` as the service's name and give it an appropriate description under
``services::<INTEGRATION>::labels:breeze.description``, so it would be easier to detect it in Docker for debugging
purposes.
2. Use an official stable release of the service with a pinned version. When there are number of possibilities for an
image, you should probably pick the latest version that is supported by Airflow.
3. Set the ``services::<INTEGRATION>::restart`` to "on-failure".
4. For integrations that require persisting data (for example, databases), define a volume at ``volumes::<VOLUME_NAME>``
and mount the volume to the data path on the container by listing it under ``services:<INTEGRATION>::volumes``
(see example).
5. Check what ports should be exposed to use the service - carefully validate that these ports are not in use by other
integrations (consult the community what to do if such case happens). To avoid conflicts with host's ports, it is a
good practice to prefix the corresponding host port with a number (usually 2), parametrize it and to list the parameter
under ``# Initialise base variables`` section in ``dev/breeze/src/airflow_breeze/global_constants.py``.
6. In some cases you might need to change the entrypoint of the service's container, for example, by setting
``stdin_open: true``.
7. In the Airflow service definition, ensure that it depends on the integration's service (``depands_on``) and set
the env. var. ``INTEGRATION-<INTEGRATION>`` to true.
8. If you need to mount a file (for example, a configuration file), you could put it at ``scripts/ci/docker-compose``
(or a subfolder of this path) and list it under ``services::<INTEGRATION>::volumes``.
For example, ``integration-drill.yml`` looks as follows:
.. code-block:: yaml
version: "3.8"
services:
drill:
container_name: drill
image: "apache/drill:1.21.1-openjdk-17"
labels:
breeze.description: "Integration required for drill operator and hook."
volumes:
- drill-db-volume:/data
- ./drill/drill-override.conf:/opt/drill/conf/drill-override.conf
restart: "on-failure"
ports:
- "${DRILL_HOST_PORT}:8047"
stdin_open: true
airflow:
depends_on:
- drill
environment:
- INTEGRATION_DRILL=true
volumes:
drill-db-volume:
In the example above, ``DRILL_HOST_PORT = "28047"`` has been added to ``dev/breeze/src/airflow_breeze/global_constants.py``.
Then, you'll also need to set the host port as an env. var. for Docker commands in ``dev/breeze/src/airflow_breeze/params/shell_params.py``
under the property ``env_variables_for_docker_commands``.
For the example above, the following statement was added:
.. code-block:: python
_set_var(_env, "DRILL_HOST_PORT", None, DRILL_HOST_PORT)
The final setup for the integration would be adding a netcat to check that upon setting the integration, it is possible
to access the service in the internal port.
For that, you'll need to add the following in ``scripts/in_container/check_environment.sh`` under "Checking backend and integrations".
The code block for ``drill`` in this file looks as follows:
.. code-block:: bash
if [[ ${INTEGRATION_DRILL} == "true" ]]; then
check_service "drill" "run_nc drill 8047" 50
fi
Then, create the integration test file under ``tests/integration`` - remember to prefix the file name with ``test_``,
and to use the ``@pytest.mark.integration`` decorator. It is recommended to define setup and teardown methods
(``setup_method`` and ``teardown_method``, respectively) - you could look at existing integration tests to learn more.
Before pushing to GitHub, make sure to run static checks (``breeze static-checks --only-my-changes``) to apply linters
on the Python logic, as well as to update the commands images under ``dev/breeze/docs/images``.
When writing integration tests for components that also require Kerberos, you could enforce auto-enabling the latter by
updating ``compose_file()`` method in ``airflow_breeze.params.shell_params.ShellParams``. For example, to ensure that
Kerberos is active for ``trino`` integration tests, the following code has been introduced:
.. code-block:: python
if "trino" in integrations and "kerberos" not in integrations:
get_console().print(
"[warning]Adding `kerberos` integration as it is implicitly needed by trino",
)
compose_file_list.append(DOCKER_COMPOSE_DIR / "integration-kerberos.yml")
-----
For other kinds of tests look at `Testing document <../09_testing.rst>`__