TESTING.rst - airflow - Git at Google

  .. Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

  ..   http://www.apache.org/licenses/LICENSE-2.0

  .. Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 .. contents:: :local:

 Airflow Test Infrastructure
 ===========================

 * **Unit tests** are Python tests that do not require any additional integrations.
   Unit tests are available both in the `Breeze environment <BREEZE.rst>`__
   and local virtualenv.

 * **Integration tests** are available in the Breeze development environment
   that is also used for Airflow Travis CI tests. Integration test are special tests that require
   additional services running - such as Postgres/Mysql/Kerberos etc. Those tests are not yet
   clearly marked as integration tests but soon they will be clearly separated by pytest annotations.

 * **System tests** are automatic tests that use external systems like
   Google Cloud Platform. These tests are intended for an end-to-end DAG execution.
   Note that automated execution of these tests is still
   `work in progress <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems#app-switcher>`_.

 This document is about running python tests, before the tests are run we also use
 `static code checks <STATIC_CODE_CHECKS.rst>`__ which allow to catch typical errors in code
 before tests are executed.

 Airflow Unit Tests
 ==================

 All tests for Apache Airflow are run using `pytest <http://doc.pytest.org/en/latest/>`_ .

 Writing unit tests
 ------------------

 There are a few guidelines that you should follow when writing unit tests:

 * Standard unit tests that do not require integrations with external systems should mock all communication
 * All our tests are run with pytest make sure you set your IDE/runners (see below) to use pytest by default
 * For new tests we should use standard "asserts" of python and pytest decorators/context managers for testing
   rather than unittest ones. Look at `Pytest docs <http://doc.pytest.org/en/latest/assert.html>`_ for details.
 * We use parameterized framework for tests that have variations in parameters
 * We plan to convert all unittests to standard "asserts" semi-automatically but this will be done later
   in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators

 Running Unit Tests from IDE
 ---------------------------

 To run unit tests from the IDE, create the `local virtualenv <LOCAL_VIRTUALENV.rst>`_,
 select it as the default project's environment, then configure your test runner:

 .. image:: images/configure_test_runner.png
     :align: center
     :alt: Configuring test runner

 and run unit tests as follows:

 .. image:: images/running_unittests.png
     :align: center
     :alt: Running unit tests

 Note that you can run the unit tests in the standalone local virtualenv
 (with no Breeze installed) if they do not have dependencies such as
 Postgres/MySQL/Hadoop/etc.


 Running Unit Tests
 --------------------------------
 To run unit, integration and system tests from the Breeze and your
 virtualenv you can use `pytest <http://doc.pytest.org/en/latest/>`_ framework.

 Custom pytest plugin run ``airflow db init`` and ``airflow db reset`` the first
 time you launch them, so you can count on the database being initialized. Currently,
 when you run tests not supported **in the local virtualenv, the tests may either fail
 or provide an error message**.

 There are many available options for selecting specific test in pytest. Details could be found
 in official documentation but here are few basic examples:

 .. code-block:: bash

     pytest -k "TestCore and not check"

 This will run ``TestCore`` class but will skip tests of this class that includes 'check' in their names.
 For better performance (due to test collection) you should do:

 .. code-block:: bash

     pytest tests/tests_core.py -k "TestCore and not bash".

 This flag is useful when used like this:

 .. code-block:: bash

     pytest tests/tests_core.py -k "test_check_operators"

 to run single test. This can also be done by specifying full path to the test:

 .. code-block:: bash

     pytest tests/test_core.py::TestCore::test_check_operators

 To run whole test class:

 .. code-block:: bash

     pytest tests/test_core.py::TestCore

 You can use all available pytest flags, for example to increase log level
 for debugging purposes:

 .. code-block:: bash

     pytest --log-level=DEBUG tests/test_core.py::TestCore

 **Note:** We do not provide a clear distinction between tests
 (Unit/Integration/System tests), but we are working on it.


 Running Tests for a Specified Target using Breeze from the host
 ---------------------------------------------------------------

 If you wish to only run tests and not to drop into shell, you can do this by providing the
 ``-t``, ``--test-target`` flag. You can add extra pytest flags after ``--`` in the command line.

 .. code-block:: bash

      ./breeze --test-target tests/hooks/test_druid_hook.py -- --logging-level=DEBUG

 You can run the whole test suite with a special '.' test target:

 .. code-block:: bash

     ./breeze --test-target .

 You can also specify individual tests or a group of tests:

 .. code-block:: bash

     ./breeze --test-target tests/test_core.py::TestCore


 Airflow Integration Tests
 =========================

 Some of the tests in Airflow are Integration tests. Those tests require not only airflow-testing docker
 image but also extra images with integrations (such as redis/mongodb etc.).


 Enabling integrations
 ---------------------

 Running Airflow integration tests cannot be run in local virtualenv. They can only run in Breeze
 environment with enabled integrations and in Travis CI.

 When you are in Breeze environment, by default all integrations are disabled - this way only true unit tests
 can be executed in Breeze. You can enable the integration by passing ``--integration <INTEGRATION>``
 switch when starting Breeze. You can specify multiple integrations by repeating the ``--integration`` switch
 or by using ``--integration all`` switch which enables all integrations.

 Note, that every integration requires separate container with the corresponding integration image,
 so they take precious resources on your PC - mainly memory. The integrations started are not stopped
 until you stop the Breeze environment with ``--stop-environment`` switch.

 The following integrations are available:

 .. list-table:: Airflow Test Integrations
    :widths: 15 80
    :header-rows: 1

    * - Integration
      - Description
    * - cassandra
      - Integration required for Cassandra hooks
    * - kerberos
      - Integration that provides Kerberos authentication
    * - mongo
      - Integration required for MongoDB hooks
    * - openldap
      - Integration required for OpenLDAP hooks
    * - rabbitmq
      - Integration required for Celery executor tests
    * - redis
      - Integration required for Celery executor tests

 Below command starts mongo integration only:

 .. code-block:: bash

     ./breeze --integration mongo

 Below command starts mongo and cassandra integrations:

 .. code-block:: bash

     ./breeze --integration mongo --integration cassandra

 Below command starts all integrations:

 .. code-block:: bash

     ./breeze --integration all

 In the CI environment integrations can be enabled by specifying ``ENABLED_INTEGRATIONS`` variable
 storing space-separated list of integrations to start. Thanks to that we can run integration and
 integration-less tests separately in different jobs which is desired from the memory usage point of view.

 Note that Kerberos is a special kind of integration. There are some tests that run differently when
 Kerberos integration is enabled (they retrieve and use Kerberos authentication token) and differently when the
 Kerberos integration is disabled (they do not retrieve nor use the token). Therefore one of the test job
 for the CI system should run all tests with kerberos integration enabled to test both scenarios.

 Running integration tests
 -------------------------

 All tests that are using an integration are marked with custom pytest marker ``pytest.mark.integration``.
 The marker has single parameter - name of the integration.

 Example redis-integration test:

 .. code-block:: python

     @pytest.mark.integration("redis")
     def test_real_ping(self):
         hook = RedisHook(redis_conn_id='redis_default')
         redis = hook.get_conn()

         self.assertTrue(redis.ping(), 'Connection to Redis with PING works.')

 The markers can be specified at the test level or at the class level (then all tests in this class
 require the integration). You can add multiple markers with different integrations for tests that
 require more than one integration.

 The behaviour of such marked tests is that it is skipped in case required integration is not enabled.
 The skip message will clearly say what's needed in order to use that tests.

 You can run all tests that are using certain integration with the custom pytest flag ``--integrations``,
 where you can pass integrations as comma separated values. You can also specify ``all`` in order to start
 tests for all integrations. Note that if an integration is not enabled in Breeze or CI.

 Example that runs only ``mongo`` integration tests:

 .. code-block:: bash

     pytest --integrations mongo

 Example that runs integration tests fot ``mogo`` and ``rabbitmq``:

 .. code-block:: bash

     pytest --integrations mongo,rabbitmq

 Example that runs all integration tests:

 .. code-block:: bash

     pytest --integrations all

 Note that collecting all tests takes quite some time, so if you know where your tests are located you can
 speed up test collection significantly by providing the folder where the tests are located.

 Here is an example of collection limited only to apache providers directory:

 .. code-block:: bash

     pytest --integrations cassandra tests/providers/apache/

 Running backend-specific tests
 ------------------------------

 Some tests that are using a specific backend are marked with custom pytest marker ``pytest.mark.backend``.
 The marker has single parameter - name of the backend. It correspond with the ``--backend`` switch of
 the Breeze environment (one of ``mysql``, ``sqlite``, ``postgres``). Those tests will only run when
 the Breeze environment is running with the right backend. You can specify more than one backend
 in the marker - then the test will run for all those backends specified.

 Example postgres-only test:

 .. code-block:: python

     @pytest.mark.backend("postgres")
     def test_copy_expert(self):
         ...


 Example postgres,mysql test (they are skipped with sqlite backend):

 .. code-block:: python

     @pytest.mark.backend("postgres", "mysql")
     def test_celery_executor(self):
         ...


 You can use custom ``--backend`` switch in pytest to only run tests specific for that backend.
 Here is an example of only running postgres-specific backend tests:

 .. code-block:: bash

     pytest --backend postgres

 Running Tests with Kubernetes
 -----------------------------

 Starting Kubernetes Cluster when starting Breeze
 ................................................

 In order to run Kubernetes in Breeze you can start Breeze with ``--start-kind-cluster`` switch. This will
 automatically create a Kind Kubernetes cluster in the same ``docker`` engine that is used to run Breeze
 Setting up the Kubernetes cluster takes some time so the cluster continues running
 until the cluster is stopped with ``--stop-kind-cluster`` switch or until ``--recreate-kind-cluster``
 switch is used rather than ``--start-kind-cluster``. Starting breeze with kind cluster automatically
 sets ``runtime`` to ``kubernetes`` (see below).

 The cluster name follows the pattern ``airflow-python-X.Y.Z-vA.B.C`` where X.Y.Z is Python version
 and A.B.C is kubernetes version. This way you can have multiple clusters setup and running at the same
 time for different python versions and different kubernetes versions.

 The Control Plane is available from inside the docker image via ``<CLUSTER_NAME>-control-plane:6443``
 host:port, the worker of the kind cluster is available at  <CLUSTER_NAME>-worker
 and webserver port for the worker is 30809.

 The Kubernetes Cluster is started but in order to deploy airflow to Kubernetes cluster you need to:

 1. Build the image.
 2. Load it to Kubernetes cluster.
 3. Deploy airflow application.

 It can be done with single script: ``./scripts/ci/in_container/kubernetes/deploy_airflow_to_kubernetes.sh``

 You can, however, work separately on the image in Kubernetes and deploying the Airflow app in the cluster.

 Building Airflow Images and Loading them to Kubernetes cluster
 ..............................................................

 This is done using ``./scripts/ci/in_container/kubernetes/docker/rebuild_airflow_image.sh`` script:

 1. Latest ``apache/airflow:master-pythonX.Y-ci`` images are rebuilt using latest sources.
 2. New Kubernetes image based on the  ``apache/airflow:master-pythonX.Y-ci`` is built with
    necessary scripts added to run in kubernetes. The image is tagged with
    ``apache/airflow:master-pythonX.Y-ci-kubernetes`` tag.
 3. The image is loaded to the kind cluster using ``kind load`` command

 Deploying Airflow Application in the Kubernetes cluster
 .......................................................

 This is done using ``./scripts/ci/in_container/kubernetes/app/deploy_app.sh`` script:

 1. Kubernetes resources are prepared by processing template from ``template`` directory, replacing
    variables with the right images and locations:
    - configmaps.yaml
    - airflow.yaml
 2. The existing resources are used without replacing any variables inside:
    - secrets.yaml
    - postgres.yaml
    - volumes.yaml
 3. All the resources are applied in the Kind cluster
 4. The script will wait until all the applications are ready and reachable

 After the deployment is finished you can run Kubernetes tests immediately in the same way as other tests.
 The Kubernetes tests are in ``tests/integration/kubernetes`` folder.

 You can run all the integration tests for Kubernetes with ``pytest tests/integration/kubernetes``.


 Running runtime-specific tests
 ------------------------------

 Some tests that are using a specific runtime are marked with custom pytest marker ``pytest.mark.runtime``.
 The marker has single parameter - name of the runtime. For the moment the only supported runtime is
 ``kubernetes``. This runtime is set when you run Breeze with ``--start-kind-cluster`` option.
 Those tests will only run when the selectd runtime is started.


 .. code-block:: python

     @pytest.mark.runtime("kubernetes")
     class TestKubernetesExecutor(unittest.TestCase):


 You can use custom ``--runtime`` switch in pytest to only run tests specific for that backend.

 Here is an example of only running kubernetes-runtime backend tests:

 .. code-block:: bash

     pytest --runtime kubernetes

 Note! For convenience and faster search, all runtime tests are stored in ``tests.runtime`` package. You
 can speed up collection of tests in this case by:

 .. code-block:: bash

     pytest --runtime kubernetes tests/runtime

 Travis CI Testing Framework
 ===========================

 Airflow test suite is based on Travis CI framework as running all of the tests
 locally requires significant setup. You can set up Travis CI in your fork of
 Airflow by following the
 `Travis CI Getting Started guide <https://docs.travis-ci.com/user/getting-started/>`__.

 Consider using Travis CI framework if you submit multiple pull requests
 and want to speed up your builds.

 There are two different options available for running Travis CI, and they are
 set up on GitHub as separate components:

 -   **Travis CI GitHub App** (new version)
 -   **Travis CI GitHub Services** (legacy version)

 Travis CI GitHub App (new version)
 ----------------------------------

 1.  Once `installed <https://github.com/apps/travis-ci/installations/new/permissions?target_id=47426163>`__,
     configure the Travis CI GitHub App at
     `Configure Travis CI <https://github.com/settings/installations>`__.

 2.  Set repository access to either "All repositories" for convenience, or "Only
     select repositories" and choose ``USERNAME/airflow`` in the drop-down menu.

 3.   Access Travis CI for your fork at `<https://travis-ci.com/USERNAME/airflow>`__.

 Travis CI GitHub Services (legacy version)
 ------------------------------------------

 **NOTE:** The apache/airflow project is still using the legacy version.

 Travis CI GitHub Services version uses an Authorized OAuth App.

 1.  Once installed, configure the Travis CI Authorized OAuth App at
     `Travis CI OAuth APP <https://github.com/settings/connections/applications/88c5b97de2dbfc50f3ac>`__.

 2.  If you are a GitHub admin, click the **Grant** button next to your
     organization; otherwise, click the **Request** button. For the Travis CI
     Authorized OAuth App, you may have to grant access to the forked
     ``ORGANIZATION/airflow`` repo even though it is public.

 3.  Access Travis CI for your fork at
     `<https://travis-ci.org/ORGANIZATION/airflow>`_.

 Creating New Projects in Travis CI
 ----------------------------------

 If you need to create a new project in Travis CI, use travis-ci.com for both
 private repos and open source.

 The travis-ci.org site for open source projects is now legacy and you should not use it.

 ..
     There is a second Authorized OAuth App available called **Travis CI for Open Source** used
     for the legacy travis-ci.org service. Don't use it for new projects!

 More information:

 -  `Open Source on travis-ci.com <https://docs.travis-ci.com/user/open-source-on-travis-ci-com/>`__.
 -  `Legacy GitHub Services to GitHub Apps Migration Guide <https://docs.travis-ci.com/user/legacy-services-to-github-apps-migration-guide/>`__.
 -  `Migrating Multiple Repositories to GitHub Apps Guide <https://docs.travis-ci.com/user/travis-migrate-to-apps-gem-guide/>`__.

 Airflow System Tests
 ====================

 The System tests for Airflow are not yet fully implemented. They are Work In Progress of the
 `AIP-4 Support for System Tests for external systems <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems>`__.
 These tests need to communicate with external services/systems that are available
 if you have appropriate credentials configured for your tests.
 The tests derive from ``tests.system_test_class.SystemTests`` class.

 The system tests execute a specified
 example DAG file that runs the DAG end-to-end.

 An example of such a system test is
 ``airflow.tests.providers.google.operators.test_natural_language_system.CloudNaturalLanguageExampleDagsTest``.

 For now you can execute the system tests and follow messages printed to get them running. Soon more information on
 running the tests will be available.


 Local and Remote Debugging in IDE
 =================================

 One of the great benefits of using the local virtualenv and Breeze is an option to run
 local debugging in your IDE graphical interface.

 When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate
 container. This makes it a little harder to use with IDE built-in debuggers.
 Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions).
 See additional details on
 `remote debugging <https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html>`_.

 You can set up your remote debugging session as follows:

 .. image:: images/setup_remote_debugging.png
     :align: center
     :alt: Setup remote debugging

 Note that on macOS, you have to use a real IP address of your host rather than default
 localhost because on macOS the container runs in a virtual machine with a different IP address.

 Make sure to configure source code mapping in the remote debugging configuration to map
 your local sources to the ``/opt/airflow`` location of the sources within the container:

 .. image:: images/source_code_mapping_ide.png
     :align: center
     :alt: Source code mapping

 DAG testing
 ===========

 To ease and speed up process of developing DAGs you can use
 py:class:`~airflow.executors.debug_executor.DebugExecutor` - a single process executor
 for debugging purposes. Using this executor you can run and debug DAGs from your IDE.

 **IDE setup steps:**

 1. Add ``main`` block at the end of your DAG file to make it runnable.
 It will run a backfill job:

 .. code-block:: python

   if __name__ == '__main__':
     dag.clear(reset_dag_runs=True)
     dag.run()


 2. Setup ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in run configuration of your IDE. In
    this step you should also setup all environment variables required by your DAG.

 3. Run and debug the DAG file.

 Additionally ``DebugExecutor`` can be used in a fail-fast mode that will make
 all other running or scheduled tasks fail immediately. To enable this option set
 ``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``.


 BASH unit testing (BATS)
 ========================

 We have started to add tests to cover Bash scripts we have in our codeabase.
 The tests are placed in ``tests\bats`` folder.
 They require BAT CLI to be installed if you want to run them in your
 host or via docker image.

 BATS CLI installation
 ---------------------

 You can find installation guide as well as information on how to write
 the bash tests in [BATS installation](https://github.com/bats-core/bats-core#installation)

 Running BATS tests in the host
 ------------------------------

 Running all tests:

 ```
 bats -r tests/bats/
 ```

 Running single test:

 ```
 bats tests/bats/your_test_file.bats
 ```

 Running BATS tests via docker
 -----------------------------

 Running all tests:

 ```
 docker run -it --workdir /airflow -v $(pwd):/airflow  bats/bats:latest -r /airflow/tests/bats
 ```

 Running single test:

 ```
 docker run -it --workdir /airflow -v $(pwd):/airflow  bats/bats:latest /airflow/tests/bats/your_test_file.bats
 ```

 BATS usage
 ----------

 You can read more about using BATS CLI and writing tests in:
 [BATS usage](https://github.com/bats-core/bats-core#usage)