TESTING.rst - airflow - Git at Google

  .. Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

  ..   http://www.apache.org/licenses/LICENSE-2.0

  .. Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 .. contents:: :local:

 Airflow Test Infrastructure
 ===========================

 * **Unit tests** are Python tests that do not require any additional integrations.
   Unit tests are available both in the `Breeze environment <BREEZE.rst>`__
   and local virtualenv.

 * **Integration tests** are available in the Breeze development environment
   that is also used for Airflow Travis CI tests. Integration tests are special tests that require
   additional services running, such as Postgres, MySQL, Kerberos, etc. Currently, these tests are not
   marked as integration tests but soon they will be clearly separated by ``pytest`` annotations.

 * **System tests** are automatic tests that use external systems like
   Google Cloud Platform. These tests are intended for an end-to-end DAG execution.
   The tests can be executed on both current version of Apache Airflow, and any of the older
   versions from 1.10.* series.

 This document is about running Python tests. Before the tests are run, use
 `static code checks <STATIC_CODE_CHECKS.rst>`__ that enable catching typical errors in the code.

 Airflow Unit Tests
 ==================

 All tests for Apache Airflow are run using `pytest <http://doc.pytest.org/en/latest/>`_ .

 Writing Unit Tests
 ------------------

 Follow the guidelines when writing unit tests:

 * For standard unit tests that do not require integrations with external systems, make sure to simulate all communications.
 * All Airflow tests are run with ``pytest``. Make sure to set your IDE/runners (see below) to use ``pytest`` by default.
 * For new tests, use standard "asserts" of Python and ``pytest`` decorators/context managers for testing
   rather than ``unittest`` ones. See `Pytest docs <http://doc.pytest.org/en/latest/assert.html>`_ for details.
 * Use a parameterized framework for tests that have variations in parameters.

 **NOTE:** We plan to convert all unit tests to standard "asserts" semi-automatically but this will be done later
 in Airflow 2.0 development phase. That will include setUp/tearDown/context managers and decorators.

 Running Unit Tests from IDE
 ---------------------------

 To run unit tests from the IDE, create the `local virtualenv <LOCAL_VIRTUALENV.rst>`_,
 select it as the default project's environment, then configure your test runner:

 .. image:: images/configure_test_runner.png
     :align: center
     :alt: Configuring test runner

 and run unit tests as follows:

 .. image:: images/running_unittests.png
     :align: center
     :alt: Running unit tests

 **NOTE:** You can run the unit tests in the standalone local virtualenv
 (with no Breeze installed) if they do not have dependencies such as
 Postgres/MySQL/Hadoop/etc.


 Running Unit Tests
 --------------------------------
 To run unit, integration, and system tests from the Breeze and your
 virtualenv, you can use the `pytest <http://doc.pytest.org/en/latest/>`_ framework.

 Custom ``pytest`` plugin run ``airflow db init`` and ``airflow db reset`` the first
 time you launch them. So, you can count on the database being initialized. Currently,
 when you run tests not supported **in the local virtualenv, they may either fail
 or provide an error message**.

 There are many available options for selecting a specific test in ``pytest``. Details can be found
 in the official documentation but here are a few basic examples:

 .. code-block:: bash

     pytest -k "TestCore and not check"

 This runs the ``TestCore`` class but skips tests of this class that include 'check' in their names.
 For better performance (due to a test collection), run:

 .. code-block:: bash

     pytest tests/tests_core.py -k "TestCore and not bash".

 This flag is useful when used to run a single test like this:

 .. code-block:: bash

     pytest tests/tests_core.py -k "test_check_operators"

 This can also be done by specifying a full path to the test:

 .. code-block:: bash

     pytest tests/test_core.py::TestCore::test_check_operators

 To run the whole test class, enter:

 .. code-block:: bash

     pytest tests/test_core.py::TestCore

 You can use all available ``pytest`` flags. For example, to increase a log level
 for debugging purposes, enter:

 .. code-block:: bash

     pytest --log-level=DEBUG tests/test_core.py::TestCore


 Running Tests for a Specified Target Using Breeze from the Host
 ---------------------------------------------------------------

 If you wish to only run tests and not to drop into shell, apply the
 ``-t``, ``--test-target`` flag. You can add extra pytest flags after ``--`` in the command line.

 .. code-block:: bash

      ./breeze test-target tests/hooks/test_druid_hook.py -- --logging-level=DEBUG

 You can run the whole test suite with a special '.' test target:

 .. code-block:: bash

     ./breeze test-target .

 You can also specify individual tests or a group of tests:

 .. code-block:: bash

     ./breeze test-target tests/test_core.py::TestCore


 Airflow Integration Tests
 =========================

 Some of the tests in Airflow are integration tests. These tests require not only ``airflow-testing`` Docker
 image but also extra images with integrations (such as ``redis``, ``mongodb``, etc.).


 Enabling Integrations
 ---------------------

 Airflow integration tests cannot be run in the local virtualenv. They can only run in the Breeze
 environment with enabled integrations and in Travis CI.

 When you are in the Breeze environment, by default all integrations are disabled. This enables only true unit tests
 to be executed in Breeze. You can enable the integration by passing the ``--integration <INTEGRATION>``
 switch when starting Breeze. You can specify multiple integrations by repeating the ``--integration`` switch
 or by using the ``--integration all`` switch that enables all integrations.

 NOTE: Every integration requires a separate container with the corresponding integration image.
 They take precious resources on your PC, mainly the memory. The started integrations are not stopped
 until you stop the Breeze environment with the ``stop`` command  and restart it
 via ``restart`` command.

 The following integrations are available:

 .. list-table:: Airflow Test Integrations
    :widths: 15 80
    :header-rows: 1

    * - Integration
      - Description
    * - cassandra
      - Integration required for Cassandra hooks
    * - kerberos
      - Integration that provides Kerberos authentication
    * - mongo
      - Integration required for MongoDB hooks
    * - openldap
      - Integration required for OpenLDAP hooks
    * - rabbitmq
      - Integration required for Celery executor tests
    * - redis
      - Integration required for Celery executor tests

 To start the ``mongo`` integration only, enter:

 .. code-block:: bash

     ./breeze --integration mongo

 To start ``mongo`` and ``cassandra`` integrations, enter:

 .. code-block:: bash

     ./breeze --integration mongo --integration cassandra

 To start all integrations, enter:

 .. code-block:: bash

     ./breeze --integration all

 In the CI environment, integrations can be enabled by specifying the ``ENABLED_INTEGRATIONS`` variable
 storing a space-separated list of integrations to start. Thanks to that, we can run integration and
 integration-less tests separately in different jobs, which is desired from the memory usage point of view.

 Note that Kerberos is a special kind of integration. There are some tests that run differently when
 Kerberos integration is enabled (they retrieve and use a Kerberos authentication token) and differently when the
 Kerberos integration is disabled (they neither retrieve nor use the token). Therefore, one of the test jobs
 for the CI system should run all tests with the Kerberos integration enabled to test both scenarios.

 Running Integration Tests
 -------------------------

 All tests using an integration are marked with a custom pytest marker ``pytest.mark.integration``.
 The marker has a single parameter - the name of an integration.

 Example of the ``redis`` integration test:

 .. code-block:: python

     @pytest.mark.integration("redis")
     def test_real_ping(self):
         hook = RedisHook(redis_conn_id='redis_default')
         redis = hook.get_conn()

         self.assertTrue(redis.ping(), 'Connection to Redis with PING works.')

 The markers can be specified at the test level or at the class level (then all tests in this class
 require an integration). You can add multiple markers with different integrations for tests that
 require more than one integration.

 If such a marked test does not have a required integration enabled, it is skipped.
 The skip message clearly says what is needed to use the test.

 To run all tests with a certain integration, use the custom pytest flag ``--integrations``,
 where you can pass integrations as comma-separated values. You can also specify ``all`` to start
 tests for all integrations.

 **NOTE:** If an integration is not enabled in Breeze or Travis CI,
 the affected test will be skipped.

 To run only ``mongo`` integration tests:

 .. code-block:: bash

     pytest --integrations mongo

 To run integration tests fot ``mongo`` and ``rabbitmq``:

 .. code-block:: bash

     pytest --integrations mongo,rabbitmq

 To runs all integration tests:

 .. code-block:: bash

     pytest --integrations all

 Note that collecting all tests takes some time. So, if you know where your tests are located, you can
 speed up the test collection significantly by providing the folder where the tests are located.

 Here is an example of the collection limited to the ``providers/apache`` directory:

 .. code-block:: bash

     pytest --integrations cassandra tests/providers/apache/

 Running Backend-Specific Tests
 ------------------------------

 Tests that are using a specific backend are marked with a custom pytest marker ``pytest.mark.backend``.
 The marker has a single parameter - the name of a backend. It corresponds to the ``--backend`` switch of
 the Breeze environment (one of ``mysql``, ``sqlite``, or ``postgres``). Backen-specific tests only run when
 the Breeze environment is running with the right backend. If you specify more than one backend
 in the marker, the test runs for all specified backends.

 Example of the ``postgres`` only test:

 .. code-block:: python

     @pytest.mark.backend("postgres")
     def test_copy_expert(self):
         ...


 Example of the ``postgres,mysql`` test (they are skipped with the ``sqlite`` backend):

 .. code-block:: python

     @pytest.mark.backend("postgres", "mysql")
     def test_celery_executor(self):
         ...


 You can use the custom ``--backend`` switch in pytest to only run tests specific for that backend.
 Here is an example of running only postgres-specific backend tests:

 .. code-block:: bash

     pytest --backend postgres

 Running Tests with Kubernetes
 -----------------------------

 Starting Kubernetes Cluster when Starting Breeze
 ................................................

 To run Kubernetes in Breeze, you can start Breeze with the ``--kind-cluster-start`` switch. This
 automatically creates a Kind Kubernetes cluster in the same ``docker`` engine that is used to run Breeze.
 Setting up the Kubernetes cluster takes some time so the cluster continues running
 until the it is stopped with the ``--kind-cluster-stop`` switch or until the ``--kind-cluster-recreate``
 switch is used rather than ``--kind-cluster-start``. Starting Breeze with the Kind Cluster automatically
 sets ``runtime`` to ``kubernetes`` (see below).

 The cluster name follows the pattern ``airflow-python-X.Y.Z-vA.B.C`` where X.Y.Z is a Python version
 and A.B.C is a Kubernetes version. This way you can have multiple clusters set up and running at the same
 time for different Python versions and different Kubernetes versions.

 The Control Plane is available from inside the Docker image via ``<CLUSTER_NAME>-control-plane:6443``
 host:port, the worker of the Kind Cluster is available at  <CLUSTER_NAME>-worker
 and webserver port for the worker is 30809.

 After the Kubernetes Cluster is started, you need to deploy Airflow to the cluster:

 1. Build the image.
 2. Load it to the Kubernetes cluster.
 3. Deploy the Airflow application.

 It can be done with a single script: ``./scripts/ci/in_container/deploy_airflow_to_kubernetes.sh``.

 You can, however, work separately on the image in Kubernetes and deploying the Airflow app in the cluster.

 Building and Loading Airflow Images to Kubernetes Cluster
 ..............................................................

 Use the script ``./scripts/ci/in_container/kubernetes/docker/rebuild_airflow_image.sh`` that does the following:

 1. Rebuilds the latest ``apache/airflow:master-pythonX.Y-ci`` images using the latest sources.
 2. Builds a new Kubernetes image based on the  ``apache/airflow:master-pythonX.Y-ci`` using
    necessary scripts added to run in Kubernetes. The image is tagged as
    ``apache/airflow:master-pythonX.Y-ci-kubernetes``.
 3. Loads the image to the Kind Cluster using the ``kind load`` command.

 Deploying the Airflow Application in the Kubernetes Cluster
 ...........................................................

 Use the script ``./scripts/ci/in_container/kubernetes/app/deploy_app.sh`` that does the following:

 1. Prepares Kubernetes resources by processing a template from the ``template`` directory and replacing
    variables with the right images and locations:
    - configmaps.yaml
    - airflow.yaml
 2. Uses the existing resources without replacing any variables inside:
    - secrets.yaml
    - postgres.yaml
    - volumes.yaml
 3. Applies all the resources to the Kind Cluster.
 4. Waits for all the applications to be ready and reachable.

 After the deployment is finished, you can run Kubernetes tests immediately in the same way as other tests.
 The Kubernetes tests are available in the ``tests/runtime/kubernetes`` folder.

 You can run all the integration tests for Kubernetes with ``pytest tests/runtime/kubernetes``.


 Running Runtime-Specific Tests
 ------------------------------

 Tests using a specific runtime are marked with a custom pytest marker ``pytest.mark.runtime``.
 The marker has a single parameter - the name of a runtime. At the moment the only supported runtime is
 ``kubernetes``. This runtime is set when you run Breeze with one of the ``--kind-cluster-*`` flags.
 Runtime-specific tests run only when the selectd runtime is started.


 .. code-block:: python

     @pytest.mark.runtime("kubernetes")
     class TestKubernetesExecutor(unittest.TestCase):


 You can use the custom ``--runtime`` switch in pytest to only run tests specific for that backend.

 To run only kubernetes-runtime backend tests, enter:

 .. code-block:: bash

     pytest --runtime kubernetes

 **NOTE:** For convenience and faster search, all runtime tests are stored in the ``tests.runtime`` package. In this case, you
 can speed up the collection of tests by running:

 .. code-block:: bash

     pytest --runtime kubernetes tests/runtime

 Travis CI Testing Framework
 ===========================

 Airflow test suite is based on Travis CI framework as running all of the tests
 locally requires significant setup. You can set up Travis CI in your fork of
 Airflow by following the
 `Travis CI Getting Started guide <https://docs.travis-ci.com/user/getting-started/>`__.

 Consider using Travis CI framework if you submit multiple pull requests
 and want to speed up your builds.

 There are two different options available for running Travis CI, and they are
 set up on GitHub as separate components:

 -   **Travis CI GitHub App** (new version)
 -   **Travis CI GitHub Services** (legacy version)

 Travis CI GitHub App (new version)
 ----------------------------------

 1.  Once `installed <https://github.com/apps/travis-ci/installations/new/permissions?target_id=47426163>`__,
     configure the Travis CI GitHub App at
     `Configure Travis CI <https://github.com/settings/installations>`__.

 2.  Set repository access to either "All repositories" for convenience, or "Only
     select repositories" and choose ``USERNAME/airflow`` in the drop-down menu.

 3.   Access Travis CI for your fork at `<https://travis-ci.com/USERNAME/airflow>`__.

 Travis CI GitHub Services (legacy version)
 ------------------------------------------

 **NOTE:** The apache/airflow project is still using the legacy version.

 Travis CI GitHub Services version uses an Authorized OAuth App.

 1.  Once installed, configure the Travis CI Authorized OAuth App at
     `Travis CI OAuth APP <https://github.com/settings/connections/applications/88c5b97de2dbfc50f3ac>`__.

 2.  If you are a GitHub admin, click the **Grant** button next to your
     organization; otherwise, click the **Request** button. For the Travis CI
     Authorized OAuth App, you may have to grant access to the forked
     ``ORGANIZATION/airflow`` repo even though it is public.

 3.  Access Travis CI for your fork at
     `<https://travis-ci.org/ORGANIZATION/airflow>`_.

 Creating New Projects in Travis CI
 ----------------------------------

 If you need to create a new project in Travis CI, use travis-ci.com for both
 private repos and open source.

 The travis-ci.org site for open source projects is now legacy and you should not use it.

 ..
     There is a second Authorized OAuth App available called **Travis CI for Open Source** used
     for the legacy travis-ci.org service. Don't use it for new projects!

 More information:

 -  `Open Source on travis-ci.com <https://docs.travis-ci.com/user/open-source-on-travis-ci-com/>`__.
 -  `Legacy GitHub Services to GitHub Apps Migration Guide <https://docs.travis-ci.com/user/legacy-services-to-github-apps-migration-guide/>`__.
 -  `Migrating Multiple Repositories to GitHub Apps Guide <https://docs.travis-ci.com/user/travis-migrate-to-apps-gem-guide/>`__.

 Airflow System Tests
 ====================

 System tests need to communicate with external services/systems that are available
 if you have appropriate credentials configured for your tests.
 The system tests derive from the ``tests.test_utils.system_test_class.SystemTests`` class. They should also
 be marked with ``@pytest.marker.system(SYSTEM)`` where ``system`` designates the system
 to be tested (for example, ``google.cloud``). These tests are skipped by default.
 You can execute the system tests by providing the ``--systems SYSTEMS`` flag to ``pytest``.

 The system tests execute a specified example DAG file that runs the DAG end-to-end.

 See more details about adding new system tests below.

 Environment for System Tests
 ----------------------------

 **Prerequisites:** You may need to set some variables to run system tests. If you need to
 add some initialization of environment variables to Breeze, you can add a
 ``variables.env`` file in the ``files/airflow-breeze-config/variables.env`` file. It will be automatically
 sourced when entering the Breeze environment. You can also add some additional
 initialization commands in this file if you want to execute something
 always at the time of entering Breeze.

 There are several typical operations you might want to perform such as:

 * generating a file with random value used across the whole Breeze session (this is useful if
   you want to use this random number in names of resources that you create in your service
 * generate variables that will be used as name of your resources
 * decrypt any variables and resources you keep as encrypted in your configuration files
 * install additional packages that are needed in case you are doing tests with 1.10.* Airflow series
   (see below)

 Example variables.env file is shown here (this is part of the variables.env file that is used to
 run Google Cloud system tests.

 .. code-block:: bash

   # Build variables. This file is sourced by Breeze.
   # Also it is sourced during continuous integration build in Cloud Build

   # Auto-export all variables
   set -a

   echo
   echo "Reading variables"
   echo

   # Generate random number that will be used across your session
   RANDOM_FILE="/random.txt"

   if [[ ! -f "${RANDOM_FILE}" ]]; then
       echo "${RANDOM}" > "${RANDOM_FILE}"
   fi

   RANDOM_POSTFIX=$(cat "${RANDOM_FILE}")

   # install any packages from dist folder if they are available
   if [[ ${RUN_AIRFLOW_1_10:=} == "true" ]]; then
       pip install /dist/apache_airflow_providers_{google,postgres,mysql}*.whl || true
   fi

 To execute system tests, specify the ``--systems SYSTEMS``
 flag where ``SYSTEMS`` is a coma-separated list of systems to run the system tests for.


 Forwarding Authentication from the Host
 ----------------------------------------------------

 For system tests, you can also forward authentication from the host to your Breeze container. You can specify
 the ``--forward-credentials`` flag when starting Breeze. Then, it will also forward the most commonly used
 credentials stored in your ``home`` directory. Use this feature with care as it makes your personal credentials
 visible to anything that you have installed inside the Docker container.

 Currently forwarded credentials are:
   * all credentials stored in ``${HOME}/.config`` (for example, GCP credentials)
   * credentials stored in ``${HOME}/.gsutil`` for ``gsutil`` tool from GCS
   * credentials stored in ``${HOME}/.boto`` and ``${HOME}/.s3`` (for AWS authentication)
   * credentials stored in ``${HOME}/.docker`` for docker
   * credentials stored in ``${HOME}/.kube`` for kubectl
   * credentials stored in ``${HOME}/.ssh`` for SSH


 Adding a New System Test
 --------------------------

 We are working on automating system tests execution (AIP-4) but for now system tests are skipped when
 tests are run in our CI system. But to enable the test automation, we encourage you to add system
 tests whenever an operator/hook/sensor is added/modified in a given system.

 * To add your own system tests, derive them from the
   ``tests.test_utils.system_tests_class.SystemTest` class and mark with the
   ``@pytest.mark.system(SYSTEM_NAME)`` marker. The system name should follow the path defined in
   the ``providers`` package (for example, the system tests from ``tests.providers.google.cloud``
   package should be marked with ``@pytest.mark.system("google.cloud")``.

 * If your system tests need some credential files to be available for an
   authentication with external systems, make sure to keep these credentials in the
   ``files/airflow-breeze-config/keys`` directory. Mark your tests with
   ``@pytest.mark.credential_file(<FILE>)`` so that they are skipped if such a credential file is not there.
   The tests should read the right credentials and authenticate on their own. The credentials are read
   in Breeze from the ``/files`` directory. The local "files" folder is mounted to the "/files" folder in Breeze.

 * If your system tests are long-lasting ones (i.e., require more than 20-30 minutes
   to complete), mark them with the ```@pytest.markers.long_running`` marker.
   Such tests are skipped by default unless you specify the ``--long-running`` flag to pytest.

 * The system test itself (python class) does not have any logic. Such a test runs
   the DAG specified by its ID. This DAG should contain the actual DAG logic
   to execute. Make sure to define the DAG in ``providers/<SYSTEM_NAME>/example_dags``. These example DAGs
   are also used to take some snippets of code out of them when documentation is generated. So, having these
   DAGs runnable is a great way to make sure the documenation is describing a working example. Inside
   your test class/test method, simply use ``self.run_dag(<DAG_ID>,<DAG_FOLDER>)`` to run the DAG. Then,
   the system class will take care about running the DAG. Note that the DAG_FOLDER should be
   a subdirectory of the ``tests.test_utils.AIRFLOW_MAIN_FOLDER`` + ``providers/<SYSTEM_NAME>/example_dags``.


 A simple example of a system test is available in:

 ``tests/providers/google/cloud/operators/test_compute_system.py``.

 It runs two DAGs defined in ``airflow.providers.google.cloud.example_dags.example_compute.py`` and
 ``airflow.providers.google.cloud.example_dags.example_compute_igm.py``.

 Preparing backport packages for System Tests for Airflow 1.10.* series
 ----------------------------------------------------------------------

 In order to run system tests with old Airflow version you need to prepare backport packages. This
 can be done by running ``./scripts/ci/ci_prepare_backport_packages.sh <PACKAGES TO BUILD>``. For
 example the below command will build google postgres and mysql packages:

 .. code-block:: bash

   ./scripts/ci/ci_prepare_backport_packages.sh google postgres mysql

 Those packages will be prepared in ./dist folder. This folder is mapped to /dist folder
 when you enter Breeze, so it is easy to automate installing of those packages for testing.


 Installing backported for Airflow 1.10.* series
 -----------------------------------------------

 The tests can be executed against the master version of Airflow but they also work
 with older versions. This is especially useful to test back-ported operators
 from Airflow 2.0 to 1.10.* versions.

 To run the tests for Airflow 1.10.* series, you need to run Breeze with
 ``--install-airflow-version=<VERSION>`` to install a different version of Airflow.
 If ``current`` is specified (default), then the current version of Airflow is used.
 Otherwise, the released version of Airflow is installed.

 The ``-install-airflow-version=<VERSION>`` command make sure that the current (from sources) version of
 Airflow is removed and the released version of Airflow from ``Pypi`` is installed. Note that tests sources
 are not removed and they can be used to run tests (unit tests and system tests) against the
 freshly installed version.

 You should automate installing of the backport packages in your own
 ``./files/airflow-breeze-config/variables.env`` file. You should make it depend on
 ``RUN_AIRFLOW_1_10`` variable value equals to "true" so that
 installation of backport packages is only performed when you install airflow 1.10.*.
 The backport packages are available in ``/dist`` directory if they were prepared as described
 in the previous chapter.

 Typically the command in you variables.env file will be similar to:

 .. code-block:: bash

   # install any packages from dist folder if they are available
   if [[ ${RUN_AIRFLOW_1_10:=} == "true" ]]; then
       pip install /dist/apache_airflow_providers_{google,postgres,mysql}*.whl || true
   fi

 The command above will automatically install backported google, postgres amd mysql packages if they
 were prepared before entering breeze.


 Running system tests for backported packages in Airflow 1.10.* series
 ---------------------------------------------------------------------

 Once you installed 1.10.* Airflow version with ``--install-airflow-version`` and prepared and
 installed the required packages via ``variables.env`` it should be as easy as running
 ``pytest --systems=<SYSTEM_NAME> TEST_NAME``. Note that we have default timeout for runnning
 system tests set to 8 minutes and some system tests might take much longer to run and you might
 want to add ``-o faulthandler_timeout=2400`` (2400s = 40 minutes for example) to your
 pytest command.

 Typical system test session
 ---------------------------

 Here is the typical session that you need to do to run system tests:

 1. Prepare backport packages

 .. code-block:: bash

   ./scripts/ci/ci_prepare_backport_packages.sh google postgres mysql

 2. Enter breeze with installing Airflow 1.10.*, forwarding credentials and installing
    backported packages (you need appropriate line in ``./files/airflow-breeze-config/variables.env``)

 .. code-block:: bash

    ./breeze --install-airflow-version 1.10.9 --python 3.6 --db-reset --forward-credentials restart

 This will:

 * install Airflow 1.10.9
 * restarts the whole environment (i.e. recreates metadata database from the scratch)
 * run Breeze with python 3.6 version
 * reset the Airflow database
 * forward your local credentials to Breeze

 3. Run the tests:

 .. code-block:: bash

    pytest -o faulthandler_timeout=2400 --systems=google tests/providers/google/cloud/operators/test_compute_system.py


 Iteration with System Tests if your resources are slow to create
 ----------------------------------------------------------------

 When you want to iterate on system tests, you might want to create slow resources first.

 If you need to setup some external resources for your tests (for example compute instances in Google Cloud)
 you should set them up and teardown in the setUp/tearDown methods of your tests.
 Since those resources might be slow to create you might want to add some helpers that
 set them up and tear them down separately via manual operations. This way you can iterate on
 the tests without waiting for setUp and tearDown with every test.

 In this case you should build in a mechanism to skip setUp and tearDown in case you manually
 created the resources. Somewhat complex example of that can be found in
 ``tests.providers.google.cloud.operators.test_cloud_sql_system.py`` and the helper is
 available in ``tests.providers.google.cloud.operators.test_cloud_sql_system_helper.py``.

 When the helper is run with ``--action create`` to create cloud sql instances which are very slow
 to create and set-up so that you can iterate on running the system tests without
 loosing the time for creating theme every time. A temporary file is created to prevent from
 setting up and tearing down the instances when running the test.

 This example also shows how you can use the random number generated at the entry of Breeze if you
 have it in your variables.env (see previous chapter). In case of Cloud SQL you cannot reuse the
 same instance name for a week so we generate random number that is used across the whole session
 and store it in ``/random.txt`` file so that the names are unique during tests.


 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!!

 Do not forget to delete manually created resources before leaving the
 Breeze session. They are usually expensive to run.

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!!

 Note that in case you have to update your backported operators or system tests (they are part of
 the backport packageS) you need to rebuild the packages outside of breeze and
 ``pip remove/pip install`` those packages in order to get them installed. This is not needed
 if you run system tests with ``current`` airflow version, so it is better to iterate with the
 system tests with the ``current`` version and fix all problems there and only afterwards run
 the tests with Airflow 1.10.*

 Typical session then looks as follows:

 1. Prepare backport packages

 .. code-block:: bash

   ./scripts/ci/ci_prepare_backport_packages.sh google postgres mysql

 2. Enter breeze with installing Airflow 1.10.*, forwarding credentials and installing
    backported packages (you need appropriate line in ``./files/airflow-breeze-config/variables.env``)

 .. code-block:: bash

    ./breeze --install-airflow-version 1.10.9 --python 3.6 --db-reset --forward-credentials restart

 3. Run create action in helper (to create slowly created resources):

 .. code-block:: bash

     python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action create

 4. Run the tests:

 .. code-block:: bash

    pytest -o faulthandler_timeout=2400 --systems=google tests/providers/google/cloud/operators/test_compute_system.py

 5. In case you are running backport packages tests you need to rebuild and reinstall a package
    every time you change the operators/hooks or example_dags. The example below shows reinstallation
    of the google package:

 In the host:

 .. code-block:: bash

   ./scripts/ci/ci_prepare_backport_packages.sh google

 In the container:

 .. code-block:: bash

   pip uninstall apache-airflow-providers-google
   pip install /dist/apache_airflow_providers_google-*.whl

 The points 4. and 5. can be repeated multiple times without leaving the container

 6. Run delete action in helper:

 .. code-block:: bash

     python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action delete


 Local and Remote Debugging in IDE
 =================================

 One of the great benefits of using the local virtualenv and Breeze is an option to run
 local debugging in your IDE graphical interface.

 When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate
 container. This makes it a little harder to use with IDE built-in debuggers.
 Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions).
 See additional details on
 `remote debugging <https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html>`_.

 You can set up your remote debugging session as follows:

 .. image:: images/setup_remote_debugging.png
     :align: center
     :alt: Setup remote debugging

 Note that on macOS, you have to use a real IP address of your host rather than default
 localhost because on macOS the container runs in a virtual machine with a different IP address.

 Make sure to configure source code mapping in the remote debugging configuration to map
 your local sources to the ``/opt/airflow`` location of the sources within the container:

 .. image:: images/source_code_mapping_ide.png
     :align: center
     :alt: Source code mapping

 Remote Debugging on Travis CI
 =============================

 You can also connect IDE to Travis CI. To do this, you must pass a public IP address in the ``pydevd.set_trace``
 method. If you do not have a public IP address then you can start the virtual machine at the cloud service
 provider for the time of debugging. You also need to install the appropriate debugger in Travis CI. You
 need to modify the ``scripts/ci/in_container/entrypoint.sh`` file to add an pip install command
 after the license header.

 Setup VM on GCP with SSH forwarding
 -----------------------------------

 Below are the steps you need to take to set up your virtual machine in the Google Cloud Platform.

 1. The next steps will assume that you have configured environment variables with the name of the network and
    virtual machine, project ID and the zone where the virtual machine will be created

     .. code-block:: bash

       PROJECT_ID="<PROJECT_ID>"
       GCP_ZONE="europe-west3-a"
       GCP_NETWORK_NAME="airflow-debugging"
       GCP_INSTANCE_NAME="airflow-debugging-ci"

 2. It is necessary to configure the network and firewall for your machine.
    The firewall must have unblocked access to port 22 for SSH traffic and any other port for the debugger.
    In the example for the debugger we will use port 5555.

     .. code-block:: bash

       gcloud compute --project="${PROJECT_ID}" networks create "${GCP_NETWORK_NAME}" \
         --subnet-mode=auto

       gcloud compute --project="${PROJECT_ID}" firewall-rules create "${GCP_NETWORK_NAME}-allow-ssh" \
         --network "${GCP_NETWORK_NAME}" \
         --allow tcp:22 \
         --source-ranges 0.0.0.0/0

       gcloud compute --project="${PROJECT_ID}" firewall-rules create "${GCP_NETWORK_NAME}-allow-debugger" \
         --network "${GCP_NETWORK_NAME}" \
         --allow tcp:5555 \
         --source-ranges 0.0.0.0/0

 3. If you have a network, you can create a virtual machine. To save costs, you can create a `Preemptible
    virtual machine <https://cloud.google.com/preemptible-vms>` that is automatically deleted for up
    to 24 hours.

     .. code-block:: bash

       gcloud beta compute --project="${PROJECT_ID}" instances create "${GCP_INSTANCE_NAME}" \
         --zone="${GCP_ZONE}" \
         --machine-type=f1-micro \
         --subnet="${GCP_NETWORK_NAME}" \
         --image=debian-10-buster-v20200210 \
         --image-project=debian-cloud \
         --preemptible

     To check the public IP address of the machine, you can run the command

     .. code-block:: bash

       gcloud compute --project="${PROJECT_ID}" instances describe "${GCP_INSTANCE_NAME}" \
         --zone="${GCP_ZONE}" \
         --format='value(networkInterfaces[].accessConfigs[0].natIP.notnull().list())'

 4. The SSH Deamon's default configuration does not allow traffic forwarding to public addresses.
    To change it, modify the ``GatewayPorts`` options in the ``/etc/ssh/sshd_config`` file to ``Yes``
    and restart the SSH deamon.

     .. code-block:: bash

       gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \
         --zone="${GCP_ZONE}" -- \
         sudo sed -i "s/#\?\s*GatewayPorts no/GatewayPorts Yes/" /etc/ssh/sshd_config

       gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \
         --zone="${GCP_ZONE}" -- \
         sudo service sshd restart

 5. To start port forwarding, run the following command:

     .. code-block:: bash

       gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \
         --zone="${GCP_ZONE}" -- \
         -N \
         -R 0.0.0.0:5555:localhost:5555 \
         -v

 If you have finished using the virtual machine, remember to delete it.

     .. code-block:: bash

       gcloud beta compute --project="${PROJECT_ID}" instances delete "${GCP_INSTANCE_NAME}" \
         --zone="${GCP_ZONE}"

 You can use the GCP service for free, if you use the `Free Tier <https://cloud.google.com/free>`__.

 DAG Testing
 ===========

 To ease and speed up process of developing DAGs, you can use
 py:class:`~airflow.executors.debug_executor.DebugExecutor`, which is a single process executor
 for debugging purposes. Using this executor, you can run and debug DAGs from your IDE.

 To set up the IDE:

 1. Add ``main`` block at the end of your DAG file to make it runnable.
 It will run a backfill job:

 .. code-block:: python

   if __name__ == '__main__':
     dag.clear(reset_dag_runs=True)
     dag.run()


 2. Set up ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in the run configuration of your IDE.
    Make sure to also set up all environment variables required by your DAG.

 3. Run and debug the DAG file.

 Additionally, ``DebugExecutor`` can be used in a fail-fast mode that will make
 all other running or scheduled tasks fail immediately. To enable this option, set
 ``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``.

 Also, with the Airflow CLI command ``airflow dags test``, you can execute one complete run of a DAG:

 .. code-block:: bash

     # airflow dags test [dag_id] [execution_date]
     airflow dags test example_branch_operator 2018-01-01

 By default ``/files/dags`` folder is mounted from your local ``<AIRFLOW_SOURCES>/files/dags`` and this is
 the directory used by airflow scheduler and webserver to scan dags for. You can place your dags there
 to test them.

 The DAGs can be run in the master version of Airflow but they also work
 with older versions.

 To run the tests for Airflow 1.10.* series, you need to run Breeze with
 ``--install-airflow-version==<VERSION>`` to install a different version of Airflow.
 If ``current`` is specified (default), then the current version of Airflow is used.
 Otherwise, the released version of Airflow is installed.

 You should also consider running it with ``restart`` command when you change installed version.
 This will clean-up the database so that you start with a clean DB and not DB installed in a previous version.
 So typically you'd run it like ``breeze --install-ariflow-version=1.10.9 restart``.

 BASH Unit Testing (BATS)
 ========================

 We have started adding tests to cover Bash scripts we have in our codebase.
 The tests are placed in the ``tests\bats`` folder.
 They require BAT CLI to be installed if you want to run them on your
 host or via a Docker image.

 Installing BATS CLI
 ---------------------

 You can find an installation guide as well as information on how to write
 the bash tests in `BATS Installation <https://github.com/bats-core/bats-core#installation>`_.

 Running BATS Tests on the Host
 ------------------------------

 To run all tests:

 ```
 bats -r tests/bats/
 ```

 To run a single test:

 ```
 bats tests/bats/your_test_file.bats
 ```

 Running BATS Tests via Docker
 -----------------------------

 To run all tests:

 ```
 docker run -it --workdir /airflow -v $(pwd):/airflow  bats/bats:latest -r /airflow/tests/bats
 ```

 To run a single test:

 ```
 docker run -it --workdir /airflow -v $(pwd):/airflow  bats/bats:latest /airflow/tests/bats/your_test_file.bats
 ```

 Using BATS
 ----------

 You can read more about using BATS CLI and writing tests in
 `BATS Usage <https://github.com/bats-core/bats-core#usage>`_.