| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| .. contents:: :local: |
| |
| Airflow Test Infrastructure |
| =========================== |
| |
| * **Unit tests** are Python tests that do not require any additional integrations. |
| Unit tests are available both in the `Breeze environment <BREEZE.rst>`__ |
| and local virtualenv. |
| |
| * **Integration tests** are available in the Breeze development environment |
| that is also used for Airflow CI tests. Integration tests are special tests that require |
| additional services running, such as Postgres, MySQL, Kerberos, etc. |
| |
| * **System tests** are automatic tests that use external systems like |
| Google Cloud. These tests are intended for an end-to-end DAG execution. |
| The tests can be executed on both the current version of Apache Airflow and any older |
| versions from 1.10.* series. |
| |
| This document is about running Python tests. Before the tests are run, use |
| `static code checks <STATIC_CODE_CHECKS.rst>`__ that enable catching typical errors in the code. |
| |
| Airflow Unit Tests |
| ================== |
| |
| All tests for Apache Airflow are run using `pytest <http://doc.pytest.org/en/latest/>`_ . |
| |
| Writing Unit Tests |
| ------------------ |
| |
| Follow the guidelines when writing unit tests: |
| |
| * For standard unit tests that do not require integrations with external systems, make sure to simulate all communications. |
| * All Airflow tests are run with ``pytest``. Make sure to set your IDE/runners (see below) to use ``pytest`` by default. |
| * For new tests, use standard "asserts" of Python and ``pytest`` decorators/context managers for testing |
| rather than ``unittest`` ones. See `pytest docs <http://doc.pytest.org/en/latest/assert.html>`_ for details. |
| * Use a parameterized framework for tests that have variations in parameters. |
| * Use with ``pytest.warn`` to capture warnings rather than ``recwarn`` fixture. We are aiming for 0-warning in our |
| tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that |
| overrides ``recwarn`` fixture behaviour. |
| |
| |
| Airflow configuration for unit tests |
| ------------------------------------ |
| |
| Some of the unit tests require special configuration set as the ``default``. This is done automatically by |
| adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used |
| fixture. This in turn makes Airflow load test configuration from the file |
| ``airflow/config_templates/unit_tests.cfg``. Test configuration from there replaces the original |
| defaults from ``airflow/config_templates/config.yml``. If you want to add some test-only configuration, |
| as default for all tests you should add the value to this file. |
| |
| You can also - of course - override the values in individual test by patching environment variables following |
| the usual ``AIRFLOW__SECTION__KEY`` pattern or ``conf_vars`` context manager. |
| |
| Airflow test types |
| ------------------ |
| |
| Airflow tests in the CI environment are split into several test types. You can narrow down which |
| test types you want to use in various ``breeze testing`` sub-commands in three ways: |
| |
| * via specifying the ``--test-type`` when you run single test type in ``breeze testing tests`` command |
| * via specifying space separating list of test types via ``--paralleltest-types`` or |
| ``--exclude-parallel-test-types`` options when you run tests in parallel (in several testing commands) |
| |
| Those test types are defined: |
| |
| * ``Always`` - those are tests that should be always executed (always sub-folder) |
| * ``API`` - Tests for the Airflow API (api, api_connexion, api_experimental and api_internal sub-folders) |
| * ``CLI`` - Tests for the Airflow CLI (cli folder) |
| * ``Core`` - for the core Airflow functionality (core, executors, jobs, models, ti_deps, utils sub-folders) |
| * ``Operators`` - tests for the operators (operators folder with exception of Virtualenv Operator tests and |
| External Python Operator tests that have their own test type). They are skipped by the |
| ``virtualenv_operator`` and ``external_python_operator`` test markers that the tests are marked with. |
| * ``WWW`` - Tests for the Airflow webserver (www folder) |
| * ``Providers`` - Tests for all Providers of Airflow (providers folder) |
| * ``PlainAsserts`` - tests that require disabling ``assert-rewrite`` feature of Pytest (usually because |
| a buggy/complex implementation of an imported library) (``plain_asserts`` marker) |
| * ``Other`` - all other tests remaining after the above tests are selected |
| |
| There are also Virtualenv/ExternalPython operator test types that are excluded from ``Operators`` test type |
| and run as separate test types. Those are : |
| |
| * ``PythonVenv`` - tests for PythonVirtualenvOperator - selected directly as TestPythonVirtualenvOperator |
| * ``ExternalPython`` - tests for ExternalPythonOperator - selected directly as TestExternalPythonOperator |
| * ``BranchExternalPython`` - tests for BranchExternalPythonOperator - selected directly as TestBranchExternalPythonOperator |
| |
| We have also tests that run "all" tests (so they do not look at the folder, but at the ``pytest`` markers |
| the tests are marked with to run with some filters applied. |
| |
| * ``All-Postgres`` - tests that require Postgres database. They are only run when backend is Postgres (``backend("postgres")`` marker) |
| * ``All-MySQL`` - tests that require MySQL database. They are only run when backend is MySQL (``backend("mysql")`` marker) |
| * ``All-Quarantined`` - tests that are flaky and need to be fixed (``quarantined`` marker) |
| * ``All`` - all tests are run (this is the default) |
| |
| |
| We also have ``Integration`` tests that are running Integration tests with external software that is run |
| via ``--integration`` flag in ``breeze`` environment - via ``breeze testing integration-tests``. |
| |
| * ``Integration`` - tests that require external integration images running in docker-compose |
| |
| This is done for three reasons: |
| |
| 1. in order to selectively run only subset of the test types for some PRs |
| 2. in order to allow efficient parallel test execution of the tests on Self-Hosted runners |
| |
| For case 2. We can utilise memory and CPUs available on both CI and local development machines to run |
| test in parallel, but we cannot use pytest xdist plugin for that - we need to split the tests into test |
| types and run each test type with their own instance of database and separate container where the tests |
| in each type are run with exclusive access to their database and each test within test type runs sequentially. |
| By the nature of those tests - they rely on shared databases - and they update/reset/cleanup data in the |
| databases while they are executing. |
| |
| |
| DB and non-DB tests |
| ------------------- |
| |
| There are two kinds of unit tests in Airflow - DB and non-DB tests. |
| |
| Some of the tests of Airflow (around 7000 of them on October 2023) |
| require a database to connect to in order to run. Those tests store and read data from Airflow DB using |
| Airflow's core code and it's crucial to run the tests against all real databases that Airflow supports in order |
| to check if the SQLAlchemy queries are correct and if the database schema is correct. |
| |
| Those tests should be marked with ``@pytest.mark.db`` decorator on one of the levels: |
| |
| * test method can be marked with ``@pytest.mark.db`` decorator |
| * test class can be marked with ``@pytest.mark.db`` decorator |
| * test module can be marked with ``pytestmark = pytest.mark.db`` at the top level of the module |
| |
| Airflow's CI runs different test kinds separately. |
| |
| For the DB tests, they are run against the multiple databases Airflow support, multiple versions of those |
| and multiple Python versions it supports. In order to save time for testing not all combinations are |
| tested but enough various combinations are tested to detect potential problems. |
| |
| As of October 2023, Airflow has ~9000 Non-DB tests and around 7000 DB tests. |
| |
| Airflow non-DB tests |
| -------------------- |
| |
| For the Non-DB tests, they are run once for each tested Python version with ``none`` database backend (which |
| causes any database access to fail. Those tests are run with ``pytest-xdist`` plugin in parallel which |
| means that we can efficiently utilised multi-processor machines (including ``self-hosted`` runners with |
| 8 CPUS we have to run the tests with maximum parallelism). |
| |
| It's usually straightforward to run those tests in local virtualenv because they do not require any |
| setup or running database. They also run much faster than DB tests. You can run them with ``pytest`` command |
| or with ``breeze`` that has all the dependencies needed to run all tests automatically installed. Of course |
| you can also select just specific test or folder or module for the Pytest to collect/run tests from there, |
| the example below shows how to run all tests, parallelising them with ``pytest-xdist`` |
| (by specifying ``tests`` folder): |
| |
| .. code-block:: bash |
| |
| pytest tests --skip-db-tests -n auto |
| |
| |
| The ``--skip-db-tests`` flag will only run tests that are not marked as DB tests. |
| |
| |
| You can also run ``breeze`` command to run all the tests (they will run in a separate container, |
| the selected python version and without access to any database). Adding ``--use-xdist`` flag will run all |
| tests in parallel using ``pytest-xdist`` plugin. |
| |
| We have a dedicated, opinionated ``breeze testing non-db-tests`` command as well that runs non-DB tests |
| (it is also used in CI to run the non-DB tests, where you do not have to specify extra flags for |
| parallel running and you can run all the Non-DB tests |
| (or just a subset of them with ``--parallel-test-types`` or ``--exclude-parallel-test-types``) in parallel: |
| |
| .. code-block:: bash |
| |
| breeze testing non-db-tests |
| |
| You can pass ``--parallel-test-type`` list of test types to execute or ``--exclude--parallel-test-types`` |
| to exclude them from the default set:. |
| |
| .. code-block:: bash |
| |
| breeze testing non-db-tests --parallel-test-types "Providers API CLI" |
| |
| |
| .. code-block:: bash |
| |
| breeze testing non-db-tests --exclude-parallel-test-types "Providers API CLI" |
| |
| You can also run the same commands via ``breeze testing tests`` - by adding the necessary flags manually: |
| |
| .. code-block:: bash |
| |
| breeze testing tests --skip-db-tests --backend none --use-xdist |
| |
| Also you can enter interactive shell with ``breeze`` and run tests from there if you want to iterate |
| with the tests. Source files in ``breeze`` are mounted as volumes so you can modify them locally and |
| rerun in Breeze as you will (``-n auto`` will parallelize tests using ``pytest-xdist`` plugin): |
| |
| .. code-block:: bash |
| |
| breeze shell --backend none --python 3.8 |
| > pytest tests --skip-db-tests -n auto |
| |
| |
| Airflow DB tests |
| ---------------- |
| |
| Airflow DB tests require database to run. It can be any of the supported Airflow Databases and they can |
| be run either using local virtualenv or Breeze |
| |
| |
| |
| By default, the DB tests will use sqlite and the "airflow.db" database created and populated in the |
| ``${AIRFLOW_HOME}`` folder. You do not need to do anything to get the database created and initialized, |
| but if you need to clean and restart the db, you can run tests with ``-with-db-init`` flag - then the |
| database will be re-initialized. You can also set ``AIRFLOW__DATABASE__SQL_ALCHEMY_CONN`` environment |
| variable to point to supported database (Postgres, MySQL, etc.) and the tests will use that database. You |
| might need to run ``airflow db reset`` to initialize the database in that case. |
| |
| The "non-DB" tests are perfectly fine to run when you have database around but if you want to just run |
| DB tests (as happens in our CI for the ``Database`` runs) you can use ``--run-db-tests-only`` flag to filter |
| out non-DB tests (and obviously you can specify not only on the whole ``tests`` directory but on any |
| folders/files/tests selection, ``pytest`` supports). |
| |
| .. code-block:: bash |
| |
| pytest tests/ --run-db-tests-only |
| |
| You can also run DB tests with ``breeze`` dockerized environment. You can choose backend to use with |
| ``--backend`` flag. The default is ``sqlite`` but you can also use others such as ``postgres`` or ``mysql``. |
| You can also select backend version and Python version to use. You can specify the ``test-type`` to run - |
| breeze will list the test types you can run with ``--help`` and provide auto-complete for them. Example |
| below runs the ``Core`` tests with ``postgres`` backend and ``3.8`` Python version: |
| |
| We have a dedicated, opinionated ``breeze testing db-tests`` command as well that runs DB tests |
| (it is also used in CI to run the DB tests, where you do not have to specify extra flags for |
| parallel running and you can run all the DB tests |
| (or just a subset of them with ``--parallel-test-types`` or ``--exclude-parallel-test-types``) in parallel: |
| |
| .. code-block:: bash |
| |
| breeze testing non-db-tests --backent postgres |
| |
| You can pass ``--parallel-test-type`` list of test types to execute or ``--exclude--parallel-test-types`` |
| to exclude them from the default set:. |
| |
| .. code-block:: bash |
| |
| breeze testing db-tests --parallel-test-types "Providers API CLI" |
| |
| |
| .. code-block:: bash |
| |
| breeze testing db-tests --exclude-parallel-test-types "Providers API CLI" |
| |
| You can also run the same commands via ``breeze testing tests`` - by adding the necessary flags manually: |
| |
| .. code-block:: bash |
| |
| breeze testing tests --run-db-tests-only --backend postgres --run-tests-in-parallel |
| |
| |
| Also - if you want to iterate with the tests you can enter interactive shell and run the tests iteratively - |
| either by package/module/test or by test type - whatever ``pytest`` supports. |
| |
| .. code-block:: bash |
| |
| breeze shell --backend postgres --python 3.8 |
| > pytest tests --run-db-tests-only |
| |
| As explained before, you cannot run DB tests in parallel using ``pytest-xdist`` plugin, but ``breeze`` has |
| support to split all the tests into test-types to run in separate containers and with separate databases |
| and you can run the tests using ``--run-tests-in-parallel`` flag (which is automatically enabled when |
| you use ``breeze testing db-tests`` command): |
| |
| .. code-block:: bash |
| |
| breeze testing tests --run-db-tests-only --backend postgres --python 3.8 --run-tests-in-parallel |
| |
| |
| Best practices for DB tests |
| =========================== |
| |
| Usually when you add new tests you add tests "similar" to the ones that are already there. In most cases, |
| therefore you do not have to worry about the test type - it will be automatically selected for you by the |
| fact that the Test Class that you add the tests or the whole module will be marked with ``db_test`` marker. |
| |
| You should strive to write "pure" non-db unit tests (i.e. DB tests) but sometimes it's just better to plug-in |
| the existing framework of DagRuns, Dags, Connections and Variables to use the Database directly rather |
| than having to mock the DB access for example. It's up to you to decide. |
| |
| However, if you choose to write DB tests you have to make sure you add the ``db_test`` marker - either to |
| the test method, class (with decorator) or whole module (with pytestmark at the top level of the module). |
| |
| In most cases when you add tests to existing modules or classes, you follow similar tests so you do not |
| have to do anything, but in some cases you need to decide if your test should be marked as DB test or |
| whether it should be changed to not use the database at all. |
| |
| If your test accesses the database but is not marked properly the Non-DB test in CI will fail with this message: |
| |
| .. code :: |
| |
| "Your test accessed the DB but `_AIRFLOW_SKIP_DB_TESTS` is set. |
| Either make sure your test does not use database or mark your test with `@pytest.mark.db_test`. |
| |
| Marking test as DB test |
| ----------------------- |
| |
| You can apply the marker on method/function/class level with ``@pytest.mark.db_test`` decorator or |
| at the module level with ``pytestmark = pytest.mark.db_test`` at the top level of the module. |
| |
| It's up to the author to decide whether to mark the test, class, or module as "DB-test" - generally the |
| less DB tests - the better and if we can clearly separate the parts that are DB from non-DB, we should, |
| but also it's ok if few tests are marked as DB tests when they are not but they are part of the class |
| or module that is "mostly-DB". |
| |
| Sometimes, when your class can be clearly split to DB and non-DB parts, it's better to split the class |
| into two separate classes and mark only the DB class as DB test. |
| |
| Method level: |
| |
| .. code-block:: python |
| |
| import pytest |
| |
| |
| @pytest.mark.db_test |
| def test_add_tagging(self, sentry, task_instance): |
| ... |
| |
| Class level: |
| |
| |
| .. code-block:: python |
| |
| import pytest |
| |
| |
| @pytest.mark.db_test |
| class TestDatabricksHookAsyncAadTokenSpOutside: |
| ... |
| |
| Module level (at the top of the module): |
| |
| .. code-block:: python |
| |
| import pytest |
| |
| from airflow.models.baseoperator import BaseOperator |
| from airflow.models.dag import DAG |
| from airflow.ti_deps.dep_context import DepContext |
| from airflow.ti_deps.deps.task_concurrency_dep import TaskConcurrencyDep |
| |
| pytestmark = pytest.mark.db_test |
| |
| |
| How to verify if DB test is correctly classified |
| ------------------------------------------------ |
| |
| When you add if you want to see if your DB test is correctly classified, you can run the test or group |
| of tests with ``--skip-db-tests`` flag. |
| |
| You can run the all (or subset of) test types if you want to make sure all ot the problems are fixed |
| |
| .. code-block:: bash |
| |
| breeze testing tests --skip-db-tests tests/your_test.py |
| |
| For the whole test suite you can run: |
| |
| .. code-block:: bash |
| |
| breeze testing non-db-tests |
| |
| For selected test types (example - the tests will run for Providers/API/CLI code only: |
| |
| .. code-block:: bash |
| |
| breeze testing non-db-tests --parallel-test-types "Providers API CLI" |
| |
| |
| How to make your test not depend on DB |
| -------------------------------------- |
| |
| This is tricky and there is no single solution. Sometimes we can mock-out the methods that require |
| DB access or objects that normally require database. Sometimes we can decide to test just sinle method |
| of class rather than more complex set of steps. Generally speaking it's good to have as many "pure" |
| unit tests that require no DB as possible comparing to DB tests. They are usually faster an more |
| reliable as well. |
| |
| |
| Special cases |
| ------------- |
| |
| There are some tricky test cases that require special handling. Here are some of them: |
| |
| |
| Parameterized tests stability |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The parameterized tests require stable order of parameters if they are run via xdist - because the parameterized |
| tests are distributed among multiple processes and handled separately. In some cases the parameterized tests |
| have undefined / random order (or parameters are not hashable - for example set of enums). In such cases |
| the xdist execution of the tests will fail and you will get an error mentioning "Known Limitations of xdist". |
| You can see details about the limitation `here <https://pytest-xdist.readthedocs.io/en/latest/known-limitations.html>`_ |
| |
| The error in this case will look similar to: |
| |
| .. code-block:: |
| |
| Different tests were collected between gw0 and gw7. The difference is: |
| |
| |
| The fix for that is to sort the parameters in ``parametrize``. For example instead of this: |
| |
| .. code-block:: python |
| |
| @pytest.mark.parametrize("status", ALL_STATES) |
| def test_method(): |
| ... |
| |
| |
| do that: |
| |
| |
| .. code-block:: python |
| |
| @pytest.mark.parametrize("status", sorted(ALL_STATES)) |
| def test_method(): |
| ... |
| |
| Similarly if your parameters are defined as result of utcnow() or other dynamic method - you should |
| avoid that, or assign unique IDs for those parametrized tests. Instead of this: |
| |
| .. code-block:: python |
| |
| @pytest.mark.parametrize( |
| "url, expected_dag_run_ids", |
| [ |
| ( |
| f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte=" |
| f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", |
| [], |
| ), |
| ( |
| f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte=" |
| f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", |
| ["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"], |
| ), |
| ], |
| ) |
| def test_end_date_gte_lte(url, expected_dag_run_ids): |
| ... |
| |
| Do this: |
| |
| .. code-block:: python |
| |
| @pytest.mark.parametrize( |
| "url, expected_dag_run_ids", |
| [ |
| pytest.param( |
| f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte=" |
| f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", |
| [], |
| id="end_date_gte", |
| ), |
| pytest.param( |
| f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte=" |
| f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", |
| ["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"], |
| id="end_date_lte", |
| ), |
| ], |
| ) |
| def test_end_date_gte_lte(url, expected_dag_run_ids): |
| ... |
| |
| |
| |
| Problems with Non-DB test collection |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Sometimes, even if whole module is marked as ``@pytest.mark.db_test`` even parsing the file and collecting |
| tests will fail when ``--skip-db-tests`` is used because some of the imports od objects created in the |
| module will read the database. |
| |
| Usually what helps is to move such initialization code to inside the tests or pytest fixtures (and pass |
| objects needed by tests as fixtures rather than importing them from the module). Similarly you might |
| use DB - bound objects (like Connection) in your ``parametrize`` specification - this will also fail pytest |
| collection. Move creation of such objects to inside the tests: |
| |
| Moving object creation from top-level to inside tests. This code will break collection of tests even if |
| the test is marked as DB test: |
| |
| |
| .. code-block:: python |
| |
| pytestmark = pytest.mark.db_test |
| |
| TI = TaskInstance( |
| task=BashOperator(task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now()), |
| run_id="fake_run", |
| state=State.RUNNING, |
| ) |
| |
| |
| class TestCallbackRequest: |
| @pytest.mark.parametrize( |
| "input,request_class", |
| [ |
| (CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest), |
| ( |
| TaskCallbackRequest( |
| full_filepath="filepath", |
| simple_task_instance=SimpleTaskInstance.from_ti(ti=TI), |
| processor_subdir="/test_dir", |
| is_failure_callback=True, |
| ), |
| TaskCallbackRequest, |
| ), |
| ( |
| DagCallbackRequest( |
| full_filepath="filepath", |
| dag_id="fake_dag", |
| run_id="fake_run", |
| processor_subdir="/test_dir", |
| is_failure_callback=False, |
| ), |
| DagCallbackRequest, |
| ), |
| ( |
| SlaCallbackRequest( |
| full_filepath="filepath", |
| dag_id="fake_dag", |
| processor_subdir="/test_dir", |
| ), |
| SlaCallbackRequest, |
| ), |
| ], |
| ) |
| def test_from_json(self, input, request_class): |
| ... |
| |
| |
| Instead - this will not break collection. The TaskInstance is not initialized when the module is parsed, |
| it will only be initialized when the test gets executed because we moved initialization of it from |
| top level / parametrize to inside the test: |
| |
| .. code-block:: python |
| |
| pytestmark = pytest.mark.db_test |
| |
| |
| class TestCallbackRequest: |
| @pytest.mark.parametrize( |
| "input,request_class", |
| [ |
| (CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest), |
| ( |
| None, # to be generated when test is run |
| TaskCallbackRequest, |
| ), |
| ( |
| DagCallbackRequest( |
| full_filepath="filepath", |
| dag_id="fake_dag", |
| run_id="fake_run", |
| processor_subdir="/test_dir", |
| is_failure_callback=False, |
| ), |
| DagCallbackRequest, |
| ), |
| ( |
| SlaCallbackRequest( |
| full_filepath="filepath", |
| dag_id="fake_dag", |
| processor_subdir="/test_dir", |
| ), |
| SlaCallbackRequest, |
| ), |
| ], |
| ) |
| def test_from_json(self, input, request_class): |
| if input is None: |
| ti = TaskInstance( |
| task=BashOperator( |
| task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now() |
| ), |
| run_id="fake_run", |
| state=State.RUNNING, |
| ) |
| |
| input = TaskCallbackRequest( |
| full_filepath="filepath", |
| simple_task_instance=SimpleTaskInstance.from_ti(ti=ti), |
| processor_subdir="/test_dir", |
| is_failure_callback=True, |
| ) |
| |
| |
| Sometimes it is difficult to rewrite the tests, so you might add conditional handling and mock out some |
| database-bound methods or objects to avoid hitting the database during test collection. The code below |
| will hit the Database while parsing the tests, because this is what Variable.setdefault does when |
| parametrize specification is being parsed - even if test is marked as DB test. |
| |
| |
| .. code-block:: python |
| |
| from airflow.models.variable import Variable |
| |
| pytestmark = pytest.mark.db_test |
| |
| initial_db_init() |
| |
| |
| @pytest.mark.parametrize( |
| "env, expected", |
| [ |
| pytest.param( |
| {"plain_key": "plain_value"}, |
| "{'plain_key': 'plain_value'}", |
| id="env-plain-key-val", |
| ), |
| pytest.param( |
| {"plain_key": Variable.setdefault("plain_var", "banana")}, |
| "{'plain_key': 'banana'}", |
| id="env-plain-key-plain-var", |
| ), |
| pytest.param( |
| {"plain_key": Variable.setdefault("secret_var", "monkey")}, |
| "{'plain_key': '***'}", |
| id="env-plain-key-sensitive-var", |
| ), |
| pytest.param( |
| {"plain_key": "{{ var.value.plain_var }}"}, |
| "{'plain_key': '{{ var.value.plain_var }}'}", |
| id="env-plain-key-plain-tpld-var", |
| ), |
| ], |
| ) |
| def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): |
| ... |
| |
| |
| You can make the code conditional and mock out the Variable to avoid hitting the database. |
| |
| |
| .. code-block:: python |
| |
| from airflow.models.variable import Variable |
| |
| pytestmark = pytest.mark.db_test |
| |
| |
| if os.environ.get("_AIRFLOW_SKIP_DB_TESTS") == "true": |
| # Handle collection of the test by non-db case |
| Variable = mock.MagicMock() # type: ignore[misc] # noqa: F811 |
| else: |
| initial_db_init() |
| |
| |
| @pytest.mark.parametrize( |
| "env, expected", |
| [ |
| pytest.param( |
| {"plain_key": "plain_value"}, |
| "{'plain_key': 'plain_value'}", |
| id="env-plain-key-val", |
| ), |
| pytest.param( |
| {"plain_key": Variable.setdefault("plain_var", "banana")}, |
| "{'plain_key': 'banana'}", |
| id="env-plain-key-plain-var", |
| ), |
| pytest.param( |
| {"plain_key": Variable.setdefault("secret_var", "monkey")}, |
| "{'plain_key': '***'}", |
| id="env-plain-key-sensitive-var", |
| ), |
| pytest.param( |
| {"plain_key": "{{ var.value.plain_var }}"}, |
| "{'plain_key': '{{ var.value.plain_var }}'}", |
| id="env-plain-key-plain-tpld-var", |
| ), |
| ], |
| ) |
| def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): |
| ... |
| |
| |
| |
| Running Unit tests |
| ================== |
| |
| Running Unit Tests from PyCharm IDE |
| ----------------------------------- |
| |
| To run unit tests from the PyCharm IDE, create the `local virtualenv <LOCAL_VIRTUALENV.rst>`_, |
| select it as the default project's environment, then configure your test runner: |
| |
| .. image:: images/pycharm/configure_test_runner.png |
| :align: center |
| :alt: Configuring test runner |
| |
| and run unit tests as follows: |
| |
| .. image:: images/pycharm/running_unittests.png |
| :align: center |
| :alt: Running unit tests |
| |
| **NOTE:** You can run the unit tests in the standalone local virtualenv |
| (with no Breeze installed) if they do not have dependencies such as |
| Postgres/MySQL/Hadoop/etc. |
| |
| Running Unit Tests from PyCharm IDE using Breeze |
| ------------------------------------------------ |
| |
| Ideally, all unit tests should be run using the standardized Breeze environment. While not |
| as convenient as the one-click "play button" in PyCharm, the IDE can be configured to do |
| this in two clicks. |
| |
| 1. Add Breeze as an "External Tool": |
| |
| a. From the settings menu, navigate to Tools > External Tools |
| b. Click the little plus symbol to open the "Create Tool" popup and fill it out: |
| |
| .. image:: images/pycharm/pycharm_create_tool.png |
| :align: center |
| :alt: Installing Python extension |
| |
| 2. Add the tool to the context menu: |
| |
| a. From the settings menu, navigate to Appearance & Behavior > Menus & Toolbars > Project View Popup Menu |
| b. Click on the list of entries where you would like it to be added. Right above or below "Project View Popup Menu Run Group" may be a good choice, you can drag and drop this list to rearrange the placement later as desired. |
| c. Click the little plus at the top of the popup window |
| d. Find your "External Tool" in the new "Choose Actions to Add" popup and click OK. If you followed the image above, it will be at External Tools > External Tools > Breeze |
| |
| **Note:** That only adds the option to that one menu. If you would like to add it to the context menu |
| when right-clicking on a tab at the top of the editor, for example, follow the steps above again |
| and place it in the "Editor Tab Popup Menu" |
| |
| .. image:: images/pycharm/pycharm_add_to_context.png |
| :align: center |
| :alt: Installing Python extension |
| |
| 3. To run tests in Breeze, right click on the file or directory in the Project View and click Breeze. |
| |
| |
| Running Unit Tests from Visual Studio Code |
| ------------------------------------------ |
| |
| To run unit tests from the Visual Studio Code: |
| |
| 1. Using the ``Extensions`` view install Python extension, reload if required |
| |
| .. image:: images/vscode_install_python_extension.png |
| :align: center |
| :alt: Installing Python extension |
| |
| 2. Using the ``Testing`` view click on ``Configure Python Tests`` and select ``pytest`` framework |
| |
| .. image:: images/vscode_configure_python_tests.png |
| :align: center |
| :alt: Configuring Python tests |
| |
| .. image:: images/vscode_select_pytest_framework.png |
| :align: center |
| :alt: Selecting pytest framework |
| |
| 3. Open ``/.vscode/settings.json`` and add ``"python.testing.pytestArgs": ["tests"]`` to enable tests discovery |
| |
| .. image:: images/vscode_add_pytest_settings.png |
| :align: center |
| :alt: Enabling tests discovery |
| |
| 4. Now you are able to run and debug tests from both the ``Testing`` view and test files |
| |
| .. image:: images/vscode_run_tests.png |
| :align: center |
| :alt: Running tests |
| |
| Running Unit Tests in local virtualenv |
| -------------------------------------- |
| |
| To run unit, integration, and system tests from the Breeze and your |
| virtualenv, you can use the `pytest <http://doc.pytest.org/en/latest/>`_ framework. |
| |
| Custom ``pytest`` plugin runs ``airflow db init`` and ``airflow db reset`` the first |
| time you launch them. So, you can count on the database being initialized. Currently, |
| when you run tests not supported **in the local virtualenv, they may either fail |
| or provide an error message**. |
| |
| There are many available options for selecting a specific test in ``pytest``. Details can be found |
| in the official documentation, but here are a few basic examples: |
| |
| .. code-block:: bash |
| |
| pytest tests/core -k "TestCore and not check" |
| |
| This runs the ``TestCore`` class but skips tests of this class that include 'check' in their names. |
| For better performance (due to a test collection), run: |
| |
| .. code-block:: bash |
| |
| pytest tests/core/test_core.py -k "TestCore and not bash" |
| |
| This flag is useful when used to run a single test like this: |
| |
| .. code-block:: bash |
| |
| pytest tests/core/test_core.py -k "test_check_operators" |
| |
| This can also be done by specifying a full path to the test: |
| |
| .. code-block:: bash |
| |
| pytest tests/core/test_core.py::TestCore::test_check_operators |
| |
| To run the whole test class, enter: |
| |
| .. code-block:: bash |
| |
| pytest tests/core/test_core.py::TestCore |
| |
| You can use all available ``pytest`` flags. For example, to increase a log level |
| for debugging purposes, enter: |
| |
| .. code-block:: bash |
| |
| pytest --log-cli-level=DEBUG tests/core/test_core.py::TestCore |
| |
| |
| Running Tests using Breeze interactive shell |
| -------------------------------------------- |
| |
| You can run tests interactively using regular pytest commands inside the Breeze shell. This has the |
| advantage, that Breeze container has all the dependencies installed that are needed to run the tests |
| and it will ask you to rebuild the image if it is needed and some new dependencies should be installed. |
| |
| By using interactive shell and iterating over the tests, you can iterate and re-run tests one-by-one |
| or group by group right after you modified them. |
| |
| Entering the shell is as easy as: |
| |
| .. code-block:: bash |
| |
| breeze |
| |
| This should drop you into the container. |
| |
| You can also use other switches (like ``--backend`` for example) to configure the environment for your |
| tests (and for example to switch to different database backend - see ``--help`` for more details). |
| |
| Once you enter the container, you might run regular pytest commands. For example: |
| |
| .. code-block:: bash |
| |
| pytest --log-cli-level=DEBUG tests/core/test_core.py::TestCore |
| |
| |
| Running Tests using Breeze from the Host |
| ---------------------------------------- |
| |
| If you wish to only run tests and not to drop into the shell, apply the |
| ``tests`` command. You can add extra targets and pytest flags after the ``--`` command. Note that |
| often you want to run the tests with a clean/reset db, so usually you want to add ``--db-reset`` flag |
| to breeze command. The Breeze image usually will have all the dependencies needed and it |
| will ask you to rebuild the image if it is needed and some new dependencies should be installed. |
| |
| .. code-block:: bash |
| |
| breeze testing tests tests/providers/http/hooks/test_http.py tests/core/test_core.py --db-reset --log-cli-level=DEBUG |
| |
| You can run the whole test suite without adding the test target: |
| |
| .. code-block:: bash |
| |
| breeze testing tests --db-reset |
| |
| You can also specify individual tests or a group of tests: |
| |
| .. code-block:: bash |
| |
| breeze testing tests --db-reset tests/core/test_core.py::TestCore |
| |
| You can also limit the tests to execute to specific group of tests |
| |
| .. code-block:: bash |
| |
| breeze testing tests --test-type Core |
| |
| In case of Providers tests, you can run tests for all providers |
| |
| .. code-block:: bash |
| |
| breeze testing tests --test-type Providers |
| |
| You can limit the set of providers you would like to run tests of |
| |
| .. code-block:: bash |
| |
| breeze testing tests --test-type "Providers[airbyte,http]" |
| |
| You can also run all providers but exclude the providers you would like to skip |
| |
| .. code-block:: bash |
| |
| breeze testing tests --test-type "Providers[-amazon,google]" |
| |
| |
| Inspecting docker compose after test commands |
| --------------------------------------------- |
| |
| Sometimes you need to inspect docker compose after tests command complete, |
| for example when test environment could not be properly set due to |
| failed healthchecks. This can be achieved with ``--skip-docker-compose-down`` |
| flag: |
| |
| .. code-block:: bash |
| |
| breeze testing tests --skip--docker-compose-down |
| |
| |
| Running full Airflow unit test suite in parallel |
| ------------------------------------------------ |
| |
| If you run ``breeze testing tests --run-in-parallel`` tests run in parallel |
| on your development machine - maxing out the number of parallel runs at the number of cores you |
| have available in your Docker engine. |
| |
| In case you do not have enough memory available to your Docker (8 GB), the ``Integration``. ``Provider`` |
| and ``Core`` test type are executed sequentially with cleaning the docker setup in-between. This |
| allows to print |
| |
| This allows for massive speedup in full test execution. On 8 CPU machine with 16 cores and 64 GB memory |
| and fast SSD disk, the whole suite of tests completes in about 5 minutes (!). Same suite of tests takes |
| more than 30 minutes on the same machine when tests are run sequentially. |
| |
| .. note:: |
| |
| On MacOS you might have less CPUs and less memory available to run the tests than you have in the host, |
| simply because your Docker engine runs in a Linux Virtual Machine under-the-hood. If you want to make |
| use of the parallelism and memory usage for the CI tests you might want to increase the resources available |
| to your docker engine. See the `Resources <https://docs.docker.com/docker-for-mac/#resources>`_ chapter |
| in the ``Docker for Mac`` documentation on how to do it. |
| |
| You can also limit the parallelism by specifying the maximum number of parallel jobs via |
| MAX_PARALLEL_TEST_JOBS variable. If you set it to "1", all the test types will be run sequentially. |
| |
| .. code-block:: bash |
| |
| MAX_PARALLEL_TEST_JOBS="1" ./scripts/ci/testing/ci_run_airflow_testing.sh |
| |
| .. note:: |
| |
| In case you would like to cleanup after execution of such tests you might have to cleanup |
| some of the docker containers running in case you use ctrl-c to stop execution. You can easily do it by |
| running this command (it will kill all docker containers running so do not use it if you want to keep some |
| docker containers running): |
| |
| .. code-block:: bash |
| |
| docker kill $(docker ps -q) |
| |
| Running Backend-Specific Tests |
| ------------------------------ |
| |
| Tests that are using a specific backend are marked with a custom pytest marker ``pytest.mark.backend``. |
| The marker has a single parameter - the name of a backend. It corresponds to the ``--backend`` switch of |
| the Breeze environment (one of ``mysql``, ``sqlite``, or ``postgres``). Backend-specific tests only run when |
| the Breeze environment is running with the right backend. If you specify more than one backend |
| in the marker, the test runs for all specified backends. |
| |
| Example of the ``postgres`` only test: |
| |
| .. code-block:: python |
| |
| @pytest.mark.backend("postgres") |
| def test_copy_expert(self): |
| ... |
| |
| |
| Example of the ``postgres,mysql`` test (they are skipped with the ``sqlite`` backend): |
| |
| .. code-block:: python |
| |
| @pytest.mark.backend("postgres", "mysql") |
| def test_celery_executor(self): |
| ... |
| |
| |
| You can use the custom ``--backend`` switch in pytest to only run tests specific for that backend. |
| Here is an example of running only postgres-specific backend tests: |
| |
| .. code-block:: bash |
| |
| pytest --backend postgres |
| |
| Running Long-running tests |
| -------------------------- |
| |
| Some of the tests rung for a long time. Such tests are marked with ``@pytest.mark.long_running`` annotation. |
| Those tests are skipped by default. You can enable them with ``--include-long-running`` flag. You |
| can also decide to only run tests with ``-m long-running`` flags to run only those tests. |
| |
| Running Quarantined tests |
| ------------------------- |
| |
| Some of our tests are quarantined. This means that this test will be run in isolation and that it will be |
| re-run several times. Also when quarantined tests fail, the whole test suite will not fail. The quarantined |
| tests are usually flaky tests that need some attention and fix. |
| |
| Those tests are marked with ``@pytest.mark.quarantined`` annotation. |
| Those tests are skipped by default. You can enable them with ``--include-quarantined`` flag. You |
| can also decide to only run tests with ``-m quarantined`` flag to run only those tests. |
| |
| Running Tests with provider packages |
| ------------------------------------ |
| |
| Airflow 2.0 introduced the concept of splitting the monolithic Airflow package into separate |
| providers packages. The main "apache-airflow" package contains the bare Airflow implementation, |
| and additionally we have 70+ providers that we can install additionally to get integrations with |
| external services. Those providers live in the same monorepo as Airflow, but we build separate |
| packages for them and the main "apache-airflow" package does not contain the providers. |
| |
| Most of the development in Breeze happens by iterating on sources and when you run |
| your tests during development, you usually do not want to build packages and install them separately. |
| Therefore by default, when you enter Breeze airflow and all providers are available directly from |
| sources rather than installed from packages. This is for example to test the "provider discovery" |
| mechanism available that reads provider information from the package meta-data. |
| |
| When Airflow is run from sources, the metadata is read from provider.yaml |
| files, but when Airflow is installed from packages, it is read via the package entrypoint |
| ``apache_airflow_provider``. |
| |
| By default, all packages are prepared in wheel format. To install Airflow from packages you |
| need to run the following steps: |
| |
| 1. Prepare provider packages |
| |
| .. code-block:: bash |
| |
| breeze release-management prepare-provider-packages [PACKAGE ...] |
| |
| If you run this command without packages, you will prepare all packages. However, You can specify |
| providers that you would like to build if you just want to build few provider packages. |
| The packages are prepared in ``dist`` folder. Note that this command cleans up the ``dist`` folder |
| before running, so you should run it before generating ``apache-airflow`` package. |
| |
| 2. Prepare airflow packages |
| |
| .. code-block:: bash |
| |
| breeze release-management prepare-airflow-package |
| |
| This prepares airflow .whl package in the dist folder. |
| |
| 3. Enter breeze installing both airflow and providers from the dist packages |
| |
| .. code-block:: bash |
| |
| breeze --use-airflow-version wheel --use-packages-from-dist --skip-mounting-local-sources |
| |
| Airflow Docker Compose Tests |
| ============================ |
| |
| Running Docker Compose Tests with Breeze |
| ---------------------------------------- |
| |
| We also test in CI whether the Docker Compose that we expose in our documentation via |
| `Running Airflow in Docker <https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html>`_ |
| works as expected. Those tests are run in CI ("Test docker-compose quick start") |
| and you can run them locally as well. |
| |
| The way the tests work: |
| |
| 1. They first build the Airflow production image |
| 2. Then they take the Docker Compose file of ours and use the image to start it |
| 3. Then they perform some simple DAG trigger tests which checks whether Airflow is up and can process |
| an example DAG |
| |
| This is done in a local environment, not in the Breeze CI image. It uses ``COMPOSE_PROJECT_NAME`` set to |
| ``quick-start`` to avoid conflicts with other docker compose deployments you might have. |
| |
| The complete test can be performed using Breeze. The prerequisite to that |
| is to have ``docker-compose`` (Docker Compose v1) or ``docker compose`` plugin (Docker Compose v2) |
| available on the path. |
| |
| Running complete test with breeze: |
| |
| .. code-block:: bash |
| |
| breeze prod-image build --python 3.8 |
| breeze testing docker-compose-tests |
| |
| In case the test fails, it will dump the logs from the running containers to the console and it |
| will shutdown the Docker Compose deployment. In case you want to debug the Docker Compose deployment |
| created for the test, you can pass ``--skip-docker-compose-deletion`` flag to Breeze or |
| export ``SKIP_DOCKER_COMPOSE_DELETION`` set to "true" variable and the deployment |
| will not be deleted after the test. |
| |
| You can also specify maximum timeout for the containers with ``--wait-for-containers-timeout`` flag. |
| You can also add ``-s`` option to the command pass it to underlying pytest command |
| to see the output of the test as it happens (it can be also set via |
| ``WAIT_FOR_CONTAINERS_TIMEOUT`` environment variable) |
| |
| The test can be also run manually with ``pytest docker_tests/test_docker_compose_quick_start.py`` |
| command, provided that you have a local airflow venv with ``dev`` extra set and the |
| ``DOCKER_IMAGE`` environment variable is set to the image you want to test. The variable defaults |
| to ``ghcr.io/apache/airflow/main/prod/python3.8:latest`` which is built by default |
| when you run ``breeze prod-image build --python 3.8``. also the switches ``--skip-docker-compose-deletion`` |
| and ``--wait-for-containers-timeout`` can only be passed via environment variables. |
| |
| If you want to debug the deployment using ``docker compose`` commands after ``SKIP_DOCKER_COMPOSE_DELETION`` |
| was used, you should set ``COMPOSE_PROJECT_NAME`` to ``quick-start`` because this is what the test uses: |
| |
| .. code-block:: bash |
| |
| export COMPOSE_PROJECT_NAME=quick-start |
| |
| You can also add ``--project-name quick-start`` to the ``docker compose`` commands you run. |
| When the test will be re-run it will automatically stop previous deployment and start a new one. |
| |
| Running Docker Compose deployment manually |
| ------------------------------------------ |
| |
| You can also (independently of Pytest test) run docker-compose deployment manually with the image you built using |
| the prod image build command above. |
| |
| .. code-block:: bash |
| |
| export AIRFLOW_IMAGE_NAME=ghcr.io/apache/airflow/main/prod/python3.8:latest |
| |
| and follow the instructions in the |
| `Running Airflow in Docker <https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html>`_ |
| but make sure to use the docker-compose file from the sources in |
| ``docs/apache-airflow/stable/howto/docker-compose/`` folder. |
| |
| Then, the usual ``docker compose`` and ``docker`` commands can be used to debug such running instances. |
| The test performs a simple API call to trigger a DAG and wait for it, but you can follow our |
| documentation to connect to such running docker compose instances and test it manually. |
| |
| Airflow Integration Tests |
| ========================= |
| |
| Some of the tests in Airflow are integration tests. These tests require ``airflow`` Docker |
| image and extra images with integrations (such as ``celery``, ``mongodb``, etc.). |
| The integration tests are all stored in the ``tests/integration`` folder. |
| |
| Enabling Integrations |
| --------------------- |
| |
| Airflow integration tests cannot be run in the local virtualenv. They can only run in the Breeze |
| environment with enabled integrations and in the CI. See `CI <CI.rst>`_ for details about Airflow CI. |
| |
| When you are in the Breeze environment, by default, all integrations are disabled. This enables only true unit tests |
| to be executed in Breeze. You can enable the integration by passing the ``--integration <INTEGRATION>`` |
| switch when starting Breeze. You can specify multiple integrations by repeating the ``--integration`` switch |
| or using the ``--integration all-testable`` switch that enables all testable integrations and |
| ``--integration all`` switch that enables all integrations. |
| |
| NOTE: Every integration requires a separate container with the corresponding integration image. |
| These containers take precious resources on your PC, mainly the memory. The started integrations are not stopped |
| until you stop the Breeze environment with the ``stop`` command and started with the ``start`` command. |
| |
| The following integrations are available: |
| |
| .. list-table:: Airflow Test Integrations |
| :widths: 15 80 |
| :header-rows: 1 |
| |
| * - Integration |
| - Description |
| * - cassandra |
| - Integration required for Cassandra hooks |
| * - kerberos |
| - Integration that provides Kerberos authentication |
| * - mongo |
| - Integration required for MongoDB hooks |
| * - pinot |
| - Integration required for Apache Pinot hooks |
| * - celery |
| - Integration required for Celery executor tests |
| * - trino |
| - Integration required for Trino hooks |
| |
| To start the ``mongo`` integration only, enter: |
| |
| .. code-block:: bash |
| |
| breeze --integration mongo |
| |
| To start ``mongo`` and ``cassandra`` integrations, enter: |
| |
| .. code-block:: bash |
| |
| breeze --integration mongo --integration cassandra |
| |
| To start all testable integrations, enter: |
| |
| .. code-block:: bash |
| |
| breeze --integration all-testable |
| |
| To start all integrations, enter: |
| |
| .. code-block:: bash |
| |
| breeze --integration all-testable |
| |
| Note that Kerberos is a special kind of integration. Some tests run differently when |
| Kerberos integration is enabled (they retrieve and use a Kerberos authentication token) and differently when the |
| Kerberos integration is disabled (they neither retrieve nor use the token). Therefore, one of the test jobs |
| for the CI system should run all tests with the Kerberos integration enabled to test both scenarios. |
| |
| Running Integration Tests |
| ------------------------- |
| |
| All tests using an integration are marked with a custom pytest marker ``pytest.mark.integration``. |
| The marker has a single parameter - the name of integration. |
| |
| Example of the ``celery`` integration test: |
| |
| .. code-block:: python |
| |
| @pytest.mark.integration("celery") |
| def test_real_ping(self): |
| hook = RedisHook(redis_conn_id="redis_default") |
| redis = hook.get_conn() |
| |
| assert redis.ping(), "Connection to Redis with PING works." |
| |
| The markers can be specified at the test level or the class level (then all tests in this class |
| require an integration). You can add multiple markers with different integrations for tests that |
| require more than one integration. |
| |
| If such a marked test does not have a required integration enabled, it is skipped. |
| The skip message clearly says what is needed to use the test. |
| |
| To run all tests with a certain integration, use the custom pytest flag ``--integration``. |
| You can pass several integration flags if you want to enable several integrations at once. |
| |
| **NOTE:** If an integration is not enabled in Breeze or CI, |
| the affected test will be skipped. |
| |
| To run only ``mongo`` integration tests: |
| |
| .. code-block:: bash |
| |
| pytest --integration mongo tests/integration |
| |
| To run integration tests for ``mongo`` and ``celery``: |
| |
| .. code-block:: bash |
| |
| pytest --integration mongo --integration celery tests/integration |
| |
| |
| Here is an example of the collection limited to the ``providers/apache`` sub-directory: |
| |
| .. code-block:: bash |
| |
| pytest --integration cassandra tests/integrations/providers/apache |
| |
| Running Integration Tests from the Host |
| --------------------------------------- |
| |
| You can also run integration tests using Breeze from the host. |
| |
| Runs all integration tests: |
| |
| .. code-block:: bash |
| |
| breeze testing integration-tests --db-reset --integration all-testable |
| |
| Runs all mongo DB tests: |
| |
| .. code-block:: bash |
| |
| breeze testing integration-tests --db-reset --integration mongo |
| |
| Helm Unit Tests |
| =============== |
| |
| On the Airflow Project, we have decided to stick with pythonic testing for our Helm chart. This makes our chart |
| easier to test, easier to modify, and able to run with the same testing infrastructure. To add Helm unit tests |
| add them in ``helm_tests``. |
| |
| .. code-block:: python |
| |
| class TestBaseChartTest: |
| ... |
| |
| To render the chart create a YAML string with the nested dictionary of options you wish to test. You can then |
| use our ``render_chart`` function to render the object of interest into a testable Python dictionary. Once the chart |
| has been rendered, you can use the ``render_k8s_object`` function to create a k8s model object. It simultaneously |
| ensures that the object created properly conforms to the expected resource spec and allows you to use object values |
| instead of nested dictionaries. |
| |
| Example test here: |
| |
| .. code-block:: python |
| |
| from tests.charts.common.helm_template_generator import render_chart, render_k8s_object |
| |
| git_sync_basic = """ |
| dags: |
| gitSync: |
| enabled: true |
| """ |
| |
| |
| class TestGitSyncScheduler: |
| def test_basic(self): |
| helm_settings = yaml.safe_load(git_sync_basic) |
| res = render_chart( |
| "GIT-SYNC", |
| helm_settings, |
| show_only=["templates/scheduler/scheduler-deployment.yaml"], |
| ) |
| dep: k8s.V1Deployment = render_k8s_object(res[0], k8s.V1Deployment) |
| assert "dags" == dep.spec.template.spec.volumes[1].name |
| |
| |
| To execute all Helm tests using breeze command and utilize parallel pytest tests, you can run the |
| following command (but it takes quite a long time even in a multi-processor machine). |
| |
| .. code-block:: bash |
| |
| breeze testing helm-tests |
| |
| You can also execute tests from a selected package only. Tests in ``tests/chart`` are grouped by packages |
| so rather than running all tests, you can run only tests from a selected package. For example: |
| |
| .. code-block:: bash |
| |
| breeze testing helm-tests --helm-test-package basic |
| |
| Will run all tests from ``tests-charts/basic`` package. |
| |
| |
| You can also run Helm tests individually via the usual ``breeze`` command. Just enter breeze and run the |
| tests with pytest as you would do with regular unit tests (you can add ``-n auto`` command to run Helm |
| tests in parallel - unlike most of the regular unit tests of ours that require a database, the Helm tests are |
| perfectly safe to be run in parallel (and if you have multiple processors, you can gain significant |
| speedups when using parallel runs): |
| |
| .. code-block:: bash |
| |
| breeze |
| |
| This enters breeze container. |
| |
| .. code-block:: bash |
| |
| pytest helm_tests -n auto |
| |
| This runs all chart tests using all processors you have available. |
| |
| .. code-block:: bash |
| |
| pytest helm_tests/test_airflow_common.py -n auto |
| |
| This will run all tests from ``tests_airflow_common.py`` file using all processors you have available. |
| |
| .. code-block:: bash |
| |
| pytest helm_tests/test_airflow_common.py |
| |
| This will run all tests from ``tests_airflow_common.py`` file sequentially. |
| |
| |
| Kubernetes tests |
| ================ |
| |
| Airflow has tests that are run against real Kubernetes cluster. We are using |
| `Kind <https://kind.sigs.k8s.io/>`_ to create and run the cluster. We integrated the tools to start/stop/ |
| deploy and run the cluster tests in our repository and into Breeze development environment. |
| |
| KinD has a really nice ``kind`` tool that you can use to interact with the cluster. Run ``kind --help`` to |
| learn more. |
| |
| K8S test environment |
| ------------------------ |
| |
| Before running ``breeze k8s`` cluster commands you need to setup the environment. This is done |
| by ``breeze k8s setup-env`` command. Breeze in this command makes sure to download tools that |
| are needed to run k8s tests: Helm, Kind, Kubectl in the right versions and sets up a |
| Python virtualenv that is needed to run the tests. All those tools and env are setup in |
| ``.build/.k8s-env`` folder. You can activate this environment yourselves as usual by sourcing |
| ``bin/activate`` script, but since we are supporting multiple clusters in the same installation |
| it is best if you use ``breeze k8s shell`` with the right parameters specifying which cluster |
| to use. |
| |
| Multiple cluster support |
| ------------------------ |
| |
| The main feature of ``breeze k8s`` command is that it allows you to manage multiple KinD clusters - one |
| per each combination of Python and Kubernetes version. This is used during CI where we can run same |
| tests against those different clusters - even in parallel. |
| |
| The cluster name follows the pattern ``airflow-python-X.Y-vA.B.C`` where X.Y is a major/minor Python version |
| and A.B.C is Kubernetes version. Example cluster name: ``airflow-python-3.8-v1.24.0`` |
| |
| Most of the commands can be executed in parallel for multiple images/clusters by adding ``--run-in-parallel`` |
| to create clusters or deploy airflow. Similarly checking for status, dumping logs and deleting clusters |
| can be run with ``--all`` flag and they will be executed sequentially for all locally created clusters. |
| |
| Per-cluster configuration files |
| ------------------------------- |
| |
| Once you start the cluster, the configuration for it is stored in a dynamically created folder - separate |
| folder for each python/kubernetes_version combination. The folder is ``./build/.k8s-clusters/<CLUSTER_NAME>`` |
| |
| There are two files there: |
| |
| * kubectl config file stored in .kubeconfig file - our scripts set the ``KUBECONFIG`` variable to it |
| * KinD cluster configuration in .kindconfig.yml file - our scripts set the ``KINDCONFIG`` variable to it |
| |
| The ``KUBECONFIG`` file is automatically used when you enter any of the ``breeze k8s`` commands that use |
| ``kubectl`` or when you run ``kubectl`` in the k8s shell. The ``KINDCONFIG`` file is used when cluster is |
| started but You and the ``k8s`` command can inspect it to know for example what port is forwarded to the |
| webserver running in the cluster. |
| |
| The files are deleted by ``breeze k8s delete-cluster`` command. |
| |
| Managing Kubernetes Cluster |
| --------------------------- |
| |
| For your testing, you manage Kind cluster with ``k8s`` breeze command group. Those commands allow to |
| created: |
| |
| .. image:: ./images/breeze/output_k8s.svg |
| :width: 100% |
| :alt: Breeze k8s |
| |
| The command group allows you to setup environment, start/stop/recreate/status Kind Kubernetes cluster, |
| configure cluster (via ``create-cluster``, ``configure-cluster`` command). Those commands can be run with |
| ``--run-in-parallel`` flag for all/selected clusters and they can be executed in parallel. |
| |
| In order to deploy Airflow, the PROD image of Airflow need to be extended and example dags and POD |
| template files should be added to the image. This is done via ``build-k8s-image``, ``upload-k8s-image``. |
| This can also be done for all/selected images/clusters in parallel via ``--run-in-parallel`` flag. |
| |
| Then Airflow (by using Helm Chart) can be deployed to the cluster via ``deploy-airflow`` command. |
| This can also be done for all/selected images/clusters in parallel via ``--run-in-parallel`` flag. You can |
| pass extra options when deploying airflow to configure your depliyment. |
| |
| You can check the status, dump logs and finally delete cluster via ``status``, ``logs``, ``delete-cluster`` |
| commands. This can also be done for all created clusters in parallel via ``--all`` flag. |
| |
| You can interact with the cluster (via ``shell`` and ``k9s`` commands). |
| |
| You can run set of k8s tests via ``tests`` command. You can also run tests in parallel on all/selected |
| clusters by ``--run-in-parallel`` flag. |
| |
| |
| Running tests with Kubernetes Cluster |
| ------------------------------------- |
| |
| You can either run all tests or you can select which tests to run. You can also enter interactive virtualenv |
| to run the tests manually one by one. |
| |
| |
| Running Kubernetes tests via breeze: |
| |
| .. code-block:: bash |
| |
| breeze k8s tests |
| breeze k8s tests TEST TEST [TEST ...] |
| |
| Optionally add ``--executor``: |
| |
| .. code-block:: bash |
| |
| breeze k8s tests --executor CeleryExecutor |
| breeze k8s tests --executor CeleryExecutor TEST TEST [TEST ...] |
| |
| Entering shell with Kubernetes Cluster |
| -------------------------------------- |
| |
| This shell is prepared to run Kubernetes tests interactively. It has ``kubectl`` and ``kind`` cli tools |
| available in the path, it has also activated virtualenv environment that allows you to run tests via pytest. |
| |
| The virtualenv is available in ./.build/.k8s-env/ |
| The binaries are available in ``.build/.k8s-env/bin`` path. |
| |
| .. code-block:: bash |
| |
| breeze k8s shell |
| |
| Optionally add ``--executor``: |
| |
| .. code-block:: bash |
| |
| breeze k8s shell --executor CeleryExecutor |
| |
| |
| K9s CLI - debug Kubernetes in style! |
| ------------------------------------ |
| |
| Breeze has built-in integration with fantastic k9s CLI tool, that allows you to debug the Kubernetes |
| installation effortlessly and in style. K9S provides terminal (but windowed) CLI that helps you to: |
| |
| - easily observe what's going on in the Kubernetes cluster |
| - observe the resources defined (pods, secrets, custom resource definitions) |
| - enter shell for the Pods/Containers running, |
| - see the log files and more. |
| |
| You can read more about k9s at `https://k9scli.io/ <https://k9scli.io/>`_ |
| |
| Here is the screenshot of k9s tools in operation: |
| |
| .. image:: images/testing/k9s.png |
| :align: center |
| :alt: K9S tool |
| |
| |
| You can enter the k9s tool via breeze (after you deployed Airflow): |
| |
| .. code-block:: bash |
| |
| breeze k8s k9s |
| |
| You can exit k9s by pressing Ctrl-C. |
| |
| Typical testing pattern for Kubernetes tests |
| -------------------------------------------- |
| |
| The typical session for tests with Kubernetes looks like follows: |
| |
| |
| 1. Prepare the environment: |
| |
| .. code-block:: bash |
| |
| breeze k8s setup-env |
| |
| The first time you run it, it should result in creating the virtualenv and installing good versions |
| of kind, kubectl and helm. All of them are installed in ``./build/.k8s-env`` (binaries available in ``bin`` |
| sub-folder of it). |
| |
| .. code-block:: text |
| |
| Initializing K8S virtualenv in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env |
| Reinstalling PIP version in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env |
| Installing necessary packages in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env |
| The ``kind`` tool is not downloaded yet. Downloading 0.14.0 version. |
| Downloading from: https://github.com/kubernetes-sigs/kind/releases/download/v0.14.0/kind-darwin-arm64 |
| The ``kubectl`` tool is not downloaded yet. Downloading 1.24.3 version. |
| Downloading from: https://storage.googleapis.com/kubernetes-release/release/v1.24.3/bin/darwin/arm64/kubectl |
| The ``helm`` tool is not downloaded yet. Downloading 3.9.2 version. |
| Downloading from: https://get.helm.sh/helm-v3.9.2-darwin-arm64.tar.gz |
| Extracting the darwin-arm64/helm to /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin |
| Moving the helm to /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin/helm |
| |
| |
| This prepares the virtual environment for tests and downloads the right versions of the tools |
| to ``./build/.k8s-env`` |
| |
| 2. Create the KinD cluster: |
| |
| .. code-block:: bash |
| |
| breeze k8s create-cluster |
| |
| Should result in KinD creating the K8S cluster. |
| |
| .. code-block:: text |
| |
| Config created in /Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kindconfig.yaml: |
| |
| # Licensed to the Apache Software Foundation (ASF) under one |
| # or more contributor license agreements. See the NOTICE file |
| # distributed with this work for additional information |
| # regarding copyright ownership. The ASF licenses this file |
| # to you under the Apache License, Version 2.0 (the |
| # "License"); you may not use this file except in compliance |
| # with the License. You may obtain a copy of the License at |
| # |
| # http://www.apache.org/licenses/LICENSE-2.0 |
| # |
| # Unless required by applicable law or agreed to in writing, |
| # software distributed under the License is distributed on an |
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| # KIND, either express or implied. See the License for the |
| # specific language governing permissions and limitations |
| # under the License. |
| --- |
| kind: Cluster |
| apiVersion: kind.x-k8s.io/v1alpha4 |
| networking: |
| ipFamily: ipv4 |
| apiServerAddress: "127.0.0.1" |
| apiServerPort: 48366 |
| nodes: |
| - role: control-plane |
| - role: worker |
| extraPortMappings: |
| - containerPort: 30007 |
| hostPort: 18150 |
| listenAddress: "127.0.0.1" |
| protocol: TCP |
| |
| |
| |
| Creating cluster "airflow-python-3.8-v1.24.2" ... |
| ✓ Ensuring node image (kindest/node:v1.24.2) 🖼 |
| ✓ Preparing nodes 📦 📦 |
| ✓ Writing configuration 📜 |
| ✓ Starting control-plane 🕹️ |
| ✓ Installing CNI 🔌 |
| ✓ Installing StorageClass 💾 |
| ✓ Joining worker nodes 🚜 |
| Set kubectl context to "kind-airflow-python-3.8-v1.24.2" |
| You can now use your cluster with: |
| |
| kubectl cluster-info --context kind-airflow-python-3.8-v1.24.2 |
| |
| Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/ |
| |
| KinD Cluster API server URL: http://localhost:48366 |
| Connecting to localhost:18150. Num try: 1 |
| Error when connecting to localhost:18150 : ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) |
| |
| Airflow webserver is not available at port 18150. Run `breeze k8s deploy-airflow --python 3.8 --kubernetes-version v1.24.2` to (re)deploy airflow |
| |
| KinD cluster airflow-python-3.8-v1.24.2 created! |
| |
| NEXT STEP: You might now configure your cluster by: |
| |
| breeze k8s configure-cluster |
| |
| 3. Configure cluster for Airflow - this will recreate namespace and upload test resources for Airflow. |
| |
| .. code-block:: bash |
| |
| breeze k8s configure-cluster |
| |
| .. code-block:: text |
| |
| Configuring airflow-python-3.8-v1.24.2 to be ready for Airflow deployment |
| Deleting K8S namespaces for kind-airflow-python-3.8-v1.24.2 |
| Error from server (NotFound): namespaces "airflow" not found |
| Error from server (NotFound): namespaces "test-namespace" not found |
| Creating namespaces |
| namespace/airflow created |
| namespace/test-namespace created |
| Created K8S namespaces for cluster kind-airflow-python-3.8-v1.24.2 |
| |
| Deploying test resources for cluster kind-airflow-python-3.8-v1.24.2 |
| persistentvolume/test-volume created |
| persistentvolumeclaim/test-volume created |
| service/airflow-webserver-node-port created |
| Deployed test resources for cluster kind-airflow-python-3.8-v1.24.2 |
| |
| |
| NEXT STEP: You might now build your k8s image by: |
| |
| breeze k8s build-k8s-image |
| |
| 4. Check the status of the cluster |
| |
| .. code-block:: bash |
| |
| breeze k8s status |
| |
| Should show the status of current KinD cluster. |
| |
| .. code-block:: text |
| |
| ======================================================================================================================== |
| Cluster: airflow-python-3.8-v1.24.2 |
| |
| * KUBECONFIG=/Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kubeconfig |
| * KINDCONFIG=/Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kindconfig.yaml |
| |
| Cluster info: airflow-python-3.8-v1.24.2 |
| |
| Kubernetes control plane is running at https://127.0.0.1:48366 |
| CoreDNS is running at https://127.0.0.1:48366/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy |
| |
| To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. |
| |
| Storage class for airflow-python-3.8-v1.24.2 |
| |
| NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE |
| standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 83s |
| |
| Running pods for airflow-python-3.8-v1.24.2 |
| |
| NAME READY STATUS RESTARTS AGE |
| coredns-6d4b75cb6d-rwp9d 1/1 Running 0 71s |
| coredns-6d4b75cb6d-vqnrc 1/1 Running 0 71s |
| etcd-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s |
| kindnet-ckc8l 1/1 Running 0 69s |
| kindnet-qqt8k 1/1 Running 0 71s |
| kube-apiserver-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s |
| kube-controller-manager-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s |
| kube-proxy-6g7hn 1/1 Running 0 69s |
| kube-proxy-dwfvp 1/1 Running 0 71s |
| kube-scheduler-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s |
| |
| KinD Cluster API server URL: http://localhost:48366 |
| Connecting to localhost:18150. Num try: 1 |
| Error when connecting to localhost:18150 : ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) |
| |
| Airflow webserver is not available at port 18150. Run `breeze k8s deploy-airflow --python 3.8 --kubernetes-version v1.24.2` to (re)deploy airflow |
| |
| |
| Cluster healthy: airflow-python-3.8-v1.24.2 |
| |
| 5. Build the image base on PROD Airflow image. You need to build the PROD image first (the command will |
| guide you if you did not - either by running the build separately or passing ``--rebuild-base-image`` flag |
| |
| .. code-block:: bash |
| |
| breeze k8s build-k8s-image |
| |
| .. code-block:: text |
| |
| Building the K8S image for Python 3.8 using airflow base image: ghcr.io/apache/airflow/main/prod/python3.8:latest |
| |
| [+] Building 0.1s (8/8) FINISHED |
| => [internal] load build definition from Dockerfile 0.0s |
| => => transferring dockerfile: 301B 0.0s |
| => [internal] load .dockerignore 0.0s |
| => => transferring context: 35B 0.0s |
| => [internal] load metadata for ghcr.io/apache/airflow/main/prod/python3.8:latest 0.0s |
| => [1/3] FROM ghcr.io/apache/airflow/main/prod/python3.8:latest 0.0s |
| => [internal] load build context 0.0s |
| => => transferring context: 3.00kB 0.0s |
| => CACHED [2/3] COPY airflow/example_dags/ /opt/airflow/dags/ 0.0s |
| => CACHED [3/3] COPY airflow/kubernetes_executor_templates/ /opt/airflow/pod_templates/ 0.0s |
| => exporting to image 0.0s |
| => => exporting layers 0.0s |
| => => writing image sha256:c0bdd363c549c3b0731b8e8ce34153d081f239ee2b582355b7b3ffd5394c40bb 0.0s |
| => => naming to ghcr.io/apache/airflow/main/prod/python3.8-kubernetes:latest |
| |
| NEXT STEP: You might now upload your k8s image by: |
| |
| breeze k8s upload-k8s-image |
| |
| |
| 5. Upload the image to KinD cluster - this uploads your image to make it available for the KinD cluster. |
| |
| .. code-block:: bash |
| |
| breeze k8s upload-k8s-image |
| |
| .. code-block:: text |
| |
| K8S Virtualenv is initialized in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env |
| Good version of kind installed: 0.14.0 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin |
| Good version of kubectl installed: 1.25.0 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin |
| Good version of helm installed: 3.9.2 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin |
| Stable repo is already added |
| Uploading Airflow image ghcr.io/apache/airflow/main/prod/python3.8-kubernetes to cluster airflow-python-3.8-v1.24.2 |
| Image: "ghcr.io/apache/airflow/main/prod/python3.8-kubernetes" with ID "sha256:fb6195f7c2c2ad97788a563a3fe9420bf3576c85575378d642cd7985aff97412" not yet present on node "airflow-python-3.8-v1.24.2-worker", loading... |
| Image: "ghcr.io/apache/airflow/main/prod/python3.8-kubernetes" with ID "sha256:fb6195f7c2c2ad97788a563a3fe9420bf3576c85575378d642cd7985aff97412" not yet present on node "airflow-python-3.8-v1.24.2-control-plane", loading... |
| |
| NEXT STEP: You might now deploy airflow by: |
| |
| breeze k8s deploy-airflow |
| |
| |
| 7. Deploy Airflow to the cluster - this will use Airflow Helm Chart to deploy Airflow to the cluster. |
| |
| .. code-block:: bash |
| |
| breeze k8s deploy-airflow |
| |
| .. code-block:: text |
| |
| Deploying Airflow for cluster airflow-python-3.8-v1.24.2 |
| Deploying kind-airflow-python-3.8-v1.24.2 with airflow Helm Chart. |
| Copied chart sources to /private/var/folders/v3/gvj4_mw152q556w2rrh7m46w0000gn/T/chart_edu__kir/chart |
| Deploying Airflow from /private/var/folders/v3/gvj4_mw152q556w2rrh7m46w0000gn/T/chart_edu__kir/chart |
| NAME: airflow |
| LAST DEPLOYED: Tue Aug 30 22:57:54 2022 |
| NAMESPACE: airflow |
| STATUS: deployed |
| REVISION: 1 |
| TEST SUITE: None |
| NOTES: |
| Thank you for installing Apache Airflow 2.3.4! |
| |
| Your release is named airflow. |
| You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser: |
| |
| Airflow Webserver: kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow |
| Default Webserver (Airflow UI) Login credentials: |
| username: admin |
| password: admin |
| Default Postgres connection credentials: |
| username: postgres |
| password: postgres |
| port: 5432 |
| |
| You can get Fernet Key value by running the following: |
| |
| echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode) |
| |
| WARNING: |
| Kubernetes workers task logs may not persist unless you configure log persistence or remote logging! |
| Logging options can be found at: https://airflow.apache.org/docs/helm-chart/stable/manage-logs.html |
| (This warning can be ignored if logging is configured with environment variables or secrets backend) |
| |
| ########################################################### |
| # WARNING: You should set a static webserver secret key # |
| ########################################################### |
| |
| You are using a dynamically generated webserver secret key, which can lead to |
| unnecessary restarts of your Airflow components. |
| |
| Information on how to set a static webserver secret key can be found here: |
| https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key |
| Deployed kind-airflow-python-3.8-v1.24.2 with airflow Helm Chart. |
| |
| Airflow for Python 3.8 and K8S version v1.24.2 has been successfully deployed. |
| |
| The KinD cluster name: airflow-python-3.8-v1.24.2 |
| The kubectl cluster name: kind-airflow-python-3.8-v1.24.2. |
| |
| |
| KinD Cluster API server URL: http://localhost:48366 |
| Connecting to localhost:18150. Num try: 1 |
| Established connection to webserver at http://localhost:18150/health and it is healthy. |
| Airflow Web server URL: http://localhost:18150 (admin/admin) |
| |
| NEXT STEP: You might now run tests or interact with airflow via shell (kubectl, pytest etc.) or k9s commands: |
| |
| |
| breeze k8s tests |
| |
| breeze k8s shell |
| |
| breeze k8s k9s |
| |
| |
| 8. Run Kubernetes tests |
| |
| Note that the tests are executed in production container not in the CI container. |
| There is no need for the tests to run inside the Airflow CI container image as they only |
| communicate with the Kubernetes-run Airflow deployed via the production image. |
| Those Kubernetes tests require virtualenv to be created locally with airflow installed. |
| The virtualenv required will be created automatically when the scripts are run. |
| |
| 8a) You can run all the tests |
| |
| .. code-block:: bash |
| |
| breeze k8s tests |
| |
| .. code-block:: text |
| |
| Running tests with kind-airflow-python-3.8-v1.24.2 cluster. |
| Command to run: pytest kubernetes_tests |
| ========================================================================================= test session starts ========================================================================================== |
| platform darwin -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin/python |
| cachedir: .pytest_cache |
| rootdir: /Users/jarek/IdeaProjects/airflow/kubernetes_tests |
| plugins: anyio-3.6.1, instafail-0.4.2, xdist-2.5.0, forked-1.4.0, timeouts-1.2.1, cov-3.0.0 |
| setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s |
| collected 55 items |
| |
| test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag PASSED [ 1%] |
| test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure PASSED [ 3%] |
| test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_already_checked_on_failure PASSED [ 5%] |
| test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_already_checked_on_success ... |
| |
| 8b) You can enter an interactive shell to run tests one-by-one |
| |
| This enters the virtualenv in ``.build/.k8s-env`` folder: |
| |
| .. code-block:: bash |
| |
| breeze k8s shell |
| |
| Once you enter the environment, you receive this information: |
| |
| .. code-block:: text |
| |
| Entering interactive k8s shell. |
| |
| (kind-airflow-python-3.8-v1.24.2:KubernetesExecutor)> |
| |
| In a separate terminal you can open the k9s CLI: |
| |
| .. code-block:: bash |
| |
| breeze k8s k9s |
| |
| Use it to observe what's going on in your cluster. |
| |
| 9. Debugging in IntelliJ/PyCharm |
| |
| It is very easy to running/debug Kubernetes tests with IntelliJ/PyCharm. Unlike the regular tests they are |
| in ``kubernetes_tests`` folder and if you followed the previous steps and entered the shell using |
| ``breeze k8s shell`` command, you can setup your IDE very easy to run (and debug) your |
| tests using the standard IntelliJ Run/Debug feature. You just need a few steps: |
| |
| 9a) Add the virtualenv as interpreter for the project: |
| |
| .. image:: images/testing/kubernetes-virtualenv.png |
| :align: center |
| :alt: Kubernetes testing virtualenv |
| |
| The virtualenv is created in your "Airflow" source directory in the |
| ``.build/.k8s-env`` folder and you have to find ``python`` binary and choose |
| it when selecting interpreter. |
| |
| 9b) Choose pytest as test runner: |
| |
| .. image:: images/testing/pytest-runner.png |
| :align: center |
| :alt: Pytest runner |
| |
| 9c) Run/Debug tests using standard "Run/Debug" feature of IntelliJ |
| |
| .. image:: images/testing/run-test.png |
| :align: center |
| :alt: Run/Debug tests |
| |
| |
| NOTE! The first time you run it, it will likely fail with |
| ``kubernetes.config.config_exception.ConfigException``: |
| ``Invalid kube-config file. Expected key current-context in kube-config``. You need to add KUBECONFIG |
| environment variable copying it from the result of "breeze k8s tests": |
| |
| .. code-block:: bash |
| |
| echo ${KUBECONFIG} |
| |
| /home/jarek/code/airflow/.build/.kube/config |
| |
| .. image:: images/testing/kubeconfig-env.png |
| :align: center |
| :alt: Run/Debug tests |
| |
| |
| The configuration for Kubernetes is stored in your "Airflow" source directory in ".build/.kube/config" file |
| and this is where KUBECONFIG env should point to. |
| |
| You can iterate with tests while you are in the virtualenv. All the tests requiring Kubernetes cluster |
| are in "kubernetes_tests" folder. You can add extra ``pytest`` parameters then (for example ``-s`` will |
| print output generated test logs and print statements to the terminal immediately. You should have |
| kubernetes_tests as your working directory. |
| |
| .. code-block:: bash |
| |
| pytest test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure -s |
| |
| You can modify the tests or KubernetesPodOperator and re-run them without re-deploying |
| Airflow to KinD cluster. |
| |
| 10. Dumping logs |
| |
| Sometimes You want to see the logs of the clister. This can be done with ``breeze k8s logs``. |
| |
| .. code-block:: bash |
| |
| breeze k8s logs |
| |
| 11. Redeploying airflow |
| |
| Sometimes there are side effects from running tests. You can run ``breeze k8s deploy-airflow --upgrade`` |
| without recreating the whole cluster. |
| |
| .. code-block:: bash |
| |
| breeze k8s deploy-airflow --upgrade |
| |
| If needed you can also delete the cluster manually (within the virtualenv activated by |
| ``breeze k8s shell``: |
| |
| .. code-block:: bash |
| |
| kind get clusters |
| kind delete clusters <NAME_OF_THE_CLUSTER> |
| |
| Kind has also useful commands to inspect your running cluster: |
| |
| .. code-block:: text |
| |
| kind --help |
| |
| 12. Stop KinD cluster when you are done |
| |
| .. code-block:: bash |
| |
| breeze k8s delete-cluster |
| |
| .. code-block:: text |
| |
| Deleting KinD cluster airflow-python-3.8-v1.24.2! |
| Deleting cluster "airflow-python-3.8-v1.24.2" ... |
| KinD cluster airflow-python-3.8-v1.24.2 deleted! |
| |
| |
| Running complete k8s tests |
| -------------------------- |
| |
| You can also run complete k8s tests with |
| |
| .. code-block:: bash |
| |
| breeze k8s run-complete-tests |
| |
| This will create cluster, build images, deploy airflow run tests and finally delete clusters as single |
| command. It is the way it is run in our CI, you can also run such complete tests in parallel. |
| |
| Manually testing release candidate packages |
| =========================================== |
| |
| Breeze can be used to test new release candidates of packages - both Airflow and providers. You can easily |
| turn the CI image of Breeze to install and start Airflow for both Airflow and provider packages - both, |
| packages that are built from sources and packages that are downloaded from PyPI when they are released |
| there as release candidates. |
| |
| The way to test it is rather straightforward: |
| |
| 1) Make sure that the packages - both ``airflow`` and ``providers`` are placed in the ``dist`` folder |
| of your Airflow source tree. You can either build them there or download from PyPI (see the next chapter) |
| |
| 2) You can run ```breeze shell`` or ``breeze start-airflow`` commands with adding the following flags - |
| ``--mount-sources remove`` and ``--use-packages-from-dist``. The first one removes the ``airflow`` |
| source tree from the container when starting it, the second one installs ``airflow`` and ``providers`` |
| packages from the ``dist`` folder when entering breeze. |
| |
| Testing pre-release packages |
| ---------------------------- |
| |
| There are two ways how you can get Airflow packages in ``dist`` folder - by building them from sources or |
| downloading them from PyPI. |
| |
| .. note :: |
| |
| Make sure you run ``rm dist/*`` before you start building packages or downloading them from PyPI because |
| the packages built there already are not removed manually. |
| |
| In order to build apache-airflow from sources, you need to run the following command: |
| |
| .. code-block:: bash |
| |
| breeze release-management prepare-airflow-package |
| |
| In order to build providers from sources, you need to run the following command: |
| |
| .. code-block:: bash |
| |
| breeze release-management prepare-provider-packages <PROVIDER_1> <PROVIDER_2> ... <PROVIDER_N> |
| |
| The packages are built in ``dist`` folder and the command will summarise what packages are available in the |
| ``dist`` folder after it finishes. |
| |
| If you want to download the packages from PyPI, you need to run the following command: |
| |
| .. code-block:: bash |
| |
| pip download apache-airflow-providers-<PROVIDER_NAME>==X.Y.Zrc1 --dest dist --no-deps |
| |
| You can use it for both release and pre-release packages. |
| |
| Examples of testing pre-release packages |
| ---------------------------------------- |
| |
| Few examples below explain how you can test pre-release packages, and combine them with locally build |
| and released packages. |
| |
| The following example downloads ``apache-airflow`` and ``celery`` and ``kubernetes`` provider packages from PyPI and |
| eventually starts Airflow with the Celery Executor. It also loads example dags and default connections: |
| |
| .. code:: bash |
| |
| rm dist/* |
| pip download apache-airflow==2.7.0rc1 --dest dist --no-deps |
| pip download apache-airflow-providers-cncf-kubernetes==7.4.0rc1 --dest dist --no-deps |
| pip download apache-airflow-providers-cncf-kubernetes==3.3.0rc1 --dest dist --no-deps |
| breeze start-airflow --mount-sources remove --use-packages-from-dist --executor CeleryExecutor --load-default-connections --load-example-dags |
| |
| |
| The following example downloads ``celery`` and ``kubernetes`` provider packages from PyPI, builds |
| ``apache-airflow`` package from the main sources and eventually starts Airflow with the Celery Executor. |
| It also loads example dags and default connections: |
| |
| .. code:: bash |
| |
| rm dist/* |
| breeze release-management prepare-airflow-package |
| pip download apache-airflow-providers-cncf-kubernetes==7.4.0rc1 --dest dist --no-deps |
| pip download apache-airflow-providers-cncf-kubernetes==3.3.0rc1 --dest dist --no-deps |
| breeze start-airflow --mount-sources remove --use-packages-from-dist --executor CeleryExecutor --load-default-connections --load-example-dags |
| |
| The following example builds ``celery``, ``kubernetes`` provider packages from PyPI, downloads 2.6.3 version |
| of ``apache-airflow`` package from PyPI and eventually starts Airflow using default executor |
| for the backend chosen (no example dags, no default connections): |
| |
| .. code:: bash |
| |
| rm dist/* |
| pip download apache-airflow==2.6.3 --dest dist --no-deps |
| breeze release-management prepare-provider-packages celery cncf.kubernetes |
| breeze start-airflow --mount-sources remove --use-packages-from-dist |
| |
| You can mix and match packages from PyPI (final or pre-release candidates) with locally build packages. You |
| can also choose which providers to install this way since the ``--remove-sources`` flag makes sure that Airflow |
| installed does not contain all the providers - only those that you explicitly downloaded or built in the |
| ``dist`` folder. This way you can test all the combinations of Airflow + Providers you might need. |
| |
| |
| Airflow System Tests |
| ==================== |
| |
| System tests need to communicate with external services/systems that are available |
| if you have appropriate credentials configured for your tests. |
| The system tests derive from the ``tests.test_utils.system_test_class.SystemTests`` class. They should also |
| be marked with ``@pytest.marker.system(SYSTEM)`` where ``system`` designates the system |
| to be tested (for example, ``google.cloud``). These tests are skipped by default. |
| |
| You can execute the system tests by providing the ``--system SYSTEM`` flag to ``pytest``. You can |
| specify several --system flags if you want to execute tests for several systems. |
| |
| The system tests execute a specified example DAG file that runs the DAG end-to-end. |
| |
| See more details about adding new system tests below. |
| |
| Environment for System Tests |
| ---------------------------- |
| |
| **Prerequisites:** You may need to set some variables to run system tests. If you need to |
| add some initialization of environment variables to Breeze, you can add a |
| ``variables.env`` file in the ``files/airflow-breeze-config/variables.env`` file. It will be automatically |
| sourced when entering the Breeze environment. You can also add some additional |
| initialization commands in this file if you want to execute something |
| always at the time of entering Breeze. |
| |
| There are several typical operations you might want to perform such as: |
| |
| * generating a file with the random value used across the whole Breeze session (this is useful if |
| you want to use this random number in names of resources that you create in your service |
| * generate variables that will be used as the name of your resources |
| * decrypt any variables and resources you keep as encrypted in your configuration files |
| * install additional packages that are needed in case you are doing tests with 1.10.* Airflow series |
| (see below) |
| |
| Example variables.env file is shown here (this is part of the variables.env file that is used to |
| run Google Cloud system tests. |
| |
| .. code-block:: bash |
| |
| # Build variables. This file is sourced by Breeze. |
| # Also it is sourced during continuous integration build in Cloud Build |
| |
| # Auto-export all variables |
| set -a |
| |
| echo |
| echo "Reading variables" |
| echo |
| |
| # Generate random number that will be used across your session |
| RANDOM_FILE="/random.txt" |
| |
| if [[ ! -f "${RANDOM_FILE}" ]]; then |
| echo "${RANDOM}" > "${RANDOM_FILE}" |
| fi |
| |
| RANDOM_POSTFIX=$(cat "${RANDOM_FILE}") |
| |
| |
| To execute system tests, specify the ``--system SYSTEM`` |
| flag where ``SYSTEM`` is a system to run the system tests for. It can be repeated. |
| |
| |
| Forwarding Authentication from the Host |
| ---------------------------------------------------- |
| |
| For system tests, you can also forward authentication from the host to your Breeze container. You can specify |
| the ``--forward-credentials`` flag when starting Breeze. Then, it will also forward the most commonly used |
| credentials stored in your ``home`` directory. Use this feature with care as it makes your personal credentials |
| visible to anything that you have installed inside the Docker container. |
| |
| Currently forwarded credentials are: |
| * credentials stored in ``${HOME}/.aws`` for ``aws`` - Amazon Web Services client |
| * credentials stored in ``${HOME}/.azure`` for ``az`` - Microsoft Azure client |
| * credentials stored in ``${HOME}/.config`` for ``gcloud`` - Google Cloud client (among others) |
| * credentials stored in ``${HOME}/.docker`` for ``docker`` client |
| * credentials stored in ``${HOME}/.snowsql`` for ``snowsql`` - SnowSQL (Snowflake CLI client) |
| |
| Adding a New System Test |
| -------------------------- |
| |
| We are working on automating system tests execution (AIP-4) but for now, system tests are skipped when |
| tests are run in our CI system. But to enable the test automation, we encourage you to add system |
| tests whenever an operator/hook/sensor is added/modified in a given system. |
| |
| * To add your own system tests, derive them from the |
| ``tests.test_utils.system_tests_class.SystemTest`` class and mark with the |
| ``@pytest.mark.system(SYSTEM_NAME)`` marker. The system name should follow the path defined in |
| the ``providers`` package (for example, the system tests from ``tests.providers.google.cloud`` |
| package should be marked with ``@pytest.mark.system("google.cloud")``. |
| |
| * If your system tests need some credential files to be available for an |
| authentication with external systems, make sure to keep these credentials in the |
| ``files/airflow-breeze-config/keys`` directory. Mark your tests with |
| ``@pytest.mark.credential_file(<FILE>)`` so that they are skipped if such a credential file is not there. |
| The tests should read the right credentials and authenticate them on their own. The credentials are read |
| in Breeze from the ``/files`` directory. The local "files" folder is mounted to the "/files" folder in Breeze. |
| |
| * If your system tests are long-running ones (i.e., require more than 20-30 minutes |
| to complete), mark them with the ```@pytest.markers.long_running`` marker. |
| Such tests are skipped by default unless you specify the ``--long-running`` flag to pytest. |
| |
| * The system test itself (python class) does not have any logic. Such a test runs |
| the DAG specified by its ID. This DAG should contain the actual DAG logic |
| to execute. Make sure to define the DAG in ``providers/<SYSTEM_NAME>/example_dags``. These example DAGs |
| are also used to take some snippets of code out of them when documentation is generated. So, having these |
| DAGs runnable is a great way to make sure the documentation is describing a working example. Inside |
| your test class/test method, simply use ``self.run_dag(<DAG_ID>,<DAG_FOLDER>)`` to run the DAG. Then, |
| the system class will take care about running the DAG. Note that the DAG_FOLDER should be |
| a subdirectory of the ``tests.test_utils.AIRFLOW_MAIN_FOLDER`` + ``providers/<SYSTEM_NAME>/example_dags``. |
| |
| |
| A simple example of a system test is available in: |
| |
| ``tests/providers/google/cloud/operators/test_compute_system.py``. |
| |
| It runs two DAGs defined in ``airflow.providers.google.cloud.example_dags.example_compute.py``. |
| |
| |
| The typical system test session |
| ------------------------------- |
| |
| Here is the typical session that you need to do to run system tests: |
| |
| 1. Enter breeze |
| |
| .. code-block:: bash |
| |
| breeze down |
| breeze --python 3.8 --db-reset --forward-credentials |
| |
| This will: |
| |
| * stop the whole environment (i.e. recreates metadata database from the scratch) |
| * run Breeze with: |
| * python 3.8 version |
| * resetting the Airflow database |
| * forward your local credentials to Breeze |
| |
| 3. Run the tests: |
| |
| .. code-block:: bash |
| |
| pytest -o faulthandler_timeout=2400 \ |
| --system=google tests/providers/google/cloud/operators/test_compute_system.py |
| |
| Iteration with System Tests if your resources are slow to create |
| ---------------------------------------------------------------- |
| |
| When you want to iterate on system tests, you might want to create slow resources first. |
| |
| If you need to set up some external resources for your tests (for example compute instances in Google Cloud) |
| you should set them up and teardown in the setUp/tearDown methods of your tests. |
| Since those resources might be slow to create, you might want to add some helpers that |
| set them up and tear them down separately via manual operations. This way you can iterate on |
| the tests without waiting for setUp and tearDown with every test. |
| |
| In this case, you should build in a mechanism to skip setUp and tearDown in case you manually |
| created the resources. A somewhat complex example of that can be found in |
| ``tests.providers.google.cloud.operators.test_cloud_sql_system.py`` and the helper is |
| available in ``tests.providers.google.cloud.operators.test_cloud_sql_system_helper.py``. |
| |
| When the helper is run with ``--action create`` to create cloud sql instances which are very slow |
| to create and set-up so that you can iterate on running the system tests without |
| losing the time for creating theme every time. A temporary file is created to prevent from |
| setting up and tearing down the instances when running the test. |
| |
| This example also shows how you can use the random number generated at the entry of Breeze if you |
| have it in your variables.env (see the previous chapter). In the case of Cloud SQL, you cannot reuse the |
| same instance name for a week so we generate a random number that is used across the whole session |
| and store it in ``/random.txt`` file so that the names are unique during tests. |
| |
| |
| !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| |
| Do not forget to delete manually created resources before leaving the |
| Breeze session. They are usually expensive to run. |
| |
| !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| |
| 1. Enter breeze |
| |
| .. code-block:: bash |
| |
| breeze down |
| breeze --python 3.8 --db-reset --forward-credentials |
| |
| 2. Run create action in helper (to create slowly created resources): |
| |
| .. code-block:: bash |
| |
| python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action create |
| |
| 3. Run the tests: |
| |
| .. code-block:: bash |
| |
| pytest -o faulthandler_timeout=2400 \ |
| --system=google tests/providers/google/cloud/operators/test_compute_system.py |
| |
| 4. Run delete action in helper: |
| |
| .. code-block:: bash |
| |
| python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action delete |
| |
| |
| Local and Remote Debugging in IDE |
| ================================= |
| |
| One of the great benefits of using the local virtualenv and Breeze is an option to run |
| local debugging in your IDE graphical interface. |
| |
| When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate |
| container. This makes it a little harder to use with IDE built-in debuggers. |
| Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions). |
| See additional details on |
| `remote debugging <https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html>`_. |
| |
| You can set up your remote debugging session as follows: |
| |
| .. image:: images/setup_remote_debugging.png |
| :align: center |
| :alt: Setup remote debugging |
| |
| Note that on macOS, you have to use a real IP address of your host rather than the default |
| localhost because on macOS the container runs in a virtual machine with a different IP address. |
| |
| Make sure to configure source code mapping in the remote debugging configuration to map |
| your local sources to the ``/opt/airflow`` location of the sources within the container: |
| |
| .. image:: images/source_code_mapping_ide.png |
| :align: center |
| :alt: Source code mapping |
| |
| Setup VM on GCP with SSH forwarding |
| ----------------------------------- |
| |
| Below are the steps you need to take to set up your virtual machine in the Google Cloud. |
| |
| 1. The next steps will assume that you have configured environment variables with the name of the network and |
| a virtual machine, project ID and the zone where the virtual machine will be created |
| |
| .. code-block:: bash |
| |
| PROJECT_ID="<PROJECT_ID>" |
| GCP_ZONE="europe-west3-a" |
| GCP_NETWORK_NAME="airflow-debugging" |
| GCP_INSTANCE_NAME="airflow-debugging-ci" |
| |
| 2. It is necessary to configure the network and firewall for your machine. |
| The firewall must have unblocked access to port 22 for SSH traffic and any other port for the debugger. |
| In the example for the debugger, we will use port 5555. |
| |
| .. code-block:: bash |
| |
| gcloud compute --project="${PROJECT_ID}" networks create "${GCP_NETWORK_NAME}" \ |
| --subnet-mode=auto |
| |
| gcloud compute --project="${PROJECT_ID}" firewall-rules create "${GCP_NETWORK_NAME}-allow-ssh" \ |
| --network "${GCP_NETWORK_NAME}" \ |
| --allow tcp:22 \ |
| --source-ranges 0.0.0.0/0 |
| |
| gcloud compute --project="${PROJECT_ID}" firewall-rules create "${GCP_NETWORK_NAME}-allow-debugger" \ |
| --network "${GCP_NETWORK_NAME}" \ |
| --allow tcp:5555 \ |
| --source-ranges 0.0.0.0/0 |
| |
| 3. If you have a network, you can create a virtual machine. To save costs, you can create a `Preemptible |
| virtual machine <https://cloud.google.com/preemptible-vms>` that is automatically deleted for up |
| to 24 hours. |
| |
| .. code-block:: bash |
| |
| gcloud beta compute --project="${PROJECT_ID}" instances create "${GCP_INSTANCE_NAME}" \ |
| --zone="${GCP_ZONE}" \ |
| --machine-type=f1-micro \ |
| --subnet="${GCP_NETWORK_NAME}" \ |
| --image=debian-11-bullseye-v20220120 \ |
| --image-project=debian-cloud \ |
| --preemptible |
| |
| To check the public IP address of the machine, you can run the command |
| |
| .. code-block:: bash |
| |
| gcloud compute --project="${PROJECT_ID}" instances describe "${GCP_INSTANCE_NAME}" \ |
| --zone="${GCP_ZONE}" \ |
| --format='value(networkInterfaces[].accessConfigs[0].natIP.notnull().list())' |
| |
| 4. The SSH Daemon's default configuration does not allow traffic forwarding to public addresses. |
| To change it, modify the ``GatewayPorts`` options in the ``/etc/ssh/sshd_config`` file to ``Yes`` |
| and restart the SSH daemon. |
| |
| .. code-block:: bash |
| |
| gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \ |
| --zone="${GCP_ZONE}" -- \ |
| sudo sed -i "s/#\?\s*GatewayPorts no/GatewayPorts Yes/" /etc/ssh/sshd_config |
| |
| gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \ |
| --zone="${GCP_ZONE}" -- \ |
| sudo service sshd restart |
| |
| 5. To start port forwarding, run the following command: |
| |
| .. code-block:: bash |
| |
| gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \ |
| --zone="${GCP_ZONE}" -- \ |
| -N \ |
| -R 0.0.0.0:5555:localhost:5555 \ |
| -v |
| |
| If you have finished using the virtual machine, remember to delete it. |
| |
| .. code-block:: bash |
| |
| gcloud beta compute --project="${PROJECT_ID}" instances delete "${GCP_INSTANCE_NAME}" \ |
| --zone="${GCP_ZONE}" |
| |
| You can use the GCP service for free if you use the `Free Tier <https://cloud.google.com/free>`__. |
| |
| DAG Testing |
| =========== |
| |
| To ease and speed up the process of developing DAGs, you can use |
| py:class:`~airflow.executors.debug_executor.DebugExecutor`, which is a single process executor |
| for debugging purposes. Using this executor, you can run and debug DAGs from your IDE. |
| |
| To set up the IDE: |
| |
| 1. Add ``main`` block at the end of your DAG file to make it runnable. |
| It will run a backfill job: |
| |
| .. code-block:: python |
| |
| if __name__ == "__main__": |
| dag.clear() |
| dag.run() |
| |
| |
| 2. Set up ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in the run configuration of your IDE. |
| Make sure to also set up all environment variables required by your DAG. |
| |
| 3. Run and debug the DAG file. |
| |
| Additionally, ``DebugExecutor`` can be used in a fail-fast mode that will make |
| all other running or scheduled tasks fail immediately. To enable this option, set |
| ``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``. |
| |
| Also, with the Airflow CLI command ``airflow dags test``, you can execute one complete run of a DAG: |
| |
| .. code-block:: bash |
| |
| # airflow dags test [dag_id] [execution_date] |
| airflow dags test example_branch_operator 2018-01-01 |
| |
| By default ``/files/dags`` folder is mounted from your local ``<AIRFLOW_SOURCES>/files/dags`` and this is |
| the directory used by airflow scheduler and webserver to scan dags for. You can place your dags there |
| to test them. |
| |
| The DAGs can be run in the main version of Airflow but they also work |
| with older versions. |
| |
| To run the tests for Airflow 1.10.* series, you need to run Breeze with |
| ``--use-airflow-pypi-version=<VERSION>`` to re-install a different version of Airflow. |
| |
| You should also consider running it with ``restart`` command when you change the installed version. |
| This will clean-up the database so that you start with a clean DB and not DB installed in a previous version. |
| So typically you'd run it like ``breeze --use-airflow-pypi-version=1.10.9 restart``. |
| |
| Tracking SQL statements |
| ======================= |
| |
| You can run tests with SQL statements tracking. To do this, use the ``--trace-sql`` option and pass the |
| columns to be displayed as an argument. Each query will be displayed on a separate line. |
| Supported values: |
| |
| * ``num`` - displays the query number; |
| * ``time`` - displays the query execution time; |
| * ``trace`` - displays the simplified (one-line) stack trace; |
| * ``sql`` - displays the SQL statements; |
| * ``parameters`` - display SQL statement parameters. |
| |
| If you only provide ``num``, then only the final number of queries will be displayed. |
| |
| By default, pytest does not display output for successful tests, if you still want to see them, you must |
| pass the ``--capture=no`` option. |
| |
| If you run the following command: |
| |
| .. code-block:: bash |
| |
| pytest --trace-sql=num,sql,parameters --capture=no \ |
| tests/jobs/test_scheduler_job.py -k test_process_dags_queries_count_05 |
| |
| On the screen you will see database queries for the given test. |
| |
| SQL query tracking does not work properly if your test runs subprocesses. Only queries from the main process |
| are tracked. |