blob: 373b6d76aaf5aa8995763a6bfc1bde2362f99d1e [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Local Virtual Environment (virtualenv)
======================================
The easiest wey to run tests for Airflow is to use local virtualenv. While Breeze is the recommended
way to run tests - because it provides a reproducible environment and is easy to set up, it is not
always the best option as you need to run your tests inside a docker container. This might make it
harder to debug the tests and to use your IDE to run them.
That's why we recommend using local virtualenv for development and testing.
.. contents:: :local:
Installation in local virtualenv
--------------------------------
Required Software Packages
..........................
Use system-level package managers like yum, apt-get for Linux, or
Homebrew for macOS to install required software packages:
* Python (One of: 3.8, 3.9, 3.10, 3.11, 3.12)
* MySQL 5.7+
* libxml
* helm (only for helm chart tests)
Refer to the `Dockerfile.ci <../Dockerfile.ci>`__ for a comprehensive list
of required packages.
.. note::
- MySql 2.2.0 needs pkgconf to be a pre requisite, refer `here <http://pkgconf.org/>`_ to install pkgconf
- MacOs with ARM architectures require graphviz for venv setup, refer `here <https://graphviz.org/download/>`_ to install graphviz
- The helm chart tests need helm to be installed as a pre requisite. Refer `here <https://helm.sh/docs/intro/install/>`_ to install and setup helm
.. note::
As of version 2.8 Airflow follows PEP 517/518 and uses ``pyproject.toml`` file to define build dependencies
and build process and it requires relatively modern versions of packaging tools to get airflow built from
local sources or ``sdist`` packages, as PEP 517 compliant build hooks are used to determine dynamic build
dependencies. In case of ``pip`` it means that at least version 22.1.0 is needed (released at the beginning of
2022) to build or install Airflow from sources. This does not affect the ability of installing Airflow from
released wheel packages.
Installing Airflow
..................
The simplest way to install Airflow in local virtualenv is to use ``pip``:
.. code:: bash
pip install -e ".[devel,<OTHER EXTRAS>]" # for example: pip install -e ".[devel,google,postgres]"
This will install Airflow in 'editable' mode - where sources of Airflow are taken directly from the source
code rather than moved to the installation directory. You need to run this command in the virtualenv you
want to install Airflow in - and you need to have the virtualenv activated.
While you can use any virtualenv manager, we recommend using `Hatch <https://hatch.pypa.io/latest/>`__
as your development environment front-end, and we already use Hatch backend ``hatchling`` for Airflow.
Hatchling is automatically installed when you build Airflow but since airflow build system uses
``PEP`` compliant ``pyproject.toml`` file, you can use any front-end build system that supports
``PEP 517`` and ``PEP 518``. You can also use ``pip`` to install Airflow in editable mode.
Extras (optional dependencies)
..............................
You can also install extra packages (like ``[ssh]``, etc) via
``pip install -e [devel,EXTRA1,EXTRA2 ...]``. However, some of them may
have additional install and setup requirements for your local system.
For example, if you have a trouble installing the mysql client on macOS and get
an error as follows:
.. code:: text
ld: library not found for -lssl
you should set LIBRARY\_PATH before running ``pip install``:
.. code:: bash
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/opt/openssl/lib/
You are STRONGLY encouraged to also install and use `pre-commit hooks <08_static_code_checks.rst#pre-commit-hooks>`_
for your local virtualenv development environment. Pre-commit hooks can speed up your
development cycle a lot.
The full list of extras is available in `pyproject.toml <../pyproject.toml>`_ and can be easily retrieved using hatch via
.. note::
Only ``pip`` installation is currently officially supported.
Make sure you have the latest pip installed, reference `version <https://pip.pypa.io/en/stable/#>`_
While there are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
``pip`` - especially when it comes to constraint vs. requirements management.
Installing via ``Poetry`` or ``pip-tools`` is not currently supported.
There are known issues with ``bazel`` that might lead to circular dependencies when using it to install
Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing
the problem in `this PR <https://github.com/bazelbuild/rules_python/pull/1166>`_ so it might be that
newer versions of ``bazel`` will handle it.
If you wish to install airflow using those tools you should use the constraint files and convert
them to appropriate format and workflow that your tool requires.
Using Hatch
-----------
Airflow uses `hatch <https://hatch.pypa.io/>`_ as a build and development tool of choice. It is one of popular
build tools and environment managers for Python, maintained by the Python Packaging Authority.
It is an optional tool that is only really needed when you want to build packages from sources, but
it is also very convenient to manage your Python versions and virtualenvs.
Airflow project contains some pre-defined virtualenv definitions in ``pyproject.toml`` that can be
easily used by hatch to create your local venvs. This is not necessary for you to develop and test
Airflow, but it is a convenient way to manage your local Python versions and virtualenvs.
Installing Hatch
................
You can install hat using various other ways (including Gui installers).
Example using ``pipx``:
.. code:: bash
pipx install hatch
We recommend using ``pipx`` as you can manage installed Python apps easily and later use it
to upgrade ``hatch`` easily as needed with:
.. code:: bash
pipx upgrade hatch
## Using Hatch to manage your Python versions
You can also use hatch to install and manage airflow virtualenvs and development
environments. For example, you can install Python 3.10 with this command:
.. code:: bash
hatch python install 3.10
or install all Python versions that are used in Airflow:
.. code:: bash
hatch python install all
Manage your virtualenvs with Hatch
..................................
Airflow has some pre-defined virtualenvs that you can use to develop and test airflow.
You can see the list of available envs with:
.. code:: bash
hatch env show
This is what it shows currently:
+-------------+---------+---------------------------------------------------------------+
| Name | Type | Description |
+=============+=========+===============================================================+
| default | virtual | Default environment with Python 3.8 for maximum compatibility |
+-------------+---------+---------------------------------------------------------------+
| airflow-38 | virtual | Environment with Python 3.8. No devel installed. |
+-------------+---------+---------------------------------------------------------------+
| airflow-39 | virtual | Environment with Python 3.9. No devel installed. |
+-------------+---------+---------------------------------------------------------------+
| airflow-310 | virtual | Environment with Python 3.10. No devel installed. |
+-------------+---------+---------------------------------------------------------------+
| airflow-311 | virtual | Environment with Python 3.11. No devel installed |
+-------------+---------+---------------------------------------------------------------+
| airflow-312 | virtual | Environment with Python 3.12. No devel installed |
+-------------+---------+---------------------------------------------------------------+
The default env (if you have not used one explicitly) is ``default`` and it is a Python 3.8
virtualenv for maximum compatibility. You can install devel set of dependencies with it
by running:
.. code:: bash
pip install -e ".[devel]"
After entering the environment.
The other environments are just bare-bones Python virtualenvs with Airflow core requirements only,
without any extras installed and without any tools. They are much faster to create than the default
environment, and you can manually install either appropriate extras or directly tools that you need for
testing or development.
.. code:: bash
hatch env create
You can create specific environment by using them in create command:
.. code:: bash
hatch env create airflow-310
You can install extras in the environment by running pip command:
.. code:: bash
hatch -e airflow-310 run -- pip install -e ".[devel,google]"
And you can enter the environment with running a shell of your choice (for example zsh) where you
can run any commands
.. code:: bash
hatch -e airflow-310 shell
Once you are in the environment (indicated usually by updated prompt), you can just install
extra dependencies you need:
.. code:: bash
[~/airflow] [airflow-310] pip install -e ".[devel,google]"
You can also see where hatch created the virtualenvs and use it in your IDE or activate it manually:
.. code:: bash
hatch env find airflow-310
You will get path similar to:
.. code::
/Users/jarek/Library/Application Support/hatch/env/virtual/apache-airflow/TReRdyYt/apache-airflow
Then you will find ``python`` binary and ``activate`` script in the ``bin`` sub-folder of this directory and
you can configure your IDE to use this python virtualenv if you want to use that environment in your IDE.
You can also set default environment name by HATCH_ENV environment variable.
You can clean the env by running:
.. code:: bash
hatch env prune
More information about hatch can be found in `Hatch: Environments <https://hatch.pypa.io/latest/environment/>`__
## Using Hatch to build your packages
You can use hatch to build installable package from the airflow sources. Such package will
include all metadata that is configured in ``pyproject.toml`` and will be installable with pip.
The packages will have pre-installed dependencies for providers that are always
installed when Airflow is installed from PyPI. By default both ``wheel`` and ``sdist`` packages are built.
.. code:: bash
hatch build
You can also build only ``wheel`` or ``sdist`` packages:
.. code:: bash
hatch build -t wheel
hatch build -t sdist
Local and Remote Debugging in IDE
---------------------------------
One of the great benefits of using the local virtualenv and Breeze is an option to run
local debugging in your IDE graphical interface.
When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate
container. This makes it a little harder to use with IDE built-in debuggers.
Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions).
See additional details on
`remote debugging <https://www.jetbrains.com/help/pycharm/remote-debugging-with-product.html>`_.
You can set up your remote debugging session as follows:
.. image:: images/setup_remote_debugging.png
:align: center
:alt: Setup remote debugging
Note that on macOS, you have to use a real IP address of your host rather than the default
localhost because on macOS the container runs in a virtual machine with a different IP address.
Make sure to configure source code mapping in the remote debugging configuration to map
your local sources to the ``/opt/airflow`` location of the sources within the container:
.. image:: images/source_code_mapping_ide.png
:align: center
:alt: Source code mapping
Developing Providers
--------------------
In Airflow 2.0 we introduced split of Apache Airflow into separate packages - there is one main
apache-airflow package with core of Airflow and 70+ packages for all providers (external services
and software Airflow can communicate with).
When you install airflow from sources using editable install, you can develop together both - main version
of Airflow and providers, which is pretty convenient, because you can use the same environment for both.
Running ``pip install -e .`` will install Airflow in editable mode, but all provider code will also be
available in the same environment. However, most provider need some additional dependencies.
You can install the dependencies of the provider you want to develop by installing airflow in editable
mode with ``provider id`` as extra (with ``-`` instead of ``.``) . You can see the list of provider's extras in the
`extras reference <../docs/apache-airflow/extra-packages-ref.rst>`_.
For example, if you want to develop Google provider, you can install it with:
.. code:: bash
pip install -e ".[devel,google]"
In case of a provider has name compose of several segments, you can use ``-`` to separate them. You can also
install multiple extra dependencies at a time:
.. code:: bash
pip install -e ".[devel,apache-beam,dbt-cloud]"
The dependencies for providers are configured in ``airflow/providers/PROVIDERS_FOLDER/provider.yaml`` file -
separately for each provider. You can find there two types of ``dependencies`` - production runtime
dependencies, and sometimes ``devel-dependencies`` which are needed to run tests. While ``provider.yaml``
file is the single source of truth for the dependencies, eventually they need to find its way to Airflow`s
``pyproject.toml``. This is done by running:
.. code:: bash
pre-commit run update-providers-dependencies --all-files
This will update ``pyproject.toml`` with the dependencies from ``provider.yaml`` files and from there
it will be used automatically when you install Airflow in editable mode.
If you want to add another dependency to a provider, you should add it to corresponding ``provider.yaml``,
run the command above and commit the changes to ``pyproject.toml``. Then running
``pip install -e .[devel,PROVIDER_EXTRA]`` will install the new dependencies. Tools like ``hatch`` can also
install the dependencies automatically when you create or switch to a development environment.
Installing recommended version of dependencies
----------------------------------------------
Whatever virtualenv solution you use, when you want to make sure you are using the same
version of dependencies as in main, you can install recommended version of the dependencies by using
constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as ``constraint`` file. This might be useful
to avoid "works-for-me" syndrome, where you use different version of dependencies than the ones
that are used in main, CI tests and by other contributors.
There are different constraint files for different python versions. For example this command will install
all basic devel requirements and requirements of google provider as last successfully tested for Python 3.8:
.. code:: bash
pip install -e ".[devel,google]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt"
Make sure to use latest main for such installation, those constraints are "development constraints" and they
are refreshed several times a day to make sure they are up to date with the latest changes in the main branch.
Note that this might not always work as expected, because the constraints are not always updated
immediately after the dependencies are updated, sometimes there is a very recent change (few hours, rarely more
than a day) which still runs in ``canary`` build and constraints will not be updated until the canary build
succeeds. Usually what works in this case is running your install command without constraints.
You can upgrade just airflow, without paying attention to provider's dependencies by using
the 'constraints-no-providers' constraint files. This allows you to keep installed provider dependencies
and install to latest supported ones by pure airflow core.
.. code:: bash
pip install -e ".[devel]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt"
These are examples of the development options available with the local virtualenv in your IDE:
* local debugging;
* Airflow source view;
* auto-completion;
* documentation support;
* unit tests.
This document describes minimum requirements and instructions for using a standalone version of the local virtualenv.
Running Tests
-------------
Running tests is described in `Testing documentation <09_testing.rst>`_.
While most of the tests are typical unit tests that do not require external components, there are a number
of Integration tests. You can technically use local virtualenv to run those tests, but it requires to
set up all necessary dependencies for all the providers you are going to tests and also setup
databases - and sometimes other external components (for integration test).
So, generally it should be easier to use the `Breeze <../dev/breeze/doc/README.rst>`__ development environment
(especially for Integration tests).
Connecting to database
----------------------
When analyzing the situation, it is helpful to be able to directly query the database. You can do it using
the built-in Airflow command (however you needs a CLI client tool for each database to be installed):
.. code:: bash
airflow db shell
The command will explain what CLI tool is needed for the database you have configured.
-----------
As the next step, it is important to learn about `Static code checks <08_static_code_checks.rst>`__.that are
used to automate code quality checks. Your code must pass the static code checks to get merged.