blob: 6b5a8040d3b387069e99fa4c069b2214ef326e56 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Airflow dependencies
====================
Airflow is not a standard python project. Most of the python projects fall into one of two types -
application or library. As described in
`this StackOverflow question <https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions>`_,
the decision whether to pin (freeze) dependency versions for a python project depends on the type. For
applications, dependencies should be pinned, but for libraries, they should be open.
For applications, pinning the dependencies makes it more stable to install in the future - because new
(even transitive) dependencies might cause installation to fail. For libraries - the dependencies should
be open to allow several different libraries with the same requirements to be installed at the same time.
The problem is that Apache Airflow is a bit of both - application to install and library to be used when
you are developing your own operators and DAGs.
This - seemingly unsolvable - puzzle is solved by having pinned constraints files.
.. contents:: :local:
Pinned constraint files
-----------------------
.. note::
Only ``pip`` installation is officially supported.
While it is possible to install Airflow with tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
``pip`` - especially when it comes to constraint vs. requirements management.
Installing via ``Poetry`` or ``pip-tools`` is not currently supported.
There are known issues with ``bazel`` that might lead to circular dependencies when using it to install
Airflow. Please switch to ``pip`` if you encounter such problems. The ``Bazel`` community added support
for cycles in `this PR <https://github.com/bazelbuild/rules_python/pull/1166>`_ so it might be that
newer versions of ``bazel`` will handle it.
If you wish to install airflow using these tools you should use the constraint files and convert
them to appropriate format and workflow that your tool requires.
By default when you install ``apache-airflow`` package - the dependencies are as open as possible while
still allowing the ``apache-airflow`` package to install. This means that the ``apache-airflow`` package
might fail to install when a direct or transitive dependency is released that breaks the installation.
In that case, when installing ``apache-airflow``, you might need to provide additional constraints (for
example ``pip install apache-airflow==1.10.2 Werkzeug<1.0.0``)
There are several sets of constraints we keep:
* 'constraints' - these are constraints generated by matching the current airflow version from sources
and providers that are installed from PyPI. Those are constraints used by the users who want to
install airflow with pip, they are named ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt``.
* "constraints-source-providers" - these are constraints generated by using providers installed from
current sources. While adding new providers their dependencies might change, so this set of providers
is the current set of the constraints for airflow and providers from the current main sources.
Those providers are used by CI system to keep "stable" set of constraints. They are named
``constraints-source-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt``
* "constraints-no-providers" - these are constraints generated from only Apache Airflow, without any
providers. If you want to manage airflow separately and then add providers individually, you can
use them. Those constraints are named ``constraints-no-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt``.
The first two can be used as constraints file when installing Apache Airflow in a repeatable way.
It can be done from the sources:
from the PyPI package:
.. code-block:: bash
pip install "apache-airflow[google,amazon,async]==2.2.5" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.5/constraints-3.8.txt"
The last one can be used to install Airflow in "minimal" mode - i.e when bare Airflow is installed without
extras.
When you install airflow from sources (in editable mode) you should use "constraints-source-providers"
instead (this accounts for the case when some providers have not yet been released and have conflicting
requirements).
.. code-block:: bash
pip install -e ".[devel]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt"
This also works with extras - for example:
.. code-block:: bash
pip install ".[ssh]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt"
There are different set of fixed constraint files for different python major/minor versions and you should
use the right file for the right python version.
If you want to update just the Airflow dependencies, without paying attention to providers, you can do it
using ``constraints-no-providers`` constraint files as well.
.. code-block:: bash
pip install . --upgrade \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt"
The ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` and ``constraints-no-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt``
will be automatically regenerated by CI job every time after the ``pyproject.toml`` is updated and pushed
if the tests are successful.
.. note::
Only ``pip`` installation is currently officially supported.
While there are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
``pip`` - especially when it comes to constraint vs. requirements management.
Installing via ``Poetry`` or ``pip-tools`` is not currently supported.
There are known issues with ``bazel`` that might lead to circular dependencies when using it to install
Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing
the problem in `this PR <https://github.com/bazelbuild/rules_python/pull/1166>`_ so it might be that
newer versions of ``bazel`` will handle it.
If you wish to install airflow using these tools you should use the constraint files and convert
them to appropriate format and workflow that your tool requires.
Optional dependencies (extras)
------------------------------
There are a number of extras that can be specified when installing Airflow. Those
extras can be specified after the usual pip install - for example ``pip install -e.[ssh]`` for editable
installation. Note that there are two kinds of extras - ``regular`` extras (used when you install
airflow as a user, but in ``editable`` mode you can also install ``devel`` extras that are necessary if
you want to run airflow locally for testing and ``doc`` extras that install tools needed to build
the documentation.
This is the full list of these extras:
Core extras
...........
Those extras are available as regular core airflow extras - they install optional features of Airflow.
.. START CORE EXTRAS HERE
aiobotocore, apache-atlas, apache-webhdfs, async, cgroups, cloudpickle, deprecated-api, github-
enterprise, google-auth, graphviz, kerberos, ldap, leveldb, otel, pandas, password, pydantic,
rabbitmq, s3fs, saml, sentry, statsd, uv, virtualenv
.. END CORE EXTRAS HERE
Provider extras
...............
Those extras are available as regular Airflow extras, they install provider packages in standard builds
or dependencies that are necessary to enable the feature in editable build.
.. START PROVIDER EXTRAS HERE
airbyte, alibaba, amazon, apache.beam, apache.cassandra, apache.drill, apache.druid, apache.flink,
apache.hdfs, apache.hive, apache.iceberg, apache.impala, apache.kafka, apache.kylin, apache.livy,
apache.pig, apache.pinot, apache.spark, apprise, arangodb, asana, atlassian.jira, celery, cloudant,
cncf.kubernetes, cohere, common.io, common.sql, databricks, datadog, dbt.cloud, dingding, discord,
docker, elasticsearch, exasol, fab, facebook, ftp, github, google, grpc, hashicorp, http, imap,
influxdb, jdbc, jenkins, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo,
mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, pagerduty,
papermill, pgvector, pinecone, postgres, presto, qdrant, redis, salesforce, samba, segment,
sendgrid, sftp, singularity, slack, smtp, snowflake, sqlite, ssh, tableau, tabular, telegram,
teradata, trino, vertica, weaviate, yandex, zendesk
.. END PROVIDER EXTRAS HERE
Devel extras
.............
The ``devel`` extras are not available in the released packages. They are only available when you install
Airflow from sources in ``editable`` installation - i.e. one that you are usually using to contribute to
Airflow. They provide tools such as ``pytest`` and ``mypy`` for general purpose development and testing.
.. START DEVEL EXTRAS HERE
devel, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel-hadoop,
devel-mypy, devel-sentry, devel-static-checks, devel-tests
.. END DEVEL EXTRAS HERE
Bundle extras
.............
Those extras are bundles dynamically generated from other extras.
.. START BUNDLE EXTRAS HERE
all, all-core, all-dbs, devel-all, devel-ci
.. END BUNDLE EXTRAS HERE
Doc extras
...........
The ``doc`` extras are not available in the released packages. They are only available when you install
Airflow from sources in ``editable`` installation - i.e. one that you are usually using to contribute to
Airflow. They provide tools needed when you want to build Airflow documentation (note that you also need
``devel`` extras installed for airflow and providers in order to build documentation for airflow and
provider packages respectively). The ``doc`` package is enough to build regular documentation, where
``doc_gen`` is needed to generate ER diagram we have describing our database.
.. START DOC EXTRAS HERE
doc, doc-gen
.. END DOC EXTRAS HERE
Deprecated extras
.................
The ``deprecated`` extras are deprecated extras from Airflow 1 that will be removed in future versions.
.. START DEPRECATED EXTRAS HERE
atlas, aws, azure, cassandra, crypto, druid, gcp, gcp-api, hdfs, hive, kubernetes, mssql, pinot, s3,
spark, webhdfs, winrm
.. END DEPRECATED EXTRAS HERE
-----
You can now check how to update Airflow's `metadata database <13_metadata_database_updates.rst>`__ if you need
to update structure of the DB.