| INSTALL / BUILD instructions for Apache Airflow |
| |
| Basic installation of Airflow from sources and development environment setup |
| ============================================================================ |
| |
| This is a generic installation method that requires minimum starndard tools to develop airflow and |
| test it in local virtual environment (using standard CPyhon installation and `pip`). |
| |
| Depending on your system you might need different prerequisites, but the following |
| systems/prerequisites are known to work: |
| |
| Linux (Debian Bookworm): |
| |
| sudo apt install -y --no-install-recommends apt-transport-https apt-utils ca-certificates \ |
| curl dumb-init freetds-bin krb5-user libgeos-dev \ |
| ldap-utils libsasl2-2 libsasl2-modules libxmlsec1 locales libffi8 libldap-2.5-0 libssl3 netcat-openbsd \ |
| lsb-release openssh-client python3-selinux rsync sasl2-bin sqlite3 sudo unixodbc |
| |
| You might need to install MariaDB development headers to build some of the dependencies |
| |
| sudo apt-get install libmariadb-dev libmariadbclient-dev |
| |
| MacOS (Mojave/Catalina) you might need to to install XCode command line tools and brew and those packages: |
| |
| brew install sqlite mysql postgresql |
| |
| The `pip` is one of the build packaging front-ends that might be used to install Airflow. It's the one |
| that we recommend (see below) for reproducible installation of specific versions of Airflow. |
| |
| As of version 2.8 Airflow follows PEP 517/518 and uses `pyproject.toml` file to define build dependencies |
| and build process and it requires relatively modern versions of packaging tools to get airflow built from |
| local sources or sdist packages, as PEP 517 compliant build hooks are used to determine dynamic build |
| dependencies. In case of `pip` it means that at least version 22.1.0 is needed (released at the beginning of |
| 2022) to build or install Airflow from sources. This does not affect the ability of installing Airflow from |
| released wheel packages. |
| |
| Downloading and installing Airflow from sources |
| ----------------------------------------------- |
| |
| While you can get Airflow sources in various ways (including cloning https://github.com/apache/airflow/), the |
| canonical way to download it is to fetch the tarball published at https://downloads.apache.org where you can |
| also verify checksum, signatures of the downloaded file. You can then and un-tar the source move into the |
| directory that was un-tarred. |
| |
| When you download source packages from https://downloads.apache.org, you download sources of Airflow and |
| all providers separately, however when you clone the GitHub repository at https://github.com/apache/airflow/ |
| you get all sources in one place. This is the most convenient way to develop Airflow and Providers together. |
| otherwise you have to separately install Airflow and Providers from sources in the same environment, which |
| is not as convenient. |
| |
| ## Creating virtualenv |
| |
| Airflow pulls in quite a lot of dependencies in order to connect to other services. You generally want to |
| test or run Airflow from a virtual env to make sure those dependencies are separated from your system |
| wide versions. Using system-installed Python installation is strongly discouraged as the versions of Python |
| shipped with operating system often have a number of limitations and are not up to date. It is recommended |
| to install Python using either https://www.python.org/downloads/ or other tools that use them. See later |
| for description of `Hatch` as one of the tools that is Airflow's tool of choice to build Airflow packages. |
| |
| Once you have a suitable Python version installed, you can create a virtualenv and activate it: |
| |
| python3 -m venv PATH_TO_YOUR_VENV |
| source PATH_TO_YOUR_VENV/bin/activate |
| |
| ## Installing airflow locally |
| |
| Installing airflow locally can be done using pip - note that this will install "development" version of |
| Airflow, where all providers are installed from local sources (if they are available), not from `pypi`. |
| It will also not include pre-installed providers installed from PyPI. In case you install from sources of |
| just Airflow, you need to install separately each provider that you want to develop. In case you install |
| from GitHub repository, all the current providers are available after installing Airflow. |
| |
| pip install . |
| |
| If you develop Airflow and iterate on it you should install it in editable mode (with -e) flag and then |
| you do not need to re-install it after each change to sources. This is useful if you want to develop and |
| iterate on Airflow and Providers (together) if you install sources from cloned GitHub repository. |
| |
| Note that you might want to install `devel` extra when you install airflow for development in editable env |
| as this one contains minimum set of tools and dependencies that are needed to run unit tests. |
| |
| |
| pip install -e ".[devel]" |
| |
| |
| You can also install optional packages that are needed to run certain tests. In case of local installation |
| for example you can install all prerequisites for google provider, tests and |
| all hadoop providers with this command: |
| |
| pip install -e ".[google,devel-tests,devel-hadoop]" |
| |
| |
| or you can install all packages needed to run tests for core, providers and all extensions of airflow: |
| |
| pip install -e ".[devel-all]" |
| |
| You can see the list of all available extras below. |
| |
| # Using Hatch to manage your Python, virtualenvs and build packages |
| |
| Airflow uses [hatch](https://hatch.pypa.io/) as a build and development tool of choice. It is one of popular |
| build tools and environment managers for Python, maintained by the Python Packaging Authority. |
| It is an optional tool that is only really needed when you want to build packages from sources, but |
| it is also very convenient to manage your Python versions and virtualenvs. |
| |
| Airflow project contains some pre-defined virtualenv definitions in ``pyproject.toml`` that can be |
| easily used by hatch to create your local venvs. This is not necessary for you to develop and test |
| Airflow, but it is a convenient way to manage your local Python versions and virtualenvs. |
| |
| Installing Hatch |
| ---------------- |
| |
| You can install hat using various other ways (including Gui installers). |
| |
| Example using `pipx`: |
| |
| pipx install hatch |
| |
| We recommend using `pipx` as you can manage installed Python apps easily and later use it |
| to upgrade `hatch` easily as needed with: |
| |
| pipx upgrade hatch |
| |
| ## Using Hatch to manage your Python versions |
| |
| You can also use hatch to install and manage airflow virtualenvs and development |
| environments. For example, you can install Python 3.10 with this command: |
| |
| hatch python install 3.10 |
| |
| or install all Python versions that are used in Airflow: |
| |
| hatch python install all |
| |
| ## Using Hatch to manage your virtualenvs |
| |
| Airflow has some pre-defined virtualenvs that you can use to develop and test airflow. |
| You can see the list of available envs with: |
| |
| hatch env show |
| |
| This is what it shows currently: |
| |
| ┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
| ┃ Name ┃ Type ┃ Description ┃ |
| ┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ |
| │ default │ virtual │ Default environment with Python 3.8 for maximum compatibility │ |
| ├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤ |
| │ airflow-38 │ virtual │ Environment with Python 3.8. No devel installed. │ |
| ├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤ |
| │ airflow-39 │ virtual │ Environment with Python 3.9. No devel installed. │ |
| ├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤ |
| │ airflow-310 │ virtual │ Environment with Python 3.10. No devel installed. │ |
| ├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤ |
| │ airflow-311 │ virtual │ Environment with Python 3.11. No devel installed │ |
| ├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤ |
| │ airflow-312 │ virtual │ Environment with Python 3.12. No devel installed │ |
| └─────────────┴─────────┴───────────────────────────────────────────────────────────────┘ |
| |
| The default env (if you have not used one explicitly) is `default` and it is a Python 3.8 |
| virtualenv for maximum compatibility with `devel` extra installed - this devel extra contains the minimum set |
| of dependencies and tools that should be used during unit testing of core Airflow and running all `airflow` |
| CLI commands - without support for providers or databases. |
| |
| The other environments are just bare-bones Python virtualenvs with Airflow core requirements only, |
| without any extras installed and without any tools. They are much faster to create than the default |
| environment, and you can manually install either appropriate extras or directly tools that you need for |
| testing or development. |
| |
| hatch env create |
| |
| You can create specific environment by using them in create command: |
| |
| hatch env create airflow-310 |
| |
| You can install extras in the environment by running pip command: |
| |
| hatch -e airflow-310 run -- pip install -e ".[devel,google]" |
| |
| And you can enter the environment with running a shell of your choice (for example zsh) where you |
| can run any commands |
| |
| hatch -e airflow-310 shell |
| |
| Once you are in the environment (indicated usually by updated prompt), you can just install |
| extra dependencies you need: |
| |
| [~/airflow] [airflow-310] pip install -e ".[devel,google]" |
| |
| You can exit the environment by just exiting the shell. |
| |
| You can also see where hatch created the virtualenvs and use it in your IDE or activate it manually: |
| |
| hatch env find airflow-310 |
| |
| You will get path similar to: |
| |
| /Users/jarek/Library/Application Support/hatch/env/virtual/apache-airflow/TReRdyYt/apache-airflow |
| |
| Then you will find `python` binary and `activate` script in the `bin` sub-folder of this directory and |
| you can configure your IDE to use this python virtualenv if you want to use that environment in your IDE. |
| |
| You can also set default environment name by HATCH_ENV environment variable. |
| |
| You can clean the env by running: |
| |
| hatch env prune |
| |
| More information about hatch can be found in https://hatch.pypa.io/1.9/environment/ |
| |
| ## Using Hatch to build your packages |
| |
| You can use hatch to build installable package from the airflow sources. Such package will |
| include all metadata that is configured in `pyproject.toml` and will be installable with pip. |
| |
| The packages will have pre-installed dependencies for providers that are always |
| installed when Airflow is installed from PyPI. By default both `wheel` and `sdist` packages are built. |
| |
| hatch build |
| |
| You can also build only `wheel` or `sdist` packages: |
| |
| hatch build -t wheel |
| hatch build -t sdist |
| |
| ## Installing recommended version of dependencies |
| |
| Whatever virtualenv solution you use, when you want to make sure you are using the same |
| version of dependencies as in main, you can install recommended version of the dependencies by using |
| constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as `constraint` file. This might be useful |
| to avoid "works-for-me" syndrome, where you use different version of dependencies than the ones |
| that are used in main, CI tests and by other contributors. |
| |
| There are different constraint files for different python versions. For example this command will install |
| all basic devel requirements and requirements of google provider as last successfully tested for Python 3.8: |
| |
| pip install -e ".[devel,google]"" \ |
| --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt" |
| |
| You can upgrade just airflow, without paying attention to provider's dependencies by using |
| the 'constraints-no-providers' constraint files. This allows you to keep installed provider dependencies |
| and install to latest supported ones by pure airflow core. |
| |
| pip install -e ".[devel]" \ |
| --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt" |
| |
| Airflow extras |
| ============== |
| |
| Airflow has a number of extras that you can install to get additional dependencies. They sometimes install |
| providers, sometimes enable other features where packages are not installed by default. |
| |
| You can read more about those extras in the extras reference: |
| https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html |
| |
| The list of available extras is below. |
| |
| Core extras |
| ----------- |
| |
| Those extras are available as regular core airflow extras - they install optional features of Airflow. |
| |
| # START CORE EXTRAS HERE |
| |
| aiobotocore, apache-atlas, apache-webhdfs, async, cgroups, cloudpickle, deprecated-api, github- |
| enterprise, google-auth, graphviz, kerberos, ldap, leveldb, otel, pandas, password, pydantic, |
| rabbitmq, s3fs, saml, sentry, statsd, uv, virtualenv |
| |
| # END CORE EXTRAS HERE |
| |
| Provider extras |
| --------------- |
| |
| Those extras are available as regular Airflow extras, they install provider packages in standard builds |
| or dependencies that are necessary to enable the feature in editable build. |
| |
| # START PROVIDER EXTRAS HERE |
| |
| airbyte, alibaba, amazon, apache.beam, apache.cassandra, apache.drill, apache.druid, apache.flink, |
| apache.hdfs, apache.hive, apache.impala, apache.kafka, apache.kylin, apache.livy, apache.pig, |
| apache.pinot, apache.spark, apprise, arangodb, asana, atlassian.jira, celery, cloudant, |
| cncf.kubernetes, cohere, common.io, common.sql, databricks, datadog, dbt.cloud, dingding, discord, |
| docker, elasticsearch, exasol, fab, facebook, ftp, github, google, grpc, hashicorp, http, imap, |
| influxdb, jdbc, jenkins, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, |
| mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, pagerduty, |
| papermill, pgvector, pinecone, postgres, presto, qdrant, redis, salesforce, samba, segment, |
| sendgrid, sftp, singularity, slack, smtp, snowflake, sqlite, ssh, tableau, tabular, telegram, |
| teradata, trino, vertica, weaviate, yandex, zendesk |
| |
| # END PROVIDER EXTRAS HERE |
| |
| Devel extras |
| ------------ |
| |
| The `devel` extras are not available in the released packages. They are only available when you install |
| Airflow from sources in `editable` installation - i.e. one that you are usually using to contribute to |
| Airflow. They provide tools such as `pytest` and `mypy` for general purpose development and testing. |
| |
| # START DEVEL EXTRAS HERE |
| |
| devel, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel-hadoop, |
| devel-mypy, devel-sentry, devel-static-checks, devel-tests |
| |
| # END DEVEL EXTRAS HERE |
| |
| Bundle extras |
| ------------- |
| |
| Those extras are bundles dynamically generated from other extras. |
| |
| # START BUNDLE EXTRAS HERE |
| |
| all, all-core, all-dbs, devel-all, devel-ci |
| |
| # END BUNDLE EXTRAS HERE |
| |
| |
| Doc extras |
| ---------- |
| |
| Doc extras - used to install dependencies that are needed to build documentation. Only available during |
| editable install. |
| |
| # START DOC EXTRAS HERE |
| |
| doc, doc-gen |
| |
| # END DOC EXTRAS HERE |
| |
| Deprecated extras |
| ----------------- |
| |
| The `deprecated` extras are deprecated extras from Airflow 1 that will be removed in future versions. |
| |
| # START DEPRECATED EXTRAS HERE |
| |
| atlas, aws, azure, cassandra, crypto, druid, gcp, gcp-api, hdfs, hive, kubernetes, mssql, pinot, s3, |
| spark, webhdfs, winrm |
| |
| # END DEPRECATED EXTRAS HERE |
| |
| Compiling front end assets |
| -------------------------- |
| |
| Sometimes you can see that front-end assets are missing and website looks broken. This is because |
| you need to compile front-end assets. This is done automatically when you create a virtualenv |
| with hatch, but if you want to do it manually, you can do it after installing node and yarn and running: |
| |
| yarn install --frozen-lockfile |
| yarn run build |
| |
| Currently we are running yarn coming with note 18.6.0, but you should check the version in |
| our `.pre-commit-config.yaml` file (node version). |
| |
| Installing yarn is described in https://classic.yarnpkg.com/en/docs/install |
| |
| Also - in case you use `breeze` or have `pre-commit` installed you can build the assets with: |
| |
| pre-commit run --hook-stage manual compile-www-assets --all-files |
| |
| or |
| |
| breeze compile-www-assets |
| |
| Both commands will install node and yarn if needed to a dedicated pre-commit node environment and |
| then build the assets. |
| |
| Finally you can also clean and recompile assets with ``custom`` build target when running hatch build |
| |
| hatch build -t custom -t wheel -t sdist |
| |
| This will also update `git_version` file in airflow package that should contain the git commit hash of the |
| build. This is used to display the commit hash in the UI. |