| |
| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| Provider packages |
| ----------------- |
| |
| Apache Airflow 2 is built in modular way. The "Core" of Apache Airflow provides core scheduler |
| functionality which allow you to write some basic tasks, but the capabilities of Apache Airflow can |
| be extended by installing additional packages, called ``providers``. |
| |
| Providers can contain operators, hooks, sensor, and transfer operators to communicate with a |
| multitude of external systems, but they can also extend Airflow core with new capabilities. |
| |
| You can install those provider packages separately in order to interface with a given service. The providers |
| for ``Apache Airflow`` are designed in the way that you can write your own providers easily. The |
| ``Apache Airflow Community`` develops and maintain more than 60 provider packages, but you are free to |
| develop your own providers - the providers you build have exactly the same capability as the providers |
| written by the community, so you can release and share those providers with others. |
| |
| The full list of community managed providers is available at |
| `Providers Index <https://airflow.apache.org/docs/#providers-packages-docs-apache-airflow-providers-index-html>`_. |
| |
| You can also see index of all community provider's operators and hooks in |
| :doc:`/operators-and-hooks-ref/index` |
| |
| Extending Airflow core functionality |
| ------------------------------------ |
| |
| Providers give you the capability of extending core Airflow with extra capabilities. The Core airflow |
| provides basic and solid functionality of scheduling, the providers extend its capabilities. Here we |
| describe all the custom capabilities. |
| |
| Airflow automatically discovers which providers add those additional capabilities and, once you install |
| provider package and re-start Airflow, those become automatically available to Airflow Users. |
| |
| The summary of the core functionalities that can be extended are available in |
| :doc:`/core-extensions/index`. |
| |
| Auth backends |
| ''''''''''''' |
| |
| The providers can add custom authentication backends, that allow you to configure the way how your |
| web server authenticates your users, integrating it with public or private authentication services. |
| |
| You can see all the authentication backends available via community-managed providers in |
| :doc:`/core-extensions/auth-backends` |
| |
| Custom connections |
| '''''''''''''''''' |
| |
| The providers can add custom connection types, extending connection form and handling custom form field |
| behaviour for the connections defined by the provider. |
| |
| You can see all custom connections available via community-managed providers in |
| :doc:`/core-extensions/connections`. |
| |
| Extra links |
| ''''''''''' |
| |
| The providers can add extra custom links to operators delivered by the provider. Those will be visible in |
| task details view of the task. |
| |
| You can see all the extra links available via community-managed providers in |
| :doc:`/core-extensions/extra-links`. |
| |
| |
| Logging |
| ''''''' |
| |
| The providers can add additional task logging capabilities. By default ``Apache Airflow`` saves logs for |
| tasks locally and make them available to Airflow UI via internal http server, however via providers |
| you can add extra logging capabilities, where Airflow Logs can be written to a remote service and |
| retrieved from those services. |
| |
| You can see all task loggers available via community-managed providers in |
| :doc:`/core-extensions/logging`. |
| |
| |
| Secret backends |
| ''''''''''''''' |
| |
| Airflow has the capability of reading connections, variables and configuration from Secret Backends rather |
| than from its own Database. |
| |
| You can see all secret backends available via community-managed providers in |
| :doc:`/core-extensions/secrets-backends`. |
| |
| |
| Installing and upgrading providers |
| ---------------------------------- |
| |
| Separate provider packages give the possibilities that were not available in 1.10: |
| |
| 1. You can upgrade to latest version of particular providers without the need of Apache Airflow core upgrade. |
| |
| 2. You can downgrade to previous version of particular provider in case the new version introduces |
| some problems, without impacting the main Apache Airflow core package. |
| |
| 3. You can release and upgrade/downgrade provider packages incrementally, independent from each other. This |
| means that you can incrementally validate each of the provider package update in your environment, |
| following the usual tests you have in your environment. |
| |
| |
| Types of providers |
| ------------------ |
| |
| Providers have the same capacity - no matter if they are provided by the community or if they are |
| third-party providers. This chapter explains how community managed providers are versioned and released |
| and how you can create your own providers. |
| |
| Community maintained providers |
| '''''''''''''''''''''''''''''' |
| |
| From the point of view of the community, Airflow is delivered in multiple, separate packages. |
| The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are more than |
| 60 provider packages which can be installed separately as so called ``Airflow Provider packages``. |
| Those packages are available as ``apache-airflow-providers`` packages - for example there is an |
| ``apache-airflow-providers-amazon`` or ``apache-airflow-providers-google`` package). |
| |
| Community maintained providers are released and versioned separately from the Airflow releases. We are |
| following the `Semver <https://semver.org/>`_ versioning scheme for the packages. Some versions of the |
| provider packages might depend on particular versions of Airflow, but the general approach we have is that |
| unless there is a good reason, new version of providers should work with recent versions of Airflow 2.x. |
| Details will vary per-provider and if there is a limitation for particular version of particular provider, |
| constraining the Airflow version used, it will be included as limitation of dependencies in the provider |
| package. |
| |
| Each community provider has corresponding extra which can be used when installing airflow to install the |
| provider together with ``Apache Airflow`` - for example you can install airflow with those extras: |
| ``apache-airflow[google,amazon]`` (with correct constraints -see :doc:`apache-airflow:installation/index`) and you |
| will install the appropriate versions of the ``apache-airflow-providers-amazon`` and |
| ``apache-airflow-providers-google`` packages together with ``Apache Airflow``. |
| |
| Some of the community providers have cross-provider dependencies as well. Those are not required |
| dependencies, they might simply enable certain features (for example transfer operators often create |
| dependency between different providers. Again, the general approach here is that the providers are backwards |
| compatible, including cross-dependencies. Any kind of breaking changes and requirements on particular versions of other |
| provider packages are automatically documented in the release notes of every provider. |
| |
| .. note:: |
| For Airflow 1.10 We also provided ``apache-airflow-backport-providers`` packages that could be installed |
| with those versions Those were the same providers as for 2.0 but automatically back-ported to work for |
| Airflow 1.10. The last release of backport providers was done on March 17, 2021 and the backport |
| providers will no longer be released, since Airflow 1.10 has reached End-Of-Life as of June 17, 2021. |
| |
| If you want to contribute to ``Apache Airflow``, you can see how to build and extend community |
| managed providers in :doc:`howto/create-update-providers`. |
| |
| .. _providers:community-maintained-providers: |
| |
| Custom provider packages |
| '''''''''''''''''''''''' |
| |
| You can develop and release your own providers. Your custom operators, hooks, sensors, transfer operators |
| can be packaged together in a standard airflow package and installed using the same mechanisms. |
| Moreover they can also use the same mechanisms to extend the Airflow Core with auth backends, |
| custom connections, logging, secret backends and extra operator links as described in the previous chapter. |
| |
| How to create your own provider |
| ------------------------------- |
| |
| As mentioned in the `Providers <http://airflow.apache.org/docs/apache-airflow-providers/index.html>`_ |
| documentation, custom providers can extend Airflow core - they can add extra links to operators as well |
| as custom connections. You can use build your own providers and install them as packages if you would like |
| to use the mechanism for your own, custom providers. |
| |
| Adding a provider to Airflow is just a matter of building a Python package and adding the right meta-data to |
| the package. We are using standard mechanism of python to define |
| `entry points <https://docs.python.org/3/library/importlib.metadata.html#entry-points>`_ . Your package |
| needs to define appropriate entry-point ``apache_airflow_provider`` which has to point to a callable |
| implemented by your package and return a dictionary containing the list of discoverable capabilities |
| of your package. The dictionary has to follow the |
| `json-schema specification <https://github.com/apache/airflow/blob/main/airflow/provider_info.schema.json>`_. |
| |
| Most of the schema provides extension point for the documentation (which you might want to also use for |
| your own purpose) but the important fields from the extensibility point of view are those: |
| |
| Displaying package information in CLI/API: |
| |
| * ``package-name`` - Name of the package for the provider. |
| |
| * ``name`` - Human-friendly name of the provider. |
| |
| * ``description`` - Additional description of the provider. |
| |
| * ``version`` - List of versions of the package (in reverse-chronological order). The first version in the |
| list is the current package version. It is taken from the version of package installed, not from the |
| provider_info information. |
| |
| Exposing customized functionality to the Airflow's core: |
| |
| * ``extra-links`` - this field should contain the list of all operator class names that are adding extra links |
| capability. See :doc:`apache-airflow:howto/define_extra_link` for description of how to add extra link |
| capability to the operators of yours. |
| |
| * ``connection-types`` - this field should contain the list of all connection types together with hook |
| class names implementing those custom connection types (providing custom extra fields and |
| custom field behaviour). This field is available as of Airflow 2.2.0 and it replaces deprecated |
| ``hook-class-names``. See :doc:`apache-airflow:howto/connection` for more details |
| |
| * ``hook-class-names`` (deprecated) - this field should contain the list of all hook class names that provide |
| custom connection types with custom extra fields and field behaviour. The ``hook-class-names`` array |
| is deprecated as of Airflow 2.2.0 (for optimization reasons) and will be removed in Airflow 3. If your |
| providers are targeting Airflow 2.2.0+ you do not have to include the ``hook-class-names`` array, if |
| you want to also target earlier versions of Airflow 2, you should include both ``hook-class-names`` and |
| ``connection-types`` arrays. See :doc:`apache-airflow:howto/connection` for more details. |
| |
| |
| When your providers are installed you can query the installed providers and their capabilities with the |
| ``airflow providers`` command. This way you can verify if your providers are properly recognized and whether |
| they define the extensions properly. See :doc:`apache-airflow:cli-and-env-variables-ref` for details of available CLI |
| sub-commands. |
| |
| When you write your own provider, consider following the |
| `Naming conventions for provider packages <https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#naming-conventions-for-provider-packages>`_ |
| |
| |
| FAQ for Airflow and Providers |
| ----------------------------- |
| |
| Upgrading Airflow 2.0 and Providers |
| ''''''''''''''''''''''''''''''''''' |
| |
| **When upgrading to a new Airflow version such as 2.0, but possibly 2.0.1 and beyond, is the best practice |
| to also upgrade provider packages at the same time?** |
| |
| It depends on your use case. If you have automated or semi-automated verification of your installation, |
| that you can run a new version of Airflow including all provider packages, then definitely go for it. |
| If you rely more on manual testing, it is advised that you upgrade in stages. Depending on your choice |
| you can either upgrade all used provider packages first, and then upgrade Airflow Core or the other way |
| round. The first approach - when you first upgrade all providers is probably safer, as you can do it |
| incrementally, step-by-step replacing provider by provider in your environment. |
| |
| Customizing Provider Packages |
| ''''''''''''''''''''''''''''' |
| |
| **I have an older version of my provider package which we have lightly customized and is working |
| fine with my MSSQL installation. I am upgrading my Airflow version. Do I need to upgrade my provider, |
| or can I keep it as it is?** |
| |
| It depends on the scope of customization. There is no need to upgrade the provider packages to later |
| versions unless you want to upgrade to Airflow version that introduces backwards-incompatible changes. |
| Generally speaking, with Airflow 2 we are following the `Semver <https://semver.org/>`_ approach where |
| we will introduce backwards-incompatible changes in Major releases, so all your modifications (as long |
| as you have not used internal Airflow classes) should work for All Airflow 2.* versions. |
| |
| |
| Creating your own providers |
| ''''''''''''''''''''''''''' |
| |
| **When I write my own provider, do I need to do anything special to make it available to others?** |
| |
| You do not need to do anything special besides creating the ``apache_airflow_provider`` entry point |
| returning properly formatted meta-data - dictionary with ``extra-links`` and ``connection-types`` fields |
| (and deprecated ``hook-class-names`` field if you are also targeting versions of Airflow before 2.2.0). |
| |
| Anyone who runs airflow in an environment that has your Python package installed will be able to use the |
| package as a provider package. |
| |
| **What do I need to do to turn a package into a provider?** |
| |
| You need to do the following to turn an existing Python package into a provider (see below for examples): |
| |
| * Add the ``apache_airflow_provider`` entry point in the ``setup.cfg`` - this tells airflow where to get |
| the required provider metadata |
| * Create the function that you refer to in the first step as part of your package: this functions returns a |
| dictionary that contains all meta-data about your provider package |
| * If you want Airflow to link to documentation of your Provider in the providers page, make sure |
| to add "project-url/documentation" `metadata <https://peps.python.org/pep-0621/#example>`_ to your package. |
| This will also add link to your documentation in PyPI. |
| * note that the dictionary should be compliant with ``airflow/provider_info.schema.json`` JSON-schema |
| specification. The community-managed providers have more fields there that are used to build |
| documentation, but the requirement for runtime information only contains several fields which are defined |
| in the schema: |
| |
| .. exampleinclude:: /../../airflow/provider_info.schema.json |
| :language: json |
| |
| Example ``setup.cfg``: |
| |
| .. code-block:: cfg |
| |
| [options.entry_points] |
| # the function get_provider_info is defined in myproviderpackage.somemodule |
| apache_airflow_provider= |
| provider_info=myproviderpackage.somemodule:get_provider_info |
| |
| Example ``myproviderpackage/somemodule.py``: |
| |
| .. code-block:: Python |
| |
| def get_provider_info(): |
| return { |
| "package-name": "my-package-name", |
| "name": "name", |
| "description": "a description", |
| "hook-class-names": [ |
| "myproviderpackage.hooks.source.SourceHook", |
| ], |
| } |
| |
| **How do provider packages work under the hood?** |
| |
| When running Airflow with your provider package, there will be (at least) three components to your airflow installation: |
| |
| * The installation itself (for example, a ``venv`` where you installed airflow with ``pip install apache-airflow``) |
| together with the related files (e.g. ``dags`` folder) |
| * The ``apache-airflow`` package |
| * Your own ``myproviderpackage`` package that is independent of ``apache-airflow`` or your airflow installation, which |
| can be a local Python package (that you install via ``pip install -e /path/to/my-package``), a normal pip package |
| (``pip install myproviderpackage``), or any other type of Python package |
| |
| In the ``myproviderpackage`` package you need to add the entry point and provide the appropriate metadata as described above. |
| If you have done that, airflow does the following at runtime: |
| |
| * Loop through ALL packages installed in your environment / ``venv`` |
| * For each package, if the package's ``setup.cfg`` has a section ``[options.entry_points]``, and if that section has a value |
| for ``apache_airflow_provider``, then get the value for ``provider_info``, e.g. ``myproviderpackage.somemodule:get_provider_info`` |
| * That value works like an import statement: ``myproviderpackage.somemodule:get_provider_info`` translates to something like |
| ``from myproviderpackage.somemodule import get_provider_info``, and the ``get_provider_info`` that is being imported should be a |
| callable, i.e. a function |
| * This function should return a dictionary with metadata |
| * If you have custom connection types as part of your package, that metadata will including a field called ``hook-class-names`` |
| which should be a list of strings of your custom hooks - those strings should also be in an import-like format, e.g. |
| ``myproviderpackage.hooks.source.SourceHook`` means that there is a class ``SourceHook`` in ``myproviderpackage/hooks/source.py`` |
| - airflow then imports these hooks and looks for the functions ``get_ui_field_behaviour`` and ``get_connection_form_widgets`` |
| (both optional) as well as the attributes ``conn_type`` and ``hook_name`` to create the custom connection type in the airflow UI |
| |
| **Should I name my provider specifically or should it be created in ``airflow.providers`` package?** |
| |
| We have quite a number (>60) of providers managed by the community and we are going to maintain them |
| together with Apache Airflow. All those providers have well-defined structured and follow the |
| naming conventions we defined and they are all in ``airflow.providers`` package. If your intention is |
| to contribute your provider, then you should follow those conventions and make a PR to Apache Airflow |
| to contribute to it. But you are free to use any package name as long as there are no conflicts with other |
| names, so preferably choose package that is in your "domain". |
| |
| **Is there a convention for a connection id and type?** |
| |
| Very good question. Glad that you asked. We usually follow the convention ``<NAME>_default`` for connection |
| id and just ``<NAME>`` for connection type. Few examples: |
| |
| * ``google_cloud_default`` id and ``google_cloud_platform`` type |
| * ``aws_default`` id and ``aws`` type |
| |
| You should follow this convention. It is important, to use unique names for connection type, |
| so it should be unique for your provider. If two providers try to add connection with the same type |
| only one of them will succeed. |
| |
| **Can I contribute my own provider to Apache Airflow?** |
| |
| Of course, but it's better to check at developer's mailing list whether such contribution will be accepted by |
| the Community, before investing time to make the provider compliant with community requirements. |
| The Community only accepts providers that are generic enough, are well documented, fully covered by tests |
| and with capabilities of being tested by people in the community. So we might not always be in the |
| position to accept such contributions. |
| |
| |
| After you think that your provider matches the expected values above, you can read |
| :doc:`howto/create-update-providers` to check all prerequisites for a new |
| community Provider and discuss it at the `Devlist <http://airflow.apache.org/community/>`_. |
| |
| However, in case you have your own, specific provider, which you can maintain on your own or by your |
| team, you are free to publish the providers in whatever form you find appropriate. The custom and |
| community-managed providers have exactly the same capabilities. |
| |
| **Can I advertise my own provider to Apache Airflow users and share it with others as package in PyPI?** |
| |
| Absolutely! We have an `Ecosystem <https://airflow.apache.org/ecosystem/>`_ area on our website where |
| we share non-community managed extensions and work for Airflow. Feel free to make a PR to the page and |
| add we will evaluate and merge it when we see that such provider can be useful for the community of |
| Airflow users. |
| |
| **Can I charge for the use of my provider?** |
| |
| This is something that is outside of our control and domain. As an Apache project, we are |
| commercial-friendly and there are many businesses built around Apache Airflow and many other |
| Apache projects. As a community, we provide all the software for free and this will never |
| change. What 3rd-party developers are doing is not under control of Apache Airflow community. |
| |
| Using Backport Providers in Airflow 1.10 |
| '''''''''''''''''''''''''''''''''''''''' |
| |
| **I have an Airflow version (1.10.12) running and it is stable. However, because of a Cloud provider change, |
| I would like to upgrade the provider package. If I don't need to upgrade the Airflow version anymore, |
| how do I know that this provider version is compatible with my Airflow version?** |
| |
| We have Backport Providers are compatible with 1.10 but they stopped being released on |
| March 17, 2021. Since then, no new changes to providers for Airflow 2.0 are going to be |
| released as backport packages. It's the highest time to upgrade to Airflow 2.0. |
| |
| When it comes to compatibility of providers with different Airflow 2 versions, each |
| provider package will keep its own dependencies, and while we expect those providers to be generally |
| backwards-compatible, particular versions of particular providers might introduce dependencies on |
| specific Airflow versions. |
| |
| |
| .. toctree:: |
| :hidden: |
| :maxdepth: 2 |
| |
| Providers <self> |
| Packages <packages-ref> |
| Operators and hooks <operators-and-hooks-ref/index> |
| Core Extensions <core-extensions/index> |
| Update community providers <howto/create-update-providers> |
| Installing from sources <installing-from-sources> |
| Installing from PyPI <installing-from-pypi> |