This year’s survey has come and gone, and with it we’ve got a new batch of data for everyone! We collected 210 responses over two weeks. We continue to see growth in both contributions and downloads over the last two years, and expect that trend will continue through 2022.
The raw response data will be made available here soon, in the meantime, feel free to email john.thomas@astronomer.io for a copy.
No. | % | |
Data Engineer | 114 | 54% |
Solutions Architect | 27 | 13% |
Developer | 25 | 12% |
DevOps | 12 | 6% |
Data Scientist | 8 | 4% |
Support Engineer | 5 | 2% |
Data Analyst | 3 | 1% |
Business Analyst | 2 | 1% |
Other | 14 | 7% |
According to the survey, more than half of Airflow users are Data Engineers (54%). Roles of the remaining Airflow users might be broken down into Solutions Architects (13%), Developers (12%), DevOps (6%) and Data Scientists (4%). The 2022 results are similar to those from 2019 and 2020 with a slight increase in the representation of Solutions Architect roles.
No. | % | |
Every day | 154 | 73% |
At least once per week | 36 | 17% |
At least once per month | 11 | 5% |
Less than once per month | 9 | 4% |
Users who took the survey are actively using Airflow as part of their current role. 73% of Airflow users who responded use it on a daily basis, 17% weekly.
No. | % | |
201-5000 | 85 | 41% |
5000+ | 49 | 23% |
51-200 | 46 | 22% |
11-50 | 20 | 10% |
1-10 | 9 | 4% |
Airflow is a framework that is used and popular in bigger companies, 64% of Airflow users who responded (compared to 52.7% in 2020) work for companies bigger than 200 employees (41% in companies size 201-5000 and 23% in companies size 5000+).
No. | % | |
6-20 | 80 | 38% |
1-5 | 61 | 29% |
51-200 | 49 | 24% |
200+ | 18 | 9% |
Airflow is generally used by small to medium-sized teams. 62% of the survey participants have more than 6 Airflow users in their company (38% have between 6 and 200 users, 24% between 51-200 users).
% 2019 | % 2020 | % 2022 | |
Very Likely | 45.4% | 61.6% | 65.9% |
Likely | 40.3% | 30.4% | 26.9% |
Neutral | 10.7% | 5.4% | 6.3% |
Unlikely | 2.6% | 1.5% | 0.5% |
Very Unlikely | 1% | 1% | 0.5% |
According to the survey, more Airflow users (65.9%) are willing to recommend Apache Airflow compared to the survey results in 2020 and 2019. There is a general positive trend in a willingness to recommend Airflow, 93% of surveyed Airflow users are willing to recommend Airflow (92% in 2020 and 85.7% in 2019), only 1% of users are not likely to recommend (3.6% in 2019 and 3.5% in 2020 ).
No. | % | |
Documentation | 189 | 90.4% |
Airflow website (Blog, etc.) | 142 | 67.9% |
Stack Overflow | 126 | 60.3% |
Github Issues | 104 | 49.8% |
Slack | 96 | 45.9% |
Airflow Summit Videos | 88 | 42.1% |
GitHub Discussions | 76 | 36.4% |
Airflow Community Webinars | 41 | 19.6% |
Astronomer Registry | 51 | 24.4% |
Airflow Mailing List | 34 | 16.3% |
Airflow documentation is a critical source of information, with more than 90% of survey participants using the documentation. It is of increasing importance compared to results from 2020 where documentation was at about 75% level. Moreover, more than 60% of users are getting information from the Airflow website (67.9% ) and Stack Overflow (60.3%) which is also a big increase compared to 36% level in 2020. What’s interesting is that Slack usage decreased from 63.05% in 2020 to 45.9% in 2022.
No. | % | |
51-250 | 66 | 31.7% |
11-50 | 64 | 30.8% |
5-10 | 25 | 12.0% |
251-500 | 20 | 9.6% |
<5 | 14 | 6.7% |
1000+ | 10 | 4.8% |
501-1000 | 9 | 4.3% |
62.5% of the Airflow users surveyed have between 11 to 250 DAGs in their largest Airflow instance.
No. | % | |
1 | 52 | 25.2% |
2 | 46 | 22.3% |
4-7 | 40 | 19.4% |
3 | 37 | 18.0% |
20+ | 19 | 9.2% |
8-10 | 7 | 3.4% |
11-20 | 5 | 2.4% |
85% of the Airflow users surveyed have between 1 and 7 active Airflow instances, and nearly 50% have only 1 or 2.
No. | % | |
11-25 | 51 | 24.5% |
26-50 | 41 | 19.7% |
51-100 | 35 | 16.8% |
<10 | 29 | 13.9% |
101-250 | 23 | 11.1% |
501-1000 | 9 | 4.3% |
1000-2500 | 8 | 3.8% |
251-500 | 8 | 3.8% |
2500-5000 | 4 | 1.9% |
75% of the surveyed Airflow users have between 1 and 100 tasks per DAG.
No. | % | |
1 | 113 | 55.1% |
2 | 61 | 29.8% |
3 | 18 | 8.8% |
4+ | 13 | 6.3% |
More than half of Airflow users who responded to the survey have 1 scheduler in their largest Airflow instance, however it’s important to notice that the second half of Airflow users decided to have 2 schedulers and more.
No. | % | |
Celery | 107 | 52.7 % |
Kubernetes | 80 | 39.4% |
Local | 49 | 24.1% |
Sequential | 21 | 10.3% |
CeleryKubernetes | 14 | 6.9% |
Celery (52.7%) and Kubernetes (39.4%) are the most common executors used. CeleryKubernetes (6.9%) executor also started to be noticed and used by Airflow users.
No. | % | |
2-5 | 64 | 44.8% |
10+ | 28 | 19.6% |
1 | 26 | 18.2% |
6-10 | 25 | 17.5% |
Amongst Celery executor users who responded to the survey, close to half the number (44.8%) have between 2 to 5 workers in their largest Airflow instance. It’s notable that nearly a fifth (19.6%) have more than 10 workers.
No. | % | |
1.10.14 or older | 13 | 6.3% |
1.10.15 | 19 | 9.2% |
2.0.x | 23 | 11.1% |
2.1.x | 24 | 11.6% |
2.2.x | 79 | 38.2% |
2.3.x | 49 | 23.7% |
It's good to see that close to 85% of users who responded to the survey use one of the Airflow 2 versions, 9.2% users still use 1.10.15, while the remaining 6.3% are still using older Airflow 1.10 versions.
The good news is that the majority of users on Airflow 1 are planning migration to Airflow 2 quite soon, as for now they have capacity constraints to undertake such a significant effort in their opinion. However, it can also be noticed in the survey’s comments that some users are generally skeptical towards migration to Airflow 2, they have negative opinions about the new scheduler or compatibility with the helm chart.
As to plans about migration to the newest version of Airflow 2, users who responded to the survey are committed and waiting especially for the features related to dynamic DAGs. However, some users also reported that they are waiting to solve some dependencies they have or they prefer to wait a little bit more for the community to test the new version before they decide to move on.
No. | % | |
External monitoring service | 81 | 40.7% |
Information from metadatabase | 71 | 35.7% |
Statsd | 54 | 27.1% |
I do not use monitoring | 47 | 23.6% |
Other | 14 | 7% |
In comparison to results from 2020, more users are monitoring airflow in some way. External monitoring services (40.7%) and information from metabase (35.7%) started to play a more important role in Airflow monitoring.
No. | % | |
On virtual machines (for example using AWS EC2) | 63 | 30.6 % |
Using a managed service like Astronomer, Google Composer or AWS MWAA | 54 | 26.2 % |
On Kubernetes (using Apache Airflow’s helm chart) | 46 | 22.3% |
On premises | 43 | 20.9% |
On Kubernetes (using custom deployments) | 39 | 18.9% |
On Kubernetes (using another helm chart) | 21 | 10.2% |
Other | 13 | 6.5% |
More than half of Airflow users who responded (51.4%) deploy Airflow on Kubernetes. This is about 20 percent more than in 2020. The remaining top deployment methods are on virtual machines (30.6%) and via managed services (26.2%).
No. | % | |
Using a synchronizing process (Git sync, GCS fuse, etc) | 100 | 49% |
Bake them into the docker image | 51 | 25% |
Shared files system | 30 | 14.7% |
Other | 16 | 7.9% |
I don’t know | 7 | 3.4% |
According to the survey responses, the most popular way of distributing DAGs is a synchronizing process, about half of Airflow users (49%) use this process to distribute DAGs from developer environments to the cloud.
No. | % | |
No, we use vanilla airflow | 165 | 81.3% |
Yes, we have a separate fork | 13 | 6.4% |
Yes, we use a 3rd-party fork | 12 | 5.9% |
Yes, we’ve backpropagated bug fixes to an older version | 13 | 6.4% |
More Airflow users (81.3%) don’t have any customisation of Airflow (compared to 75.9% in 2020). Those Airflow users who have customisations (18.7%) decided to introduce them mainly to separate development and production workflows, to backport bug fixes, due to security fixes or to run a backfill command on Kubernetes pod.
No. | %I | |
PostgreSQL 13 | 86 | 43.9% |
PostgreSQL 12 | 74 | 37.8% |
MySQL 8 | 22 | 11.2% |
MySQL 5 | 9 | 4.6% |
MariaDB | 4 | 2.0% |
MsSQL | 1 | 0.5% |
According to the survey responses, the most popular metadata databases are PostgreSQL 13 (43.9%) and PostgreSQL 12 (37.8%). This represents a sharp increase from 2020, up from 68.9% to 81.7% total on PostgreSQL, with a corresponding decrease in MySQL, down from 23% to 15%. This is an interesting result taking into account community discussion about not adding support for more database backend or even deciding on single database support.
No. | % | |
Using existing dedicated operators / hooks | 70 | 34.5% |
Using Bash/Python operators | 58 | 28.6% |
Using custom operators / hooks | 50 | 24.6% |
Using KubernetesPodOperator | 25 | 12.3% |
According to the survey responses, the following ways of using Airflow to connect to external services are the most popular: Using existing dedicated operators / hooks (34.5%), Using Bash/Python operators (28.6%), Using custom operators / hooks (24.6%). Using KubernetesPodOperator (12.3%) is less popular regarding the survey responses. The integration with providers and external services methods ranking is similar to the one from 2020.
No. | % | |
Amazon Web Services | 112 | 55.4% |
Google Cloud Platform / Google APIs | 79 | 39.1% |
Internal company systems | 75 | 37.1% |
Hadoop / Spark / Flink / Other Apache software | 57 | 28.2% |
Microsoft Azure | 17 | 8.4% |
Other | 21 | 10.5% |
I do not use external services in my Airflow DAGs | 14 | 6.9% |
It’s not surprising that Amazon Web Services (55.4% vs 59.6% in 2020), on the next three positions Google Cloud Platform (39.1% vs 47.7% in 2020 ), Internal company systems (37.1% vs 55.6% in 2020), and other Apache products (28.2% vs 35.47% in 2020) are leading Airflow providers.
No. | % | |
every 12 months | 46 | 22.9% |
every 6 months | 49 | 24.4% |
once a quarter | 47 | 23.4% |
Whenever there is a newer version | 59 | 29.4% |
Different frequencies of Airflow environments upgrades are almost equally popular amongst Airflow users who responded to the survey.
No. | % | |
When I need it | 83 | 42.8% |
Never - always use the providers that come with Airflow | 68 | 35.1% |
I did not know I can upgrade providers separately | 32 | 16.5% |
I upgrade providers when they are released | 11 | 5.7% |
According to the survey responses, Airflow users most often upgrade providers when they need it (42.8%) or prefer to stay with providers that come with Airflow (35.1%). It’s surprising that 16.5% of Airflow users who responded to the survey were not aware that they can upgrade their providers separately from the core Airflow.
No. | % | |
Xcom | 141 | 69.8% |
Saving and retrieving from Storage | 99 | 49% |
TaskFlow | 37 | 18.3% |
Other | 5 | 2.5% |
We don’t | 29 | 14.4% |
According to the survey responses, Xcom (69.8%) is the most popular method to pass inputs and outputs between tasks, however Saving and Retrieving Inputs and Outputs from Storage still plays an important role (49%). It’s interesting that close to 15% of Airflow users who responded to the survey declare to not pass any outputs or inputs between tasks.
No. | % | |
No, but I will use such feature if fully supported in Airflow | 95 | 47.5% |
I’m not familiar with data lineage | 58 | 29% |
No, data lineage isn’t a concern for my usage | 26 | 13% |
Yes, I send lineage to an Open Source lineage repository | 15 | 7.5% |
Yes, I send lineage to an Enterprise lineage repository | 7 | 3.5% |
Yes, I send lineage to a custom internal lineage repository | 9 | 4.5% |
When asked what lineage backend Airflow users use, the answers indicated that, while lineage itself is a quite new topic, there is interest in the feature as a whole. Most Airflow users responded that they don’t use lineage solutions currently but might be interested in the future if supported by Airflow (47.5%), are not familiar with data lineage (29%) or that data lineage is not their concern (13%).
No. | % | |
Original Airflow Graphical User Interface | 189 | 94% |
CLI | 98 | 48.8% |
API | 80 | 39.8% |
Custom (own created) Airflow Graphical User Interface | 12 | 6% |
GCP Composer | 1 | 0.5% |
It’s clear that usage of Airflow web UI is important as 94% of users who responded to the survey declare to use it as a part of their current role. Usage of CLI (48.8%) and API (39.8%) goes in pairs but are not so common compared to Airflow web UI usage.
No. | % | |
Monitoring Runs | 188 | 95.9% |
Accessing Task Logs | 176 | 89.8% |
Manually triggering DAGs | 167 | 85.2% |
Clearing Tasks | 162 | 82.7% |
Marking Tasks as successful | 119 | 60.7% |
Other | 6 | 3% |
Airflow web UI is used heavily for monitoring: Monitoring Runs (95.9%) and troubleshooting: Accessing Task Logs (89.8%), Manually triggering DAGs (85.2%), Clearing Tasks (82.7%) and Marking Tasks as successful (60.7%).
No. | % | |
Backfilling | 63 | 56.8% |
Manually triggering DAGs | 52 | 46.8% |
Clearing Tasks | 26 | 23.4% |
Monitoring Runs | 25 | 22.5% |
Accessing Task Logs | 21 | 18.9% |
Marking Tasks as successful | 11 | 9.9% |
Other | 17 | 15.3% |
Compared to Airflow web UI, Airflow CLI is used mainly for Backfilling (56.8%) and Manually triggering DAGs (46.8%).
No. | % | |
List of DAGs | 178 | 89.4% |
Task Logs | 162 | 81.4% |
DAG Runs | 160 | 80.4% |
Graph view | 147 | 73.9% |
Grid/Tree View | 138 | 69.3% |
Run Details | 117 | 58.8% |
DAG details | 111 | 55.8% |
Task Instances | 102 | 51.3% |
Task Duration | 91 | 45.7% |
Code | 90 | 45.2% |
Task Tries | 60 | 30.2% |
Gantt | 48 | 21.4% |
Landing Times | 27 | 13.6% |
Other | 4 | 2% |
UI views importance ranking shows that the majority Airflow users use Web UI mostly for monitoring and/or troubleshooting purposes, where the top 3 views are List of DAGs (89.4%), Task Logs (81.4%) and DAG Runs (80.4%). The results are very similar to those from 2020 and 2019.
No. | % | |
I see them from time to time | 99 | 48.3% |
I regularly follow what‘s being discussed but don’t participate | 53 | 25.9% |
I didn't know I could | 41 | 20.0% |
I actively participate in the discussions | 12 | 5.9% |
No. | % | |
I know I can but I do not contribute | 116 | 57.1% |
Very rarely when it relates to what I need | 44 | 21.7% |
I do not know I could | 30 | 14.8% |
I regularly contribute by discussing, reviewing and submitting PR | 13 | 6.4% |
Results related to the Airflow contribution are very similar to those about participating in the Airflow community discussions. Most of the Airflow users (57.1%) who responded to the survey are aware but do not contribute or contribute very rarely (21.7%). 14.8% of users were not aware they could contribute. Once again, it’s a clear indicator that there is much more to be done to engage our community to be more active contributors and raise the current 6.4% of users who actively contribute.
No. | % | |
I have no time to contribute even if would like to | 65 | 38.9% |
I don’t know how to start | 63 | 37.7% |
I don’t have a need to contribute | 19 | 11.4% |
I didn’t know I could | 12 | 7.2% |
My employer has policy that makes it difficult to contribute | 8 | 4.8% |
According to the survey results, the most important blocker for the Airflow contribution is limited time (38.9%), but surprisingly interesting and important blocker is also lack of knowledge on how to start (37.7%), followed by lack of knowledge that it’s possible to contribute (7.2%).
No. | % | |
Web UI | 100 | 49.5% |
Logging, monitoring and alerting | 97 | 48.0% |
Examples, how-to, onboarding documentation | 74 | 36.6% |
Technical documentation | 74 | 36.6% |
Scheduler performance | 56 | 27.7% |
Reliability | 52 | 25.7% |
DAG authoring | 48 | 23.8% |
REST API | 43 | 21.3% |
Authentication and authorization | 41 | 20.3% |
External integration e.g. AWS, GCP, Apache products | 41 | 20.3% |
Better support for various deployments (Docker-compose/Nomad/Others) | 39 | 19.3% |
Everything works fine for me | 19 | 9.4% |
I don’t know | 4 | 2.0% |
The results are quite self-explanatory. According to the survey results, the top area for improvement is still the Airflow web UI (49.5%), closely followed by more telemetry for logging, monitoring and alerting purposes (48%). However all those efforts should go in line with improved documentation (36.6.%) and resources about using the Airflow, especially when we take into account the need of onboarding new users (36.6%).
No. | % | |
DAG Versioning | 129 | 66.2% |
Dependency management and Data-driven scheduling | 83 | 42.6% |
More dynamic task structure | 82 | 42.1% |
Multi-Tenancy | 74 | 37.9% |
Signal-based scheduling | 67 | 34.4% |
Better Security (Isolation) | 65 | 33.3% |
Submitting new DAGs externally via API | 53 | 27.2% |
Composable Operators | 46 | 23.6% |
Support for native cloud executors (AWS/GCP/Azure etc.) | 44 | 22.6% |
Better support for Machine Learning | 38 | 19.5% |
Remote CLI | 36 | 18.5% |
Support for hybrid executors | 22 | 11.3% |
According to the survey results, DAG Versioning is a winner for new features in Airflow, and it’s not a surprise as this feature may positively impact daily work of Airflow users. It is followed by three other ideas: Dependency management and Data-driven scheduling (42.6%), More dynamic task structure (42.1%) and Multi-Tenancy (37.9%). Another interesting point from that question is that only 11.3% think that support for hybrid executors is needed in Airflow.