blob: cf21ad8c5f101b09eea8b2401d59d3957659ef21 [file] [log] [blame]
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Apache Airflow – Home</title>
<link>/</link>
<description>Recent content in Home on Apache Airflow</description>
<generator>Hugo -- gohugo.io</generator>
<atom:link href="/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Blog: Apache Airflow 2.9.0: Dataset and UI Improvements</title>
<link>/blog/airflow-2.9.0/</link>
<pubDate>Mon, 08 Apr 2024 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.9.0/</guid>
<description>
&lt;p&gt;I’m happy to announce that Apache Airflow 2.9.0 has been released! This time around we have new features for data-aware scheduling and a bunch of UI-related improvements.&lt;/p&gt;
&lt;p&gt;Apache Airflow 2.9.0 contains over 550 commits, which include 38 new features, 70 improvements, 31 bug fixes, and 18 documentation changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.9.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.9.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.9.0/&lt;/a&gt; &lt;br&gt;
🛠 Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.9.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: &amp;ldquo;docker pull apache/airflow:2.9.0&amp;rdquo; &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.9.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.9.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Airflow 2.9.0 is also the first release that supports Python 3.12. However, Pendulum 2 does not support Python 3.12, so you’ll need to use &lt;a href=&#34;https://pendulum.eustace.io/blog/announcing-pendulum-3-0-0.html&#34;&gt;Pendulum 3&lt;/a&gt; if you upgrade to Python 3.12.&lt;/p&gt;
&lt;h2 id=&#34;new-data-aware-scheduling-options&#34;&gt;New data-aware scheduling options&lt;/h2&gt;
&lt;h3 id=&#34;logical-operators-and-conditional-expressions-for-dag-scheduling&#34;&gt;Logical operators and conditional expressions for DAG scheduling&lt;/h3&gt;
&lt;p&gt;When Datasets were added in Airflow 2.4, DAGs only had scheduling support for logical AND combinations of Datasets. Simply, you could schedule against more than one Dataset, but a DAG run would only be created once all the Datasets were updated after the last run. Now in Airflow 2.9, we support logical OR and even arbitrary combinations of AND and OR.&lt;/p&gt;
&lt;p&gt;As an example, you can schedule a DAG whenever &lt;code&gt;dataset_1&lt;/code&gt; or &lt;code&gt;dataset_2&lt;/code&gt; are updated :&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;schedule&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dataset_1&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;|&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dataset_2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can have arbitrary combinations:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;schedule&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dataset_1&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;|&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dataset_2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dataset_3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can read more about this new functionality in the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/authoring-and-scheduling/datasets.html#advanced-dataset-scheduling-with-conditional-expressions&#34;&gt;data-aware scheduling docs&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;combining-dataset-and-time-based-schedules&#34;&gt;Combining Dataset and Time-Based Schedules&lt;/h3&gt;
&lt;p&gt;Airflow 2.9 comes with a new timetable, &lt;code&gt;DatasetOrTimeSchedule&lt;/code&gt;, that allows you to schedule DAGs based on both dataset events and a timetable. Now you have the best of both worlds.&lt;/p&gt;
&lt;p&gt;For example, to run whenever &lt;code&gt;dataset_1&lt;/code&gt; updates and at midnight UTC:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;schedule&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DatasetOrTimeSchedule&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;timetable&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;CronTriggerTimetable&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;0 0 * * *&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;timezone&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;UTC&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;datasets&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag1_dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;],&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;dataset-event-rest-api-endpoints&#34;&gt;Dataset Event REST API endpoints&lt;/h3&gt;
&lt;p&gt;New REST API endpoints have been introduced for creating, listing, and deleting dataset events. This makes it possible for external systems to notify Airflow about dataset updates and unlocks management of event queues for more sophisticated use cases.&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/stable-rest-api-ref.html#tag/Dataset&#34;&gt;Dataset API docs&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h3 id=&#34;dataset-ui-enhancements&#34;&gt;Dataset UI Enhancements&lt;/h3&gt;
&lt;p&gt;The DAG&amp;rsquo;s graph view has been enhanced to display both the datasets it is scheduled on and those in the task outlets, providing a comprehensive overview of the datasets consumed and produced by the DAG.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;datasets-in-graph.png&#34; alt=&#34;Datasets in the graph view&#34;&gt;&lt;/p&gt;
&lt;p&gt;The main datasets view now allows you to filter for both DAGs and datasets:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;dataset-view-filtering.png&#34; alt=&#34;Dataset view filtering&#34;&gt;&lt;/p&gt;
&lt;p&gt;When viewing a Dataset, you can now create a manual dataset event through the UI by clicking the play button shown in the top right here:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;create-manual-dataset-event.png&#34; alt=&#34;Creating manual Dataset event&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;custom-names-for-dynamic-task-mapping&#34;&gt;Custom names for Dynamic Task Mapping&lt;/h2&gt;
&lt;p&gt;Gone are the days of clicking into index numbers and hunting for the dynamically mapped task you wanted to see! This has been a requested feature ever since task mapping was added in Airflow 2.3, and we are happy it’s finally here.&lt;/p&gt;
&lt;p&gt;You can provide a &lt;code&gt;map_index_template&lt;/code&gt; to mapped operators:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;n&#34;&gt;BashOperator&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;partial&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;task_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;hello&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;bash_command&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;echo Hello $NAME&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;map_index_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;{{ task.env[&amp;#39;NAME&amp;#39;] }}&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;expand&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;env&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;NAME&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;John&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;},&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;NAME&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Bob&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;},&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;NAME&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Fred&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}],&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That template will be rendered after each task finishes running and will populate the name in the UI:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;dynamic-task-mapping-custom-names.png&#34; alt=&#34;Dynamic Task Mapping custom names&#34;&gt;&lt;/p&gt;
&lt;p&gt;More details on this, including a taskflow example, is available in the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/authoring-and-scheduling/dynamic-task-mapping.html#named-mapping&#34;&gt;dynamic task mapping docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;object-storage-as-xcom-backend&#34;&gt;Object Storage as XCom Backend&lt;/h2&gt;
&lt;p&gt;You can now configure Object Storage to be used as an XCom backend, making it much easier to get XCom results into an object store. Deployment managers can configure the object store of their choice, a size threshold to route some results to the Airflow metadata database and some to the object store, and even a compression method to apply before the data is stored.&lt;/p&gt;
&lt;p&gt;The following configuration will store anything above 1MB in S3 and will compress it using gzip:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[core]
xcom_backend = airflow.providers.common.io.xcom.backend.XComObjectStoreBackend
[common.io]
xcom_objectstorage_path = s3://conn_id@mybucket/key
xcom_objectstorage_threshold = 1048576
xcom_objectstorage_compression = gzip
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;See the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/core-concepts/xcoms.html#object-storage-xcom-backend&#34;&gt;docs on the object storage xcom backend&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h2 id=&#34;display-names-for-dags-and-tasks&#34;&gt;Display names for DAGs and Tasks&lt;/h2&gt;
&lt;p&gt;Get your emojis ready! You can now set a display name for dags and tasks, separate from the &lt;code&gt;dag_id&lt;/code&gt; and &lt;code&gt;task_id&lt;/code&gt;. This allows you to have localized display names in the UI, or just use a bunch of emojis.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;dag_display_name&lt;/code&gt; and &lt;code&gt;task_display_name&lt;/code&gt;, you can break away from the ascii handcuffs:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;not_a_fun_dag_id&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dag_display_name&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;📣 Best DAG ever 🎉&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;BashOperator&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;some_task&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;task_display_name&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;🥳 Fun task!&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&#34;display-names.png&#34; alt=&#34;Display names for DAGs and tasks&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;task-log-grouping&#34;&gt;Task log grouping&lt;/h2&gt;
&lt;p&gt;Airflow now has support for arbitrary grouping of task logs.&lt;/p&gt;
&lt;p&gt;By default, pre-execute and post-execute logs are grouped and collapsed, making it easier to see your task logs:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;pre-post-logs-grouped.png&#34; alt=&#34;Pre and post execute logs are grouped&#34;&gt;&lt;/p&gt;
&lt;p&gt;You can also use this feature in your task code to make your logs easier to follow:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;big_hello&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;::group::Setup our big Hello&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;greeting&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;c&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Hello Airflow 2.9&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;greeting&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Adding &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt; to our greeting. Current greeting: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;greeting&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;::endgroup::&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;greeting&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That custom group is collapsed by default:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;custom-log-grouping.png&#34; alt=&#34;Custom log grouping collapsed by default&#34;&gt;&lt;/p&gt;
&lt;p&gt;And it can be expanded if you want to dig into the details:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;custom-log-grouping-expanded.png&#34; alt=&#34;Custom log grouping expanded&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;ui-modernization&#34;&gt;UI Modernization&lt;/h2&gt;
&lt;p&gt;In addition to all the UI improvements mentioned above, we have a bunch more improvements in Airflow 2.9!&lt;/p&gt;
&lt;p&gt;The rest of the DAG level views have been moved into React and the grid view interface, allowing for a more cohesive experience. This includes the calendar, task duration, run duration (which replaces landing times), and the audit log. These weren’t just “moved”, they each were improved upon as well.&lt;/p&gt;
&lt;p&gt;Here is the new run duration view, which replaces landing times. Users can toggle between landing times and simple run duration:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;run-duration.png&#34; alt=&#34;Run duration&#34;&gt;&lt;/p&gt;
&lt;p&gt;And the new task duration view. Users can toggle queued time on/off and see the median value across the displayed runs:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;task-duration.png&#34; alt=&#34;Task duration&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;additional-new-features&#34;&gt;Additional new features&lt;/h2&gt;
&lt;p&gt;Here are just a few interesting new features since there are too many to list in full:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All create/update/delete actions in the REST API are now recorded in the audit log&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/administration-and-deployment/logging-monitoring/callbacks.html#callback-types&#34;&gt;New &lt;code&gt;on_skipped_callback&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/core-concepts/dags.html#dag-auto-pausing-experimental&#34;&gt;Auto pause DAGs after n consecutive failures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Support for &lt;a href=&#34;https://matomo.org/&#34;&gt;Matomo&lt;/a&gt; as an &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/administration-and-deployment/logging-monitoring/tracking-user-activity.html&#34;&gt;analytics tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.9.0/howto/operator/bash.html&#34;&gt;New &lt;code&gt;@task.bash&lt;/code&gt; TaskFlow decorator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Support regex in dag_id for the DAG pause and resume CLI commands&lt;/li&gt;
&lt;li&gt;&lt;code&gt;airflow tasks test&lt;/code&gt; now works with deferrable operators&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contributors&#34;&gt;Contributors&lt;/h2&gt;
&lt;p&gt;Thanks to everyone who contributed to this release, including Amogh Desai, Andrey Anshin, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Jens Scheffler, Tzu-ping Chung, Vincent Beck, Wei Lee, and over 120 others!&lt;/p&gt;
&lt;p&gt;I’d especially like to thank our release manager, Ephraim, for getting this release out the door.&lt;/p&gt;
&lt;p&gt;I hope you enjoy using Apache Airflow 2.9.0!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Vulnerability in long deprecated OpenID authentication method in Flask AppBuilder</title>
<link>/blog/fab-oid-vulnerability/</link>
<pubDate>Mon, 26 Feb 2024 00:00:00 +0000</pubDate>
<guid>/blog/fab-oid-vulnerability/</guid>
<description>
&lt;h1 id=&#34;vulnerability-in-long-deprecated-openid-authentication-method-in-flask-appbuilder&#34;&gt;Vulnerability in long deprecated OpenID authentication method in Flask AppBuilder&lt;/h1&gt;
&lt;p&gt;Recently &lt;a href=&#34;https://www.linkedin.com/in/islam-rzayev&#34;&gt;Islam Rzayev&lt;/a&gt; made us aware of a vulnerability in the
long deprecated OpenID authentication method in Flask AppBuilder. This vulnerability allowed a malicious user
to take over the identity of any Airflow UI user by forging a specially crafted request and implementing
their own OpenID service. While this is an old, deprecated and almost not used authentication method, we still
took the issue seriously.&lt;/p&gt;
&lt;p&gt;This issue ONLY affects users who have &lt;code&gt;AUTH_OID&lt;/code&gt; set in their &lt;code&gt;webserver_config.py&lt;/code&gt; file as
&lt;code&gt;AUTH_TYPE&lt;/code&gt;. This is a very old and deprecated authentication method that is unlikely to be used by anyone.&lt;/p&gt;
&lt;p&gt;We would like to advise even the small number of our users that still use this
authentication method to take an immediate action and either upgrade to Apache Airflow 2.8.2 or switch to
another authentication method (or apply a workaround we provide if they cannot do either of the above
immediately).&lt;/p&gt;
&lt;p&gt;Important to stress, because many of the users might get confused by the name, OpenID is NOT the same as
OpenID Connect. Those are completely different protocols and while OpenID Connect (also known as OIDC) is
a modern, widely used protocol, OpenID is a legacy protocol that has been deprecated more than 10 years
ago and since then has been abandoned by almost everyone in the community, including all services in
Flask AppBuilder example services that supported it, so it is highly unlikely someone is still using it.&lt;/p&gt;
&lt;p&gt;Due to this highly unlikely configuration the &lt;a href=&#34;https://www.cve.org/CVERecord?id=CVE-2024-25128&#34;&gt;Flask AppBuilder CVE&lt;/a&gt;
is just &amp;ldquo;Moderate&amp;rdquo; not &amp;ldquo;Critical&amp;rdquo;. It affects a very small (if any) number of users and it&amp;rsquo;s not likely
to be a target for an attack. However, we still advise our users who still use AUTH_OID to apply remediation.&lt;/p&gt;
&lt;p&gt;This vulnerability is fixed in Flask Appbuilder 4.3.11 and Apache Airflow 2.8.2 uses that version of Flask
Application Builder. We advise users who still use this authentication method to either switch to another
authentication method or upgrade to Apache Airflow 2.8.2. If they cannot do either
of these solutions quickly, they should apply the workaround provided below.&lt;/p&gt;
&lt;h2 id=&#34;impact&#34;&gt;Impact&lt;/h2&gt;
&lt;p&gt;When Flask-AppBuilder is set to &lt;code&gt;AUTH_TYPE&lt;/code&gt; set to &lt;code&gt;AUTH_OID&lt;/code&gt;, it allows an attacker to forge an HTTP
request that could deceive the backend into using any requested OpenID service. This vulnerability
could grant an attacker unauthorised privilege access if a custom OpenID service is deployed
by the attacker and accessible by the backend.&lt;/p&gt;
&lt;p&gt;This vulnerability is only exploitable when the application is using OpenID (not OpenID Connect also known
as OIDC). Currently, this protocol is regarded as legacy, with significantly reduced usage.&lt;/p&gt;
&lt;h2 id=&#34;possible-remediation&#34;&gt;Possible remediation&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Change your authentication method - if you are using &lt;code&gt;AUTH_OID&lt;/code&gt;, there are almost no commercial services
supporting it, it was deprecated 10 years ago and abandoned by nearly everyone in the community 4 years
ago. Your best choice is to choose a different authentication method.&lt;/li&gt;
&lt;li&gt;Upgrade to Apache Airflow 2.8.2 (which also upgrades to Flask-AppBuilder 4.3.11 that contains a fix)&lt;/li&gt;
&lt;li&gt;If upgrade is not possible, apply the workaround below&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;workarounds&#34;&gt;Workarounds&lt;/h2&gt;
&lt;p&gt;If upgrade or changing authentication method is not possible add the following to
your &lt;code&gt;webserver_config.py&lt;/code&gt; file to fix the issue:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;os&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;flask&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;flash&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;redirect&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;flask_appbuilder.security.forms&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LoginForm_oid&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;flask_appbuilder.security.views&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AuthOIDView&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;flask_appbuilder.views&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;expose&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow.www.security&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AirflowSecurityManager&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;basedir&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;os&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;abspath&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;os&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dirname&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;vm&#34;&gt;__file__&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;FixedOIDView&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;AuthOIDView&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@expose&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;/login/&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;methods&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;GET&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;POST&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;login&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;flag&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;form&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LoginForm_oid&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;form&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;validate_on_submit&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;identity_url&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;provider&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;appbuilder&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sm&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;openid_providers&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;provider&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;url&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;form&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;openid&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;identity_url&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;form&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;openid&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;identity_url&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;is&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;flash&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;invalid_login_message&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;warning&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;redirect&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;appbuilder&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_url_for_login&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;super&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;login&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;flag&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;flag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;FixedAirflowSecurityManager&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;AirflowSecurityManager&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;authoidview&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FixedOIDView&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;SECURITY_MANAGER_CLASS&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FixedAirflowSecurityManager&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;credits&#34;&gt;Credits&lt;/h2&gt;
&lt;p&gt;Big thanks to &lt;a href=&#34;https://www.linkedin.com/in/islam-rzayev&#34;&gt;Islam Rzayev&lt;/a&gt; for finding out and reporting the issue responsibly and to &lt;a href=&#34;https://github.com/dpgaspar&#34;&gt;Daniel Gaspar&lt;/a&gt; for
very close cooperation on this one and coordinating the disclosure together with the &lt;a href=&#34;https://superset.apache.org/&#34;&gt;Apache Superset&lt;/a&gt;
where Flask AppBuilder is also used.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 2.8.0 is here</title>
<link>/blog/airflow-2.8.0/</link>
<pubDate>Fri, 15 Dec 2023 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.8.0/</guid>
<description>
&lt;p&gt;I am thrilled to announce the release of Apache Airflow 2.8.0, featuring a host of significant enhancements and new features that will greatly benefit our community.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.8.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.8.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.8.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.8.0/&lt;/a&gt; &lt;br&gt;
🛠 Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.8.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.8.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: &amp;ldquo;docker pull apache/airflow:2.8.0&amp;rdquo; &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.8.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.8.0&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;airflow-object-storage-aip-58&#34;&gt;Airflow Object Storage (AIP-58)&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This feature is experimental and subject to change.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Airflow now offers a generic abstraction layer over various object stores like S3, GCS, and Azure Blob Storage, enabling the use of different storage systems in DAGs without code modification.&lt;/p&gt;
&lt;p&gt;In addition, it allows you to use most of the standard Python modules, like shutil, that can work with file-like objects.&lt;/p&gt;
&lt;p&gt;Here is an example of how to use the new feature to open a file:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow.io.path&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ObjectStoragePath&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;base&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ObjectStoragePath&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;s3://my-bucket/&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;conn_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;aws_default&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# conn_id is optional&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;read_file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ObjectStoragePath&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;str&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;read&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The above example is just the tip of the iceberg. The new feature allows you to configure an alternative backend for a scheme or protocol.&lt;/p&gt;
&lt;p&gt;Here is an example of how to configure a custom backend for the &lt;code&gt;dbfs&lt;/code&gt; scheme:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow.io.path&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ObjectStoragePath&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow.io.store&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;attach&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;fsspec.implementations.dbfs&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DBFSFileSystem&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;attach&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;protocol&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;dbfs&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DBFSFileSystem&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;instance&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;myinstance&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;token&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;mytoken&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;base&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ObjectStoragePath&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;dbfs://my-location/&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For more information: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/objectstorage.html&#34;&gt;Airflow Object Storage&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The support for a specific object storage system depends on the installed providers,
with out-of-the-box support for the file scheme.&lt;/p&gt;
&lt;h2 id=&#34;ship-logs-from-other-components-to-task-logs&#34;&gt;Ship logs from other components to Task logs&lt;/h2&gt;
&lt;p&gt;This feature seamlessly integrates task-related messages from various Airflow components, including the Scheduler and
Executors, into the task logs. This integration allows users to easily track error messages and other relevant
information within a single log view.&lt;/p&gt;
&lt;p&gt;Presently, suppose a task is terminated by the scheduler before initiation, times out due to prolonged queuing, or transitions into a zombie state. In that case, it is not recorded in the task log. With this enhancement, in such situations,
it becomes feasible to dispatch an error message to the task log for convenient visibility on the UI.&lt;/p&gt;
&lt;p&gt;This feature can be toggled, for more information &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#enable-task-context-logger&#34;&gt;see “enable_task_context_logger” in the logging configuration documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;listener-hooks-for-datasets&#34;&gt;Listener hooks for Datasets&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Please note that listeners are still experimental and subject to change.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This feature enables users to subscribe to Dataset creation and update events using listener hooks.
It’s particularly useful to trigger external processes based on a Dataset being created or updated.&lt;/p&gt;
&lt;h2 id=&#34;using-extra-index-urls-with-pythonvirtualenvoperator-and-caching&#34;&gt;Using Extra Index URLs with PythonVirtualEnvOperator and Caching&lt;/h2&gt;
&lt;p&gt;This feature allows you to specify extra index URLs to PythonVirtualEnvOperator (+corresponding decorator) to be able to install virtualenvs with (private) additional Python package repositories.&lt;/p&gt;
&lt;p&gt;You can also reuse the virtualenvs by caching them in a specified directory and reusing them in subsequent runs. This
can be achieved by setting the &lt;code&gt;venv_cache_path&lt;/code&gt; to a file system folder on your worker&lt;/p&gt;
&lt;p&gt;For more information: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonvirtualenvoperator&#34;&gt;PythonVirtualenvOperator&lt;/a&gt;&lt;/p&gt;
&lt;h1 id=&#34;web-ui-improvements&#34;&gt;Web UI improvements&lt;/h1&gt;
&lt;p&gt;There are a number of improvements to the Web UI in this release, including:&lt;/p&gt;
&lt;h2 id=&#34;add-multiselect-to-run-state-in-grid-view&#34;&gt;Add multiselect to run state in grid view:&lt;/h2&gt;
&lt;p&gt;The grid view now supports multiselect for run states. This allows you to select multiple states to filter the dag runs shown in the grid view.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;multiselect-states.png&#34; alt=&#34;Multiselect on the run state&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;improved-visibility-of-task-status-in-the-graph-view&#34;&gt;Improved visibility of task status in the Graph view&lt;/h2&gt;
&lt;p&gt;You can now see the status of a task in the graph view through the border color of the task. This makes it easier to see the status of a task at a glance.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;task_status_visibility.png&#34; alt=&#34;Task status visibility&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;raw-html-code-in-dag-docs-and-dag-params-descriptions-is-disabled-by-default&#34;&gt;Raw HTML code in DAG docs and DAG params descriptions is disabled by default&lt;/h2&gt;
&lt;p&gt;As part of our continuous quest to make airflow more secure by default, we have disabled raw HTML code in DAG docs and DAG params descriptions by default.
We care for your security, and &amp;ldquo;secure by default&amp;rdquo; is one of the things we follow strongly.&lt;/p&gt;
&lt;p&gt;Other notable UI improvements include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simplify DAG trigger UI&lt;/li&gt;
&lt;li&gt;Hide logical date and run id in trigger UI form&lt;/li&gt;
&lt;li&gt;Move external logs links to top of react logs page&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additional new features and improvements can be found in the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.8.0/release_notes.html#airflow-2-8-0-2023-12-14&#34;&gt;Airflow 2.8.0 release notes&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;contributors&#34;&gt;Contributors&lt;/h1&gt;
&lt;p&gt;Thanks to everyone who contributed to this release, including Amogh Desai, Andrey Anshin, Bolke de Bruin, Daniel Dyląg, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Jens Scheffler, mhenc, Miroslav Šedivý, Pankaj Koti, Tzu-ping Chung, Vincent, and everyone else who committed, all 110 of you! You are what makes Airflow the successful project that it is!&lt;/p&gt;
&lt;p&gt;I hope you enjoy using Apache Airflow 2.8.0!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 2.7.0 is here</title>
<link>/blog/airflow-2.7.0/</link>
<pubDate>Fri, 18 Aug 2023 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.7.0/</guid>
<description>
&lt;p&gt;I’m happy to announce that Apache Airflow 2.7.0 has been released! Some notable features have been added that we are excited for the community to use.&lt;/p&gt;
&lt;p&gt;Apache Airflow 2.7.0 contains over 500 commits, which include 40 new features, 49 improvements, 53 bug fixes, and 15 documentation changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.7.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.7.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.7.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.7.0/&lt;/a&gt; &lt;br&gt;
🛠 Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.7.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.7.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: &amp;ldquo;docker pull apache/airflow:2.7.0&amp;rdquo; &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.7.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.7.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Airflow 2.7.0 is a release that focuses on security. The Airflow security team, working together with security researchers, identified a number of areas that required strengthening of security. This resulted in, among others things, an improved description of the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/security/security_model/&#34;&gt;Airflow security model&lt;/a&gt;, a better explanation of our &lt;a href=&#34;https://github.com/apache/airflow/security/policy&#34;&gt;security policy&lt;/a&gt; and the disabling of certain, potentially dangerous, features by default - like, for example, connection testing (#32052).&lt;/p&gt;
&lt;p&gt;Airflow 2.7.0 is also the first release that drops support for end-of-life Python 3.7. This allows Airflow users and maintainers to make use of features and improvements in Python 3.8, and unlocks newer versions of our dependencies.&lt;/p&gt;
&lt;h2 id=&#34;setup-and-teardown-aip-52&#34;&gt;Setup and Teardown (AIP-52)&lt;/h2&gt;
&lt;p&gt;Airflow now has first class support for the concept of setup and teardown tasks. These tasks have special behavior in that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Teardown tasks will still run, no matter what state the upstream tasks end up in&lt;/li&gt;
&lt;li&gt;Teardown tasks failing won’t, by default, cause the DAG run to fail&lt;/li&gt;
&lt;li&gt;Automatically clear setup/teardown tasks when clearing a dependent task&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can read more about setup and teardown in the &lt;a href=&#34;/blog/introducing_setup_teardown/&#34;&gt;Introducing Setup and Teardown tasks blog post&lt;/a&gt;, or in the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.7.0/howto/setup-and-teardown.html&#34;&gt;setup and teardown docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;cluster-activity-ui&#34;&gt;Cluster Activity UI&lt;/h2&gt;
&lt;p&gt;There is a new top level page in Airflow, the Cluster Activity page. This gives an overview of the cluster, including component health, dag and task state counts, and more!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;cluster_activity.png&#34; alt=&#34;New cluster activity page&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;graph-and-gantt-views-moved-into-the-grid-view-ui&#34;&gt;Graph and gantt views moved into the Grid view UI&lt;/h2&gt;
&lt;p&gt;The graph and gantt views have been rewritten and moved into the now familiar grid view. This makes it easier to jump between task details, logs, graph, and gantt views without losing your place in a complicated DAG.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;graph_in_grid.png&#34; alt=&#34;Graph in grid view&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;enable-deferrable-mode-for-all-deferable-tasks-with-1-config-setting&#34;&gt;Enable deferrable mode for all deferable tasks with 1 config setting&lt;/h2&gt;
&lt;p&gt;Airflow 2.7.0 comes with a new config option, &lt;code&gt;default_deferrable&lt;/code&gt;, which allows admins to enable deferrable mode for all deferrable tasks without requiring any DAG modifications. Simply set it in your config and enjoy async tasks!&lt;/p&gt;
&lt;h2 id=&#34;openlineage-built-in-integration&#34;&gt;OpenLineage built-in integration&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://openlineage.io/&#34;&gt;OpenLineage&lt;/a&gt; provides a spec standardizing operational lineage collection and distribution across the data ecosystem that projects – open source or proprietary – implement.&lt;/p&gt;
&lt;p&gt;With 2.7.0, OpenLineage changes from a plugin implementation maintained in the OpenLineage project to a built-in feature of Airflow. As a plugin, OpenLineage depended on Airflow and operators’ internals, making it brittle. Built-in OpenLineage support in Airflow makes publishing operational lineage through the OpenLineage ecosystem easier and more reliable. It has been implemented by moving the &lt;a href=&#34;https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow&#34;&gt;openlineage-airflow&lt;/a&gt; package from the OpenLineage project to an &lt;code&gt;apache-airflow-providers-openlineage&lt;/code&gt; provider in the base Airflow Docker image, where it can be easily enabled by configuration. Also, lineage extraction logic that was included in &lt;a href=&#34;https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors&#34;&gt;Extractors&lt;/a&gt; in that package has been moved into each corresponding provider package along with unit tests, eliminating the need for Extractors in most cases. For this purpose, a new optional API for Operators (&lt;code&gt;get_openlineage_facets_on_{start(), complete(ti), failure(ti)}&lt;/code&gt;, documented &lt;a href=&#34;https://openlineage.io/docs/integrations/airflow/default-extractors&#34;&gt;here&lt;/a&gt;) can be used. Having the extraction logic in each provider ensures the stability of the lineage contract in each operator and makes adding lineage coverage to custom operators easier.&lt;/p&gt;
&lt;h2 id=&#34;some-executors-moved-into-providers&#34;&gt;Some executors moved into providers&lt;/h2&gt;
&lt;p&gt;Some of the executors that were shipped in core Airflow have moved into their respective providers for Airflow 2.7.0. The great benefit of this is to allow faster bug-fix releases as providers are released independently from core.
The following providers have been moved and require certain minimum providers versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In order to use Celery executors, install the &lt;a href=&#34;https://pypi.org/project/apache-airflow-providers-celery/&#34;&gt;celery provider version 3.3.0+&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In order to use the Kubernetes executor, install the &lt;a href=&#34;https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/&#34;&gt;kubernetes provider version 7.4.0+&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In order to use the Dask executor, install any version of the &lt;a href=&#34;https://pypi.org/project/apache-airflow-providers-daskexecutor/&#34;&gt;daskexecutor provider&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you use the official docker images, all of these providers come preinstalled.&lt;/p&gt;
&lt;h2 id=&#34;additional-new-features&#34;&gt;Additional new features&lt;/h2&gt;
&lt;p&gt;Here are just a few interesting new features, since there are too many to list in full:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pools can now consider tasks in the deferred state as running (#32709)&lt;/li&gt;
&lt;li&gt;chain_linear, like chain but allowing sequential tasks (#31927)&lt;/li&gt;
&lt;li&gt;Grid view now supports keyboard shortcuts! (#30950)&lt;/li&gt;
&lt;li&gt;Mark task groups as success or failed (#30478)&lt;/li&gt;
&lt;li&gt;Fail_stop, allowing all remaining and running tasks to be failed on the first failure in a DAG (#29406)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contributors&#34;&gt;Contributors&lt;/h2&gt;
&lt;p&gt;Thanks to everyone who contributed to this release, including Akash Sharma, Amogh Desai, Brent Bovenzi, D. Ferruzzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Karthikeyan Singaravelan, Maciej Obuchowski, Niko Oliveira, Pankaj Koti, Pankaj Singh, Pierre Jeambrun, Tzu-ping Chung, Utkarsh Sharma, Vincent Beck, and over 74 others!&lt;/p&gt;
&lt;p&gt;I’d especially like to thank our release manager, Ephraim, for getting this release out the door.&lt;/p&gt;
&lt;p&gt;I hope you enjoy using Apache Airflow 2.7.0!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Introducing Setup and Teardown tasks</title>
<link>/blog/introducing_setup_teardown/</link>
<pubDate>Fri, 18 Aug 2023 00:00:00 +0000</pubDate>
<guid>/blog/introducing_setup_teardown/</guid>
<description>
&lt;p&gt;In data pipelines, commonly we need to create infrastructure resources, like a cluster or GPU nodes in an existing cluster, before doing the actual “work” and delete them after the work is done. Airflow 2.7 adds “setup” and “teardown” tasks to better support this type of pipeline. This blog post aims to highlight the key features so you know what’s possible. For full documentation on how to use setup and teardown tasks, see the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.7.0/howto/setup-and-teardown.html&#34;&gt;setup and teardown docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-setup-and-teardown&#34;&gt;Why setup and teardown?&lt;/h2&gt;
&lt;p&gt;Before we dig into examples, let me state at high level what setup and teardown bring to the table.&lt;/p&gt;
&lt;h3 id=&#34;more-expressive-dependencies&#34;&gt;More expressive dependencies&lt;/h3&gt;
&lt;p&gt;Before setup and teardown, upstream and downstream relationships could only mean one thing: “this comes before that”. With setup and teardown, in effect we can say “this requires that”. And what it means in practice is, if you clear your task, and it requires a setup, that setup will be cleared too. And if that setup has a teardown, that will run again as well.&lt;/p&gt;
&lt;h3 id=&#34;separating-the-work-from-the-infra&#34;&gt;Separating the work from the infra&lt;/h3&gt;
&lt;p&gt;Sometimes the part of the dag you care about is not, say, the cleanup task. For example, suppose you have a dag that loads some data and then deletes temp files. As long as the data loads, you want your dag to be marked successful. By default, this is how teardown tasks work; that is, they are ignored when determining dag run state.&lt;/p&gt;
&lt;h2 id=&#34;simple-case&#34;&gt;Simple case&lt;/h2&gt;
&lt;p&gt;A simple example is one setup / teardown pair, and one normal or “work” task.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;simple.png&#34; alt=&#34;Simple setup and teardown example&#34;&gt;&lt;/p&gt;
&lt;p&gt;Setups and teardowns are indicated by the up and down arrows, respectively. From that we can see that .&lt;code&gt;create_cluster&lt;/code&gt; is a setup task and &lt;code&gt;delete_cluster&lt;/code&gt; is a teardown. The link between a setup and a teardown is always dotted to highlight the special relationship.&lt;/p&gt;
&lt;p&gt;Some things to observe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;create_cluster&lt;/code&gt; fails, neither &lt;code&gt;run_query&lt;/code&gt; nor &lt;code&gt;delete_cluster&lt;/code&gt; will run.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;create_cluster&lt;/code&gt; succeeds and &lt;code&gt;run_query&lt;/code&gt; fails, then &lt;code&gt;delete_cluster&lt;/code&gt; will still run.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;create_cluster&lt;/code&gt; is skipped, &lt;code&gt;run_query&lt;/code&gt; and &lt;code&gt;delete_cluster&lt;/code&gt; will be skipped&lt;/li&gt;
&lt;li&gt;By default, if &lt;code&gt;run_query&lt;/code&gt; succeeds, and &lt;code&gt;delete_cluster&lt;/code&gt; fails, then the dag run will still be marked successful. (This behavior can be overridden).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;authoring-with-task-groups&#34;&gt;Authoring with task groups&lt;/h2&gt;
&lt;p&gt;When we set something downstream of a task group, any teardowns in the task group are ignored. This reflects the assumption that in general, we probably don’t want to stop dag execution just because a teardown fails. So, let’s wrap the above dag in a task group and see what happens:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;task-group-arrow.png&#34; alt=&#34;Setup and teardown in task groups&#34;&gt;&lt;/p&gt;
&lt;p&gt;And here’s how we linked those groups in the code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TaskGroup&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;do_emr&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;do_emr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;create_cluster_task&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;create_cluster&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;run_query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;create_cluster_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;delete_cluster&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;create_cluster_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TaskGroup&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;load&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;create_config_task&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;create_configuration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;load_data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;create_config_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;delete_configuration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;create_config_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;do_emr&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In this code, each group has a teardown, and we just arrow the first group to the second. As advertised, &lt;code&gt;delete_cluster&lt;/code&gt;, a teardown task, is ignored. This has two important consequences: one, even if it fails, the &lt;code&gt;load&lt;/code&gt; group will still run; and two, &lt;code&gt;delete_cluster&lt;/code&gt; and &lt;code&gt;create_configuration&lt;/code&gt; can run in parallel (generally speaking, we’d imagine you don’t want to wait for teardown operations to complete before continuing onto other tasks in the dag). Of course you can override this behavior by adding an arrow between &lt;code&gt;delete_cluster&lt;/code&gt; and &lt;code&gt;create_configuration&lt;/code&gt;. Further, the success of this dag will depend only on whether the &lt;code&gt;load_data&lt;/code&gt; task completes successfully.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;There’s a lot of detail we’re omitting here about exactly how to write dags with setup and teardown tasks, and for that please head over to the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.7.0/howto/setup-and-teardown.html&#34;&gt;setup and teardown docs&lt;/a&gt;. But hopefully this post gives you enough of an idea of what is possible with setup and teardown tasks that you can begin to see where they can improve your data pipelines in Airflow.&lt;/p&gt;
&lt;p&gt;Curious to know what else is new in Airflow 2.7? Head over to the main &lt;a href=&#34;/blog/airflow-2.7.0/&#34;&gt;Airflow 2.7 blog post&lt;/a&gt; to find out!&lt;/p&gt;
&lt;h2 id=&#34;acknowledgements&#34;&gt;Acknowledgements&lt;/h2&gt;
&lt;p&gt;Setup and Teardown was the product of AIP-52. Thanks to everyone who contributed to it, including those that read and voted on the AIP. Special thanks to Ash Berlin-Taylor, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Jed Cunningham, Rahul Vats, and Vikram Koka.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: what&#39;s new in Apache Airflow 2.6.0</title>
<link>/blog/airflow-2.6.0/</link>
<pubDate>Sun, 30 Apr 2023 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.6.0/</guid>
<description>
&lt;p&gt;I am excited to announce that Apache Airflow 2.6.0 has been released, bringing many minor features and improvements to the community.&lt;/p&gt;
&lt;p&gt;Apache Airflow 2.6.0 contains over 500 commits, which include 42 new features, 58 improvements, 38 bug fixes, and 17 documentation changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.6.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.6.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.6.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.6.0/&lt;/a&gt; &lt;br&gt;
🛠 Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.6.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.6.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: &amp;ldquo;docker pull apache/airflow:2.6.0&amp;rdquo; &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.6.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.6.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As the changelog is quite large, the following are some notable new features that shipped in this release.&lt;/p&gt;
&lt;h2 id=&#34;trigger-logs-can-now-be-viewed-in-webserver&#34;&gt;Trigger logs can now be viewed in webserver&lt;/h2&gt;
&lt;p&gt;Trigger logs have now been added to task logs. They appear right alongside the rest of the logs from your task.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;trigger_logging.png&#34; alt=&#34;Trigger logs shown in task log&#34;&gt;&lt;/p&gt;
&lt;p&gt;Adding this feature required changes across the entire Airflow logging stack, so be sure to update your providers if you are using remote logging.&lt;/p&gt;
&lt;h2 id=&#34;grid-view-improvements&#34;&gt;Grid view improvements&lt;/h2&gt;
&lt;p&gt;The grid view has received a number of minor improvements in this release.&lt;/p&gt;
&lt;p&gt;Most notably, there is now a graph tab in the grid view. This offers a more integrated graph representation of the DAG, where choosing a task in either the grid or graph will highlight the same task in both views.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;graph.png&#34; alt=&#34;The new graph view&#34;&gt;&lt;/p&gt;
&lt;p&gt;You can also filter upstream and downstream from a single task. For example, in the screenshot above, &lt;code&gt;describe_integrity&lt;/code&gt; is the selected task. If you choose to filter downstream, this is the result:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;filter_downstream.png&#34; alt=&#34;The new graph view can be filtered to show downstream tasks only&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;trigger-ui-based-on-dag-level-params&#34;&gt;Trigger UI based on DAG level params&lt;/h2&gt;
&lt;p&gt;A user-friendly form is now shown to users triggering runs for DAGs with DAG level params.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;trigger_dag_form.png&#34; alt=&#34;Form shown for params in UI when triggering a DAG&#34;&gt;&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.6.0/core-concepts/params.html#use-params-to-provide-a-trigger-ui-form&#34;&gt;Params docs&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h2 id=&#34;consolidation-of-handling-stuck-queued-tasks&#34;&gt;Consolidation of handling stuck queued tasks&lt;/h2&gt;
&lt;p&gt;Airflow now has a single configuration, &lt;code&gt;[scheduler] task_queued_timeout&lt;/code&gt;, to handle tasks that get stuck in queued for too long. With a simpler implementation than the outgoing code handling these tasks, tasks stuck in queued will no longer slip through the cracks and stay stuck.&lt;/p&gt;
&lt;p&gt;For more details, see the &lt;a href=&#34;https://medium.com/apache-airflow/unsticking-airflow-stuck-queued-tasks-are-no-more-in-2-6-0-6f40a1a22835&#34;&gt;Unsticking Airflow: Stuck Queued Tasks are No More in 2.6.0&lt;/a&gt; Medium post.&lt;/p&gt;
&lt;h2 id=&#34;cluster-policy-hooks-can-come-from-plugins&#34;&gt;Cluster Policy hooks can come from plugins&lt;/h2&gt;
&lt;p&gt;Cluster policy hooks (e.g. &lt;code&gt;dag_policy&lt;/code&gt;), can now come from Airflow plugins in addition to Airflow local settings. By allowing multiple hooks to be defined, it makes it easier for more than one team to run hooks in a single Airflow instance.&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.6.0/administration-and-deployment/cluster-policies.html&#34;&gt;cluster policy docs&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h2 id=&#34;notification-support-added&#34;&gt;Notification support added&lt;/h2&gt;
&lt;p&gt;The notifications framework allows you to send messages to external systems when a task instance/DAG run changes state. For example, you can easily post a message to Slack&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;err&#34;&gt;“&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;slack_notifier_example&lt;/span&gt;&lt;span class=&#34;err&#34;&gt;”&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;start_date&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;datetime&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2023&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;on_success_callback&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;send_slack_notification&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;The DAG {{ dag.dag_id }} succeeded&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;channel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;#general&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;username&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Airflow&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;],&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As of today, Slack is the only system supported out of the box. However, watch this space as more integrations will be added soon.&lt;/p&gt;
&lt;p&gt;You can also create notifiers for your own use, refer to the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.6.0/howto/notifications.html&#34;&gt;notifier how-to docs&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h2 id=&#34;thanks-to-the-contributors&#34;&gt;Thanks to the contributors&lt;/h2&gt;
&lt;p&gt;Thanks to everyone who contributed to this release, including Andrey Anshin, Ash Berlin-Taylor, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Josh Fell, Michael Petro, Niko Oliveira, Pierre Jeambrun, Tzu-ping Chung, Victor Chiapaikeo, and over 120 others!&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d especially like to thank our release manager, Ephraim, for getting this release out the door.&lt;/p&gt;
&lt;p&gt;I hope you enjoy using Apache Airflow 2.6.0!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 2.5.0: Tick-Tock</title>
<link>/blog/airflow-2.5.0/</link>
<pubDate>Fri, 02 Dec 2022 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.5.0/</guid>
<description>
&lt;p&gt;Apache Airfow 2.5 has just been released, barely two and a half months after 2.4!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.5.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.5.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.5.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.5.0/&lt;/a&gt; &lt;br&gt;
🛠️ Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.5.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.5.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: docker pull apache/airflow:2.5.0 &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.5.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.5.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This quicker release cadence is a departure from our previous habit of releasing every five-to-seven months and was a deliberate effort to listen to you, our users, and get the changes and improvements into your workflows earlier.&lt;/p&gt;
&lt;h2 id=&#34;usability-improvements-to-the-datasets-ui&#34;&gt;Usability improvements to the Datasets UI&lt;/h2&gt;
&lt;p&gt;When we released Dataset aware scheduling in September we knew that the tools we gave to manage the Datasets were very much a Minimum Viable Product, and in the last two months the committers and contributors have been hard at work at making the UI much more usable when it comes to Datasets.&lt;/p&gt;
&lt;p&gt;But we we aren&amp;rsquo;t done yet - keep an eye out for more improvements coming over the next couple of releases too.&lt;/p&gt;
&lt;h2 id=&#34;greatly-improved-airflow-dags-test-command&#34;&gt;Greatly improved &lt;code&gt;airflow dags test&lt;/code&gt; command&lt;/h2&gt;
&lt;p&gt;This airflow subcommand has been rethought and re-optimized to make it much easier to test your DAGs locally - the major changes are:&lt;/p&gt;
&lt;p&gt;a. Task logs are visible right there in the console, instead of hidden away inside the task log files
b. It is about an order of magnitude quicker to run the tasks than before (i.e. it gets to running the task code so much quicker)
c. Everything runs in one process, so you can put a breakpoint in your IDE, and configure it to run &lt;code&gt;airflow dags test &amp;lt;mydag&amp;gt;&lt;/code&gt; then debug code!&lt;/p&gt;
&lt;h2 id=&#34;auto-tailing-task-logs-in-the-grid-view&#34;&gt;Auto tailing task logs in the Grid view&lt;/h2&gt;
&lt;p&gt;Hopefully the headline says enough. It&amp;rsquo;s lovely, go check it out.&lt;/p&gt;
&lt;h2 id=&#34;more-improvments-to-dynamic-task-mapping&#34;&gt;More improvments to Dynamic-Task mapping&lt;/h2&gt;
&lt;p&gt;In a similar vein to the improvements to the Dataset (UI), we have continued to iterate on and improve the feature we first added in Airflow 2.3, Dynamic Task Mapping, and 2.5 includes &lt;a href=&#34;https://github.com/apache/airflow/pulls?q=is%3Apr+author%3Auranusjr+is%3Aclosed+milestone%3A%22Airflow+2.5.0%22&#34;&gt;dozens of improvements&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;thanks-to-the-contributors&#34;&gt;Thanks to the contributors&lt;/h2&gt;
&lt;p&gt;Andrey Anshin, Ash Berlin-Taylor, blag, Bolke de Bruin, Brent Bovenzi, Chenglong Yan, Daniel Standish, Dov Benyomin Sohacheski, Elad Kalif, Ephraim Anierobi, Jarek Potiuk, Jed Cunningham, Jorrick Sleijster, Michael Petro, Niko, Pierre Jeambrun, Tzu-ping Chung and many more, over 75 of you. Thank you!&lt;/p&gt;
&lt;p&gt;And a special thank you to Ephraim who tirelessly worked behind the scenes as release manager!&lt;/p&gt;
&lt;p&gt;A much shorter change log than 2.4, but I think you&amp;rsquo;ll agree, some great changes.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 2.4.0: That Data Aware Release</title>
<link>/blog/airflow-2.4.0/</link>
<pubDate>Mon, 19 Sep 2022 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.4.0/</guid>
<description>
&lt;p&gt;Apache Airflow 2.4.0 contains over 650 &amp;ldquo;user-facing&amp;rdquo; commits (excluding commits to providers or chart) and over 870 total. That includes 46 new features, 39 improvements, 52 bug fixes, and several documentation changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.4.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.4.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.4.0/&lt;/a&gt; &lt;br&gt;
🛠️ Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: docker pull apache/airflow:2.4.0 &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.4.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.4.0&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;data-aware-scheduling-aip-48&#34;&gt;Data-aware scheduling (AIP-48)&lt;/h2&gt;
&lt;p&gt;This one is big. Airflow now has the ability to schedule DAGs based on other tasks updating datasets.&lt;/p&gt;
&lt;p&gt;What does this mean, exactly? This is a great new feature that lets DAG authors create smaller, more self-contained DAGs, which chain together into a larger data-based workflow. If you are currently using &lt;code&gt;ExternalTaskSensor&lt;/code&gt; or &lt;code&gt;TriggerDagRunOperator&lt;/code&gt; you should take a look at datasets &amp;ndash; in most cases you can replace them with something that will speed up the scheduling!&lt;/p&gt;
&lt;p&gt;But enough talking, lets have a short example. First lets write a simple DAG with a task called &lt;code&gt;my_task&lt;/code&gt; that produces a dataset called &lt;code&gt;my-dataset&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Dataset&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uri&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;my-dataset&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;producer&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;outlets&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;my_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Datasets are defined by a URI. Now, we can create a second DAG (&lt;code&gt;consumer&lt;/code&gt;) that gets scheduled whenever this dataset changes:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Dataset&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uri&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;my-dataset&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;dataset-consumer&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;schedule&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]):&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With these two DAGs, the instant &lt;code&gt;my_task&lt;/code&gt; finishes, Airflow will create the DAG run for the &lt;code&gt;dataset-consumer&lt;/code&gt; workflow.&lt;/p&gt;
&lt;p&gt;We know that what exists right now won&amp;rsquo;t fit all use cases that people might wish for datasets, and in the coming minor releases (2.5, 2.6, etc.) we will expand and improve upon this foundation.&lt;/p&gt;
&lt;p&gt;Datasets represent the abstract concept of a dataset, and (for now) do not have any direct read or write capability - in this release we are adding the foundational feature that we will build upon in the future - and it&amp;rsquo;s part of our goal to have smaller releases to get new features in your hands sooner!&lt;/p&gt;
&lt;p&gt;For more information on datasets, see the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/datasets.html&#34;&gt;documentation on Data-aware scheduling&lt;/a&gt;. That includes details on how datasets are identified (URIs), how you can depend on multiple datasets, and how to think about what a dataset is (hint: don&amp;rsquo;t include &amp;ldquo;date partitions&amp;rdquo; in a dataset, it&amp;rsquo;s higher level than that).&lt;/p&gt;
&lt;h2 id=&#34;easier-management-of-conflicting-python-dependencies-using-the-new-externalpythonoperator&#34;&gt;Easier management of conflicting python dependencies using the new ExternalPythonOperator&lt;/h2&gt;
&lt;p&gt;As much as we wish all python libraries could be used happily together that sadly isn&amp;rsquo;t the world we live in, and sometimes there are conflicts when trying to install multiple python libraries in an Airflow install &amp;ndash; right now we hear this a lot with &lt;code&gt;dbt-core&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To make this easier we have introduced &lt;code&gt;@task.external_python&lt;/code&gt; (and the matching &lt;code&gt;ExternalPythonOperator&lt;/code&gt;) that lets you run an python function as an Airflow task in a pre-configured virtual env, or even a whole different python version. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;external_python&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;python&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;/opt/venvs/task_deps/bin/python&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;my_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data_interval_start&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;data_interval_env&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;Looking at data between &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data_interval_start&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; and &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data_interval_end&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There are a few subtlties as to what you need installed in the virtual env depending on which context variables you access, so be sure to read the &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/2.4.0/howto/operator/python.html#externalpythonoperator&#34;&gt;how-to on using the ExternalPythonOperator&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;more-improvements-to-dynamic-task-mapping-aip-42&#34;&gt;More improvements to Dynamic Task Mapping (AIP-42)&lt;/h2&gt;
&lt;p&gt;You asked, we listened. Dynamic task mapping now includes support for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;expand_kwargs&lt;/code&gt;: To assign multiple parameters to a non-TaskFlow operator.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;zip&lt;/code&gt;: To combine multiple things without cross-product.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;map&lt;/code&gt;: To transform the parameters just before the task is run.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information on dynamic task mapping, see the new sections of the doc on &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#transforming-mapped-data&#34;&gt;Transforming Mapped Data&lt;/a&gt;, &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#combining-upstream-data-aka-zipping&#34;&gt;Combining upstream data (aka &amp;ldquo;zipping&amp;rdquo;)&lt;/a&gt;, and &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#assigning-multiple-parameters-to-a-non-taskflow-operator&#34;&gt;Assigning multiple parameters to a non-TaskFlow operator&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;auto-register-dags-used-in-a-context-manager-no-more-as-dag-needed&#34;&gt;Auto-register DAGs used in a context manager (no more &lt;code&gt;as dag:&lt;/code&gt; needed)&lt;/h2&gt;
&lt;p&gt;This one is a small quality of life improvement, and I don&amp;rsquo;t want to admit how many times I forgot the &lt;code&gt;as dag:&lt;/code&gt;, or worse, had &lt;code&gt;as dag:&lt;/code&gt; repeated.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;example&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@dag&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;dag_maker&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;dag2&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dag_maker&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;can become&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;example&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@dag&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;my_dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;my_dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you want to disable the behaviour for any reason, set &lt;code&gt;auto_register=False&lt;/code&gt; on the DAG:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# This dag will not be picked up by Airflow as it&amp;#39;s not assigned to a variable&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;example&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;auto_register&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;additional-improvements&#34;&gt;Additional improvements&lt;/h2&gt;
&lt;p&gt;With over 650 commits the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html#airflow-2-4-0-2022-09-19&#34;&gt;full list of features, fixes and changes&lt;/a&gt; is too big to go in to here (check out the release notes for a full list), but some noteworthy or interesting small features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Auto-refresh on the home page&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;@task.short_circuit&lt;/code&gt; TaskFlow decorator&lt;/li&gt;
&lt;li&gt;Add roles delete command to cli&lt;/li&gt;
&lt;li&gt;Add support for &lt;code&gt;TaskGroup&lt;/code&gt; in &lt;code&gt;ExternalTaskSensor&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;@task.kubernetes&lt;/code&gt; taskflow decorator&lt;/li&gt;
&lt;li&gt;Add experimental &lt;code&gt;parsing_context&lt;/code&gt; to enable optimization of Dynamic DAG handling in workers&lt;/li&gt;
&lt;li&gt;Consolidate to one &lt;code&gt;schedule&lt;/code&gt; param&lt;/li&gt;
&lt;li&gt;Allow showing non-sensitive config values in Admin -&amp;gt; Configuration (rather than all or nothing)&lt;/li&gt;
&lt;li&gt;Operator name separate from class (no more &lt;code&gt;_PythonDecoratedOperator&lt;/code&gt; when using TaskFlow)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contributors&#34;&gt;Contributors&lt;/h2&gt;
&lt;p&gt;Thanks to everyone who contributed to this release, including Andrey Anshin, Ash Berlin-Taylor, Bartłomiej Hirsz, Brent Bovenzi, Chenglong Yan, D. Ferruzzi, Daniel Standish, Drew Hubl, Elad Kalif, Ephraim Anierobi, Jarek Potiuk, Jed Cunningham, Josh Fell, Mark Norman Francis, Niko, Tzu-ping Chung, Vincent, Wojciech Januszek, chethanuk-plutoflume, pierrejeambrun, and everyone else who committed, all 152 of you! You are what makes Airflow the successful project that it is!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Airflow Survey 2022</title>
<link>/blog/airflow-survey-2022/</link>
<pubDate>Fri, 17 Jun 2022 00:00:00 +0000</pubDate>
<guid>/blog/airflow-survey-2022/</guid>
<description>
&lt;h1 id=&#34;airflow-user-survey-2022&#34;&gt;Airflow User Survey 2022&lt;/h1&gt;
&lt;p&gt;This year’s survey has come and gone, and with it we’ve got a new batch of data for everyone! We collected 210 responses over two weeks. We continue to see growth in both contributions and downloads over the last two years, and expect that trend will continue through 2022.&lt;/p&gt;
&lt;p&gt;The raw response data will be made available here soon, in the meantime, feel free to email &lt;a href=&#34;mailto:john.thomas@astronomer.io&#34;&gt;john.thomas@astronomer.io&lt;/a&gt; for a copy.&lt;/p&gt;
&lt;h2 id=&#34;tldr&#34;&gt;TL;DR&lt;/h2&gt;
&lt;h3 id=&#34;overview-of-the-user&#34;&gt;Overview of the user&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Like previous years, more than half of the Airflow users are Data Engineers (54%). Solutions Architects (13%), Developers (12%), DevOps (6%) and Data Scientists (4%) are also active Airflow users! There was a slight increase in the representation of Solutions Architect roles compared to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; and &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt; .&lt;/li&gt;
&lt;li&gt;Airflow is used and popular in bigger companies, 64% of Airflow users work for companies with 200+ employees which is an 11 percent increase compared to &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;62% of the survey participants have more than 6 Airflow users in their company.&lt;/li&gt;
&lt;li&gt;More Airflow users (65.9%) are willing to recommend Apache Airflow compared to the survey results in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; and &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt;. There is a general positive trend in a willingness to recommend Airflow, 93% of surveyed Airflow users are willing to recommend Airflow ( 85.7% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt; and 92% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; ), only 1% of users are not likely to recommend (3.6% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt; and 3.5% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Airflow documentation is a critical source of information, with more than 90% (15% increase compared to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;) of survey participants using the documentation. Airflow documentation is also one of the top areas to improve! What’s interesting, also Stack Overflow usage is critical, with about 60% users declaring to use it as a source of information (24% increase compared to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;deployments&#34;&gt;Deployments&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;85% of the Airflow users have between 1 to 7 active Airflow instances. 62.5% of the Airflow users have between 11 to 250 DAGs in their largest Airflow instance. 75% of the surveyed Airflow users have between 1 to 100 tasks per DAG.&lt;/li&gt;
&lt;li&gt;Close to 85% of users use one of the Airflow 2 versions, 9.2% users still use 1.10.15, while the remaining 6.3% are still using olderAirflow 1 versions. The good news is that the majority of users on Airflow 1 are planning migration to Airflow 2 quite soon, with resources and capacity being the main blockers.&lt;/li&gt;
&lt;li&gt;In comparison to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;, more users were interested in monitoring in general and specifically in using tools such as external monitoring services (40.7%, up from 29.6%) and information from metabase (35.7%, up from 25.1%).&lt;/li&gt;
&lt;li&gt;Celery (52.7%) and Kubernetes (39.4%) are the most common executors used.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;usage&#34;&gt;Usage&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;81.3% of Airflow users who responded to the survey don’t have any customisation of Airflow.&lt;/li&gt;
&lt;li&gt;Xcom (69.8%) is the most popular method to pass inputs and outputs between tasks, however Saving and Retrieving Inputs and Outputs from Storage still plays an important role (49%).&lt;/li&gt;
&lt;li&gt;Lineage itself is a quite new topic for Airflow users, most of them don’t use lineage solutions but might be interested if supported by Airflow (47.5%), are not familiar with data lineage (29%) or that data lineage is not their concern (13%).&lt;/li&gt;
&lt;li&gt;The Airflow web UI is used heavily for Monitoring Runs (95.9%), Accessing Task Logs (89.8%), Manually triggering DAGs (85.2%), Clearing Tasks (82.7%) and Marking Tasks as successful (60.7%). The top 3 views used are: List of DAGs, Task Logs and DAG Runs, which is very similar to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; and &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;community-and-contribution&#34;&gt;Community and contribution&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Most Airflow users (57.1%) are aware they could contribute but do not, and an additional 21.7% contribute very rarely. 14.8% of users were not aware they could contribute. There is much more to be done to engage our community to be more active contributors and raise the current 6.4% of users who actively contribute, especially considering that one important blocker for contribution is lack of knowledge on how to start (37.7%).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;the-future-of-airflow&#34;&gt;The future of Airflow&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The top area for improvement is still the Airflow web UI (49.5%), closely followed by more telemetry for logging, monitoring and alerting purposes (48%). However all those efforts should go in line with improved documentation (36.6.%) and resources about using the Airflow, especially when we take into account the need of onboarding new users (36.6%).&lt;/li&gt;
&lt;li&gt;DAG Versioning(66.2%) is a winner for new features in Airflow, and it’s not a surprise as this feature may positively impact daily work of Airflow users. It is followed by three other ideas: Dependency management and Data-driven scheduling (42.6%), More dynamic task structure (42.1%) and Multi-Tenancy (37.9%).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;overview-of-the-user-1&#34;&gt;Overview of the user&lt;/h2&gt;
&lt;h3 id=&#34;what-best-describes-your-current-occupation-single-choice&#34;&gt;What best describes your current occupation? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image1.png&#34; alt=&#34;alt_text&#34; title=&#34;user_occupations&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Engineer&lt;/td&gt;
&lt;td&gt;114&lt;/td&gt;
&lt;td&gt;54%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Solutions Architect&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevOps&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Scientist&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support Engineer&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Analyst&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business Analyst&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey, more than half of Airflow users are Data Engineers (54%). Roles of the remaining Airflow users might be broken down into Solutions Architects (13%), Developers (12%), DevOps (6%) and Data Scientists (4%). The 2022 results are similar to &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;those from 2019&lt;/a&gt; and &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; with a slight increase in the representation of Solutions Architect roles.&lt;/p&gt;
&lt;h3 id=&#34;how-often-do-you-interact-with-airflow-single-choice&#34;&gt;How often do you interact with Airflow? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image2.png&#34; alt=&#34;alt_text&#34; title=&#34;interaction_frequency&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Every day&lt;/td&gt;
&lt;td&gt;154&lt;/td&gt;
&lt;td&gt;73%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;At least once per week&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;17%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;At least once per month&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Less than once per month&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Users who took the survey are actively using Airflow as part of their current role. 73% of Airflow users who responded use it on a daily basis, 17% weekly.&lt;/p&gt;
&lt;h3 id=&#34;how-many-people-work-at-your-company-single-choice&#34;&gt;How many people work at your company? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image3.png&#34; alt=&#34;alt_text&#34; title=&#34;company_size&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;201-5000&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;41%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5000+&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;23%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-200&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-50&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-10&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Airflow is a framework that is used and popular in bigger companies, 64% of Airflow users who responded (compared to 52.7% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;) work for companies bigger than 200 employees (41% in companies size 201-5000 and 23% in companies size 5000+).&lt;/p&gt;
&lt;h3 id=&#34;how-many-people-at-your-company-use-airflow-single-choice&#34;&gt;How many people at your company use Airflow? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image4.png&#34; alt=&#34;alt_text&#34; title=&#34;airflow_usage&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6-20&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-5&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-200&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200+&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Airflow is generally used by small to medium-sized teams. 62% of the survey participants have more than 6 Airflow users in their company (38% have between 6 and 200 users, 24% between 51-200 users).&lt;/p&gt;
&lt;h3 id=&#34;how-likely-are-you-to-recommend-apache-airflow-single-choice&#34;&gt;How likely are you to recommend Apache Airflow? (single choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;% 2019&lt;/td&gt;
&lt;td&gt;% 2020&lt;/td&gt;
&lt;td&gt;% 2022&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very Likely&lt;/td&gt;
&lt;td&gt;45.4%&lt;/td&gt;
&lt;td&gt;61.6%&lt;/td&gt;
&lt;td&gt;65.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Likely&lt;/td&gt;
&lt;td&gt;40.3%&lt;/td&gt;
&lt;td&gt;30.4%&lt;/td&gt;
&lt;td&gt;26.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Neutral&lt;/td&gt;
&lt;td&gt;10.7%&lt;/td&gt;
&lt;td&gt;5.4%&lt;/td&gt;
&lt;td&gt;6.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unlikely&lt;/td&gt;
&lt;td&gt;2.6%&lt;/td&gt;
&lt;td&gt;1.5%&lt;/td&gt;
&lt;td&gt;0.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very Unlikely&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;0.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey, more Airflow users (65.9%) are willing to recommend Apache Airflow compared to the survey results in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; and &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt;. There is a general positive trend in a willingness to recommend Airflow, 93% of surveyed Airflow users are willing to recommend Airflow (92% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; and 85.7% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt;), only 1% of users are not likely to recommend (3.6% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt; and 3.5% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; ).&lt;/p&gt;
&lt;h3 id=&#34;what-is-your-source-of-information-about-airflow-multiple-choice&#34;&gt;What is your source of information about Airflow? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;189&lt;/td&gt;
&lt;td&gt;90.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow website (Blog, etc.)&lt;/td&gt;
&lt;td&gt;142&lt;/td&gt;
&lt;td&gt;67.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack Overflow&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;td&gt;60.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Github Issues&lt;/td&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;49.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slack&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;45.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow Summit Videos&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;td&gt;42.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Discussions&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;36.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow Community Webinars&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;19.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Astronomer Registry&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;24.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow Mailing List&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;16.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Airflow documentation is a critical source of information, with more than 90% of survey participants using the documentation. It is of increasing importance compared to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; where documentation was at about 75% level. Moreover, more than 60% of users are getting information from the Airflow website (67.9% ) and Stack Overflow (60.3%) which is also a big increase compared to 36% level in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;. What’s interesting is that Slack usage decreased from 63.05% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; to 45.9% in 2022.&lt;/p&gt;
&lt;h2 id=&#34;deployments-1&#34;&gt;Deployments&lt;/h2&gt;
&lt;h3 id=&#34;how-many-active-dags-do-you-have-in-your-largest-airflow-instance-single-choice&#34;&gt;How many active DAGs do you have in your largest Airflow instance? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image5.png&#34; alt=&#34;alt_text&#34; title=&#34;active_dags&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-250&lt;/td&gt;
&lt;td&gt;66&lt;/td&gt;
&lt;td&gt;31.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-50&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;30.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5-10&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;12.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;251-500&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt;5&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000+&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;501-1000&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;62.5% of the Airflow users surveyed have between 11 to 250 DAGs in their largest Airflow instance.&lt;/p&gt;
&lt;h3 id=&#34;how-many-active-airflow-instances-do-you-have-single-choice&#34;&gt;How many active Airflow instances do you have? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image6.png&#34; alt=&#34;alt_text&#34; title=&#34;image_tooltip&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;25.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;22.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4-7&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;19.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;18.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20+&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8-10&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-20&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;85% of the Airflow users surveyed have between 1 and 7 active Airflow instances, and nearly 50% have only 1 or 2.&lt;/p&gt;
&lt;h3 id=&#34;what-is-the-maximum-number-of-tasks-that-you-have-used-in-a-single-dagsingle-choice&#34;&gt;What is the maximum number of tasks that you have used in a single DAG?(single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image7.png&#34; alt=&#34;alt_text&#34; title=&#34;maximum tasks&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-25&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;24.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;26-50&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;19.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-100&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;16.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt;10&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;13.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101-250&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;11.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;501-1000&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000-2500&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;3.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;251-500&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;3.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2500-5000&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;75% of the surveyed Airflow users have between 1 and 100 tasks per DAG.&lt;/p&gt;
&lt;h3 id=&#34;how-many-schedulers-do-you-have-in-your-largest-airflow-instance-single-choice&#34;&gt;How many schedulers do you have in your largest Airflow instance? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image8.png&#34; alt=&#34;alt_text&#34; title=&#34;max_schedulers&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;td&gt;55.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;29.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;8.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4+&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;More than half of Airflow users who responded to the survey have 1 scheduler in their largest Airflow instance, however it’s important to notice that the second half of Airflow users decided to have 2 schedulers and more.&lt;/p&gt;
&lt;h3 id=&#34;what-executor-type-do-you-use-multiple-choice&#34;&gt;What executor type do you use? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Celery&lt;/td&gt;
&lt;td&gt;107&lt;/td&gt;
&lt;td&gt;52.7 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;39.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sequential&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;10.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CeleryKubernetes&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Celery (52.7%) and Kubernetes (39.4%) are the most common executors used. CeleryKubernetes (6.9%) executor also started to be noticed and used by Airflow users.&lt;/p&gt;
&lt;h3 id=&#34;if-you-use-the-celery-executor-how-many-workers-do-you-have-in-your-largest-airflow-instance-single-choice&#34;&gt;If you use the Celery executor, how many workers do you have in your largest Airflow instance? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image9.png&#34; alt=&#34;alt_text&#34; title=&#34;max_workers&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2-5&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;44.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;19.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;18.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6-10&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;17.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Amongst Celery executor users who responded to the survey, close to half the number (44.8%) have between 2 to 5 workers in their largest Airflow instance. It’s notable that nearly a fifth (19.6%) have more than 10 workers.&lt;/p&gt;
&lt;h3 id=&#34;which-version-of-airflow-do-you-currently-use-single-choice&#34;&gt;Which version of Airflow do you currently use? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image10.png&#34; alt=&#34;alt_text&#34; title=&#34;airflow_version&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.14 or older&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.15&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.0.x&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;11.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.1.x&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;11.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.2.x&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;38.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.3.x&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;23.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It&amp;rsquo;s good to see that close to 85% of users who responded to the survey use one of the Airflow 2 versions, 9.2% users still use 1.10.15, while the remaining 6.3% are still using older Airflow 1.10 versions.&lt;/p&gt;
&lt;p&gt;The good news is that the majority of users on Airflow 1 are planning migration to Airflow 2 quite soon, as for now they have capacity constraints to undertake such a significant effort in their opinion. However, it can also be noticed in the survey’s comments that some users are generally skeptical towards migration to Airflow 2, they have negative opinions about the new scheduler or compatibility with the helm chart.&lt;/p&gt;
&lt;p&gt;As to plans about migration to the newest version of Airflow 2, users who responded to the survey are committed and waiting especially for the features related to dynamic DAGs. However, some users also reported that they are waiting to solve some dependencies they have or they prefer to wait a little bit more for the community to test the new version before they decide to move on.&lt;/p&gt;
&lt;h3 id=&#34;what-metrics-do-you-use-to-monitor-airflow-multiple-choice&#34;&gt;What metrics do you use to monitor Airflow? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External monitoring service&lt;/td&gt;
&lt;td&gt;81&lt;/td&gt;
&lt;td&gt;40.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Information from metadatabase&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;td&gt;35.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Statsd&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;27.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I do not use monitoring&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;23.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In comparison to results from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;, more users are monitoring airflow in some way. External monitoring services (40.7%) and information from metabase (35.7%) started to play a more important role in Airflow monitoring.&lt;/p&gt;
&lt;h3 id=&#34;how-do-you-deploy-airflow-multiple-choice&#34;&gt;How do you deploy Airflow? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On virtual machines (for example using AWS EC2)&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;30.6 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using a managed service like Astronomer, Google Composer or AWS MWAA&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;26.2 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On Kubernetes (using Apache Airflow’s helm chart)&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;22.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On premises&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;20.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On Kubernetes (using custom deployments)&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;18.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On Kubernetes (using another helm chart)&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;10.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;More than half of Airflow users who responded (51.4%) deploy Airflow on Kubernetes. This is about 20 percent more than in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;. The remaining top deployment methods are on virtual machines (30.6%) and via managed services (26.2%).&lt;/p&gt;
&lt;h3 id=&#34;how-do-you-distribute-your-dags-from-your-developer-environment-to-the-cloud-single-choice&#34;&gt;How do you distribute your DAGs from your developer environment to the cloud? (single choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using a synchronizing process (Git sync, GCS fuse, etc)&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;49%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bake them into the docker image&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared files system&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;14.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;7.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t know&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey responses, the most popular way of distributing DAGs is a synchronizing process, about half of Airflow users (49%) use this process to distribute DAGs from developer environments to the cloud.&lt;/p&gt;
&lt;h2 id=&#34;usage-1&#34;&gt;Usage&lt;/h2&gt;
&lt;h3 id=&#34;do-you-have-any-customisation-of-airflow-single-choice&#34;&gt;Do you have any customisation of Airflow? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image11.png&#34; alt=&#34;alt_text&#34; title=&#34;customization&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, we use vanilla airflow&lt;/td&gt;
&lt;td&gt;165&lt;/td&gt;
&lt;td&gt;81.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, we have a separate fork&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, we use a 3rd-party fork&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, we’ve backpropagated bug fixes to an older version&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;More Airflow users (81.3%) don’t have any customisation of Airflow (compared to 75.9% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;). Those Airflow users who have customisations (18.7%) decided to introduce them mainly to separate development and production workflows, to backport bug fixes, due to security fixes or to run a backfill command on Kubernetes pod.&lt;/p&gt;
&lt;h3 id=&#34;which-metadata-database-do-you-use-single-choice&#34;&gt;Which Metadata Database do you use? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image12.png&#34; alt=&#34;alt_text&#34; title=&#34;database&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%I&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL 13&lt;/td&gt;
&lt;td&gt;86&lt;/td&gt;
&lt;td&gt;43.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL 12&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;37.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MySQL 8&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;11.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MySQL 5&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MariaDB&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MsSQL&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey responses, the most popular metadata databases are PostgreSQL 13 (43.9%) and PostgreSQL 12 (37.8%). This represents a sharp increase from 2020, up from 68.9% to 81.7% total on PostgreSQL, with a corresponding decrease in MySQL, down from 23% to 15%. This is an interesting result taking into account community discussion about not adding support for more database backend or even deciding on single database support.&lt;/p&gt;
&lt;h3 id=&#34;whats-the-primary-method-by-which-you-integrate-with-providers-and-external-services-in-your-airflow-dags-single-choice&#34;&gt;What&amp;rsquo;s the primary method by which you integrate with providers and external services in your Airflow DAGs? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image13.png&#34; alt=&#34;alt_text&#34; title=&#34;providers_interface&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using existing dedicated operators / hooks&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;34.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using Bash/Python operators&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;28.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using custom operators / hooks&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;24.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using KubernetesPodOperator&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;12.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey responses, the following ways of using Airflow to connect to external services are the most popular: Using existing dedicated operators / hooks (34.5%), Using Bash/Python operators (28.6%), Using custom operators / hooks (24.6%). Using KubernetesPodOperator (12.3%) is less popular regarding the survey responses. The integration with providers and external services methods ranking is similar to the one from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;what-providers-do-you-use-in-your-airflow-dags-multiple-choice&#34;&gt;What providers do you use in your Airflow DAGs? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Web Services&lt;/td&gt;
&lt;td&gt;112&lt;/td&gt;
&lt;td&gt;55.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Cloud Platform / Google APIs&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;39.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal company systems&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;37.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hadoop / Spark / Flink / Other Apache software&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;28.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft Azure&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;8.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;10.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I do not use external services in my Airflow DAGs&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It’s not surprising that Amazon Web Services (55.4% vs 59.6% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/&#34;&gt;2020&lt;/a&gt;), on the next three positions Google Cloud Platform (39.1% vs 47.7% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/&#34;&gt;2020&lt;/a&gt; ), Internal company systems (37.1% vs 55.6% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/&#34;&gt;2020&lt;/a&gt;), and other Apache products (28.2% vs 35.47% in &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/&#34;&gt;2020&lt;/a&gt;) are leading Airflow providers.&lt;/p&gt;
&lt;h3 id=&#34;how-frequently-do-you-upgrade-airflow-environments-single-choice&#34;&gt;How frequently do you upgrade Airflow environments? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image14.png&#34; alt=&#34;alt_text&#34; title=&#34;upgrade_frequency&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;every 12 months&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;22.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;every 6 months&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;once a quarter&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;23.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whenever there is a newer version&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;29.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Different frequencies of Airflow environments upgrades are almost equally popular amongst Airflow users who responded to the survey.&lt;/p&gt;
&lt;h3 id=&#34;do-you-upgrade-providers-separately-from-the-core-single-choice&#34;&gt;Do you upgrade providers separately from the core? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image15.png&#34; alt=&#34;alt_text&#34; title=&#34;providers_upgrade&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;When I need it&lt;/td&gt;
&lt;td&gt;83&lt;/td&gt;
&lt;td&gt;42.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Never - always use the providers that come with Airflow&lt;/td&gt;
&lt;td&gt;68&lt;/td&gt;
&lt;td&gt;35.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I did not know I can upgrade providers separately&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;16.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I upgrade providers when they are released&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey responses, Airflow users most often upgrade providers when they need it (42.8%) or prefer to stay with providers that come with Airflow (35.1%). It’s surprising that 16.5% of Airflow users who responded to the survey were not aware that they can upgrade their providers separately from the core Airflow.&lt;/p&gt;
&lt;h3 id=&#34;how-do-you-pass-inputs-and-outputs-between-tasks-multiple-choice&#34;&gt;How do you pass inputs and outputs between tasks? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Xcom&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;td&gt;69.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Saving and retrieving from Storage&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;td&gt;49%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TaskFlow&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;18.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;We don’t&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;14.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey responses, Xcom (69.8%) is the most popular method to pass inputs and outputs between tasks, however Saving and Retrieving Inputs and Outputs from Storage still plays an important role (49%). It’s interesting that close to 15% of Airflow users who responded to the survey declare to not pass any outputs or inputs between tasks.&lt;/p&gt;
&lt;h3 id=&#34;do-you-use-a-data-lineage-backend-multiple-choice&#34;&gt;Do you use a data lineage backend? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, but I will use such feature if fully supported in Airflow&lt;/td&gt;
&lt;td&gt;95&lt;/td&gt;
&lt;td&gt;47.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I’m not familiar with data lineage&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, data lineage isn’t a concern for my usage&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, I send lineage to an Open Source lineage repository&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;7.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, I send lineage to an Enterprise lineage repository&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, I send lineage to a custom internal lineage repository&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;When asked what lineage backend Airflow users use, the answers indicated that, while lineage itself is a quite new topic, there is interest in the feature as a whole. Most Airflow users responded that they don’t use lineage solutions currently but might be interested in the future if supported by Airflow (47.5%), are not familiar with data lineage (29%) or that data lineage is not their concern (13%).&lt;/p&gt;
&lt;h3 id=&#34;which-interfaces-of-airflow-do-you-use-as-part-of-your-current-role-multiple-choice&#34;&gt;Which interfaces of Airflow do you use as part of your current role? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Original Airflow Graphical User Interface&lt;/td&gt;
&lt;td&gt;189&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;td&gt;48.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;39.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom (own created) Airflow Graphical User Interface&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP Composer&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It’s clear that usage of Airflow web UI is important as 94% of users who responded to the survey declare to use it as a part of their current role. Usage of CLI (48.8%) and API (39.8%) goes in pairs but are not so common compared to Airflow web UI usage.&lt;/p&gt;
&lt;h3 id=&#34;if-gui-marked-what-do-you-use-the-gui-for-multiple-choice&#34;&gt;(If GUI Marked) What do you use the GUI for? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring Runs&lt;/td&gt;
&lt;td&gt;188&lt;/td&gt;
&lt;td&gt;95.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessing Task Logs&lt;/td&gt;
&lt;td&gt;176&lt;/td&gt;
&lt;td&gt;89.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manually triggering DAGs&lt;/td&gt;
&lt;td&gt;167&lt;/td&gt;
&lt;td&gt;85.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clearing Tasks&lt;/td&gt;
&lt;td&gt;162&lt;/td&gt;
&lt;td&gt;82.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marking Tasks as successful&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;td&gt;60.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Airflow web UI is used heavily for monitoring: Monitoring Runs (95.9%) and troubleshooting: Accessing Task Logs (89.8%), Manually triggering DAGs (85.2%), Clearing Tasks (82.7%) and Marking Tasks as successful (60.7%).&lt;/p&gt;
&lt;h3 id=&#34;if-cli-marked-what-do-you-use-the-cli-for-multiple-choice&#34;&gt;(if CLI Marked) What do you use the CLI For? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backfilling&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;56.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manually triggering DAGs&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;46.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clearing Tasks&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;23.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring Runs&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;22.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessing Task Logs&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;18.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Marking Tasks as successful&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;9.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;15.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Compared to Airflow web UI, Airflow CLI is used mainly for Backfilling (56.8%) and Manually triggering DAGs (46.8%).&lt;/p&gt;
&lt;h3 id=&#34;in-airflow-which-ui-views-are-important-for-you-multiple-choice&#34;&gt;In Airflow, which UI views are important for you? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;List of DAGs&lt;/td&gt;
&lt;td&gt;178&lt;/td&gt;
&lt;td&gt;89.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task Logs&lt;/td&gt;
&lt;td&gt;162&lt;/td&gt;
&lt;td&gt;81.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG Runs&lt;/td&gt;
&lt;td&gt;160&lt;/td&gt;
&lt;td&gt;80.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph view&lt;/td&gt;
&lt;td&gt;147&lt;/td&gt;
&lt;td&gt;73.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grid/Tree View&lt;/td&gt;
&lt;td&gt;138&lt;/td&gt;
&lt;td&gt;69.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run Details&lt;/td&gt;
&lt;td&gt;117&lt;/td&gt;
&lt;td&gt;58.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG details&lt;/td&gt;
&lt;td&gt;111&lt;/td&gt;
&lt;td&gt;55.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task Instances&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;51.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task Duration&lt;/td&gt;
&lt;td&gt;91&lt;/td&gt;
&lt;td&gt;45.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;45.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task Tries&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;30.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gantt&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;21.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Landing Times&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;13.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;UI views importance ranking shows that the majority Airflow users use Web UI mostly for monitoring and/or troubleshooting purposes, where the top 3 views are List of DAGs (89.4%), Task Logs (81.4%) and DAG Runs (80.4%). The results are very similar to those from &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user&#34;&gt;2020&lt;/a&gt; and &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;2019&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;community-and-contribution-1&#34;&gt;Community and contribution&lt;/h2&gt;
&lt;h3 id=&#34;are-you-participating-in-the-airflow-community-discussions-single-choice&#34;&gt;Are you participating in the Airflow community discussions? (single choice)&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image16.png&#34; alt=&#34;alt_text&#34; title=&#34;discussions_engagement&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I see them from time to time&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;td&gt;48.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I regularly follow what&amp;rsquo;s being discussed but don&amp;rsquo;t participate&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;25.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I didn&amp;rsquo;t know I could&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;20.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I actively participate in the discussions&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I know I can but I do not contribute&lt;/td&gt;
&lt;td&gt;116&lt;/td&gt;
&lt;td&gt;57.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very rarely when it relates to what I need&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;21.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I do not know I could&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;14.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I regularly contribute by discussing, reviewing and submitting PR&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Results related to the Airflow contribution are very similar to those about participating in the Airflow community discussions. Most of the Airflow users (57.1%) who responded to the survey are aware but do not contribute or contribute very rarely (21.7%). 14.8% of users were not aware they could contribute. Once again, it’s a clear indicator that there is much more to be done to engage our community to be more active contributors and raise the current 6.4% of users who actively contribute.&lt;/p&gt;
&lt;h3 id=&#34;if-you-do-not-contribute---why&#34;&gt;If you do not contribute - why?&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;images/image18.png&#34; alt=&#34;alt_text&#34; title=&#34;contribution_reasons&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I have no time to contribute even if would like to&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;38.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t know how to start&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;37.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t have a need to contribute&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;11.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I didn’t know I could&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;7.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;My employer has policy that makes it difficult to contribute&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey results, the most important blocker for the Airflow contribution is limited time (38.9%), but surprisingly interesting and important blocker is also lack of knowledge on how to start (37.7%), followed by lack of knowledge that it’s possible to contribute (7.2%).&lt;/p&gt;
&lt;h2 id=&#34;the-future-of-airflow-1&#34;&gt;The future of Airflow&lt;/h2&gt;
&lt;h3 id=&#34;in-your-opinion-what-could-be-improved-in-airflow-multiple-choice&#34;&gt;In your opinion, what could be improved in Airflow? (multiple choice)&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web UI&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;49.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging, monitoring and alerting&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;td&gt;48.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Examples, how-to, onboarding documentation&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;36.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical documentation&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;36.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduler performance&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;27.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;25.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG authoring&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;23.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST API&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;21.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authentication and authorization&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;20.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External integration e.g. AWS, GCP, Apache products&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;20.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Better support for various deployments (Docker-compose/Nomad/Others)&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;19.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everything works fine for me&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t know&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The results are quite self-explanatory. According to the survey results, the top area for improvement is still the Airflow web UI (49.5%), closely followed by more telemetry for logging, monitoring and alerting purposes (48%). However all those efforts should go in line with improved documentation (36.6.%) and resources about using the Airflow, especially when we take into account the need of onboarding new users (36.6%).&lt;/p&gt;
&lt;h3 id=&#34;which-features-would-you-like-to-see-in-airflow&#34;&gt;Which features would you like to see in Airflow?&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;No.&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG Versioning&lt;/td&gt;
&lt;td&gt;129&lt;/td&gt;
&lt;td&gt;66.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency management and Data-driven scheduling&lt;/td&gt;
&lt;td&gt;83&lt;/td&gt;
&lt;td&gt;42.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;More dynamic task structure&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;42.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Tenancy&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;37.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Signal-based scheduling&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;34.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Better Security (Isolation)&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;33.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Submitting new DAGs externally via API&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;27.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composable Operators&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;23.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support for native cloud executors (AWS/GCP/Azure etc.)&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;22.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Better support for Machine Learning&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;19.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote CLI&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;18.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support for hybrid executors&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;11.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey results, DAG Versioning is a winner for new features in Airflow, and it’s not a surprise as this feature may positively impact daily work of Airflow users. It is followed by three other ideas: Dependency management and Data-driven scheduling (42.6%), More dynamic task structure (42.1%) and Multi-Tenancy (37.9%). Another interesting point from that question is that only 11.3% think that support for hybrid executors is needed in Airflow.&lt;/p&gt;
&lt;h2 id=&#34;data&#34;&gt;Data&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re interested in taking a look at the raw data yourself, it&amp;rsquo;s available here: (Airflow User Survey 2022.csv)[/data/survey-responses/airflow-user-survey-responses-2022.csv.zip]&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Airflow Summit 2022</title>
<link>/blog/airflow_summit_2022/</link>
<pubDate>Mon, 16 May 2022 00:00:00 +0000</pubDate>
<guid>/blog/airflow_summit_2022/</guid>
<description>
&lt;p&gt;The biggest Airflow Event of the Year returns May 23–27! Airflow Summit 2022 will bring together the global
community of Apache Airflow practitioners and data leaders.&lt;/p&gt;
&lt;h3 id=&#34;whats-on-the-agenda&#34;&gt;What’s on the Agenda&lt;/h3&gt;
&lt;p&gt;During the free conference, you will hear about Apache Airflow best practices, trends in building data
pipelines, data governance, Airflow and machine learning, and the future of Airflow. There will also be
a series of presentations on non-code contributions driving the open-source project.&lt;/p&gt;
&lt;h3 id=&#34;how-to-attend&#34;&gt;How to Attend&lt;/h3&gt;
&lt;p&gt;This year’s edition will include a variety of online sessions across different time zones.
Additionally, you can take part in local in-person events organized worldwide for data
communities to watch the event and network.&lt;/p&gt;
&lt;h3 id=&#34;interested&#34;&gt;Interested?&lt;/h3&gt;
&lt;p&gt;🪶 &lt;a href=&#34;https://www.crowdcast.io/e/airflowsummit2022/register?utm_campaign=Astronomer_marketing&amp;amp;utm_source=Astronomer%20website&amp;amp;utm_medium=website&amp;amp;utm_term=Airflow%20Summit&#34;&gt;Register for Airflow Summit 2022&lt;/a&gt; today&lt;/p&gt;
&lt;p&gt;🤝 &lt;a href=&#34;https://airflowsummit.org/in-person-events/&#34;&gt;Check out the in-person events&lt;/a&gt; planned for Airflow Summit 2022.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 2.3.0 is here</title>
<link>/blog/airflow-2.3.0/</link>
<pubDate>Sat, 30 Apr 2022 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.3.0/</guid>
<description>
&lt;p&gt;Apache Airflow 2.3.0 contains over 700 commits since 2.2.0 and includes 50 new features, 99 improvements, 85 bug fixes, and several doc changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.3.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.3.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.3.0/&lt;/a&gt; &lt;br&gt;
🛠️ Release Notes: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/release_notes.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.3.0/release_notes.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: docker pull apache/airflow:2.3.0 &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.3.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.3.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As the changelog is quite large, the following are some notable new features that shipped in this release.&lt;/p&gt;
&lt;h2 id=&#34;dynamic-task-mappingaip-42&#34;&gt;Dynamic Task Mapping(AIP-42)&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s now first-class support for dynamic tasks in Airflow. What this means is that you can generate tasks dynamically at runtime. Much like using a &lt;code&gt;for&lt;/code&gt; loop
to create a list of tasks, here you can create the same tasks without having to know the exact number of tasks ahead of time.&lt;/p&gt;
&lt;p&gt;You can have a &lt;code&gt;task&lt;/code&gt; generate the list to iterate over, which is not possible with a &lt;code&gt;for&lt;/code&gt; loop.&lt;/p&gt;
&lt;p&gt;Here is an example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;make_list&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;# This can also be from an API call, checking a database, -- almost anything you like, as long as the&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;# resulting list/dictionary can be stored in the current XCom backend.&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;a&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;b&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;},&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;str&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;consumer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;list&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;dynamic-map&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;start_date&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;datetime&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2022&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;4&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;consumer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;expand&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arg&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;make_list&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/dynamic-task-mapping.html&#34;&gt;Dynamic Task Mapping&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;grid-view-replaces-tree-view&#34;&gt;Grid View replaces Tree View&lt;/h2&gt;
&lt;p&gt;Grid view replaces tree view in Airflow 2.3.0.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Screenshots&lt;/strong&gt;:
&lt;img src=&#34;grid-view.png&#34; alt=&#34;The new grid view&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;purge-history-from-metadata-database&#34;&gt;Purge history from metadata database&lt;/h2&gt;
&lt;p&gt;Airflow 2.3.0 introduces a new &lt;code&gt;airflow db clean&lt;/code&gt; command that can be used to purge old data from the metadata database.&lt;/p&gt;
&lt;p&gt;You would want to use this command if you want to reduce the size of the metadata database.&lt;/p&gt;
&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/usage-cli.html#purge-history-from-metadata-database&#34;&gt;Purge history from metadata database&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;localkubernetesexecutor&#34;&gt;LocalKubernetesExecutor&lt;/h2&gt;
&lt;p&gt;There is a new executor named LocalKubernetesExecutor. This executor helps you run some tasks using LocalExecutor and run another set of tasks using the KubernetesExecutor in the same deployment based on the task&amp;rsquo;s queue.&lt;/p&gt;
&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/executor/local_kubernetes.html&#34;&gt;LocalKubernetesExecutor&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;dagprocessormanager-as-standalone-process-aip-43&#34;&gt;DagProcessorManager as standalone process (AIP-43)&lt;/h2&gt;
&lt;p&gt;As of 2.3.0, you can run the DagProcessorManager as a standalone process. Because DagProcessorManager runs user code, separating it from the scheduler process and running it as an independent process in a different host is a good idea.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;airflow dag-processor&lt;/code&gt; cli command will start a new process that will run the DagProcessorManager in a separate process. Before you can run the DagProcessorManager as a standalone process, you need to set the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#standalone_dag_processor&#34;&gt;[scheduler] standalone_dag_processor&lt;/a&gt; to &lt;code&gt;True&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/cli-and-env-variables-ref.html#dag-processor&#34;&gt;dag-processor CLI command&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;json-serialization-for-connections&#34;&gt;JSON serialization for connections&lt;/h2&gt;
&lt;p&gt;You can now create connections using the &lt;code&gt;json&lt;/code&gt; serialization format.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;airflow connections add &lt;span class=&#34;s1&#34;&gt;&amp;#39;my_prod_db&amp;#39;&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;span class=&#34;se&#34;&gt;&lt;/span&gt; --conn-json &lt;span class=&#34;s1&#34;&gt;&amp;#39;{
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;conn_type&amp;#34;: &amp;#34;my-conn-type&amp;#34;,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;login&amp;#34;: &amp;#34;my-login&amp;#34;,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;password&amp;#34;: &amp;#34;my-password&amp;#34;,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;host&amp;#34;: &amp;#34;my-host&amp;#34;,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;port&amp;#34;: 1234,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;schema&amp;#34;: &amp;#34;my-schema&amp;#34;,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;extra&amp;#34;: {
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;param1&amp;#34;: &amp;#34;val1&amp;#34;,
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; &amp;#34;param2&amp;#34;: &amp;#34;val2&amp;#34;
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; }
&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt; }&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also use &lt;code&gt;json&lt;/code&gt; serialization format when setting the connection in environment variables.&lt;/p&gt;
&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/howto/connection.html&#34;&gt;JSON serialization for connections&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;airflow-db-downgrade-and-offline-generation-of-sql-scripts&#34;&gt;Airflow &lt;code&gt;db downgrade&lt;/code&gt; and Offline generation of SQL scripts&lt;/h2&gt;
&lt;p&gt;Airflow 2.3.0 introduced a new command &lt;code&gt;airflow db downgrade&lt;/code&gt; that will downgrade the database to your chosen version.&lt;/p&gt;
&lt;p&gt;You can also generate the downgrade/upgrade SQL scripts for your database and manually run it against your database or just view the SQL queries that would be run by the downgrade/upgrade command.&lt;/p&gt;
&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/usage-cli.html#downgrading-airflow&#34;&gt;Airflow &lt;code&gt;db downgrade&lt;/code&gt; and Offline generation of SQL scripts&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;reuse-of-decorated-tasks&#34;&gt;Reuse of decorated tasks&lt;/h2&gt;
&lt;p&gt;You can now reuse decorated tasks across your dag files. A decorated task has an &lt;code&gt;override&lt;/code&gt; method that allows you to override it&amp;rsquo;s arguments.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;add_task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;x&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;y&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Task args: x=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;x&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;, y=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;y&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;x&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;y&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;start_date&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;datetime&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2022&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;mydag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;start&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;add_task&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;override&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;start&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;start&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;add_task&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;override&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;add_start_&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;start&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;More information can be found here: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.3.0/tutorial_taskflow_api.html#reusing-a-decorated-task&#34;&gt;Reuse of decorated DAGs&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;other-small-features&#34;&gt;Other small features&lt;/h2&gt;
&lt;p&gt;This isn’t a comprehensive list, but some noteworthy or interesting small features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Support different timeout value for dag file parsing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;airflow dags reserialize&lt;/code&gt; command to reserialize dags&lt;/li&gt;
&lt;li&gt;Events Timetable&lt;/li&gt;
&lt;li&gt;SmoothOperator - Operator that does literally nothing except logging a YouTube link to
Sade&amp;rsquo;s &amp;ldquo;Smooth Operator&amp;rdquo;. Enjoy!&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contributors&#34;&gt;Contributors&lt;/h2&gt;
&lt;p&gt;Thanks to everyone who contributed to this release: Ash Berlin-Taylor, Brent Bovenzi, Daniel Standish, Elad, Ephraim Anierobi, Jarek Potiuk, Jed Cunningham, Josh Fell, Kamil Breguła, Kanthi, Kaxil Naik, Khalid Mammadov, Malthe Borch, Ping Zhang, Tzu-ping Chung and many others who keep making Airflow better for everyone.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: What&#39;s new in Apache Airflow 2.2.0</title>
<link>/blog/airflow-2.2.0/</link>
<pubDate>Mon, 11 Oct 2021 00:00:00 +0000</pubDate>
<guid>/blog/airflow-2.2.0/</guid>
<description>
&lt;p&gt;I’m proud to announce that Apache Airflow 2.2.0 has been released. It contains over 600 commits since 2.1.4 and includes 30 new features, 84 improvements, 85 bug fixes, and many internal and doc changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;📦 PyPI: &lt;a href=&#34;https://pypi.org/project/apache-airflow/2.2.0/&#34;&gt;https://pypi.org/project/apache-airflow/2.2.0/&lt;/a&gt; &lt;br&gt;
📚 Docs: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.2.0/&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.2.0/&lt;/a&gt; &lt;br&gt;
🛠️ Changelog: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/2.2.0/changelog.html&#34;&gt;https://airflow.apache.org/docs/apache-airflow/2.2.0/changelog.html&lt;/a&gt; &lt;br&gt;
🐳 Docker Image: docker pull apache/airflow:2.2.0 &lt;br&gt;
🚏 Constraints: &lt;a href=&#34;https://github.com/apache/airflow/tree/constraints-2.2.0&#34;&gt;https://github.com/apache/airflow/tree/constraints-2.2.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As the changelog is quite large, the following are some notable new features that shipped in this release.&lt;/p&gt;
&lt;h2 id=&#34;custom-timetables-aip-39&#34;&gt;Custom Timetables (AIP-39)&lt;/h2&gt;
&lt;p&gt;Airflow has historically used cron expressions and timedeltas to represent when a DAG should run. This worked for a lot of use cases, but not all. For example, running daily on Monday-Friday, but not on weekends wasn’t possible.&lt;/p&gt;
&lt;p&gt;To provide more scheduling flexibility, determining when a DAG should run is now done with Timetables. Of course, backwards compatibility has been maintained - cron expressions and timedeltas are still fully supported, however, timetables are pluggable so you can add your own custom timetable to fit your needs! For example, you could write a timetable to schedule a DagRun&lt;/p&gt;
&lt;p&gt;&lt;code&gt;execution_date&lt;/code&gt; has long been confusing to new Airflowers, so as part of this change a new concept has been added to Airflow to replace it named &lt;code&gt;data_interval&lt;/code&gt;, which is the period of data that a task should operate on. The following are now available:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;logical_date&lt;/code&gt; (aka &lt;code&gt;execution_date&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;data_interval_start&lt;/code&gt; (same value as &lt;code&gt;execution_date&lt;/code&gt; for cron)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;data_interval_end&lt;/code&gt; (aka &lt;code&gt;next_execution_date&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you write your own timetables, keep in mind they should be idempotent and fast as they are used in the scheduler to create DagRuns.&lt;/p&gt;
&lt;p&gt;More information can be found at: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html&#34;&gt;Customizing DAG Scheduling with Timetables&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;deferrable-tasks-aip-40&#34;&gt;Deferrable Tasks (AIP-40)&lt;/h2&gt;
&lt;p&gt;Deferrable tasks allows operators or sensors to defer themselves until a light-weight async check passes, at which point they can resume executing. Most importantly, this results in the worker slot, and most notably any resources used by it, to be returned to Airflow. This allows simple things like monitoring a job in an external system or watching for an event to be much cheaper.&lt;/p&gt;
&lt;p&gt;To support this feature, a new component has been added to Airflow, the triggerer, which is the daemon process that runs the asyncio event loop.&lt;/p&gt;
&lt;p&gt;Airflow 2.2.0 ships with 2 deferrable sensors, &lt;code&gt;DateTimeSensorAsync&lt;/code&gt; and &lt;code&gt;TimeDeltaSensorAsync&lt;/code&gt;, both of which are drop-in replacements for the existing corresponding sensor.&lt;/p&gt;
&lt;p&gt;More information can be found at:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/concepts/deferring.html&#34;&gt;Deferrable Operators &amp;amp; Triggers&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;custom-task-decorators-and-taskdocker&#34;&gt;Custom &lt;code&gt;@task&lt;/code&gt; decorators and &lt;code&gt;@task.docker&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Airflow 2.2.0 allows providers to create custom &lt;code&gt;@task&lt;/code&gt; decorators in the TaskFlow interface.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;@task.docker&lt;/code&gt; decorator is one such decorator that allows you to run a function in a docker container. Airflow handles getting the code into the container and returning xcom - you just worry about your function. This is particularly useful when you have conflicting dependencies between Airflow itself and tasks you need to run.&lt;/p&gt;
&lt;p&gt;More information on creating custom &lt;code&gt;@task&lt;/code&gt; decorators can be found at: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/howto/create-custom-decorator.html&#34;&gt;Creating Custom @task Decorators&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;More information on the &lt;code&gt;@task.docker&lt;/code&gt; decorator can be found at: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html#using-the-taskflow-api-with-docker-or-virtual-environments&#34;&gt;Using the Taskflow API with Docker or Virtual Environments&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;validation-of-dag-params&#34;&gt;Validation of DAG params&lt;/h2&gt;
&lt;p&gt;You can now apply validation on DAG params by passing a &lt;code&gt;Param&lt;/code&gt; object for each param. The &lt;code&gt;Param&lt;/code&gt; object supports the full &lt;a href=&#34;https://json-schema.org/draft/2020-12/json-schema-validation.html&#34;&gt;json-schema validation specifications&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Currently this only functions with manually triggered DAGs, but it does set the stage for future params related functionality.&lt;/p&gt;
&lt;p&gt;More information can be found at: &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/concepts/params.html&#34;&gt;Params&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;other-small-features&#34;&gt;Other small features&lt;/h2&gt;
&lt;p&gt;This isn’t a comprehensive list, but some noteworthy or interesting small features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Testing Connections from the UI - test the credentials for your Connection actually work&lt;/li&gt;
&lt;li&gt;Duplication Connections from the UI&lt;/li&gt;
&lt;li&gt;DAGs “Next run” info is shown in the UI, including when the run will actually start&lt;/li&gt;
&lt;li&gt;&lt;code&gt;airflow standalone&lt;/code&gt; command runs all of the Airflow components directly without docker - great for local development&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contributors&#34;&gt;Contributors&lt;/h2&gt;
&lt;p&gt;Thanks to everyone who contributed to this release: Andrew Godwin, Ash Berlin-Taylor, Brent Bovenzi, Elad Kalif, Ephraim Anierobi, James Timmins, Jarek Potiuk, Jed Cunningham, Josh Fell, Kamil Breguła, Kaxil Naik, Malthe Borch, Sam Wheating, Sumit Maheshwari, Tzu-ping Chung and many others&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Airflow Summit 2021</title>
<link>/blog/airflow_summit_2021/</link>
<pubDate>Sun, 21 Mar 2021 00:00:00 +0000</pubDate>
<guid>/blog/airflow_summit_2021/</guid>
<description>
&lt;h2 id=&#34;airflow-summit-2021-is-here&#34;&gt;Airflow Summit 2021 is here!&lt;/h2&gt;
&lt;p&gt;The summit will be held online, July 8-16, 2021. Join us from all over the world to find
out how Airflow is being used by leading companies, what is its roadmap and how you can
participate in its development.&lt;/p&gt;
&lt;h2 id=&#34;useful-information&#34;&gt;Useful information:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The official website: &lt;a href=&#34;https://airflowsummit.org&#34;&gt;https://airflowsummit.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Call for proposals is open until &lt;strong&gt;12 April 2021&lt;/strong&gt;. To submit your talk go to &lt;a href=&#34;https://sessionize.com/airflow-summit-2021/&#34;&gt;https://sessionize.com/airflow-summit-2021/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;In case of any questions reach out to us via &lt;a href=&#34;mailto:info@airflowsummit.org&#34;&gt;info@airflowsummit.org&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>
<item>
<title>Blog: Airflow Survey 2020</title>
<link>/blog/airflow-survey-2020/</link>
<pubDate>Tue, 09 Mar 2021 00:00:00 +0000</pubDate>
<guid>/blog/airflow-survey-2020/</guid>
<description>
&lt;h1 id=&#34;apache-airflow-survey-2020&#34;&gt;Apache Airflow Survey 2020&lt;/h1&gt;
&lt;p&gt;World of data processing tools is growing steadily. Apache Airflow seems to be already considered as
crucial component of this complex ecosystem. We observe steady growth in number of users as well as in
an amount of active contributors. So listening and understanding our community is of high importance.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s worth to note that the 2020 survey was still mostly about 1.10.X version of Apache Airflow and
possibly many drawbacks were addressed in the 2.0 version that was released in December 2020. But if this
is true, we will learn next year!&lt;/p&gt;
&lt;h2 id=&#34;overview-of-the-user&#34;&gt;Overview of the user&lt;/h2&gt;
&lt;p&gt;&lt;img src=&#34;What_best_describes_your_current_occupation.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What best describes your current occupation? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data Engineer&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;td&gt;56.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;13.79&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevOps&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;8.37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Solutions Architect&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Scientist&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.91&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Analyst&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support Engineer&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1.48&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Those results are not a surprise as Airflow is a tool dedicated to data-related tasks. The majority of
our users are data engineers, scientists or analysts. The 2020 results are similar to &lt;a href=&#34;https://airflow.apache.org/blog/airflow-survey/&#34;&gt;those from 2019&lt;/a&gt; with
visible slight increase in ML use cases.&lt;/p&gt;
&lt;p&gt;Additionally, 79% of users uses Airflow on daily basis and 16% interacts with it at least once a week.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How many people work in your company? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;200+&lt;/td&gt;
&lt;td&gt;107&lt;/td&gt;
&lt;td&gt;52.71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-200&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;21.67&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-50&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;18.23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-10&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;7.39&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;How many people in your company use Airflow? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1-5&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;41.38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6-20&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;36.95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21-50&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;11.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;10.34&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Airflow is a software that is used and trusted by big companies. We can also see that Airflow can work
fine for teams of different sizes. However, in some cases users may use multiple Airflow instances.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Are you considering moving to other workflow engines? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No, we are happy with Airflow&lt;/td&gt;
&lt;td&gt;174&lt;/td&gt;
&lt;td&gt;85.71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;14.29&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Nearly 1 out of 7 users is considering migrating to other workflow engines. Their decision is usually
justified by need of &lt;strong&gt;easier workflow writing experience&lt;/strong&gt; (12.32%), &lt;strong&gt;better UI/UX&lt;/strong&gt; and &lt;strong&gt;faster scheduler&lt;/strong&gt;
(8.37% both).&lt;/p&gt;
&lt;p&gt;While the first point may be addressed by &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskflow-api&#34;&gt;TaskFlow API&lt;/a&gt; in Airflow 2.0 the other two are definitely addressed
in the new major version. And the early feedback from 2.0 users seems to be confirming it.&lt;/p&gt;
&lt;p&gt;The alternative engines considered by users are mainly Prefect and Argo. Some participants also mentioned
Luigi, Kubeflow or custom solutions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Are you or your team actively participating in Airflow development - contributing? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;I wish we could&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;td&gt;48.77&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;29.06&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;22.17&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is really heart-warming result. It means that 1 out of 5 users contributes actively to our project!
But it would be good to learn if there&amp;rsquo;s something else than time that is stopping people who wish to contribute
from doing it. If there are some other obstacles we definitely would like to learn about them so we can improve.
That said - if you know something we can improve please reach out via Slack, dev list or Github
discussions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How likely are you to recommend Apache Airflow? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;2020 %&lt;/th&gt;
&lt;th&gt;2019 %&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Very Likely&lt;/td&gt;
&lt;td&gt;125&lt;/td&gt;
&lt;td&gt;61.58&lt;/td&gt;
&lt;td&gt;45.45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Likely&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;30.54&lt;/td&gt;
&lt;td&gt;40.26%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Neutral&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5.42&lt;/td&gt;
&lt;td&gt;10.71%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unlikely&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1.48&lt;/td&gt;
&lt;td&gt;2.60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very unlikely&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.99&lt;/td&gt;
&lt;td&gt;0.97%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Here is good news! It seems that people are more willing to recommend Apache Airflow than year before.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is your source of information about Airflow? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;154&lt;/td&gt;
&lt;td&gt;75.86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow website&lt;/td&gt;
&lt;td&gt;139&lt;/td&gt;
&lt;td&gt;68.47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slack&lt;/td&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;63.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Github&lt;/td&gt;
&lt;td&gt;127&lt;/td&gt;
&lt;td&gt;62.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack Overflow&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;35.47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow Summit Videos&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;21.67&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The dev mailing list&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;16.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Awesome Apache Airflow repository&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;10.34&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;7.39&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Here we see that Airflow documentation is the crucial source of information. What&amp;rsquo;s interesting is that more
than 60% of users are getting information from Github and Slack channels.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;Where_are_you_based.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;airflow-uses-cases&#34;&gt;Airflow uses cases&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Do you have any customisation of Airflow? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No, we use vanilla Airflow&lt;/td&gt;
&lt;td&gt;154&lt;/td&gt;
&lt;td&gt;75.86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, we have small patches (no fork)&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;16.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, we have separate fork&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;7.39&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;When onboarding new members to airflow, what is the biggest problem? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No guide on best practises on developing DAGs&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;50.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;There is no easy option to launch Airflow&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;31.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small number of tutorials on different aspects of using Airflow&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;28.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation is not clear enough&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;26.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;There is no easy option to deploy DAGs to an Airflow instance&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;25.62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No problems&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;16.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small number of blogs regarding Airflow&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;14.78&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Which interface(s) of Airflow do you use as part of your current role? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Original Airflow Graphical User Interface&lt;/td&gt;
&lt;td&gt;199&lt;/td&gt;
&lt;td&gt;98.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;td&gt;43.35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;23.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom (own created) Airflow Graphical User Interface&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.91&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1.48&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Do you combine multiple DAGs? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Yes, by triggering another DAG&lt;/td&gt;
&lt;td&gt;87&lt;/td&gt;
&lt;td&gt;42.86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, I don&amp;rsquo;t combine multiple DAGs&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;38.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, through SubDAG&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;19.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;8.87&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;How do you integrate with external services? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Using existing dedicated operators / hooks&lt;/td&gt;
&lt;td&gt;147&lt;/td&gt;
&lt;td&gt;72.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using Bash / Python operator&lt;/td&gt;
&lt;td&gt;140&lt;/td&gt;
&lt;td&gt;68.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using own custom operators / hooks&lt;/td&gt;
&lt;td&gt;138&lt;/td&gt;
&lt;td&gt;67.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.91&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;What external services do you use in your Airflow DAGs? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Web Services&lt;/td&gt;
&lt;td&gt;121&lt;/td&gt;
&lt;td&gt;59.61&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal company systems&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;td&gt;55.67&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Cloud Platform / Google APIs&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;td&gt;47.78&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hadoop / Spark / Flink / Other Apache software&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;35.47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft Azure&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;10.34&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9.36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I do not use external services in my Airflow DAGs&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2.46&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;img src=&#34;What_external_services_do_you_use_in_your_Airflow_DAGs.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you use Airflow Plugins? If yes, what do you use them for? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Adding new operators/sensors and hooks&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;td&gt;58.62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don&amp;rsquo;t use Airflow plugins&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;33.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding AppBuilder views &amp;amp; menu items&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;13.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding new executors&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;8.37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding OperatorExtraLinks&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;| Other&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you use Airflow&amp;rsquo;s data lineage feature? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No, I will use such feature if fully supported in Airflow&lt;/td&gt;
&lt;td&gt;105&lt;/td&gt;
&lt;td&gt;51.72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, data lineage isn’t a concern for my usage.&lt;/td&gt;
&lt;td&gt;68&lt;/td&gt;
&lt;td&gt;33.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, I use another data lineage product&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;11.82&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, I use custom implementation&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2.46&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, I use Airflow&amp;rsquo;s experimental data lineage feature&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.49&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;When asked what lineage product users use, the answers were varying from custom tools
to known product like Amundsen, Atlas or dbt.&lt;/p&gt;
&lt;h2 id=&#34;deployment&#34;&gt;Deployment&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;How many active DAGs do you have in your largest Airflow instance? (open question)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Number of DAGs&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt; 20&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21-40&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;41-60&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;61-100&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101-200&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;201-300&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;301-999&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000+&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;What is the maximum number of tasks that you have used in one DAG? (open question)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Number of DAGs&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt; 10&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-20&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21-30&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31-40&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;41-50&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-100&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101-200&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;201-500&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;501+&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Which version of Airflow do you use currently? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1.10.14&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;27.09&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.0.0+&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;22.17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.12&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;13.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.10&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;12.81&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.11&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.5 or older&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.9&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;3.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.13&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3.45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.6&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.7&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.10.8&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1.48&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This was probably one of the most important questions in the survey. While it&amp;rsquo;s good to see
that more than 60% of users use one of three latest Airflow versions, it&amp;rsquo;s worrying that the rest
are using versions that are old or have known security vulnerabilities.&lt;/p&gt;
&lt;p&gt;Additionally, more than 20% of users are already using 2.0.0+ versions which is reasonably good information.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What meta-database do you use? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Postgres 12&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;17.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postgres 9.6&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;16.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postgres 11&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;15.27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MySQL 5.7&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;13.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MySQL 8.0&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postgres 10&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;9.36&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postgres 13&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;8.87&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This means that more about 69% of users decide to use Postgres as their meta-database.
MySQL is the choice of nearly 24% users. The other responses included some MySQL versions
like MariaDB or cloud hosted database like Cloud SQL (used by Google Composer) or AWS Aurora.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s good to know that users rather avoid using SQLite in production deployments!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What executor type do you use? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;What_executor_type_do_you_use.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;2020&lt;/th&gt;
&lt;th&gt;2019&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Celery&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;49.26%&lt;/td&gt;
&lt;td&gt;44.81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;23.65%&lt;/td&gt;
&lt;td&gt;16.88%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;19.7%&lt;/td&gt;
&lt;td&gt;27.60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sequential&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4.93%&lt;/td&gt;
&lt;td&gt;7.14%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2.46%&lt;/td&gt;
&lt;td&gt;3.57&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In comparison to previous year it seems that more users use currently Celery and
Kubernetes executors and LocalExecutor usage dropped by nearly 8 points. This may
suggest that users&#39; deployments are growing, and they need more scalable solutions.&lt;/p&gt;
&lt;p&gt;Among CeleryExecutor users 78% use Redis as a broker, 19% use RabbitMQ and the rest
is using other brokers or is not sure what is used in their deployments.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What metrics do you use to monitor Airflow? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;I do not use monitoring&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;32.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External monitoring service&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;29.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Information from metadatabase&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;25.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Statsd&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;15.27&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The other responses included mostly information about tools used by users
including DataDog and Prometheus exporter.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How do you deploy Airflow? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;On virtual machines (for example using AWS EC2)&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;31.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using a managed service like Astronomer, Google Composer or AWS MWAA&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;17.24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On Kubernetes (using custom deployments)&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;14.29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On premises&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;13.79&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On Kubernetes (using another helm chart)&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On Kubernetes (using Apache Airflow&amp;rsquo;s helm chart)&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;8.37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.91&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Nearly 33% of users deploys Airflow using some kind of Kubernetes deployment. This is about
10 percent more than in 2019. There&amp;rsquo;s slightly increase in usage of Airflow via
managed services (14.61% in 2019).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you use containerisation for deployment? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Yes, using helm chart / kubernetes&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;28.57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, I don’t use containerisation&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;28.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, single docker image&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, using docker compose&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;19.21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Among users who do not use Kubernetes based deployments 58% of them use containerisation. About
42% of those users use docker-compose for deployments.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How do you distribute your DAGs? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Using a synchronizing process (Git sync, GCS fuse, etc)&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;38.92&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bake them into the docker image&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;27.59&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared files system&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;16.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t know&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The most popular way of distributing DAGs seems to be using a synchronizing process. About
40% of users use this process together with Kubernetes deployments.&lt;/p&gt;
&lt;h2 id=&#34;future-of-airflow&#34;&gt;Future of Airflow&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;In your opinion, what could be improved in Airflow? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Web UI&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;49.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Examples, how-to, onboarding documentation&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;44.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging, monitoring and alerting&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;44.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical documentation&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;44.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduler performance&lt;/td&gt;
&lt;td&gt;83&lt;/td&gt;
&lt;td&gt;40.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG authoring&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;31.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authentication and authorization&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;28.57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST API&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;25.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;21.67&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;20.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External integration e.g. AWS, GCP, Apache products&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;17.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;13.79&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9.85&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Everything work fine for me&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t know&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1.97&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Which features would most interest you? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DAG versioning&lt;/td&gt;
&lt;td&gt;109&lt;/td&gt;
&lt;td&gt;53.69&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Builtin statistics&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;td&gt;34.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Improved data lineage&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;32.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling at the start of the interval&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;31.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stateless workers&lt;/td&gt;
&lt;td&gt;59&lt;/td&gt;
&lt;td&gt;29.06&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;More option to configure schedules (time units, increments)&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;28.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-tenant deployment&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;24.14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG fetcher (AIP-5)&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;19.21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generic transfer operator&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;16.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;16.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I have everything I need&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5.42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;5.42&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Will you consider migrating to Airflow 2.0? (single choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Yes, as soon as possible&lt;/td&gt;
&lt;td&gt;81&lt;/td&gt;
&lt;td&gt;39.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, once it’s mature (for example after 2.1)&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;35.47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I am already using Airflow 2.0+&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;19.21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don&amp;rsquo;t know yet&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;3.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No, I do not plan to migrate&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1.48&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;What are the features of Airflow 2.0 you are most excited about? (multiple choice)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;General performance improvements&lt;/td&gt;
&lt;td&gt;133&lt;/td&gt;
&lt;td&gt;65.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refreshed WebUI&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;50.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduler HA&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;td&gt;48.77&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Official docker image&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;41.38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@task decorator&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;27.59&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Official helm chart&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;25.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Providers packages&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;20.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configurable XCom backends&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;16.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CeleryKubernetesExecutor&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;15.27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;5.91&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;From an open-source point of view, it is good to see that many people would love to contribute to Apache Airflow.
This means that there are resources that if unleashed may make our community even stronger. From a product perspective, it is important to know that users are usually using the latest versions of our software and
are willing to upgrade to new ones.&lt;/p&gt;
&lt;p&gt;Finally, there are still some things to improve - documentation, onboarding guides and plug-and-play airflow
deployments. However, we hope that with the increase of adoption there will be an increase in people willing
to share their experience and tools.&lt;/p&gt;
&lt;h2 id=&#34;data&#34;&gt;Data&lt;/h2&gt;
&lt;p&gt;If you think I missed something or you simply want to look for insights on your own, the data is available for you here: (Airflow User Survey 2020.csv)[/data/survey-responses/airflow-user-survey-responses-2020.csv.zip]&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 2.0 is here!</title>
<link>/blog/airflow-two-point-oh-is-here/</link>
<pubDate>Thu, 17 Dec 2020 00:00:00 +0000</pubDate>
<guid>/blog/airflow-two-point-oh-is-here/</guid>
<description>
&lt;p&gt;I am proud to announce that Apache Airflow 2.0.0 has been released.&lt;/p&gt;
&lt;p&gt;The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now I&amp;rsquo;ll simply share some of the major features in 2.0.0 compared to 1.10.14:&lt;/p&gt;
&lt;h2 id=&#34;a-new-way-of-writing-dags-the-taskflow-api-aip-31&#34;&gt;A new way of writing dags: the TaskFlow API (AIP-31)&lt;/h2&gt;
&lt;p&gt;(Known in 2.0.0alphas as Functional DAGs.)&lt;/p&gt;
&lt;p&gt;DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use&lt;/p&gt;
&lt;p&gt;Read more here:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html&#34;&gt;TaskFlow API Tutorial&lt;/a&gt; &lt;br&gt;
&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows&#34;&gt;TaskFlow API Documentation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A quick teaser of what DAGs can now look like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow.decorators&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;task&lt;/span&gt;
&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;airflow.utils.dates&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;days_ago&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@dag&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;default_args&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;owner&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;airflow&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;},&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;schedule_interval&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;start_date&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;days_ago&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;tutorial_taskflow_api_etl&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;extract&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;1001&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;301.27&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;1002&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;433.21&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;1003&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;502.22&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;transform&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;order_data_dict&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;dict&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;dict&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;total_order_value&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;value&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;order_data_dict&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;values&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;total_order_value&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;value&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;total_order_value&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;total_order_value&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;nd&#34;&gt;@task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;total_order_value&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Total order value is: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;%.2f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;%&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;total_order_value&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;order_data&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;extract&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;order_summary&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;transform&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;order_data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;order_summary&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;total_order_value&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;tutorial_etl_dag&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;tutorial_taskflow_api_etl&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;fully-specified-rest-api-aip-32&#34;&gt;Fully specified REST API (AIP-32)&lt;/h2&gt;
&lt;p&gt;We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification&lt;/p&gt;
&lt;p&gt;Read more here:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html&#34;&gt;REST API Documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;massive-scheduler-performance-improvements&#34;&gt;Massive Scheduler performance improvements&lt;/h2&gt;
&lt;p&gt;As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.&lt;/p&gt;
&lt;p&gt;Over at Astronomer.io we&amp;rsquo;ve &lt;a href=&#34;https://www.astronomer.io/blog/airflow-2-scheduler&#34;&gt;benchmarked the scheduler—it&amp;rsquo;s fast&lt;/a&gt; (we had to triple check the numbers as we don&amp;rsquo;t quite believe them at first!)&lt;/p&gt;
&lt;h2 id=&#34;scheduler-is-now-ha-compatible-aip-15&#34;&gt;Scheduler is now HA compatible (AIP-15)&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.&lt;/p&gt;
&lt;p&gt;To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB won&amp;rsquo;t work with more than one scheduler I&amp;rsquo;m afraid).&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s no config or other set up required to run more than one scheduler—just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.&lt;/p&gt;
&lt;p&gt;For more information, read the &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler&#34;&gt;Scheduler HA documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;task-groups-aip-34&#34;&gt;Task Groups (AIP-34)&lt;/h2&gt;
&lt;p&gt;SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarily that they only executed a single task in parallel!) To improve this experience, we’ve introduced &amp;ldquo;Task Groups&amp;rdquo;: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.&lt;/p&gt;
&lt;p&gt;SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isn&amp;rsquo;t the case, please let us know by opening an issue on GitHub&lt;/p&gt;
&lt;p&gt;For more information, check out the &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup&#34;&gt;Task Group documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;refreshed-ui&#34;&gt;Refreshed UI&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve given the Airflow UI &lt;a href=&#34;https://github.com/apache/airflow/pull/11195&#34;&gt;a visual refresh&lt;/a&gt; and updated some of the styling.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;airflow-2.0-ui.gif&#34; alt=&#34;Airflow 2.0&amp;rsquo;s new UI&#34;&gt;&lt;/p&gt;
&lt;p&gt;We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).&lt;/p&gt;
&lt;p&gt;Check out &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow/stable/ui.html&#34;&gt;the screenshots in the docs&lt;/a&gt; for more.&lt;/p&gt;
&lt;h2 id=&#34;smart-sensors-for-reduced-load-from-sensors-aip-17&#34;&gt;Smart Sensors for reduced load from sensors (AIP-17)&lt;/h2&gt;
&lt;p&gt;If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with &amp;ldquo;reschedule&amp;rdquo; mode. To improve this, we&amp;rsquo;ve added a new mode called &amp;ldquo;Smart Sensors&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;This feature is in &amp;ldquo;early-access&amp;rdquo;: it&amp;rsquo;s been well-tested by Airbnb and is &amp;ldquo;stable&amp;rdquo;/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. We&amp;rsquo;ll try very hard not to!)&lt;/p&gt;
&lt;p&gt;Read more about it in the &lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html&#34;&gt;Smart Sensors documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;simplified-kubernetesexecutor&#34;&gt;Simplified KubernetesExecutor&lt;/h2&gt;
&lt;p&gt;For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml &lt;code&gt;pod_template_file&lt;/code&gt; instead of specifying parameters in their airflow.cfg.&lt;/p&gt;
&lt;p&gt;We have also replaced the &lt;code&gt;executor_config&lt;/code&gt; dictionary with the &lt;code&gt;pod_override&lt;/code&gt; parameter, which takes a Kubernetes V1Pod object for a1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.&lt;/p&gt;
&lt;p&gt;Read more here:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file&#34;&gt;Docs on pod_template_file&lt;/a&gt; &lt;br&gt;
&lt;a href=&#34;https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override&#34;&gt;Docs on pod_override&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;airflow-core-and-providers-splitting-airflow-into-60-packages&#34;&gt;Airflow core and providers: Splitting Airflow into 60+ packages:&lt;/h2&gt;
&lt;p&gt;Airflow 2.0 is not a monolithic &amp;ldquo;one to rule them all&amp;rdquo; package. We’ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from &amp;ldquo;building&amp;rdquo; blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.&lt;/p&gt;
&lt;p&gt;The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.&lt;/p&gt;
&lt;p&gt;But that’s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.&lt;/p&gt;
&lt;p&gt;Our very own Jarek Potiuk has written about &lt;a href=&#34;https://higrys.medium.com/airflow-2-0-providers-1bd21ba3bd93&#34;&gt;providers in much more detail&lt;/a&gt; on Jarek&amp;rsquo;s blog.&lt;/p&gt;
&lt;p&gt;Docs on the &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow-providers/&#34;&gt;providers concept and writing custom providers&lt;/a&gt; &lt;br&gt;
Docs on &lt;a href=&#34;http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html&#34;&gt;all providers packages available&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;security&#34;&gt;Security&lt;/h2&gt;
&lt;p&gt;As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.&lt;/p&gt;
&lt;h2 id=&#34;configuration&#34;&gt;Configuration&lt;/h2&gt;
&lt;p&gt;Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around &amp;ldquo;core&amp;rdquo;. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.&lt;/p&gt;
&lt;h2 id=&#34;thanks-to-all-of-you&#34;&gt;Thanks to all of you&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read through UPDATING.md to check what might affect you. For example: We have re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - you&amp;rsquo;ll just notice a lot of DeprecationWarnings that need to be fixed up.&lt;/p&gt;
&lt;p&gt;Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Journey with Airflow as an Outreachy Intern</title>
<link>/blog/experience-with-airflow-as-an-outreachy-intern/</link>
<pubDate>Sun, 30 Aug 2020 00:00:00 +0000</pubDate>
<guid>/blog/experience-with-airflow-as-an-outreachy-intern/</guid>
<description>
&lt;p&gt;&lt;a href=&#34;https://www.outreachy.org/&#34;&gt;Outreachy&lt;/a&gt; is a program which organises three months paid internships with FOSS
projects for people who are typically underrepresented in those projects.&lt;/p&gt;
&lt;h3 id=&#34;contribution-period&#34;&gt;Contribution Period&lt;/h3&gt;
&lt;p&gt;The first thing I had to do was choose a project under an organisation. After going through all the projects
I chose “Extending the REST API of Apache Airflow”, because I had a good idea of what REST API(s) are, so I
thought it would be easier to get started with the contributions. The next step was to set up Airflow’s dev
environment which thanks to &lt;a href=&#34;https://github.com/apache/airflow/blob/master/BREEZE.rst&#34;&gt;Breeze&lt;/a&gt;, was a breeze.
Since I had never contributed to FOSS before so this part was overwhelming but there were plenty of issues
labelled “good first issues” with detailed descriptions and some even had code snippets so luckily that nudged
me in the right direction. These things about Airflow and the positive vibes from the community were the reasons
why I chose to stick with Airflow as my Outreachy project.&lt;/p&gt;
&lt;h3 id=&#34;internship-period&#34;&gt;Internship Period&lt;/h3&gt;
&lt;p&gt;My first PR was followed by many new experiences one of them being that I introduced a
&lt;a href=&#34;https://github.com/apache/airflow/pull/7680#issuecomment-619763051&#34;&gt;bug&lt;/a&gt; in it;).
But nonetheless it made me familiar with the feedback loop and the feedback on my subsequent
&lt;a href=&#34;https://github.com/apache/airflow/pulls?q=is%3Apr+author%3AOmairK+&#34;&gt;PRs&lt;/a&gt; was the focal point of the overall
learning experience I went through, which boosted my confidence to contribute more and move out of my comfort zone.
I wanted to learn more about the things that happen under the Airflow’s hood so I started filtering out recent PRs
dealing with different components and I would go through the code changes along with discussion that would help me
get a better understanding of the whole workflow. &lt;a href=&#34;https://lists.apache.org/list.html?dev@airflow.apache.org&#34;&gt;Airflow’s mailing list&lt;/a&gt;
was also a great source of knowledge.&lt;/p&gt;
&lt;p&gt;The API related PRs that I worked on helped me with some of the important concepts like:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9329&#34;&gt;Pool CRUD endpoints&lt;/a&gt; where pools limit the execution parallelism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9597&#34;&gt;Tasks&lt;/a&gt; determine the actual work that has to be carried out.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9473&#34;&gt;DAG&lt;/a&gt; which represents the structure for a collection
of tasks. It keeps track of tasks, their dependencies and the sequence in which they have to run.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9473&#34;&gt;Dag Runs&lt;/a&gt; that are the instantiation of DAG(s) in time.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Through actively and passively participating in discussions I learnt that even if there is a difference of opinion
one could always learn from the different approaches, and &lt;a href=&#34;https://github.com/apache/airflow/pull/8721&#34;&gt;this PR&lt;/a&gt; with
more than 300+ comments is the proof of it. I also started reviewing small PRs which gave me the amazing opportunity
to interact with new people. Throughout my internship I learnt a lot about different frameworks and technologies
but the biggest takeaway for me was that a code is read more often than it&amp;rsquo;s written, and I started writing code with
that in mind.&lt;/p&gt;
&lt;h3 id=&#34;wrapping-up&#34;&gt;Wrapping Up&lt;/h3&gt;
&lt;p&gt;So with my project of extending Airflow’s REST API as well as the Outreachy internship coming to an end I would like
to thank my mentors &lt;a href=&#34;https://github.com/potiuk&#34;&gt;Jarek Potiuk&lt;/a&gt;, &lt;a href=&#34;https://github.com/kaxil&#34;&gt;Kaxil Naik&lt;/a&gt; and
&lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil Breguła&lt;/a&gt; for the patience and the time they invested in mentoring me and
the Airflow community for making me feel so welcomed. I plan to stick around and contribute to give back to the
community that has been made my summer, one to remember.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 1.10.12</title>
<link>/blog/airflow-1.10.12/</link>
<pubDate>Tue, 25 Aug 2020 00:00:00 +0000</pubDate>
<guid>/blog/airflow-1.10.12/</guid>
<description>
&lt;p&gt;Airflow 1.10.12 contains 113 commits since 1.10.11 and includes 5 new features, 23 improvements, 23 bug fixes,
and several doc changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href=&#34;https://pypi.org/project/apache-airflow/1.10.12/&#34;&gt;https://pypi.org/project/apache-airflow/1.10.12/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href=&#34;https://airflow.apache.org/docs/1.10.12/&#34;&gt;https://airflow.apache.org/docs/1.10.12/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changelog&lt;/strong&gt;: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.12/changelog.html&#34;&gt;http://airflow.apache.org/docs/1.10.12/changelog.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Airflow 1.10.11 has breaking changes with respect to
KubernetesExecutor &amp;amp; KubernetesPodOperator so I recommend users to directly upgrade to Airflow 1.10.12 instead&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Some of the noteworthy new features (user-facing) are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/8560&#34;&gt;Allow defining custom XCom class&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9645&#34;&gt;Get Airflow configs with sensitive data from Secret Backends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/10282&#34;&gt;Add AirflowClusterPolicyViolation support to Airflow local settings&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;allow-defining-custom-xcom-class&#34;&gt;Allow defining Custom XCom class&lt;/h3&gt;
&lt;p&gt;Until Airflow 1.10.11, the XCom data was only stored in Airflow Metadatabase. From Airflow 1.10.12, users
would be able to define custom XCom classes. This will allow users to transfer larger data between tasks.
An example here would be to store XCom in S3 or GCS Bucket if the size of data that needs to be stored is larger
than &lt;code&gt;XCom.MAX_XCOM_SIZE&lt;/code&gt; (48 KB).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PR&lt;/strong&gt;: &lt;a href=&#34;https://github.com/apache/airflow/pull/8560&#34;&gt;https://github.com/apache/airflow/pull/8560&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;get-airflow-configs-with-sensitive-data-from-secret-backends&#34;&gt;Get Airflow configs with sensitive data from Secret Backends&lt;/h3&gt;
&lt;p&gt;Users would be able to get the following Airflow configs from Secrets Backend like Hashicorp Vault:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;sql_alchemy_conn&lt;/code&gt; in [core] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fernet_key&lt;/code&gt; in [core] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;broker_url&lt;/code&gt; in [celery] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;flower_basic_auth&lt;/code&gt; in [celery] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;result_backend&lt;/code&gt; in [celery] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;password&lt;/code&gt; in [atlas] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;smtp_password&lt;/code&gt; in [smtp] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bind_password&lt;/code&gt; in [ldap] section&lt;/li&gt;
&lt;li&gt;&lt;code&gt;git_password&lt;/code&gt; in [kubernetes] section&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Further improving Airflow&amp;rsquo;s Secret Management story, from Airflow 1.10.12, users don&amp;rsquo;t need to hardcode
the &lt;strong&gt;sensitive&lt;/strong&gt; config value in airflow.cfg nor then need to use an Environment variable to set this config.&lt;/p&gt;
&lt;p&gt;For example, the metadata database connection string can either be set in airflow.cfg like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-ini&#34; data-lang=&#34;ini&#34;&gt;&lt;span class=&#34;k&#34;&gt;[core]&lt;/span&gt;
&lt;span class=&#34;na&#34;&gt;sql_alchemy_conn_secret&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;sql_alchemy_conn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This will retrieve config option from the set Secret Backends.&lt;/p&gt;
&lt;p&gt;As you can see you just need to add a &lt;code&gt;_secret&lt;/code&gt; suffix at the end of the actual config option
and the value needs to be the &lt;strong&gt;key&lt;/strong&gt; which the Secrets backend will look for.&lt;/p&gt;
&lt;p&gt;Similarly, &lt;code&gt;_secret&lt;/code&gt; config options can also be set using a corresponding environment variable. For example:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;export AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET=sql_alchemy_conn
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;More details: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.12/howto/set-config.html&#34;&gt;http://airflow.apache.org/docs/1.10.12/howto/set-config.html&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;add-airflowclusterpolicyviolation-support-to-airflow_local_settingspy&#34;&gt;Add AirflowClusterPolicyViolation support to airflow_local_settings.py&lt;/h3&gt;
&lt;p&gt;Users can use Cluster Policies to apply cluster-wide checks on Airflow
tasks. You can raise &lt;a href=&#34;http://airflow.apache.org/docs/1.10.12/_api/airflow/exceptions/index.html#airflow.exceptions.AirflowClusterPolicyViolation&#34;&gt;AirflowClusterPolicyViolation&lt;/a&gt;
in a policy or task mutation hook to prevent a DAG from being
imported or prevent a task from being executed if the task is not compliant with
your check.&lt;/p&gt;
&lt;p&gt;These checks are intended to help teams using Airflow to protect against common
beginner errors that may get past a code reviewer, rather than as technical
security controls.&lt;/p&gt;
&lt;p&gt;For example, don&amp;rsquo;t run tasks without &lt;code&gt;airflow&lt;/code&gt; owners:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;task_must_have_owners&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;not&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;task&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;owner&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;or&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;task&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;owner&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;lower&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;conf&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;operators&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;default_owner&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;raise&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AirflowClusterPolicyViolation&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;s1&#34;&gt;&amp;#39;Task must have non-None non-default owner. Current value: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{}&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;format&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;owner&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;More details: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.12/concepts.html#cluster-policies-for-custom-task-checks&#34;&gt;http://airflow.apache.org/docs/1.10.12/concepts.html#cluster-policies-for-custom-task-checks&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;launch-pods-via-yaml-files-when-using-kubernetesexecutor-and-kubernetespodoperator&#34;&gt;Launch Pods via YAML files when using KubernetesExecutor and KubernetesPodOperator&lt;/h3&gt;
&lt;p&gt;As of 1.10.12, users can launch pods via YAML files instead of passing various configurations.&lt;/p&gt;
&lt;p&gt;To allow greater flexibility we have deprecated Airflow&amp;rsquo;s Pod class and instead now use classes and
objects from the official Kubernetes API. The POD class will still work but raise a deprecation
warning. This feature involved a pretty extensive rewrite of all of our pod creation code.&lt;/p&gt;
&lt;p&gt;Initially, we were going to hold off on these features until Airflow 2.0. However, we soon
realized that exposing these features in 1.10.x is crucial in preparing users for the 2.0 release to come.&lt;/p&gt;
&lt;p&gt;Details: &lt;a href=&#34;https://github.com/apache/airflow/pull/6230&#34;&gt;https://github.com/apache/airflow/pull/6230&lt;/a&gt; (&lt;a href=&#34;https://github.com/apache/airflow/commit/7aa0f472b57985a952a3e3d0a38f1b2535d93413&#34;&gt;Backport commit&lt;/a&gt;)&lt;/p&gt;
&lt;h2 id=&#34;updating-guide&#34;&gt;Updating Guide&lt;/h2&gt;
&lt;p&gt;If you are updating Apache Airflow from a previous version to &lt;code&gt;1.10.12&lt;/code&gt;, please take a note of the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run &lt;code&gt;airflow upgradedb&lt;/code&gt; after &lt;code&gt;pip install -U apache-airflow==1.10.12&lt;/code&gt; as &lt;code&gt;1.10.12&lt;/code&gt; contains 1 database migration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As of airflow 1.10.12, using the &lt;code&gt;airflow.contrib.kubernetes.Pod&lt;/code&gt; class in the &lt;code&gt;pod_mutation_hook&lt;/code&gt; is now
deprecated. Instead we recommend that users treat the pod parameter as a &lt;code&gt;kubernetes.client.models.V1Pod&lt;/code&gt; object.
This means that users now have access to the full Kubernetes API when modifying airflow pods for mutating POD.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Previously, when tasks skipped by SkipMixin (such as &lt;code&gt;BranchPythonOperator&lt;/code&gt;, &lt;code&gt;BaseBranchOperator&lt;/code&gt; and
&lt;code&gt;ShortCircuitOperator&lt;/code&gt;) are cleared, they execute. Since 1.10.12, when such skipped tasks are cleared,
they will be skipped again by the newly introduced &lt;code&gt;NotPreviouslySkippedDep&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;special-note&#34;&gt;Special Note&lt;/h2&gt;
&lt;h3 id=&#34;python-2&#34;&gt;Python 2&lt;/h3&gt;
&lt;p&gt;Python 2 has reached end of its life on Jan 2020. Airflow Master no longer supports Python 2.
Airflow 1.10.* would be the last series to support Python 2.&lt;/p&gt;
&lt;p&gt;We strongly recommend users to use Python &amp;gt;= 3.6&lt;/p&gt;
&lt;h3 id=&#34;use-airflow-rbac-ui&#34;&gt;Use Airflow RBAC UI&lt;/h3&gt;
&lt;p&gt;Airflow 1.10.12 ships with 2 UIs, the default is non-RBAC Flask-admin based UI and Flask-appbuilder based UI.&lt;/p&gt;
&lt;p&gt;The Flask-AppBuilder (FAB) based UI allows Role-based Access Control and has more advanced features compared to
the legacy Flask-admin based UI. This UI can be enabled by setting &lt;code&gt;rbac=True&lt;/code&gt; in &lt;code&gt;[webserver]&lt;/code&gt; section in
your &lt;code&gt;airflow.cfg&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Flask-admin based UI is deprecated and new features won&amp;rsquo;t be ported to it. This UI will still be the default
for 1.10.* series but would no longer be available from Airflow 2.0&lt;/p&gt;
&lt;h3 id=&#34;we-have-moved-to-github-issues&#34;&gt;We have moved to GitHub Issues&lt;/h3&gt;
&lt;p&gt;The Airflow Project has moved from &lt;a href=&#34;https://issues.apache.org/jira/projects/AIRFLOW/issues&#34;&gt;JIRA&lt;/a&gt; to
&lt;a href=&#34;https://github.com/apache/airflow/issues&#34;&gt;GitHub&lt;/a&gt; for tracking issues.&lt;/p&gt;
&lt;p&gt;So if you find any bugs in Airflow 1.10.12 please create a GitHub Issue for it.&lt;/p&gt;
&lt;h2 id=&#34;list-of-contributors&#34;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following people contributed to the 1.10.12 release. Thank you to all contributors!&lt;/p&gt;
&lt;p&gt;Alexander Sutcliffe, Andy, Aneesh Joseph, Ash Berlin-Taylor, Aviral Agrawal, BaoshanGu, Beni Ben zikry,
Daniel Imberman, Daniel Standish, Danylo Baibak, Ephraim Anierobi, Felix Uellendall, Greg Neiheisel,
Hartorn, Jacob Ferriero, Jannik F, Jarek Potiuk, Jinhui Zhang, Kamil Breguła, Kaxil Naik, Kurganov,
Luis Magana, Max Arrich, Pete DeJoy, Sumit Maheshwari, Tomek Urbaszek, Vicken Simonian, Vinnie Guimaraes,
William Tran, Xiaodong Deng, YI FU, Zikun Zhu, dewaldabrie, pulsar314, retornam, yuqian90&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow For Newcomers</title>
<link>/blog/apache-airflow-for-newcomers/</link>
<pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
<guid>/blog/apache-airflow-for-newcomers/</guid>
<description>
&lt;p&gt;Apache Airflow is a platform to programmatically author, schedule, and monitor workflows.
A workflow is a sequence of tasks that processes a set of data. You can think of workflow as the
path that describes how tasks go from being undone to done. Scheduling, on the other hand, is the
process of planning, controlling, and optimizing when a particular task should be done.&lt;/p&gt;
&lt;h3 id=&#34;authoring-workflow-in-apache-airflow&#34;&gt;Authoring Workflow in Apache Airflow.&lt;/h3&gt;
&lt;p&gt;Airflow makes it easy to author workflows using python scripts. A &lt;a href=&#34;https://en.wikipedia.org/wiki/Directed_acyclic_graph&#34;&gt;Directed Acyclic Graph&lt;/a&gt;
(DAG) represents a workflow in Airflow. It is a collection of tasks in a way that shows each task&amp;rsquo;s
relationships and dependencies. You can have as many DAGs as you want, and Airflow will execute
them according to the task&amp;rsquo;s relationships and dependencies. If task B depends on the successful
execution of another task A, it means Airflow will run task A and only run task B after task A.
This dependency is very easy to express in Airflow. For example, the above scenario is expressed as&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;n&#34;&gt;task_A&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;task_B&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Also equivalent to&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;n&#34;&gt;task_A&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;set_downstream&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task_B&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;img src=&#34;Simple_dag.png&#34; alt=&#34;Simple Dag&#34;&gt;&lt;/p&gt;
&lt;p&gt;That helps Airflow to know that it needs to execute task A before task B. Tasks can have far more complex
relationships to each other than expressed above and Airflow figures out how and when to execute the tasks following
their relationships and dependencies.
&lt;img src=&#34;semicomplex.png&#34; alt=&#34;Complex Dag&#34;&gt;&lt;/p&gt;
&lt;p&gt;Before we discuss the architecture of Airflow that makes scheduling, executing, and monitoring of
workflow an easy thing, let us discuss the &lt;a href=&#34;https://github.com/apache/airflow/blob/master/BREEZE.rst&#34;&gt;Breeze environment&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;breeze-environment&#34;&gt;Breeze Environment&lt;/h3&gt;
&lt;p&gt;The breeze environment is the development environment for Airflow where you can run tests, build images,
build documentations and so many other things. There are excellent
&lt;a href=&#34;https://github.com/apache/airflow/blob/master/BREEZE.rst&#34;&gt;documentation and video&lt;/a&gt; on Breeze environment.
Please check them out. You enter the Breeze environment by running the &lt;code&gt;./breeze&lt;/code&gt; script. You can run all
the commands mentioned here in the Breeze environment.&lt;/p&gt;
&lt;h3 id=&#34;scheduler&#34;&gt;Scheduler&lt;/h3&gt;
&lt;p&gt;The scheduler is the component that monitors DAGs and triggers those tasks whose dependencies have
been met. It watches over the DAG folder, checking the tasks in each DAG and triggers them once they
are ready. It accomplishes this by spawning a process that runs periodically(every minute or so)
reading the metadata database to check the status of each task and decides what needs to be done.
The metadata database is where the status of all tasks are recorded. The status can be one of running,
success, failed, etc.&lt;/p&gt;
&lt;p&gt;A task is said to be ready when its dependencies have been met. The dependencies include all the data
necessary for the task to be executed. It should be noted that the scheduler won&amp;rsquo;t trigger your tasks until
the period it covers has ended. If a task&amp;rsquo;s &lt;code&gt;schedule_interval&lt;/code&gt; is &lt;code&gt;@daily&lt;/code&gt;, the scheduler triggers the task
at the end of the day and not at the beginning. This is to ensure that the necessary data needed for the tasks
are ready. It is also possible to trigger tasks manually on the UI.&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&#34;https://github.com/apache/airflow/blob/master/BREEZE.rst&#34;&gt;Breeze environment&lt;/a&gt;, the scheduler is started by running the command &lt;code&gt;airflow scheduler&lt;/code&gt;. It uses
the configured production environment. The configuration can be specified in &lt;code&gt;airflow.cfg&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;executor&#34;&gt;Executor&lt;/h3&gt;
&lt;p&gt;Executors are responsible for running tasks. They work with the scheduler to get information about
what resources are needed to run a task as the task is queued.&lt;/p&gt;
&lt;p&gt;By default, Airflow uses the &lt;a href=&#34;https://airflow.apache.org/docs/stable/executor/sequential.html#sequential-executor&#34;&gt;SequentialExecutor&lt;/a&gt;.
However, this executor is limited and it is the only executor that can be used with SQLite.&lt;/p&gt;
&lt;p&gt;There are many other &lt;a href=&#34;https://airflow.apache.org/docs/stable/executor/index.html&#34;&gt;executors&lt;/a&gt;,
the difference is on the resources they have and how they choose to use the resources. The available executors
are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sequential Executor&lt;/li&gt;
&lt;li&gt;Debug Executor&lt;/li&gt;
&lt;li&gt;Local Executor&lt;/li&gt;
&lt;li&gt;Dask Executor&lt;/li&gt;
&lt;li&gt;Celery Executor&lt;/li&gt;
&lt;li&gt;Kubernetes Executor&lt;/li&gt;
&lt;li&gt;Scaling Out with Mesos (community contributed)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CeleryExecutor is a better executor compared to the SequentialExecutor. The CeleryExecutor uses several
workers to execute a job in a distributed way. If a worker node is ever down, the CeleryExecutor assign its
task to another worker node. This ensures high availability.&lt;/p&gt;
&lt;p&gt;The CeleryExecutor works closely with the scheduler which adds a message to the queue and the Celery broker
which delivers the message to a Celery worker to execute.
You can find more information about the CeleryExecutor and how to configure it at the
&lt;a href=&#34;https://airflow.apache.org/docs/stable/executor/celery.html#celery-executor&#34;&gt;documentation&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;webserver&#34;&gt;Webserver&lt;/h3&gt;
&lt;p&gt;The webserver is the web interface (UI) for Airflow. The UI is feature-rich. It makes it easy to
monitor and troubleshoot DAGs and Tasks.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;airflow-ui.png&#34; alt=&#34;airflow UI&#34;&gt;&lt;/p&gt;
&lt;p&gt;There are many actions you can perform on the UI. You can trigger a task, monitor the execution
including the duration of the task. The UI makes it possible to view the task&amp;rsquo;s dependencies in a
tree view and graph view. You can view task logs in the UI.&lt;/p&gt;
&lt;p&gt;The web UI is started with the command &lt;code&gt;airflow webserver&lt;/code&gt; in the breeze environment.&lt;/p&gt;
&lt;h3 id=&#34;backend&#34;&gt;Backend&lt;/h3&gt;
&lt;p&gt;By default, Airflow uses the SQLite backend for storing the configuration information, DAG states,
and much other useful information. This should not be used in production as SQLite can cause a data
loss.&lt;/p&gt;
&lt;p&gt;You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change to PostgreSQL or MySQL.&lt;/p&gt;
&lt;p&gt;The command &lt;code&gt;./breeze --backend mysql&lt;/code&gt; selects MySQL as the backend when starting the breeze environment.&lt;/p&gt;
&lt;h3 id=&#34;operators&#34;&gt;Operators&lt;/h3&gt;
&lt;p&gt;Operators determine what gets done by a task. Airflow has a lot of builtin Operators. Each operator
does a specific task. There&amp;rsquo;s a BashOperator that executes a bash command, the PythonOperator which
calls a python function, AwsBatchOperator which executes a job on AWS Batch and &lt;a href=&#34;https://airflow.apache.org/docs/stable/concepts.html#operators&#34;&gt;many more&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;sensors&#34;&gt;Sensors&lt;/h4&gt;
&lt;p&gt;Sensors can be described as special operators that are used to monitor a long-running task.
Just like Operators, there are many predefined sensors in Airflow. These includes&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AthenaSensor: Asks for the state of the Query until it reaches a failure state or success state.&lt;/li&gt;
&lt;li&gt;AzureCosmosDocumentSensor: Checks for the existence of a document which matches the given query in CosmosDB&lt;/li&gt;
&lt;li&gt;GoogleCloudStorageObjectSensor: Checks for the existence of a file in Google Cloud Storage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A list of most of the available sensors can be found in this &lt;a href=&#34;https://airflow.apache.org/docs/stable/_api/airflow/contrib/sensors/index.html?highlight=sensors#module-airflow.contrib.sensors&#34;&gt;module&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;contributing-to-airflow&#34;&gt;Contributing to Airflow&lt;/h3&gt;
&lt;p&gt;Airflow is an open source project, everyone is welcome to contribute. It is easy to get started thanks
to the excellent &lt;a href=&#34;https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst&#34;&gt;documentation on how to get started&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I joined the community about 12 weeks ago through the &lt;a href=&#34;https://www.outreachy.org/&#34;&gt;Outreachy Program&lt;/a&gt; and have
completed about &lt;a href=&#34;https://github.com/apache/airflow/pulls/ephraimbuddy&#34;&gt;40 PRs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It has been an amazing experience! Thanks to my mentors &lt;a href=&#34;https://github.com/potiuk&#34;&gt;Jarek&lt;/a&gt; and
&lt;a href=&#34;https://github.com/kaxil&#34;&gt;Kaxil&lt;/a&gt;, and the community members especially &lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil&lt;/a&gt;
and &lt;a href=&#34;https://github.com/turbaszek&#34;&gt;Tomek&lt;/a&gt; for all their support. I&amp;rsquo;m grateful!&lt;/p&gt;
&lt;p&gt;Thank you so much, &lt;a href=&#34;https://github.com/leahecole&#34;&gt;Leah E. Cole&lt;/a&gt;, for your wonderful reviews.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Implementing Stable API for Apache Airflow</title>
<link>/blog/implementing-stable-api-for-apache-airflow/</link>
<pubDate>Sun, 19 Jul 2020 00:00:00 +0000</pubDate>
<guid>/blog/implementing-stable-api-for-apache-airflow/</guid>
<description>
&lt;p&gt;My &lt;a href=&#34;https://outreachy.org&#34;&gt;Outreachy internship&lt;/a&gt; is coming to its ends which is also the best time to look back and
reflect on the progress so far.&lt;/p&gt;
&lt;p&gt;The goal of my project is to Extend and Improve the Apache Airflow REST API. In this post,
I will be sharing my progress so far.&lt;/p&gt;
&lt;p&gt;We started a bit late implementing the REST API because it took time for the OpenAPI 3.0
specification we were to use for the project to be merged. Thanks to &lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil&lt;/a&gt;,
who paved the way for us to start implementing the REST API endpoints. Below are the endpoints I
implemented and the challenges I encountered, including how I overcame them.&lt;/p&gt;
&lt;h3 id=&#34;implementing-the-read-only-connection-endpoints&#34;&gt;Implementing The Read-Only Connection Endpoints&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/apache/airflow/pull/9095&#34;&gt;read-only connection endpoints&lt;/a&gt; were the first endpoint I implemented. Looking back,
I can see how much I have improved.&lt;/p&gt;
&lt;p&gt;I started by implementing the database schema for the Connection table using &lt;a href=&#34;https://marshmallow.readthedocs.io/en/2.x-line/&#34;&gt;Marshmallow 2&lt;/a&gt;.
We had to use Marshmallow 2 because Flask-AppBuilder was still using it and Flask-AppBuilder
is deeply integrated to Apache Airflow. This meant I had to unlearn Marshmallow 3 that I had
been studying before this realization, but thankfully, &lt;a href=&#34;https://marshmallow.readthedocs.io/en/stable/index.html&#34;&gt;Marshmallow 3&lt;/a&gt; isn&amp;rsquo;t too
different, so I was able to start using Marshmallow 2 in no time.&lt;/p&gt;
&lt;p&gt;This first PR would have been more difficult than it was unless there had been any reference
endpoint to look at. &lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil&lt;/a&gt; implemented a &lt;a href=&#34;https://github.com/apache/airflow/pull/9045&#34;&gt;draft PR&lt;/a&gt; in which I took inspiration from.
Thanks to this, It was easy for me to write the unit tests. It was also in this endpoint that
I learned using &lt;a href=&#34;https://github.com/wolever/parameterized&#34;&gt;parameterized&lt;/a&gt; in unit tests :D.&lt;/p&gt;
&lt;h3 id=&#34;implementing-the-read-only-dagruns-endpoints&#34;&gt;Implementing The Read-Only DagRuns Endpoints&lt;/h3&gt;
&lt;p&gt;This &lt;a href=&#34;https://github.com/apache/airflow/pull/9153&#34;&gt;endpoint&lt;/a&gt; came with its many challenges, especially on filtering with &lt;code&gt;datetimes&lt;/code&gt;.
This was because the &lt;code&gt;connexion&lt;/code&gt; library we were using to build the REST API was not validating
date-time format in OpenAPI 3.0 specification, what I eventually found out, was intentional.
Connexion dropped &lt;code&gt;strict-rfc3339&lt;/code&gt; because of the later license which is not compatible with
Apache 2.0 license.&lt;/p&gt;
&lt;p&gt;I implemented a workaround on this, by defining a function called &lt;code&gt;conn_parse_datetime&lt;/code&gt; in the
API utils module. This was later refactored and thankfully, &lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil&lt;/a&gt;
implemented a decorator that allowed us to have cleaner code on the views while using this function.&lt;/p&gt;
&lt;p&gt;Then we tried using &lt;code&gt;rfc3339-validator&lt;/code&gt; whose license is compatible with Apache 2.0 licence but
later discarded this because with our custom date parser we were able to use duration and
not just date times.&lt;/p&gt;
&lt;h3 id=&#34;other-endpoints&#34;&gt;Other Endpoints&lt;/h3&gt;
&lt;p&gt;I implemented some different other endpoints. One peculiar issue I faced was because of Marshmallow 2
not giving error when extra fields are in the request body. I implemented a &lt;code&gt;validate_unknown&lt;/code&gt;
method on the schema to handle this. Thankfully, Flask-AppBuilder updated to using Marshmallow 3,
we quickly updated Flask-AppBuilder in Apache Airflow and started using Marshmallow 3 too.&lt;/p&gt;
&lt;p&gt;Here are some PRs I contributed that are related to the REST API:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9227&#34;&gt;Add event log endpoints&lt;/a&gt;
The event log would help users get information on operations performed at the UI&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9266&#34;&gt;Add CRUD endpoints for connection&lt;/a&gt;
This PR performs DELETE, PATCH and POST operations on &lt;code&gt;Connection&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9331&#34;&gt;Add log endpoint&lt;/a&gt;
This PR enables users to get Task Instances log entries&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9431&#34;&gt;Move limit &amp;amp; offset to kwargs in views plus work on a configurable maximum limit&lt;/a&gt;
This helped us in having a neat code on the views and added configurable maximum limit on query results.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9648&#34;&gt;Update FlaskAppBuilder to v3&lt;/a&gt;
This enabled Airflow to start using v3 of Flask App Builder and also made it possible for the API to use
a modern database serializer/deserializer&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/9771&#34;&gt;Add migration guide from the experimental REST API to the stable REST API&lt;/a&gt;
This would enable users to start using the stable REST API in less time.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;follow-ups&#34;&gt;Follow-Ups&lt;/h3&gt;
&lt;p&gt;There is still lots of works to be done on the REST API including writing helpful documentation.
I still follow up on these and hopefully, we will complete the REST API before the internship ends.&lt;/p&gt;
&lt;p&gt;I am very grateful to my mentors, &lt;a href=&#34;https://github.com/potiuk&#34;&gt;Jarek&lt;/a&gt; and &lt;a href=&#34;https://github.com/kaxil&#34;&gt;Kaxil&lt;/a&gt; for their
patience with me and for surviving my never-ending questions. &lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil&lt;/a&gt; and &lt;a href=&#34;https://github.com/turbaszek&#34;&gt;Tomek&lt;/a&gt;
have been very supportive and I appreciate them for their support and amazing code reviews.&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a href=&#34;https://github.com/leahecole&#34;&gt;Leah E. Cole&lt;/a&gt; and &lt;a href=&#34;https://github.com/mschickensoup&#34;&gt;Karolina Rosół&lt;/a&gt;, for their
wonderful reviews. I&amp;rsquo;m grateful.&lt;/p&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 1.10.10</title>
<link>/blog/airflow-1.10.10/</link>
<pubDate>Thu, 09 Apr 2020 00:00:00 +0000</pubDate>
<guid>/blog/airflow-1.10.10/</guid>
<description>
&lt;p&gt;Airflow 1.10.10 contains 199 commits since 1.10.9 and includes 11 new features, 43 improvements, 44 bug fixes, and several doc changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href=&#34;https://pypi.org/project/apache-airflow/1.10.10/&#34;&gt;https://pypi.org/project/apache-airflow/1.10.10/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href=&#34;https://airflow.apache.org/docs/1.10.10/&#34;&gt;https://airflow.apache.org/docs/1.10.10/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changelog&lt;/strong&gt;: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.10/changelog.html&#34;&gt;http://airflow.apache.org/docs/1.10.10/changelog.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some of the noteworthy new features (user-facing) are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/8046&#34;&gt;Allow user to chose timezone to use in the RBAC UI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/7832&#34;&gt;Add Production Docker image support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html&#34;&gt;Allow Retrieving Airflow Connections &amp;amp; Variables from various Secrets backend&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://airflow.apache.org/docs/1.10.10/dag-serialization.html&#34;&gt;Stateless Webserver using DAG Serialization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/7880&#34;&gt;Tasks with Dummy Operators are no longer sent to executor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/7312&#34;&gt;Allow passing DagRun conf when triggering dags via UI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;allow-user-to-chose-timezone-to-use-in-the-rbac-ui&#34;&gt;Allow user to chose timezone to use in the RBAC UI&lt;/h3&gt;
&lt;p&gt;By default the Web UI will show times in UTC. It is possible to change the timezone shown by using the menu in the top
right (click on the clock to activate it):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Screenshot&lt;/strong&gt;:
&lt;img src=&#34;rbac-ui-timezone.gif&#34; alt=&#34;Allow user to chose timezone to use in the RBAC UI&#34;&gt;&lt;/p&gt;
&lt;p&gt;Details: &lt;a href=&#34;https://airflow.apache.org/docs/1.10.10/timezone.html#web-ui&#34;&gt;https://airflow.apache.org/docs/1.10.10/timezone.html#web-ui&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This feature is only available for the RBAC UI (enabled using &lt;code&gt;rbac=True&lt;/code&gt; in &lt;code&gt;[webserver]&lt;/code&gt; section in your &lt;code&gt;airflow.cfg&lt;/code&gt;).&lt;/p&gt;
&lt;h3 id=&#34;add-production-docker-image-support&#34;&gt;Add Production Docker image support&lt;/h3&gt;
&lt;p&gt;There are brand new production images (alpha quality) available for Airflow 1.10.10. You can pull them from the
&lt;a href=&#34;https://hub.docker.com/r/apache/airflow&#34;&gt;Apache Airflow Dockerhub&lt;/a&gt; repository and start using it.&lt;/p&gt;
&lt;p&gt;More information about using production images can be found in &lt;a href=&#34;https://github.com/apache/airflow/blob/master/IMAGES.rst#using-the-images&#34;&gt;https://github.com/apache/airflow/blob/master/IMAGES.rst#using-the-images&lt;/a&gt;. Soon it will be updated with
information how to use images using official helm chart.&lt;/p&gt;
&lt;p&gt;To pull the images you can run one of the following commands:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker pull apache/airflow:1.10.10-python2.7&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docker pull apache/airflow:1.10.10-python3.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docker pull apache/airflow:1.10.10-python3.6&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docker pull apache/airflow:1.10.10-python3.7&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docker pull apache/airflow:1.10.10&lt;/code&gt; (uses Python 3.6)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;allow-retrieving-airflow-connections--variables-from-various-secrets-backend&#34;&gt;Allow Retrieving Airflow Connections &amp;amp; Variables from various Secrets backend&lt;/h3&gt;
&lt;p&gt;From Airflow 1.10.10, users would be able to get Airflow Variables from Environment Variables.&lt;/p&gt;
&lt;p&gt;Details: &lt;a href=&#34;https://airflow.apache.org/docs/1.10.10/concepts.html#storing-variables-in-environment-variables&#34;&gt;https://airflow.apache.org/docs/1.10.10/concepts.html#storing-variables-in-environment-variables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A new concept of Secrets Backend has been introduced to retrieve Airflow Connections and Variables.&lt;/p&gt;
&lt;p&gt;From Airflow 1.10.10, users can retrieve Connections &amp;amp; Variables using the same syntax (no DAG code change is required),
from a secret backend defined in &lt;code&gt;airflow.cfg&lt;/code&gt;. If no backend is defined, Airflow falls-back to Environment Variables
and then Metadata DB.&lt;/p&gt;
&lt;p&gt;Check &lt;a href=&#34;https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html#configuration&#34;&gt;https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html#configuration&lt;/a&gt; for details on how-to
configure Secrets backend.&lt;/p&gt;
&lt;p&gt;As of 1.10.10, Airflow supports the following Secret Backends:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hashicorp Vault&lt;/li&gt;
&lt;li&gt;GCP Secrets Manager&lt;/li&gt;
&lt;li&gt;AWS Parameters Store&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Details: &lt;a href=&#34;https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html&#34;&gt;https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Example configuration to use Hashicorp Vault as the backend:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-ini&#34; data-lang=&#34;ini&#34;&gt;&lt;span class=&#34;k&#34;&gt;[secrets]&lt;/span&gt;
&lt;span class=&#34;na&#34;&gt;backend&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;airflow.contrib.secrets.hashicorp_vault.VaultBackend&lt;/span&gt;
&lt;span class=&#34;na&#34;&gt;backend_kwargs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{&amp;#34;url&amp;#34;: &amp;#34;http://127.0.0.1:8200&amp;#34;, &amp;#34;connections_path&amp;#34;: &amp;#34;connections&amp;#34;, &amp;#34;variables_path&amp;#34;: &amp;#34;variables&amp;#34;, &amp;#34;mount_point&amp;#34;: &amp;#34;airflow&amp;#34;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;stateless-webserver-using-dag-serialization&#34;&gt;Stateless Webserver using DAG Serialization&lt;/h3&gt;
&lt;p&gt;The Webserver can now run without access to DAG Files when DAG Serialization is turned on.
The 2 limitations we had in 1.10.7-1.10.9 (
&lt;a href=&#34;https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations&#34;&gt;https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations&lt;/a&gt;)
have been resolved.&lt;/p&gt;
&lt;p&gt;The main advantage of this would be reduction in Webserver startup time for large number of DAGs.
Without DAG Serialization all the DAGs are loaded in the DagBag during the
Webserver startup.&lt;/p&gt;
&lt;p&gt;With DAG Serialization, an empty DagBag is created and
Dags are loaded from DB only when needed (i.e. when a particular DAG is
clicked on in the home page)&lt;/p&gt;
&lt;p&gt;Details: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.10/dag-serialization.html&#34;&gt;http://airflow.apache.org/docs/1.10.10/dag-serialization.html&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;tasks-using-dummy-operators-are-no-longer-sent-to-executor&#34;&gt;Tasks using Dummy Operators are no longer sent to executor&lt;/h3&gt;
&lt;p&gt;The Dummy operators does not actually do any work and are mostly used for organizing/grouping tasks along
with BranchPythonOperator.&lt;/p&gt;
&lt;p&gt;Previously, when using Kubernetes Executor, the executor would spin up a whole worker pod to execute a dummy task.
With Airflow 1.10.10 tasks using Dummy Operators would be scheduled &amp;amp; evaluated by the Scheduler but not sent to the
Executor. This should significantly improve execution time and resource usage.&lt;/p&gt;
&lt;h3 id=&#34;allow-passing-dagrun-conf-when-triggering-dags-via-ui&#34;&gt;Allow passing DagRun conf when triggering dags via UI&lt;/h3&gt;
&lt;p&gt;When triggering a DAG from the CLI or the REST API, it s possible to pass configuration for the DAG run as a JSON blob.&lt;/p&gt;
&lt;p&gt;From Airflow 1.10.10, when a user clicks on Trigger Dag button, a new screen confirming the trigger request, and allowing the user to pass a JSON configuration
blob would be show.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Screenshot&lt;/strong&gt;:
&lt;img src=&#34;trigger-dag-conf.png&#34; alt=&#34;Allow passing DagRun conf when triggering dags via UI&#34;&gt;&lt;/p&gt;
&lt;p&gt;Details: &lt;a href=&#34;https://github.com/apache/airflow/pull/7312&#34;&gt;https://github.com/apache/airflow/pull/7312&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;updating-guide&#34;&gt;Updating Guide&lt;/h2&gt;
&lt;p&gt;If you are updating Apache Airflow from a previous version to &lt;code&gt;1.10.10&lt;/code&gt;, please take a note of the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run &lt;code&gt;airflow upgradedb&lt;/code&gt; after &lt;code&gt;pip install -U apache-airflow==1.10.10&lt;/code&gt; as &lt;code&gt;1.10.10&lt;/code&gt; contains 3 database migrations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you have used &lt;code&gt;none_failed&lt;/code&gt; trigger rule in your DAG, change it to use the new &lt;code&gt;none_failed_or_skipped&lt;/code&gt; trigger rule.
As previously implemented, the actual behavior of &lt;code&gt;none_failed&lt;/code&gt; trigger rule would skip the current task if all parents of the task
had also skipped. This was not in-line with what was documented about that trigger rule. We have changed the implementation to match
the documentation, hence if you need the old behavior use &lt;code&gt;none_failed_or_skipped&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;More details in &lt;a href=&#34;https://github.com/apache/airflow/pull/7464&#34;&gt;https://github.com/apache/airflow/pull/7464&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting empty string to a Airflow Variable will now return an empty string, it previously returned &lt;code&gt;None&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt;&amp;gt; Variable.set(&#39;test_key&#39;, &#39;&#39;)
&amp;gt;&amp;gt; Variable.get(&#39;test_key&#39;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The above code returned &lt;code&gt;None&lt;/code&gt; previously, now it will return &amp;lsquo;&amp;rsquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When a task is marked as &lt;code&gt;success&lt;/code&gt; by a user in Airflow UI, function defined in &lt;code&gt;on_success_callback&lt;/code&gt; will be called.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;special-note--deprecations&#34;&gt;Special Note / Deprecations&lt;/h2&gt;
&lt;h3 id=&#34;python-2&#34;&gt;Python 2&lt;/h3&gt;
&lt;p&gt;Python 2 has reached end of its life on Jan 2020. Airflow Master no longer supports Python 2.
Airflow 1.10.* would be the last series to support Python 2.&lt;/p&gt;
&lt;p&gt;We strongly recommend users to use Python &amp;gt;= 3.6&lt;/p&gt;
&lt;h3 id=&#34;use-airflow-rbac-ui&#34;&gt;Use Airflow RBAC UI&lt;/h3&gt;
&lt;p&gt;Airflow 1.10.10 ships with 2 UIs, the default is non-RBAC Flask-admin based UI and Flask-appbuilder based UI.&lt;/p&gt;
&lt;p&gt;The Flask-AppBuilder (FAB) based UI allows Role-based Access Control and has more advanced features compared to
the legacy Flask-admin based UI. This UI can be enabled by setting &lt;code&gt;rbac=True&lt;/code&gt; in &lt;code&gt;[webserver]&lt;/code&gt; section in your &lt;code&gt;airflow.cfg&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Flask-admin based UI is deprecated and new features won&amp;rsquo;t be ported to it. This UI will still be the default
for 1.10.* series but would no longer be available from Airflow 2.0&lt;/p&gt;
&lt;h3 id=&#34;running-airflow-on-macos&#34;&gt;Running Airflow on MacOS&lt;/h3&gt;
&lt;p&gt;Run &lt;code&gt;export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES&lt;/code&gt; in your scheduler environmentIf you are running Airflow on MacOS
and get the following error in the Scheduler logs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;objc[1873]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[1873]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This error occurs because of added security to restrict multiprocessing &amp;amp; multithreading in Mac OS High Sierra and above.&lt;/p&gt;
&lt;h3 id=&#34;we-have-moved-to-github-issues&#34;&gt;We have moved to GitHub Issues&lt;/h3&gt;
&lt;p&gt;The Airflow Project has moved from &lt;a href=&#34;https://issues.apache.org/jira/projects/AIRFLOW/issues&#34;&gt;JIRA&lt;/a&gt; to
&lt;a href=&#34;https://github.com/apache/airflow/issues&#34;&gt;GitHub&lt;/a&gt; for tracking issues.&lt;/p&gt;
&lt;p&gt;So if you find any bugs in Airflow 1.10.10 please create a GitHub Issue for it.&lt;/p&gt;
&lt;h2 id=&#34;list-of-contributors&#34;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following people contributed to the 1.10.10 release. Thank you to all contributors!&lt;/p&gt;
&lt;p&gt;ANiteckiP, Alex Guziel, Alex Lue, Anita Fronczak, Ash Berlin-Taylor, Benji Visser, Bhavika Tekwani, Brad Dettmer, Chris McLennon, Cooper Gillan, Daniel Imberman, Daniel Standish, Felix Uellendall, Jarek Potiuk, Jiajie Zhong, Jithin Sukumar, Kamil Breguła, Kaxil Naik, Kengo Seki, Kris, Kumpan Anton, Lokesh Lal, Louis Guitton, Louis Simoneau, Luyao Yang, Noël Bardelot, Omair Khan, Philipp Großelfinger, Ping Zhang, RasPavel, Ray, Robin Edwards, Ry Walker, Saurabh, Sebastian Brandt, Tomek Kzukowski, Tomek Urbaszek, Van-Duyet Le, Xiaodong Deng, Xinbin Huang, Yu Qian, Zacharya, atrbgithub, cong-zhu, retornam&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Apache Airflow 1.10.8 &amp; 1.10.9</title>
<link>/blog/airflow-1.10.8-1.10.9/</link>
<pubDate>Sun, 23 Feb 2020 00:00:00 +0000</pubDate>
<guid>/blog/airflow-1.10.8-1.10.9/</guid>
<description>
&lt;p&gt;Airflow 1.10.8 contains 160 commits since 1.10.7 and includes 4 new features, 42 improvements, 36 bug fixes, and several doc changes.&lt;/p&gt;
&lt;p&gt;We released 1.10.9 on the same day as one of the Flask dependencies (Werkzeug) released 1.0 which broke Airflow 1.10.8.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Details&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href=&#34;https://pypi.org/project/apache-airflow/1.10.9/&#34;&gt;https://pypi.org/project/apache-airflow/1.10.9/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href=&#34;https://airflow.apache.org/docs/1.10.9/&#34;&gt;https://airflow.apache.org/docs/1.10.9/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changelog (1.10.8)&lt;/strong&gt;: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.8/changelog.html#airflow-1-10-8-2020-01-07&#34;&gt;http://airflow.apache.org/docs/1.10.8/changelog.html#airflow-1-10-8-2020-01-07&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changelog (1.10.9)&lt;/strong&gt;: &lt;a href=&#34;http://airflow.apache.org/docs/1.10.9/changelog.html#airflow-1-10-9-2020-02-10&#34;&gt;http://airflow.apache.org/docs/1.10.9/changelog.html#airflow-1-10-9-2020-02-10&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some of the noteworthy new features (user-facing) are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/6489&#34;&gt;Add tags to DAGs and use it for filtering in the UI (RBAC only)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://airflow.apache.org/docs/1.10.9/executor/debug.html&#34;&gt;New Executor: DebugExecutor for Local debugging from your IDE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/7281&#34;&gt;Allow passing conf in &amp;ldquo;Add DAG Run&amp;rdquo; (Triggered Dags) view&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/pull/7038&#34;&gt;Allow dags to run for future execution dates for manually triggered DAGs (only if &lt;code&gt;schedule_interval=None&lt;/code&gt;)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.apache.org/docs/1.10.9/configurations-ref.html&#34;&gt;Dedicated page in documentation for all configs in airflow.cfg&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;add-tags-to-dags-and-use-it-for-filtering-in-the-ui&#34;&gt;Add tags to DAGs and use it for filtering in the UI&lt;/h3&gt;
&lt;p&gt;In order to filter DAGs (e.g by team), you can add tags in each dag. The filter is saved in a cookie and can be reset by the reset button.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;p&gt;In your Dag file, pass a list of tags you want to add to DAG object:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;n&#34;&gt;dag&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;DAG&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;dag_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;example_dag_tag&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;schedule_interval&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0 0 * * *&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;tags&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;example&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Screenshot&lt;/strong&gt;:
&lt;img src=&#34;airflow-dag-tags.png&#34; alt=&#34;Add filter by DAG tags&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This feature is only available for the RBAC UI (enabled using &lt;code&gt;rbac=True&lt;/code&gt; in &lt;code&gt;[webserver]&lt;/code&gt; section in your &lt;code&gt;airflow.cfg&lt;/code&gt;).&lt;/p&gt;
&lt;h2 id=&#34;special-note--deprecations&#34;&gt;Special Note / Deprecations&lt;/h2&gt;
&lt;h3 id=&#34;python-2&#34;&gt;Python 2&lt;/h3&gt;
&lt;p&gt;Python 2 has reached end of its life on Jan 2020. Airflow Master no longer supports Python 2.
Airflow 1.10.* would be the last series to support Python 2.&lt;/p&gt;
&lt;p&gt;We strongly recommend users to use Python &amp;gt;= 3.6&lt;/p&gt;
&lt;h3 id=&#34;use-airflow-rbac-ui&#34;&gt;Use Airflow RBAC UI&lt;/h3&gt;
&lt;p&gt;Airflow 1.10.9 ships with 2 UIs, the default is non-RBAC Flask-admin based UI and Flask-appbuilder based UI.&lt;/p&gt;
&lt;p&gt;The Flask-AppBuilder (FAB) based UI is allows Role-based Access Control and has more advanced features compared to
the legacy Flask-admin based UI. This UI can be enabled by setting &lt;code&gt;rbac=True&lt;/code&gt; in &lt;code&gt;[webserver]&lt;/code&gt; section in your &lt;code&gt;airflow.cfg&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Flask-admin based UI is deprecated and new features won&amp;rsquo;t be ported to it. This UI will still be the default
for 1.10.* series but would no longer be available from Airflow 2.0&lt;/p&gt;
&lt;h2 id=&#34;list-of-contributors&#34;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following people contributed to the 1.10.8 and 1.10.9 release. Thank you to all contributors!&lt;/p&gt;
&lt;p&gt;Anita Fronczak, Ash Berlin-Taylor, BasPH, Bharat Kashyap, Bharath Palaksha, Bhavika Tekwani, Bjorn Olsen, Brian Phillips, Cooper Gillan, Daniel Cohen, Daniel Imberman, Daniel Standish, Gabriel Eckers, Hossein Torabi, Igor Khrol, Jacob, Jarek Potiuk, Jay, Jiajie Zhong, Jithin Sukumar, Kamil Breguła, Kaxil Naik, Kousuke Saruta, Mustafa Gök, Noël Bardelot, Oluwafemi Sule, Pete DeJoy, QP Hou, Qian Yu, Robin Edwards, Ry Walker, Steven van Rossum, Tomek Urbaszek, Xinbin Huang, Yuen-Kuei Hsueh, Yu Qian, Zacharya, ZxMYS, rconroy293, tooptoop4&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Experience in Google Season of Docs 2019 with Apache Airflow</title>
<link>/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/</link>
<pubDate>Fri, 20 Dec 2019 00:00:00 +0000</pubDate>
<guid>/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/</guid>
<description>
&lt;p&gt;I came across &lt;a href=&#34;https://developers.google.com/season-of-docs&#34;&gt;Google Season of Docs&lt;/a&gt; (GSoD) almost by accident, thanks to my extensive HackerNews and Twitter addiction. I was familiar with the Google Summer of Code but not with this program.
It turns out it was the inaugural phase. I read the details, and the process felt a lot like GSoC except that this was about documentation.&lt;/p&gt;
&lt;h2 id=&#34;about-me&#34;&gt;About Me&lt;/h2&gt;
&lt;p&gt;I have been writing tech articles on medium as well as my blog for the past 1.5 years. Blogging helps me test my understanding of the concepts as untangling the toughest of ideas in simple sentences requires a considerable time investment.&lt;/p&gt;
&lt;p&gt;Also, I have been working as a Software Developer for the past three years, which involves writing documentation for my projects as well. I completed my B.Tech from IIT Roorkee. During my stay in college, I applied for GSoC once but didn’t make it through in the final list of selected candidates.&lt;/p&gt;
&lt;p&gt;I saw GSoD as an excellent opportunity to improve my technical writing skills using feedback from the open-source community. I contributed some bug fixes and features to Apache Superset and Apache Druid, but this would be my first contribution as a technical writer.&lt;/p&gt;
&lt;h2 id=&#34;searching-for-the-organization&#34;&gt;Searching for the organization&lt;/h2&gt;
&lt;p&gt;About 40+ organizations were participating in the GSoD. However, there were two which came as the right choice for me in the first instant. The first one was &lt;a href=&#34;https://airflow.apache.org/&#34;&gt;Apache Airflow&lt;/a&gt; because I had already used Airflow extensively and also contributed some custom operators inside the forked version of my previous company.&lt;/p&gt;
&lt;p&gt;The second one was &lt;a href=&#34;http://cassandra.apache.org/&#34;&gt;Apache Cassandra&lt;/a&gt;, on which I also had worked extensively but hadn’t done any code or doc changes.&lt;/p&gt;
&lt;p&gt;Considering the total experience, I decided to go with the Airflow.&lt;/p&gt;
&lt;h2 id=&#34;project-selection&#34;&gt;Project selection&lt;/h2&gt;
&lt;p&gt;After selecting the org, the next step was to choose the project. Again, my previous experience played a role here, and I ended up picking the &lt;strong&gt;How to create a workflow&lt;/strong&gt; . The aim of the project was to write documentation which will help users in creating complex as well as custom DAGs.&lt;br&gt;
The final deliverables were a bit different, though. More on that later.&lt;/p&gt;
&lt;p&gt;After submitting my application, I got involved in my job until one day, I saw a mail from google confirming my selection as a Technical Writer for the project.&lt;/p&gt;
&lt;h2 id=&#34;community-bonding&#34;&gt;Community Bonding&lt;/h2&gt;
&lt;p&gt;Getting selected is just a beginning. I got the invite to the Airflow slack channel where most of the discussions happened.
My mentor was &lt;a href=&#34;https://github.com/ashb&#34;&gt;Ash-Berlin Taylor&lt;/a&gt; from Apache Airflow. I started talking to my mentor to get a general sense of what deliverables were expected. The deliverables were documented in &lt;a href=&#34;https://cwiki.apache.org/confluence/display/AIRFLOW/Season+of+Docs+2019&#34;&gt;confluence&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A page for how to create a DAG that also includes:
&lt;ul&gt;
&lt;li&gt;Revamping the page related to scheduling a DAG&lt;/li&gt;
&lt;li&gt;Adding tips for specific DAG conditions, such as rerunning a failed task&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A page for developing custom operators that includes:
&lt;ul&gt;
&lt;li&gt;Describing mechanisms that are important when creating an operator, such as template fields, UI color, hooks, connection, etc.&lt;/li&gt;
&lt;li&gt;Describing the responsibility between the operator and the hook&lt;/li&gt;
&lt;li&gt;Considerations for dealing with shared resources (such as connections and hooks)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A page that describes how to define the relationships between tasks. The page should include information about:
&lt;ul&gt;
&lt;li&gt;** &amp;gt;&amp;gt; &amp;lt;&amp;lt; **&lt;/li&gt;
&lt;li&gt;set upstream / set downstream&lt;/li&gt;
&lt;li&gt;helpers method ex. chain&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A page that describes the communication between tasks that also includes:
&lt;ul&gt;
&lt;li&gt;Revamping the page related to macros and XCOM&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My mentor set the expectation early on that the deliverables were sort of like guidelines and not strict rules.
If I wanted to, I could choose to work on something else related to the project also, which was not under deliverables.
After connecting with the mentor, I started engaging with the overall Airflow community. The people in the community were helpful, especially &lt;a href=&#34;https://github.com/mik-laj&#34;&gt;Kamil Bregula&lt;/a&gt;. Kamil helped me in getting started with the guidelines to follow while writing the documentation for Airflow.&lt;/p&gt;
&lt;h2 id=&#34;doc-development&#34;&gt;Doc Development&lt;/h2&gt;
&lt;p&gt;I picked DAG run as my first deliverable. I chose this topic as some parts of it were already documented but needed some additional text.
I splitter the existing Scheduling &amp;amp; Triggers page into two new pages.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Schedulers&lt;/li&gt;
&lt;li&gt;DAG Runs&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Most of the details unrelated to schedulers were moved to DAG runs page, and then missing points such as how to re-run a task or DAG were added.
Once I was satisfied with my version, I asked my mentor and Kamil to review it. For the first version, I shared the text in the Google docs file in which the reviewers added comments.
However, the document started getting messy, and it became difficult to track the changes. The time had come now to raise a proper Pull Request.&lt;/p&gt;
&lt;p&gt;This was the time when I faced my first challenge. The documentation of Apache Airflow is written using RST(reStructuredText) syntax, with which I was entirely unfamiliar. I had mostly worked in Markdown.
I spent the next couple of days understanding the syntax. Fortunately, it was quite easy to get acquainted.
I raised the &lt;a href=&#34;https://github.com/apache/airflow/pull/6295&#34;&gt;Pull Request&lt;/a&gt; and waited for the comments. Finally, after a few days when I saw the comments, they were mostly related to two things - grammar and formatting. There were also comments related to what I had missed or misinterpreted.&lt;/p&gt;
&lt;h3 id=&#34;using-correct-grammar&#34;&gt;Using correct grammar&lt;/h3&gt;
&lt;p&gt;After discussing with Kamil, I decided to follow &lt;a href=&#34;https://developers.google.com/style/&#34;&gt;Google’s Developer Documentation Guidelines&lt;/a&gt;. These guidelines contain almost everything you’ll need to consider while writing good documentation, such as always to use active voice.
Secondly, I installed the Grammarly app. After writing a doc, I used to put it in Grammarly to check for errors. Then I corrected the errors, made some more changes, and then again pushed it to Grammarly. This was an iterative process until I arrived with a version of the doc, which was grammatically correct but not seemed to have been written by an AI.&lt;/p&gt;
&lt;h3 id=&#34;formatting&#34;&gt;Formatting&lt;/h3&gt;
&lt;p&gt;Formatting involves writing notes and tips, marking the airflow components correctly in the text, and making sure a user who is skimming through the docs doesn’t miss the critical text.
This required a bit of trial and error. I studied the current pattern in Airflow docs and made changes, pushed commits, incorporated new review comments, and then so on.&lt;/p&gt;
&lt;p&gt;In the end, all the reviewers approved the PR, but it was not merged until two months later. This was because we doubted if some more pages, such as &lt;strong&gt;Concepts&lt;/strong&gt;, should also be split up, resulting in a better-structured document. In the end, we decided to delay it until we discussed it with the broader community.&lt;/p&gt;
&lt;p&gt;My &lt;a href=&#34;https://github.com/apache/airflow/pull/6348&#34;&gt;second PR&lt;/a&gt; was a completely new document. It was related to How to create your custom operator. For this, since now I was familiar with most of the syntax, I directly raised the PR without going via Google docs. I received a lot of comments again, but this time they were more related to what I had written rather than how I had written it.
e.g., Describing in detail how to use &lt;strong&gt;template fields&lt;/strong&gt; and clean up my code examples. The fewer grammatical &amp;amp; formatting error comments showed I had made progress.
The PR was accepted within two weeks and gave me a huge confidence boost.&lt;/p&gt;
&lt;p&gt;After my second PR, I was in a bit of a deadlock. My last remaining deliverable was related to &lt;strong&gt;Macros&lt;/strong&gt;, but the scope wasn’t clear. I talked to my mentor, and he told me he didn’t mind if I can go off-track to work on something else while the community figured out what changes were needed.
We discussed a lot of ideas. In the end, I decided to go with the Best Practices guide inspired by my mentors’ &lt;a href=&#34;https://drive.google.com/file/d/1E4zle8-fv5S1rrlcNUzjiEV19OMYvwoY/view?usp=sharing&#34;&gt;talk on Apache Airflow &lt;/a&gt;in a meetup. Having faced challenges while running Airflow in production myself, I was highly motivated to write something like this so that other developers don’t suffer.
The first draft was ready within two weeks. I called it &lt;strong&gt;Running Airflow in Production&lt;/strong&gt;. However, after adding a few more pieces to the document, I realized it was better to call it &lt;strong&gt;Best Practices&lt;/strong&gt; guide, which most of the open-source projects contained.&lt;/p&gt;
&lt;p&gt;People were enthusiastic about this &lt;a href=&#34;https://github.com/apache/airflow/pull/6515&#34;&gt;pull request&lt;/a&gt; since a lot of them faced the challenges described in the doc. I had hit the nail on the head. After some deliberation over the next 1-2 weeks, my PR got accepted.&lt;/p&gt;
&lt;p&gt;I then returned to my first PR and started making some changes related to the new review comments. After this, I discussed with my mentor about specific elements that were bugging him, such as getting people to understand how the schedule interval works in as few words as possible.
After a lot of trial and error, we arrived at a version with which both of us could make peace.&lt;/p&gt;
&lt;h2 id=&#34;final-evaluation&#34;&gt;Final Evaluation&lt;/h2&gt;
&lt;p&gt;On 12th September, I received mail from Google about the successful completion of the project. This meant my mentor liked my work. The Airflow community also appreciated the contributions.&lt;/p&gt;
&lt;p&gt;My documents were finally published on Airflow website -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.readthedocs.io/en/latest/dag-run.html&#34;&gt;DAG Runs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.readthedocs.io/en/latest/scheduler.html&#34;&gt;Scheduler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.readthedocs.io/en/latest/howto/custom-operator.html&#34;&gt;Creating a custom operator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://airflow.readthedocs.io/en/latest/best-practices.html&#34;&gt;Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also started getting invited in the PR reviews of other developers. I am looking forward to more contributions to the project in the coming year.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Airflow Survey 2019</title>
<link>/blog/airflow-survey/</link>
<pubDate>Wed, 11 Dec 2019 00:00:00 +0000</pubDate>
<guid>/blog/airflow-survey/</guid>
<description>
&lt;h1 id=&#34;apache-airflow-survey-2019&#34;&gt;Apache Airflow Survey 2019&lt;/h1&gt;
&lt;p&gt;Apache Airflow is &lt;a href=&#34;https://www.astronomer.io/blog/why-airflow/&#34;&gt;growing faster than ever&lt;/a&gt;.
Thus, receiving and adjusting to our users’ feedback is a must. We created
&lt;a href=&#34;https://forms.gle/XAzR1pQBZiftvPQM7&#34;&gt;survey&lt;/a&gt; and we got &lt;strong&gt;308&lt;/strong&gt; responses.
Let’s see who Airflow users are, how they play with it, and what they miss.&lt;/p&gt;
&lt;h1 id=&#34;overview-of-the-user&#34;&gt;Overview of the user&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;What best describes your current occupation?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data Engineer&lt;/td&gt;
&lt;td&gt;194&lt;/td&gt;
&lt;td&gt;62.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;11.04%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architect&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;7.47%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Scientist&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;6.17%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Analyst&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;4.22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevOps&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;4.22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IT Administrator&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.65%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Machine Learning Engineer&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.65%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manager&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.65%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operations&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.65%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chief Data Officer&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering Manager&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intern&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product owner&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quant&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;In your day to day job, what do you use Airflow for?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data processing (ETL)&lt;/td&gt;
&lt;td&gt;298&lt;/td&gt;
&lt;td&gt;96.75%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Artificial Intelligence and Machine Learning Pipelines&lt;/td&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;29.22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automating DevOps operations&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;20.78%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;According to the survey, most of the Airflow users are the “data” people. Moreover,
28.57% uses Airflow to both ETL and ML pipelines meaning that those two fields
are somehow connected. Only five respondents use Airflow for DevOps operations only,
That means that other 59 people who use Airflow for DevOps stuff use it also for
ETL / ML purposes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How many active DAGs do you have in your largest Airflow instance?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0-20&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;td&gt;37.34%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21-40&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;21.10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;41-60&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;14.29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;61-100&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;9.09%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101-200&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;9.09%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;201-300&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;2.27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;301-999&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;2.60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000+&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;4.22%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The majority of users do not exceed 100 active DAGs per Airflow instance. However,
as we can see there are users who exceed thousands of DAGs with a maximum number 5000.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is the maximum number of tasks that you have used in one DAG?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0-10&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;19.81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11-20&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;19.48%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21-30&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;10.06%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31-40&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;6.82%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;41-50&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;8.44%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51-100&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;11.69%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101-200&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;9.09%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;201-500&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;6.82%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;501+&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;11.54%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The given maximum number of tasks in a single DAG was 10 000 (!). The number of tasks
depends on the purposes of a DAG, so it’s rather hard to say if users have “simple”
or “complicated” workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When onboarding new members to Airflow, what is the biggest problem?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No guide on best practises on developing DAGs&lt;/td&gt;
&lt;td&gt;160&lt;/td&gt;
&lt;td&gt;51.95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small number of tutorials on different aspects of using Airflow&lt;/td&gt;
&lt;td&gt;57&lt;/td&gt;
&lt;td&gt;18.51%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation is not clear enough&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;13.64%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small number of blogs regarding Airflow&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;1.95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;13.96%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is an important result. Using Airflow is all about writing and scheduling DAGs.
No guide or any other complete resource on best practices for developing Dags is a big
problem. Diving deep in the “other” answers, we can find that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Airflow’s “magic” (scheduler, executors, schedule times) is hard to understand&lt;/li&gt;
&lt;li&gt;DAG testing is not easy to do and to explain&lt;/li&gt;
&lt;li&gt;Airflow UI needs some love.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How likely are you to recommend Apache Airflow?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Very Likely&lt;/td&gt;
&lt;td&gt;140&lt;/td&gt;
&lt;td&gt;45.45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Likely&lt;/td&gt;
&lt;td&gt;124&lt;/td&gt;
&lt;td&gt;40.26%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Neutral&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;10.71%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unlikely&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;2.60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very unlikely&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0.97%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This means that more than 85% of people who use Airflow like it. It seems Airflow does
its job nicely. However, we have to remember that this survey is likely biased - it’s
more likely that you respond to the survey if you like the tool you use. Should we
focus then on those 11 people who did not like Airflow? It’s a good question.&lt;/p&gt;
&lt;h2 id=&#34;airflow-usage&#34;&gt;Airflow usage&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Which interface(s) of Airflow do you use as part of your current role?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Original Airflow Graphical User Interface&lt;/td&gt;
&lt;td&gt;297&lt;/td&gt;
&lt;td&gt;96.43%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;td&gt;40.91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Original Airflow Graphical User Interface, CLI&lt;/td&gt;
&lt;td&gt;117&lt;/td&gt;
&lt;td&gt;37.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;19.48%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Original Airflow Graphical User Interface, CLI, API&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;10.39%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom (own created) Airflow Graphical User Interface&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;8.12%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It’s visible that usage of CLI goes in pair with using Airflow web UI. Our
survey included some UX related questions to allow us to understand how users
use Airflow webserver.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What do you use the Graphical User Interface for?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;plot1.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What do you use CLI for?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;plot2.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In Airflow, which UI view(s) are important for you?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;plot3.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;Here we see that the majority uses Web UI mostly for monitoring purposes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitoring DAGs&lt;/li&gt;
&lt;li&gt;Accessing logs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An interesting result is that many people seem not to use backfilling as
there’s no other way than to do it by CLI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What executor type do you use?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Celery&lt;/td&gt;
&lt;td&gt;138&lt;/td&gt;
&lt;td&gt;44.81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;27.60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;16.88%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sequential&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;7.14%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;3.57&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The other option mostly consisted of information that someone uses a few types or is
migrating from one executor to another. What can be observed is an increase in usage
of Local and Kubernetes executors when compared to results from an earlier &lt;a href=&#34;https://ash.berlintaylor.com/writings/2019/02/airflow-user-survey-2019/&#34;&gt;survey done
by Ash&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you use Kubernetes-based deployments for Airflow?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No - we do not plan to use Kubernetes near term&lt;/td&gt;
&lt;td&gt;88&lt;/td&gt;
&lt;td&gt;28.57%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes - setup on our own via Helm Chart or similar&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;21.10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Not yet - but we use Kubernetes in our organization and we could move&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;19.81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes - via managed service in the cloud (Composer / Astronomer etc.)&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;14.61%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Not yet - but we plan to deploy Kubernetes in our organization soon&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;td&gt;13.64%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;2.27%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The most interesting thing is that there’s nearly 30% of users who do not use Kubernetes,
and they are not going to move. This means we should keep other deployment options in
mind when working on Airflow 2.0. On the other hand, almost 70% of the users already
use Kubernetes, or it’s a viable option for them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you combine multiple DAGs?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No, I don&amp;rsquo;t combine multiple DAGs&lt;/td&gt;
&lt;td&gt;127&lt;/td&gt;
&lt;td&gt;41.23%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, through SubDAG&lt;/td&gt;
&lt;td&gt;73&lt;/td&gt;
&lt;td&gt;23.70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Yes, by triggering another DAG&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;23.38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;11.69%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In the other category, 9 people explicitly mentioned using &lt;code&gt;ExternalTaskSensor&lt;/code&gt;,
and I think it could be treated as running subDAGs by triggering other DAGs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do you use Airflow Plugins? If yes, what do you use it for?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Adding new operators/sensors and hooks&lt;/td&gt;
&lt;td&gt;187&lt;/td&gt;
&lt;td&gt;60.71%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don&amp;rsquo;t use Airflow plugins&lt;/td&gt;
&lt;td&gt;109&lt;/td&gt;
&lt;td&gt;35.39%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding AppBuilder views &amp;amp; menu items&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;10.06%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding new executor&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;5.84%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding OperatorExtraLinks&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;2.27%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The high percentage - 60% for “Adding new operators/sensors and hooks” is quite a
surprising result for some of us - especially that you do not actually need to use the
plugin mechanism to add any of those. Those are standard python objects, and you can
simply drop your hooks/operators/sensors code to &lt;code&gt;PYTHONPATH&lt;/code&gt; environment variable and
they will work. It seems that this may be a result of a lack of best practices guide.&lt;/p&gt;
&lt;p&gt;Plugins are more useful for adding views and menu items - yet only 10%.
OperatorExtraLinks are even more useful (though relatively new) feature, so it’s not
entirely surprising they are hardly used.&lt;/p&gt;
&lt;p&gt;It was also kind of surprising that someone at all uses plugins to use their own
executors. We considered removing that option recently - but now we have to rethink
our approach.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What metrics do you use to monitor Airflow?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There were a lot of different responses. Some use Prometheus and other services,
others do not use any monitoring. One of the interesting responses linked to this
solution for &lt;a href=&#34;https://github.com/mastak/airflow_operators_metrics&#34;&gt;airflow_operators_metrics&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;external-services&#34;&gt;External services&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;What external services do you use in your Airflow DAGs?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Web Services&lt;/td&gt;
&lt;td&gt;160&lt;/td&gt;
&lt;td&gt;51.95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal company systems&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;48.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hadoop / Spark / Flink / Other Apache software&lt;/td&gt;
&lt;td&gt;119&lt;/td&gt;
&lt;td&gt;38.64%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Cloud Platform / Google APIs&lt;/td&gt;
&lt;td&gt;112&lt;/td&gt;
&lt;td&gt;36.36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft Azure&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;9.09%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I do not use external services in my Airflow DAGs&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;5.84%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It’s not surprising that Amazon Web Services is leading the way as they are considered the most mature
cloud provider. Internal system and other Apache products on the next two positions are
quite understandable if we take into account that the majority uses Airflow for ETL processes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What external services do you use in your Airflow DAGs? (Mixed providers)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google Cloud Platform / Google APIs, Amazon Web Services&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;14.29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Web Services, Microsoft Azure&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;1.62%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Cloud Platform / Google APIs, Microsoft Azure&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This result is not surprising because companies usually prefer to stick with one cloud
provider.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How do you integrate with external services?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Using Bash / Python operator&lt;/td&gt;
&lt;td&gt;220&lt;/td&gt;
&lt;td&gt;71.43%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using existing, dedicated operators / hooks&lt;/td&gt;
&lt;td&gt;217&lt;/td&gt;
&lt;td&gt;70.45%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using own, custom operators / hooks&lt;/td&gt;
&lt;td&gt;216&lt;/td&gt;
&lt;td&gt;70.13%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We had some anecdotal evidence that people use more Python/Bash operators than the
dedicated ones - but it looks like all ways of using Airflow to connect to external
services are equally popular.&lt;/p&gt;
&lt;h2 id=&#34;what-can-be-improved&#34;&gt;What can be improved&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;In your opinion, what could be improved in Airflow?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scheduler performance&lt;/td&gt;
&lt;td&gt;189&lt;/td&gt;
&lt;td&gt;61.36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web UI&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;td&gt;58.44%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging, monitoring and alerting&lt;/td&gt;
&lt;td&gt;145&lt;/td&gt;
&lt;td&gt;47.08%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Examples, how-to, onboarding documentation&lt;/td&gt;
&lt;td&gt;143&lt;/td&gt;
&lt;td&gt;46.43%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical documentation&lt;/td&gt;
&lt;td&gt;137&lt;/td&gt;
&lt;td&gt;44.48%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;112&lt;/td&gt;
&lt;td&gt;36.36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST API&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;31.17%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authentication and authorization&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;td&gt;28.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External integration e.g. AWS, GCP, Apache product&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;td&gt;15.91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;13.31%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I don’t know&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;1.62%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The results are rather quite self-explaining. Improved performance of Airflow, better
UI, and more telemetry are desirable. But this should go in pair with improved
documentation and resources about using the Airflow, especially when we
take into account the problem of onboarding new users.&lt;/p&gt;
&lt;p&gt;Another interesting point from that question is that only 16% think that operators
should be extended and improved. This suggests that we should focus on improving
Airflow core instead of adding more and more integrations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What would be the most interesting feature for you?&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;No.&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Production-ready Airflow docker image&lt;/td&gt;
&lt;td&gt;175&lt;/td&gt;
&lt;td&gt;56.82%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Declarative way of writing DAGs / automated DAGs generation&lt;/td&gt;
&lt;td&gt;155&lt;/td&gt;
&lt;td&gt;50.32%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horizontal Autoscaling&lt;/td&gt;
&lt;td&gt;122&lt;/td&gt;
&lt;td&gt;39.61%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Asynchronous Operators&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;td&gt;31.49%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stateless web server&lt;/td&gt;
&lt;td&gt;81&lt;/td&gt;
&lt;td&gt;26.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knative Executor&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;15.58%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;I already have all I need&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;4.22%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Production Docker image wins, and it’s not a surprise. We all know that deploying
Airflow is not a plug and play process, and that’s why the official image is being
worked on by Jarek Potiuk. An unexpected result is that half of the users would like to
have a declarative way of creating DAGs. That seems to be something that is “against Airflow”
as we always emphasize the possibility of writing workflows in pure python. Stories
about DAG generators are not new and confirm that there’s a need for a way to
declare DAGs.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: New Airflow website</title>
<link>/blog/announcing-new-website/</link>
<pubDate>Wed, 11 Dec 2019 00:00:00 +0000</pubDate>
<guid>/blog/announcing-new-website/</guid>
<description>
&lt;p&gt;The brand &lt;a href=&#34;https://airflow.apache.org/&#34;&gt;new Airflow website&lt;/a&gt; has arrived! Those who have been following the process know that the journey to update &lt;a href=&#34;https://airflow.readthedocs.io/en/1.10.6/&#34;&gt;the old Airflow website&lt;/a&gt; started at the beginning of the year.
Thanks to sponsorship from the Cloud Composer team at Google that allowed us to
collaborate with &lt;a href=&#34;https://www.polidea.com/&#34;&gt;Polidea&lt;/a&gt; and with their design studio &lt;a href=&#34;https://utilodesign.com/&#34;&gt;Utilo&lt;/a&gt;, and deliver an awesome website.&lt;/p&gt;
&lt;p&gt;Documentation of open source projects is key to engaging new contributors in the maintenance,
development, and adoption of software. We want the Apache Airflow community to have
the best possible experience to contribute and use the project. We also took this opportunity to make the project
more accessible, and in doing so, increase its reach.&lt;/p&gt;
&lt;p&gt;In the past three and a half months, we have updated everything: created a more efficient landing page,
enhanced information architecture, and improved UX &amp;amp; UI. Most importantly, the website now has capabilities
to be translated into many languages. This is our effort to foster a more inclusive community around
Apache Airflow, and we look forward to seeing contributions in Spanish, Chinese, Russian, and other languages as well!&lt;/p&gt;
&lt;p&gt;We built our website on Docsy, a platform that is easy to use and contribute to. Follow
&lt;a href=&#34;https://github.com/apache/airflow-site/blob/master/README.md&#34;&gt;these steps&lt;/a&gt; to set up your environment and
to create your first pull request. You may also use
the new website for your own open source project as a template.
All of our &lt;a href=&#34;https://github.com/apache/airflow-site/tree/master&#34;&gt;code is open and hosted on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Share your questions, comments, and suggestions with us, to help us improve the website.
We hope that this new design makes finding documentation about Airflow easier,
and that its improved accessibility increases adoption and use of Apache Airflow around the world.&lt;/p&gt;
&lt;p&gt;Happy browsing!&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: ApacheCon Europe 2019 — Thoughts and Insights by Airflow Committers</title>
<link>/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/</link>
<pubDate>Fri, 22 Nov 2019 00:00:00 +0000</pubDate>
<guid>/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/</guid>
<description>
&lt;p&gt;Is it possible to create an organization that delivers tens of projects used by millions, nearly no one is paid for doing their job, and still, it has been fruitfully carrying on for more than 20 years? Apache Software Foundation proves it is possible. For the last two decades, ASF has been crafting a model called the Apache Way—a way of organizing and leading tech open source projects. Due to this approach, which is strongly based on the “community over code” motto, we can enjoy such awesome projects like Apache Spark, Flink, Beam, or Airflow (and many more).&lt;/p&gt;
&lt;p&gt;After this year’s ApacheCon, Polidea’s engineers talked with Committers of Apache projects, such as—Aizhamal Nurmamat kyzy, Felix Uellendall, and Fokko Driesprong—about insights to what makes the ASF such an amazing organization.&lt;/p&gt;
&lt;p&gt;You can read the &lt;a href=&#34;https://higrys.medium.com/apachecon-europe-2019-thoughts-and-insights-by-airflow-committers-9ff5f6938c99&#34;&gt;insights after the ApacheCon 2019&lt;/a&gt;.&lt;/p&gt;
</description>
</item>
<item>
<title>Blog: Documenting using local development environment</title>
<link>/blog/documenting-using-local-development-environments/</link>
<pubDate>Fri, 22 Nov 2019 00:00:00 +0000</pubDate>
<guid>/blog/documenting-using-local-development-environments/</guid>
<description>
&lt;h2 id=&#34;documenting-local-development-environment-of-apache-airflow&#34;&gt;Documenting local development environment of Apache Airflow&lt;/h2&gt;
&lt;p&gt;From Sept to November, 2019 I have been participating in a wonderful initiative, &lt;a href=&#34;https://developers.google.com/season-of-docs&#34;&gt;Google Season of Docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I had a pleasure to contribute to the Apache Airflow open source project as a technical writer.
My initial assignment was an extension to the GitHub-based Contribution guide.&lt;/p&gt;
&lt;p&gt;From the very first days I have been pretty closely involved into inter-project communications
via emails/slack and had regular 1:1s with my mentor, Jarek Potiuk.&lt;/p&gt;
&lt;p&gt;I got infected with Jarek’s enthusiasm to ease the on-boarding experience for
Airflow contributors. I do share this strategy and did my best to improve the structure,
language and DX. As a result, Jarek and I extended the current contributor’s docs and
ended up with the Contributing guide navigating the users through the project
infrastructure and providing a workflow example based on a real-life use case;
the Testing guide with an overview of a complex testing infrastructure for Apache Airflow;
and two guides dedicated to the Breeze dev environment and local virtual environment
(my initial assignment).&lt;/p&gt;
&lt;p&gt;I’m deeply grateful to my mentor and Airflow developers for their feedback,
patience and help while I was breaking through new challenges
(I’ve never worked on an open source project before),
and for their support of all my ideas! I think a key success factor for any contributor
is a responsive, supportive and motivated team, and I was lucky to join such
a team for 3 months.&lt;/p&gt;
&lt;p&gt;Documents I worked on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/blob/master/BREEZE.rst&#34;&gt;Breeze development environment documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/blob/master/LOCAL_VIRTUALENV.rst&#34;&gt;Local virtualenv environment documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst&#34;&gt;Contributing guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/apache/airflow/blob/master/TESTING.rst&#34;&gt;Testing guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>
<item>
<title>Blog: It&#39;s a &#34;Breeze&#34; to develop Apache Airflow</title>
<link>/blog/its-a-breeze-to-develop-apache-airflow/</link>
<pubDate>Fri, 22 Nov 2019 00:00:00 +0000</pubDate>
<guid>/blog/its-a-breeze-to-develop-apache-airflow/</guid>
<description>
&lt;h2 id=&#34;the-story-behind-the-airflow-breeze-tool&#34;&gt;The story behind the Airflow Breeze tool&lt;/h2&gt;
&lt;p&gt;Initially, we started contributing to this fantastic open-source project [Apache Airflow] with a team of three which then grew to five. When we kicked it off a year ago, I realized pretty soon where the biggest bottlenecks and areas for improvement in terms of productivity were. Even with the help of our client, who provided us with a “homegrown” development environment it took us literally days to set it up and learn some basics.&lt;/p&gt;
&lt;p&gt;That is how the journey to increased productivity in Apache Airflow began. The result? The Airflow Breeze open-source tool. Jarek Potiuk, an Airflow Committer, will tell you all about it.&lt;/p&gt;
&lt;p&gt;You can learn &lt;a href=&#34;https://higrys.medium.com/its-a-breeze-to-develop-apache-airflow-bf306d3e3505&#34;&gt;how and why it’s a &amp;ldquo;Breeze&amp;rdquo; to Develop Apache Airflow&lt;/a&gt;.&lt;/p&gt;
</description>
</item>
</channel>
</rss>