Add blog with Airflow Survey results (#219)
diff --git a/README.md b/README.md
index 9b01e17..7bb8845 100644
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@
 
 In order to build site, run script `<ROOT DIRECTORY>/site.sh build-site`.
 
-In order to preview landing pages, run script `<ROOT DIRECTORY>/site.sh preview`.
+In order to preview landing pages, run script `<ROOT DIRECTORY>/site.sh preview-landing-pages`.
 
 In order to work with documentation theme, please refer to
 [Sphinx Airflow theme's readme file](sphinx_airflow_theme/README.md).
diff --git a/landing-pages/create-index.js b/landing-pages/create-index.js
index 495b5fc..d3ff461 100644
--- a/landing-pages/create-index.js
+++ b/landing-pages/create-index.js
@@ -45,8 +45,14 @@
   const fileNames = await fs.readdir(postsDirectoryPath);
   const posts = await Promise.all(
     fileNames.map(async(fileName) => {
+      let filePath;
+      if ((await fs.stat(`${postsDirectoryPath}/${fileName}`)).isFile()) {
+        filePath = `${postsDirectoryPath}/${fileName}`;
+      } else {
+        filePath = `${postsDirectoryPath}/${fileName}/index.md`;
+      }
       const fileContent = await fs.readFile(
-        `${postsDirectoryPath}/${fileName}`,
+        filePath,
         "utf8"
       );
       const {content, data} = await parse(fileContent);
diff --git a/landing-pages/site/assets/scss/_markdown-content.scss b/landing-pages/site/assets/scss/_markdown-content.scss
index a07593b..8195243 100644
--- a/landing-pages/site/assets/scss/_markdown-content.scss
+++ b/landing-pages/site/assets/scss/_markdown-content.scss
@@ -33,4 +33,34 @@
   pre span {
     @extend .monotext--brownish-grey;
   }
+
+  img {
+    width: 100%;
+  }
+
+  table {
+    border-collapse: collapse;
+    width: 100%;
+  }
+
+ th {
+    background: #ccc;
+  }
+
+  th, td {
+    border: 1px solid #ccc;
+    padding: 8px;
+  }
+
+  tr:nth-child(even) {
+    background: #efefef;
+  }
+
+  tr:hover {
+    background: #d1d1d1;
+  }
+
+  li {
+    color: #707070;
+  }
 }
diff --git a/landing-pages/site/config.toml b/landing-pages/site/config.toml
index edbee9b..f17b38b 100644
--- a/landing-pages/site/config.toml
+++ b/landing-pages/site/config.toml
@@ -174,3 +174,4 @@
 
 [permalinks]
 tags = "/blog/tags/:slug/"
+posts = "/:year/:month/:title/"
diff --git a/landing-pages/site/content/en/blog/airflow-survey/index.md b/landing-pages/site/content/en/blog/airflow-survey/index.md
new file mode 100644
index 0000000..6211241
--- /dev/null
+++ b/landing-pages/site/content/en/blog/airflow-survey/index.md
@@ -0,0 +1,334 @@
+---
+title: "Airflow Survey 2019"
+linkTitle: "Airflow Survey 2019"
+author: "Tomek Urbaszek"
+twitter: "Nuclearriot"
+github: "nuclearpinguin"
+linkedin: "tomaszurbaszek"
+description: "Receiving and adjusting to our users’ feedback is a must. Let’s see who Airflow users are, how they play with it, and what they miss."
+tags: ["community", "survey", "users"]
+date: "2019-12-11"
+---
+# Apache Airflow Survey 2019
+
+Apache Airflow is [growing faster than ever](https://www.astronomer.io/blog/why-airflow/).
+Thus, receiving and adjusting to our users’ feedback is a must. We created
+[survey](https://forms.gle/XAzR1pQBZiftvPQM7) and we got **308** responses.
+Let’s see who Airflow users are, how they play with it, and what they miss.
+
+# Overview of the user
+
+**What best describes your current occupation?**
+
+|                         |No.|  %   |
+|-------------------------|---|------|
+|Data Engineer            |194|62.99%|
+|Developer                | 34|11.04%|
+|Architect                | 23|7.47% |
+|Data Scientist           | 19|6.17% |
+|Data Analyst             | 13|4.22% |
+|DevOps                   | 13|4.22% |
+|IT Administrator         |  2|0.65% |
+|Machine Learning Engineer|  2|0.65% |
+|Manager                  |  2|0.65% |
+|Operations               |  2|0.65% |
+|Chief Data Officer       |  1|0.32% |
+|Engineering Manager      |  1|0.32% |
+|Intern                   |  1|0.32% |
+|Product owner            |  1|0.32% |
+|Quant                    |  1|0.32% |
+
+
+**In your day to day job, what do you use Airflow for?**
+
+|                                                      |No.|  %   |
+|------------------------------------------------------|---|------|
+|Data processing (ETL)                                 |298|96.75%|
+|Artificial Intelligence and Machine Learning Pipelines| 90|29.22%|
+|Automating DevOps operations                          | 64|20.78%|
+
+According to the survey, most of the Airflow users are the “data” people. Moreover,
+28.57% uses Airflow to both ETL and ML pipelines meaning that those two fields
+are somehow connected. Only five respondents use Airflow for DevOps operations only,
+That means that other 59 people who use Airflow for DevOps stuff use it also for
+ETL / ML  purposes.
+
+**How many active DAGs do you have in your largest Airflow instance?**
+
+|       |No.|  %   |
+|-------|---|------|
+|0-20   |115|37.34%|
+|21-40  | 65|21.10%|
+|41-60  | 44|14.29%|
+|61-100 | 28|9.09% |
+|101-200| 28|9.09% |
+|201-300|  7|2.27% |
+|301-999|  8|2.60% |
+|1000+  | 13|4.22% |
+
+
+The majority of users do not exceed 100 active DAGs per Airflow instance. However,
+as we can see there are users who exceed thousands of DAGs with a maximum number 5000.
+
+**What is the maximum number of tasks that you have used in one DAG?**
+
+|       |No.|  %   |
+|-------|---|------|
+|0-10   | 61|19.81%|
+|11-20  | 60|19.48%|
+|21-30  | 31|10.06%|
+|31-40  | 21|6.82% |
+|41-50  | 26|8.44% |
+|51-100 | 36|11.69%|
+|101-200| 28|9.09% |
+|201-500| 21|6.82% |
+|501+   | 24|11.54%|
+
+
+The given maximum number of tasks in a single DAG was 10 000 (!). The number of tasks
+depends on the purposes of a DAG, so it’s rather hard to say if users have “simple”
+or “complicated” workflows.
+
+**When onboarding new members to Airflow, what is the biggest problem?**
+
+|                                                               |No.|  %   |
+|---------------------------------------------------------------|---|------|
+|No guide on best practises on developing DAGs                  |160|51.95%|
+|Small number of tutorials on different aspects of using Airflow| 57|18.51%|
+|Documentation is not clear enough                              | 42|13.64%|
+|Small number of blogs regarding Airflow                        |  6|1.95% |
+|Other                                                          | 43|13.96%|
+
+This is an important result. Using Airflow is all about writing and scheduling DAGs.
+No guide or any other complete resource on best practices for developing Dags is a big
+problem. Diving deep in the “other” answers, we can find that:
+
+- Airflow’s “magic” (scheduler, executors, schedule times) is hard to understand
+- DAG testing is not easy to do and to explain
+- Airflow UI needs some love.
+
+**How likely are you to recommend Apache Airflow?**
+
+|             |No.|  %   |
+|-------------|---|------|
+|Very Likely  |140|45.45%|
+|Likely       |124|40.26%|
+|Neutral      | 33|10.71%|
+|Unlikely     |  8|2.60% |
+|Very unlikely|  3|0.97% |
+
+This means that more than 85% of people who use Airflow like it. It seems Airflow does
+its job nicely. However, we have to remember that this survey is likely biased - it’s
+more likely that you respond to the survey if you like the tool you use. Should we
+focus then on those 11 people who did not like Airflow? It’s a good question.
+
+## Airflow usage
+
+**Which interface(s) of Airflow do you use as part of your current role?**
+
+|                                                     |No.|  %   |
+|-----------------------------------------------------|---|------|
+|Original Airflow Graphical User Interface            |297|96.43%|
+|CLI                                                  |126|40.91%|
+|Original Airflow Graphical User Interface, CLI       |117|37.99%|
+|API                                                  | 60|19.48%|
+|Original Airflow Graphical User Interface, CLI, API  | 32|10.39%|
+|Custom (own created) Airflow Graphical User Interface| 25|8.12% |
+
+It’s visible that usage of CLI goes in pair with using Airflow web UI. Our
+survey included some UX related questions to allow us to understand how users
+use Airflow webserver.
+
+**What do you use the Graphical User Interface for?**
+
+![](plot1.png)
+
+**What do you use CLI for?**
+
+![](plot2.png)
+
+**In Airflow, which UI view(s) are important for you?**
+
+![](plot3.png)
+
+Here we see that the majority uses Web UI mostly for monitoring purposes:
+
+- Monitoring DAGs
+- Accessing logs
+
+An interesting result is that many people seem not to use backfilling as
+there’s no other way than to do it by CLI.
+
+**What executor type do you use?**
+
+|          |No.|  %   |
+|----------|---|------|
+|Celery    |138|44.81%|
+|Local     | 85|27.60%|
+|Kubernetes| 52|16.88%|
+|Sequential| 22|7.14% |
+|Other     | 11|3.57  |
+
+The other option mostly consisted of information that someone uses a few types or is
+migrating from one executor to another. What can be observed is an increase in usage
+of Local and Kubernetes executors when compared to results from an earlier [survey done
+by Ash](https://ash.berlintaylor.com/writings/2019/02/airflow-user-survey-2019/).
+
+**Do you use Kubernetes-based deployments for Airflow?**
+
+|                                                                     |No.|  %   |
+|---------------------------------------------------------------------|---|------|
+|No - we do not plan to use Kubernetes near term                      | 88|28.57%|
+|Yes - setup on our own via Helm Chart or similar                     | 65|21.10%|
+|Not yet - but we use Kubernetes in our organization and we could move| 61|19.81%|
+|Yes - via managed service in the cloud (Composer / Astronomer etc.)  | 45|14.61%|
+|Not yet - but we plan to deploy Kubernetes in our organization soon  | 42|13.64%|
+|Other                                                                |  7|2.27% |
+
+The most interesting thing is that there’s nearly 30% of users who do not use Kubernetes,
+and they are not going to move. This means we should keep other deployment options in
+mind when working on Airflow 2.0. On the other hand, almost 70% of the users already
+use Kubernetes, or it’s a viable option for them.
+
+**Do you combine multiple DAGs?**
+
+|                                 |No.|  %   |
+|---------------------------------|---|------|
+|No, I don't combine multiple DAGs|127|41.23%|
+|Yes, through SubDAG              | 73|23.70%|
+|Yes, by triggering another DAG   | 72|23.38%|
+|Other                            | 36|11.69%|
+
+In the other category, 9 people explicitly mentioned using `ExternalTaskSensor`,
+and I think it could be treated as running subDAGs by triggering other DAGs.
+
+**Do you use Airflow Plugins? If yes, what do you use it for?**
+
+|                                      |No.|  %   |
+|--------------------------------------|---|------|
+|Adding new operators/sensors and hooks|187|60.71%|
+|I don't use Airflow plugins           |109|35.39%|
+|Adding AppBuilder views & menu items  | 31|10.06%|
+|Adding new executor                   | 18|5.84% |
+|Adding OperatorExtraLinks             |  7|2.27% |
+
+The high percentage - 60%  for “Adding new operators/sensors and hooks” is quite a
+surprising result for some of us - especially that you do not actually need to use the
+plugin mechanism to add any of those. Those are standard python objects, and you can
+simply drop your hooks/operators/sensors code to `PYTHONPATH` environment variable and
+they will work. It seems that this may be a result of a lack of best practices guide.
+
+Plugins are more useful for adding views and menu items - yet only 10%.
+OperatorExtraLinks are even more useful (though relatively new) feature, so it’s not
+entirely surprising they are hardly used.
+
+It was also kind of surprising that someone at all uses plugins to use their own
+executors. We considered removing that option recently - but now we have to rethink
+our approach.
+
+**What metrics do you use to monitor Airflow?**
+
+There were a lot of different responses. Some use Prometheus and other services,
+others do not use any monitoring. One of the interesting responses linked to this
+solution for [airflow_operators_metrics](https://github.com/mastak/airflow_operators_metrics).
+
+## External services
+
+**What external services do you use in your Airflow DAGs?**
+
+|                                                 |No.|  %   |
+|-------------------------------------------------|---|------|
+|Amazon Web Services                              |160|51.95%|
+|Internal company systems                         |150|48.7% |
+|Hadoop / Spark / Flink / Other Apache software   |119|38.64%|
+|Google Cloud Platform / Google APIs              |112|36.36%|
+|Microsoft Azure                                  | 28|9.09% |
+|I do not use external services in my Airflow DAGs| 18|5.84% |
+
+
+It’s not surprising that Amazon Web Services is leading the way as they are considered the most mature
+cloud provider. Internal system and other Apache products on the next two positions are
+quite understandable if we take into account that the majority uses Airflow for ETL processes.
+
+**What external services do you use in your Airflow DAGs? (Mixed providers)**
+
+|                                                        |No.|  %   |
+|--------------------------------------------------------|---|------|
+|Google Cloud Platform / Google APIs, Amazon Web Services| 44|14.29%|
+|Amazon Web Services, Microsoft Azure                    |  5|1.62% |
+|Google Cloud Platform / Google APIs, Microsoft Azure    |  4|1.3%  |
+
+This result is not surprising because companies usually prefer to stick with one cloud
+provider.
+
+**How do you integrate with external services?**
+
+|                                           |No.|  %   |
+|-------------------------------------------|---|------|
+|Using Bash / Python operator               |220|71.43%|
+|Using existing, dedicated operators / hooks|217|70.45%|
+|Using own, custom operators / hooks        |216|70.13%|
+
+We had some anecdotal evidence that people use more Python/Bash operators than the
+dedicated ones - but it looks like all ways of using Airflow to connect to external
+services are equally popular.
+
+
+## What can be improved
+
+**In your opinion, what could be improved in Airflow?**
+
+|                                                  |No.|  %   |
+|--------------------------------------------------|---|------|
+|Scheduler performance                             |189|61.36%|
+|Web UI                                            |180|58.44%|
+|Logging, monitoring and alerting                  |145|47.08%|
+|Examples, how-to, onboarding documentation        |143|46.43%|
+|Technical documentation                           |137|44.48%|
+|Reliability                                       |112|36.36%|
+|REST API                                          | 96|31.17%|
+|Authentication and authorization                  | 89|28.9% |
+|External integration e.g. AWS, GCP, Apache product| 49|15.91%|
+|CLI                                               | 41|13.31%|
+|I don’t know                                      |  5|1.62% |
+
+The results are rather quite self-explaining. Improved performance of Airflow, better
+UI, and more telemetry are desirable. But this should go in pair with improved
+documentation and resources about using the Airflow, especially when we
+take into account the problem of onboarding new users.
+
+Another interesting point from that question is that only 16% think that operators
+should be extended and improved. This suggests that we should focus on improving
+Airflow core instead of adding more and more integrations.
+
+**What would be the most interesting feature for you?**
+
+|                                                           |No.|  %   |
+|-----------------------------------------------------------|---|------|
+|Production-ready Airflow docker image                      |175|56.82%|
+|Declarative way of writing DAGs / automated DAGs generation|155|50.32%|
+|Horizontal Autoscaling                                     |122|39.61%|
+|Asynchronous Operators                                     | 97|31.49%|
+|Stateless web server                                       | 81|26.3% |
+|Knative Executor                                           | 48|15.58%|
+|I already have all I need                                  | 13|4.22% |
+
+Production Docker image wins, and it’s not a surprise. We all know that deploying
+Airflow is not a plug and play process, and that’s why the official image is being
+worked on by Jarek Potiuk. An unexpected result is that half of the users would like to
+have a declarative way of creating DAGs. That seems to be something that is “against Airflow”
+as we always emphasize the possibility of writing workflows in pure python. Stories
+about DAG generators are not new and confirm that there’s a need for a way to
+declare DAGs.
+
+## Data
+
+If you think I missed something and you want to look for insights on your own the data is available
+for you here:
+
+ - Original data: https://storage.googleapis.com/airflow-survey/survey.csv
+ - Processed: https://storage.googleapis.com/airflow-survey/airflow_survey_processed.csv
+
+The processed data includes multi-choice options one-hot encoded. If you find any interesting
+insight, please update the article ([make PR](https://github.com/apache/airflow-site/blob/aip-11/CONTRIBUTE.md)
+to Airflow site).
diff --git a/landing-pages/site/content/en/blog/airflow-survey/plot1.png b/landing-pages/site/content/en/blog/airflow-survey/plot1.png
new file mode 100644
index 0000000..57be032
--- /dev/null
+++ b/landing-pages/site/content/en/blog/airflow-survey/plot1.png
Binary files differ
diff --git a/landing-pages/site/content/en/blog/airflow-survey/plot2.png b/landing-pages/site/content/en/blog/airflow-survey/plot2.png
new file mode 100644
index 0000000..f72112b
--- /dev/null
+++ b/landing-pages/site/content/en/blog/airflow-survey/plot2.png
Binary files differ
diff --git a/landing-pages/site/content/en/blog/airflow-survey/plot3.png b/landing-pages/site/content/en/blog/airflow-survey/plot3.png
new file mode 100644
index 0000000..63a20a3
--- /dev/null
+++ b/landing-pages/site/content/en/blog/airflow-survey/plot3.png
Binary files differ
diff --git a/site.sh b/site.sh
index 1df1a55..c717a05 100755
--- a/site.sh
+++ b/site.sh
@@ -32,19 +32,19 @@
 
 These are  ${0} commands used in various situations:
 
-    build-site          Prepare dist directory with landing pages and documentation
-    preview-site        Starts the web server with preview of the website
-    build-landing-pages Builds a landing pages
-    prepare-theme       Prepares and copies files needed for the proper functioning of the sphinx theme.
-    shell               Start shell
-    build-image         Build a Docker image with a environment
-    install-node-deps   Download all the Node dependencies
-    check-site-links    Checks if the links are correct in the website
-    lint-css            Lint CSS files
-    lint-js             Lint Javascript files
-    cleanup             Delete the virtual environment in Docker
-    stop                Stop the environment
-    help                Display usage
+    build-site            Prepare dist directory with landing pages and documentation
+    preview-landing-pages Starts the web server with preview of the website
+    build-landing-pages   Builds a landing pages
+    prepare-theme         Prepares and copies files needed for the proper functioning of the sphinx theme.
+    shell                 Start shell
+    build-image           Build a Docker image with a environment
+    install-node-deps     Download all the Node dependencies
+    check-site-links      Checks if the links are correct in the website
+    lint-css              Lint CSS files
+    lint-js               Lint Javascript files
+    cleanup               Delete the virtual environment in Docker
+    stop                  Stop the environment
+    help                  Display usage
 
 Unrecognized commands are run as programs in the container.
 
@@ -299,7 +299,7 @@
 # Check container commands
 if [[ "${CMD}" == "install-node-deps" ]] ; then
     run_command "/opt/site/landing-pages/" yarn install
-elif [[ "${CMD}" == "preview" ]]; then
+elif [[ "${CMD}" == "preview-landing-pages" ]]; then
     ensure_node_module_exists
     run_command "/opt/site/landing-pages/" npm run index
     prepare_docs_index