| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| <!-- START doctoc generated TOC please keep comment here to allow auto update --> |
| <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> |
| **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* |
| |
| - [2. Implement standalone python command](#2-implement-standalone-python-command) |
| - [Status](#status) |
| - [Context](#context) |
| - [Decision](#decision) |
| - [Consequences](#consequences) |
| |
| <!-- END doctoc generated TOC please keep comment here to allow auto update --> |
| |
| # 2. Implement standalone python command |
| |
| Date: 2021-11-28 |
| |
| ## Status |
| |
| Accepted |
| |
| ## Context |
| |
| The [Breeze](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst) is |
| a command line development environment for Apache Airflow that makes |
| it easy to setup Airflow development and test environment easily |
| (< 10 minutes is the goal) and enable contributors to run any subset |
| of tests that are executed in our CI environment easily. |
| |
| The environment has proven to be very useful (it has successfully onboarded |
| a number of new contributors, and it makes the development environment of |
| even seasoned contributors much easier as it provides a very easy |
| replication of the CI environment as well as very easy to setup test |
| environment that can be used to run: |
| |
| * Unit tests |
| * Integration tests |
| * Kubernetes/Helm tests |
| * System tests |
| |
| It also serves as a base for our CI execution environment. The same scripts and tools are used |
| in our CI (based on GitHub actions). A lot of common code and function between CI and Breeze are |
| shared between the CI and Breeze. All those tools are held in "ci" package. |
| |
| Unfortunately, Breeze is largely based on Bash code - for which very few people (except maybe the |
| Breeze creator - Jarek Potiuk, the author of this document) have any other feeling that uneasiness, |
| disgust and fear of it :). Since Airflow is largely based on Python, the common consensus is that |
| Breeze should be rewritten in Python. |
| |
| In November 2021, Outreachy sponsored two internship for two interns: @Bowrna and @edithturn were assigned to |
| the projects: |
| |
| * Convert Airflow Local Development environment `Breeze` - from Bash-based to Python-based |
| * Rewrite GitHub Action workflows to Python |
| |
| With @potiuk, @eladkal and @xurror as mentors. |
| |
| The long-standing issues about those two projects are (and we hope to close the projects during the |
| three months internship - December 2021 - March 2022): |
| |
| * https://github.com/apache/airflow/issues/12282 |
| * https://github.com/apache/airflow/issues/13182 |
| |
| There are a number of problems with Bash scripts: |
| |
| * They are difficult to understand, modify and debug as Bash "magic" is somewhat arcane |
| * They are difficult to implement complex logic with |
| * Navigating common code that is used from the scripts is cumbersome and lack IDE/tools support |
| * Default Bash on MacOS is very old (from 3.* line) and it will not be updated to a newer version |
| which impacts cross-platform Breeze applicability |
| * Bash only works well for Windows in WSL2 environment, which further undermines cross-platform |
| abilities of testing and running Airflow |
| |
| On the contrary Python - after dropping Python 2 end of life in January 2020, has become much more |
| appealing as a common scripting language that can be cross-platform and ubiquitous. |
| |
| This is the current state of lines of code in the project (generated by `sloccount`): |
| |
| ``` |
| SLOC Directory SLOC-by-Language (Sorted) |
| |
| 144905 tests python=144761,xml=132,sh=12 |
| 130115 airflow python=127249,javascript=2827,sh=39 |
| 12052 docs javascript=8977,python=2931,sh=144 |
| 9073 scripts sh=7457,python=1616 |
| 6314 chart python=6218,sh=96 |
| 3665 top_dir sh=2896,python=769 |
| 3102 dev python=2938,sh=164 |
| 1723 kubernetes_tests python=1723 |
| 280 docker_tests python=280 |
| 140 metastore_browser python=140 |
| 109 clients sh=109 |
| 28 images sh=28 |
| |
| Totals grouped by language (dominant language first): |
| python: 288625 (92.65%) |
| javascript: 11804 (3.79%) |
| sh: 10945 (3.51%) |
| xml: 132 (0.04%) |
| ``` |
| |
| We have now >10K lines of shell code now. We'd announce the success of the project if the shell number is less |
| than `300` lines of code or so, constituting less than `0.1%` of the code base. |
| |
| ## Decision |
| |
| The main decision is: |
| |
| **Vast majority of both Breeze and our CI scripts should be Python-based** |
| |
| There are likely a number of scripts that will remain in Bash, but they should contain no sophisticated |
| logic, they should not have common code in form of libraries and only used to execute simple tasks inside |
| Docker containers. No Bash should ever be used in the host environment. |
| |
| There are a few properties of Breeze/CI scripts that should be maintained though |
| |
| * It should be possible to start Breeze and run any of the CI scripts without having a specially prepared |
| virtualenv. If the virtualenv is needed - such environment should be prepared and maintained automatically |
| by the script being run. The idea is that new person starting their adventure with Airflow can simply |
| run a command and get everything done with the least number of prerequisites |
| |
| * The prerequisites for Breeze and CI are: |
| * Python 3.8+ (Python 3.8 end of life is October 2024) |
| * Docker (23.0+) |
| * Docker Compose (2.16.0+) |
| * No other tools and CLI commands should be needed |
| * The python requirements should be automatically installed when missing in a "Breeze" venv and updated |
| automatically when needed. The number of Python dependencies needed to run Breeze and CI scripts |
| should be minimal in order to make the virtualenv installation portable to Linux, MacOS and Windows |
| environment. |
| |
| * There are some basic assumptions that result from our common patterns across other components we use: |
| * we use `rich` library for colouring terminal output. Using wisely terminal colours |
| is an essential part of the developer experience. We will have to standardize color usage in |
| a follow-up adr |
| * we use `click` library to provide command line parsing and autocompletion (in the future). Click is |
| a comprehensive library with clean, decorator-based interface and provides rich customisation options |
| * we use `pytest` to run automated tests for our code |
| * until we are ready to share it with developers the new `Breeze` script resides in `dev/Breeze` folder, |
| without yet linking it from main directory of Airflow. Later we will link to it from the main directory |
| likely as `breeze` script (in some environments where filesystem is case-insensitive (MacOS) you cannot |
| really put two files differing only by case in the same folder. |
| * There is enough overlap between the CI and Breeze to reuse a lot of commands for building images and |
| other CI actions that they should be shared between Breeze and CI. Therefore `dev/Breeze` will |
| also become a home for all the CI scripts that will be used in GitHubActions in CI. |
| |
| ## Consequences |
| |
| The consequences of the change should be largely invisible to the current users of Breeze. They should be |
| able to perform the same actions and operations as in the Bash version (with possible later decision of |
| deprecating or removing of some commands). The biggest consequence should be to the whole development |
| community of Airflow - for them, modifying and extending and fixing Breeze and CI environment should |
| become much more appealing. |
| |
| The old script should remain and be maintained until the most important functionality of the |
| original Breeze script has been rewritten enough. |