blob: f629afed47b48aec2fbc35b076ca3ef1a31dbaf4 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. raw:: html
<div align="center">
<img src="images/AirflowBreeze_logo.png"
alt="Airflow Breeze - Development and Test Environment for Apache Airflow">
</div>
.. contents:: :local:
Airflow Breeze CI environment
=============================
Airflow Breeze is an easy-to-use development and test environment using
`Docker Compose <https://docs.docker.com/compose/>`_.
The environment is available for local use and is also used in Airflow's CI tests.
We call it *Airflow Breeze* as **It's a Breeze to contribute to Airflow**.
The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow
are described in `CONTRIBUTING.rst <CONTRIBUTING.rst#integration-test-development-environment>`_.
Prerequisites
=============
Docker Desktop
--------------
- **Version**: Install the latest stable `Docker Desktop <https://docs.docker.com/get-docker/>`_
and add make sure it is in your PATH. ``Breeze`` detects if you are using version that is too
old and warns you to upgrade.
- **Permissions**: Configure to run the ``docker`` commands directly and not only via root user.
Your user should be in the ``docker`` group.
See `Docker installation guide <https://docs.docker.com/install/>`_ for details.
- **Disk space**: On macOS, increase your available disk space before starting to work with
the environment. At least 20 GB of free disk space is recommended. You can also get by with a
smaller space but make sure to clean up the Docker disk space periodically.
See also `Docker for Mac - Space <https://docs.docker.com/docker-for-mac/space>`_ for details
on increasing disk space available for Docker on Mac.
- **Docker problems**: Sometimes it is not obvious that space is an issue when you run into
a problem with Docker. If you see a weird behaviour, try ``breeze cleanup`` command.
Also see `pruning <https://docs.docker.com/config/pruning/>`_ instructions from Docker.
Here is an example configuration with more than 200GB disk space for Docker:
.. raw:: html
<div align="center">
<img src="images/disk_space_osx.png" width="640"
alt="Disk space MacOS">
</div>
- **Docker is not running** - even if it is running with Docker Desktop. This is an issue
specific to Docker Desktop 4.13.0 (released in late October 2022). Please upgrade Docker
Desktop to 4.13.1 or later to resolve the issue. For technical details, see also
`docker/for-mac#6529 <https://github.com/docker/for-mac/issues/6529>`_.
**Docker errors that may come while running breeze**
- If docker not running in python virtual environment
- **Solution**
- 1. Create the docker group if it does not exist
- ``sudo groupadd docker``
- 2. Add your user to the docker group.
- ``sudo usermod -aG docker $USER``
- 3. Log in to the new docker group
- ``newgrp docker``
- 4. Check if docker can be run without root
- ``docker run hello-world``
|
|
Note: If you use Colima, please follow instructions at: `Contributors Quick Start Guide <https://github.com/apache/airflow/blob/main
/CONTRIBUTORS_QUICK_START.rst>`__
Docker Compose
--------------
- **Version**: Install the latest stable `Docker Compose <https://docs.docker.com/compose/install/>`_
and add it to the PATH. ``Breeze`` detects if you are using version that is too old and warns you to upgrade.
- **Permissions**: Configure permission to be able to run the ``docker-compose`` command by your user.
Docker in WSL 2
---------------
- **WSL 2 installation** :
Install WSL 2 and a Linux Distro (e.g. Ubuntu) see
`WSL 2 Installation Guide <https://docs.microsoft.com/en-us/windows/wsl/install-win10>`_ for details.
- **Docker Desktop installation** :
Install Docker Desktop for Windows. For Windows Home follow the
`Docker Windows Home Installation Guide <https://docs.docker.com/docker-for-windows/install-windows-home>`_.
For Windows Pro, Enterprise, or Education follow the
`Docker Windows Installation Guide <https://docs.docker.com/docker-for-windows/install/>`_.
- **Docker setting** :
WSL integration needs to be enabled
.. raw:: html
<div align="center">
<img src="images/docker_wsl_integration.png" width="640"
alt="Airflow Breeze - Docker WSL2 integration">
</div>
- **WSL 2 Filesystem Performance** :
Accessing the host Windows filesystem incurs a performance penalty,
it is therefore recommended to do development on the Linux filesystem.
E.g. Run ``cd ~`` and create a development folder in your Linux distro home
and git pull the Airflow repo there.
- **WSL 2 Docker mount errors**:
Another reason to use Linux filesystem, is that sometimes - depending on the length of
your path, you might get strange errors when you try start ``Breeze``, such as
``caused: mount through procfd: not a directory: unknown:``. Therefore checking out
Airflow in Windows-mounted Filesystem is strongly discouraged.
- **WSL 2 Docker volume remount errors**:
If you're experiencing errors such as ``ERROR: for docker-compose_airflow_run
Cannot create container for service airflow: not a directory`` when starting Breeze
after the first time or an error like ``docker: Error response from daemon: not a directory.
See 'docker run --help'.`` when running the pre-commit tests, you may need to consider
`installing Docker directly in WSL 2 <https://dev.to/bowmanjd/install-docker-on-windows-wsl-without-docker-desktop-34m9>`_
instead of using Docker Desktop for Windows.
- **WSL 2 Memory Usage** :
WSL 2 can consume a lot of memory under the process name "Vmmem". To reclaim the memory after
development you can:
* On the Linux distro clear cached memory: ``sudo sysctl -w vm.drop_caches=3``
* If no longer using Docker you can quit Docker Desktop
(right click system try icon and select "Quit Docker Desktop")
* If no longer using WSL you can shut it down on the Windows Host
with the following command: ``wsl --shutdown``
- **Developing in WSL 2**:
You can use all the standard Linux command line utilities to develop on WSL 2.
Further VS Code supports developing in Windows but remotely executing in WSL.
If VS Code is installed on the Windows host system then in the WSL Linux Distro
you can run ``code .`` in the root directory of you Airflow repo to launch VS Code.
The pipx tool
--------------
We are using ``pipx`` tool to install and manage Breeze. The ``pipx`` tool is created by the creators
of ``pip`` from `Python Packaging Authority <https://www.pypa.io/en/latest/>`_
Install pipx
.. code-block:: bash
pip install --user pipx
Breeze, is not globally accessible until your PATH is updated. Add <USER FOLDER>\.local\bin as a variable
environments. This can be done automatically by the following command (follow instructions printed).
.. code-block:: bash
pipx ensurepath
In Mac
.. code-block:: bash
python -m pipx ensurepath
Resources required
==================
Memory
------
Minimum 4GB RAM for Docker Engine is required to run the full Breeze environment.
On macOS, 2GB of RAM are available for your Docker containers by default, but more memory is recommended
(4GB should be comfortable). For details see
`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_.
On Windows WSL 2 expect the Linux Distro and Docker containers to use 7 - 8 GB of RAM.
Disk
----
Minimum 40GB free disk space is required for your Docker Containers.
On Mac OS This might deteriorate over time so you might need to increase it or run ``breeze cleanup``
periodically. For details see
`Docker for Mac - Advanced tab <https://docs.docker.com/v17.12/docker-for-mac/#advanced-tab>`_.
On WSL2 you might want to increase your Virtual Hard Disk by following:
`Expanding the size of your WSL 2 Virtual Hard Disk <https://docs.microsoft.com/en-us/windows/wsl/compare-versions#expanding-the-size-of-your-wsl-2-virtual-hard-disk>`_
There is a command ``breeze ci resource-check`` that you can run to check available resources. See below
for details.
Cleaning the environment
------------------------
You may need to clean up your Docker environment occasionally. The images are quite big
(1.5GB for both images needed for static code analysis and CI tests) and, if you often rebuild/update
them, you may end up with some unused image data.
To clean up the Docker environment:
1. Stop Breeze with ``breeze down``. (If Breeze is already running)
2. Run the ``breeze cleanup`` command.
3. Run ``docker images --all`` and ``docker ps --all`` to verify that your Docker is clean.
Both commands should return an empty list of images and containers respectively.
If you run into disk space errors, consider pruning your Docker images with the ``docker system prune --all``
command. You may need to restart the Docker Engine before running this command.
In case of disk space errors on macOS, increase the disk space available for Docker. See
`Prerequisites <#prerequisites>`_ for details.
Installation
============
Set your working directory to root of (this) cloned repository.
Run this command to install Breeze (make sure to use ``-e`` flag):
.. code-block:: bash
pipx install -e ./dev/breeze
Once this is complete, you should have ``breeze`` binary on your PATH and available to run by ``breeze``
command.
Those are all available commands for Breeze and details about the commands are described below:
.. image:: ./images/breeze/output-commands.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output-commands.svg
:width: 100%
:alt: Breeze commands
Breeze installed this way is linked to your checked out sources of Airflow, so Breeze will
automatically use latest version of sources from ``./dev/breeze``. Sometimes, when dependencies are
updated ``breeze`` commands with offer you to run self-upgrade.
You can always run such self-upgrade at any time:
.. code-block:: bash
breeze setup self-upgrade
If you have several checked out Airflow sources, Breeze will warn you if you are using it from a different
source tree and will offer you to re-install from those sources - to make sure that you are using the right
version.
You can skip Breeze's upgrade check by setting ``SKIP_BREEZE_UPGRADE_CHECK`` variable to non empty value.
By default Breeze works on the version of Airflow that you run it in - in case you are outside of the
sources of Airflow and you installed Breeze from a directory - Breeze will be run on Airflow sources from
where it was installed.
You can run ``breeze setup version`` command to see where breeze installed from and what are the current sources
that Breeze works on
Running Breeze for the first time
---------------------------------
The First time you run Breeze, it pulls and builds a local version of Docker images.
It pulls the latest Airflow CI images from the
`GitHub Container Registry <https://github.com/orgs/apache/packages?repo_name=airflow>`_
and uses them to build your local Docker images. Note that the first run (per python) might take up to 10
minutes on a fast connection to start. Subsequent runs should be much faster.
Once you enter the environment, you are dropped into bash shell of the Airflow container and you can
run tests immediately.
To use the full potential of breeze you should set up autocomplete. The ``breeze`` command comes
with a built-in bash/zsh/fish autocomplete setup command. After installing,
when you start typing the command, you can use <TAB> to show all the available switches and get
auto-completion on typical values of parameters that you can use.
You should set up the autocomplete option automatically by running:
.. code-block:: bash
breeze setup autocomplete
Automating breeze installation
------------------------------
Breeze on POSIX-compliant systems (Linux, MacOS) can be automatically installed by running the
``scripts/tools/setup_breeze`` bash script. This includes checking and installing ``pipx``, setting up
``breeze`` with it and setting up autocomplete.
Customizing your environment
----------------------------
When you enter the Breeze environment, automatically an environment file is sourced from
``files/airflow-breeze-config/variables.env``.
You can also add ``files/airflow-breeze-config/init.sh`` and the script will be sourced always
when you enter Breeze. For example you can add ``pip install`` commands if you want to install
custom dependencies - but there are no limits to add your own customizations.
You can override the name of the init script by setting ``INIT_SCRIPT_FILE`` environment variable before
running the breeze environment.
You can also customize your environment by setting ``BREEZE_INIT_COMMAND`` environment variable. This variable
will be evaluated at entering the environment.
The ``files`` folder from your local sources is automatically mounted to the container under
``/files`` path and you can put there any files you want to make available for the Breeze container.
You can also copy any .whl or .sdist packages to dist and when you pass ``--use-packages-from-dist`` flag
as ``wheel`` or ``sdist`` line parameter, breeze will automatically install the packages found there
when you enter Breeze.
You can also add your local tmux configuration in ``files/airflow-breeze-config/.tmux.conf`` and
these configurations will be available for your tmux environment.
There is a symlink between ``files/airflow-breeze-config/.tmux.conf`` and ``~/.tmux.conf`` in the container,
so you can change it at any place, and run
.. code-block:: bash
tmux source ~/.tmux.conf
inside container, to enable modified tmux configurations.
Regular development tasks
=========================
The regular Breeze development tasks are available as top-level commands. Those tasks are most often
used during the development, that's why they are available without any sub-command. More advanced
commands are separated to sub-commands.
Entering Breeze shell
---------------------
This is the most often used feature of breeze. It simply allows to enter the shell inside the Breeze
development environment (inside the Breeze container).
You can use additional ``breeze`` flags to choose your environment. You can specify a Python
version to use, and backend (the meta-data database). Thanks to that, with Breeze, you can recreate the same
environments as we have in matrix builds in the CI.
For example, you can choose to run Python 3.7 tests with MySQL as backend and with mysql version 8
as follows:
.. code-block:: bash
breeze --python 3.7 --backend mysql --mysql-version 8
The choices you make are persisted in the ``./.build/`` cache directory so that next time when you use the
``breeze`` script, it could use the values that were used previously. This way you do not have to specify
them when you run the script. You can delete the ``.build/`` directory in case you want to restore the
default settings.
You can see which value of the parameters that can be stored persistently in cache marked with >VALUE<
in the help of the commands.
Building the documentation
--------------------------
To build documentation in Breeze, use the ``build-docs`` command:
.. code-block:: bash
breeze build-docs
Results of the build can be found in the ``docs/_build`` folder.
The documentation build consists of three steps:
* verifying consistency of indexes
* building documentation
* spell checking
You can choose only one stage of the two by providing ``--spellcheck-only`` or ``--docs-only`` after
extra ``--`` flag.
.. code-block:: bash
breeze build-docs --spellcheck-only
This process can take some time, so in order to make it shorter you can filter by package, using the flag
``--package-filter <PACKAGE-NAME>``. The package name has to be one of the providers or ``apache-airflow``. For
instance, for using it with Amazon, the command would be:
.. code-block:: bash
breeze build-docs --package-filter apache-airflow-providers-amazon
Often errors during documentation generation come from the docstrings of auto-api generated classes.
During the docs building auto-api generated files are stored in the ``docs/_api`` folder. This helps you
easily identify the location the problems with documentation originated from.
Those are all available flags of ``build-docs`` command:
.. image:: ./images/breeze/output_build-docs.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_build-docs.svg
:width: 100%
:alt: Breeze build documentation
Running static checks
---------------------
You can run static checks via Breeze. You can also run them via pre-commit command but with auto-completion
Breeze makes it easier to run selective static checks. If you press <TAB> after the static-check and if
you have auto-complete setup you should see auto-completable list of all checks available.
For example, this following command:
.. code-block:: bash
breeze static-checks -t mypy-core
will run mypy check for currently staged files inside ``airflow/`` excluding providers.
Selecting files to run static checks on
........................................
Pre-commits run by default on staged changes that you have locally changed. It will run it on all the
files you run ``git add`` on and it will ignore any changes that you have modified but not staged.
If you want to run it on all your modified files you should add them with ``git add`` command.
With ``--all-files`` you can run static checks on all files in the repository. This is useful when you
want to be sure they will not fail in CI, or when you just rebased your changes and want to
re-run latest pre-commits on your changes, but it can take a long time (few minutes) to wait for the result.
.. code-block:: bash
breeze static-checks -t mypy-core --all-files
The above will run mypy check for all files.
You can limit that by selecting specific files you want to run static checks on. You can do that by
specifying (can be multiple times) ``--file`` flag.
.. code-block:: bash
breeze static-checks -t mypy-core --file airflow/utils/code_utils.py --file airflow/utils/timeout.py
The above will run mypy check for those to files (note: autocomplete should work for the file selection).
However, often you do not remember files you modified and you want to run checks for files that belong
to specific commits you already have in your branch. You can use ``breeze static check`` to run the checks
only on changed files you have already committed to your branch - either for specific commit, for last
commit, for all changes in your branch since you branched off from main or for specific range
of commits you choose.
.. code-block:: bash
breeze static-checks -t mypy-core --last-commit
The above will run mypy check for all files in the last commit in your branch.
.. code-block:: bash
breeze static-checks -t mypy-core --only-my-changes
The above will run mypy check for all commits in your branch which were added since you branched off from main.
.. code-block:: bash
breeze static-checks -t mypy-core --commit-ref 639483d998ecac64d0fef7c5aa4634414065f690
The above will run mypy check for all files in the 639483d998ecac64d0fef7c5aa4634414065f690 commit.
Any ``commit-ish`` reference from Git will work here (branch, tag, short/long hash etc.)
.. code-block:: bash
breeze static-checks -t identity --verbose --from-ref HEAD^^^^ --to-ref HEAD
The above will run the check for the last 4 commits in your branch. You can use any ``commit-ish`` references
in ``--from-ref`` and ``--to-ref`` flags.
Those are all available flags of ``static-checks`` command:
.. image:: ./images/breeze/output_static-checks.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_static-checks.svg
:width: 100%
:alt: Breeze static checks
.. note::
When you run static checks, some of the artifacts (mypy_cache) is stored in docker-compose volume
so that it can speed up static checks execution significantly. However, sometimes, the cache might
get broken, in which case you should run ``breeze down`` to clean up the cache.
.. note::
You cannot change Python version for static checks that are run within Breeze containers.
The ``--python`` flag has no effect for them. They are always run with lowest supported Python version.
The main reason is to keep consistency in the results of static checks and to make sure that
our code is fine when running the lowest supported version.
Starting Airflow
----------------
For testing Airflow you often want to start multiple components (in multiple terminals). Breeze has
built-in ``start-airflow`` command that start breeze container, launches multiple terminals using tmux
and launches all Airflow necessary components in those terminals.
When you are starting airflow from local sources, www asset compilation is automatically executed before.
.. code-block:: bash
breeze --python 3.7 --backend mysql start-airflow
You can also use it to start any released version of Airflow from ``PyPI`` with the
``--use-airflow-version`` flag.
.. code-block:: bash
breeze start-airflow --python 3.7 --backend mysql --use-airflow-version 2.2.5
Those are all available flags of ``start-airflow`` command:
.. image:: ./images/breeze/output_start-airflow.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_start-airflow.svg
:width: 100%
:alt: Breeze start-airflow
Launching multiple terminals in the same environment
----------------------------------------------------
Often if you want to run full airflow in the Breeze environment you need to launch multiple terminals and
run ``airflow webserver``, ``airflow scheduler``, ``airflow worker`` in separate terminals.
This can be achieved either via ``tmux`` or via exec-ing into the running container from the host. Tmux
is installed inside the container and you can launch it with ``tmux`` command. Tmux provides you with the
capability of creating multiple virtual terminals and multiplex between them. More about ``tmux`` can be
found at `tmux GitHub wiki page <https://github.com/tmux/tmux/wiki>`_ . Tmux has several useful shortcuts
that allow you to split the terminals, open new tabs etc - it's pretty useful to learn it.
Another way is to exec into Breeze terminal from the host's terminal. Often you can
have multiple terminals in the host (Linux/MacOS/WSL2 on Windows) and you can simply use those terminals
to enter the running container. It's as easy as launching ``breeze exec`` while you already started the
Breeze environment. You will be dropped into bash and environment variables will be read in the same
way as when you enter the environment. You can do it multiple times and open as many terminals as you need.
Those are all available flags of ``exec`` command:
.. image:: ./images/breeze/output_exec.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_exec.svg
:width: 100%
:alt: Breeze exec
Compiling www assets
--------------------
Airflow webserver needs to prepare www assets - compiled with node and yarn. The ``compile-www-assets``
command takes care about it. This is needed when you want to run webserver inside of the breeze.
.. image:: ./images/breeze/output_compile-www-assets.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_compile-www-assets.svg
:width: 100%
:alt: Breeze compile-www-assets
Breeze cleanup
--------------
Sometimes you need to cleanup your docker environment (and it is recommended you do that regularly). There
are several reasons why you might want to do that.
Breeze uses docker images heavily and those images are rebuild periodically and might leave dangling, unused
images in docker cache. This might cause extra disk usage. Also running various docker compose commands
(for example running tests with ``breeze testing tests``) might create additional docker networks that might
prevent new networks from being created. Those networks are not removed automatically by docker-compose.
Also Breeze uses it's own cache to keep information about all images.
All those unused images, networks and cache can be removed by running ``breeze cleanup`` command. By default
it will not remove the most recent images that you might need to run breeze commands, but you
can also remove those breeze images to clean-up everything by adding ``--all`` command (note that you will
need to build the images again from scratch - pulling from the registry might take a while).
Breeze will ask you to confirm each step, unless you specify ``--answer yes`` flag.
Those are all available flags of ``cleanup`` command:
.. image:: ./images/breeze/output_cleanup.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_cleanup.svg
:width: 100%
:alt: Breeze cleanup
Running arbitrary commands in container
---------------------------------------
More sophisticated usages of the breeze shell is using the ``breeze shell`` command - it has more parameters
and you can also use it to execute arbitrary commands inside the container.
.. code-block:: bash
breeze shell "ls -la"
Those are all available flags of ``shell`` command:
.. image:: ./images/breeze/output_shell.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_shell.svg
:width: 100%
:alt: Breeze shell
Running Breeze with Metrics
---------------------------
Running Breeze with a StatsD Metrics Stack
..........................................
You can launch an instance of Breeze pre-configured to emit StatsD metrics using
``breeze start-airflow --integration statsd``. This will launch an Airflow webserver
within the Breeze environment as well as containers running StatsD, Prometheus, and
Grafana. The integration configures the "Targets" in Prometheus, the "Datasources" in
Grafana, and includes a default dashboard in Grafana.
When you run Airflow Breeze with this integration, in addition to the standard ports
(See "Port Forwarding" below), the following are also automatically forwarded:
* 29102 -> forwarded to StatsD Exporter -> breeze-statsd-exporter:9102
* 29090 -> forwarded to Prometheus -> breeze-prometheus:9090
* 23000 -> forwarded to Grafana -> breeze-grafana:3000
You can connect to these ports/databases using:
* StatsD Metrics: http://127.0.0.1:29102/metrics
* Prometheus Targets: http://127.0.0.1:29090/targets
* Grafana Dashboards: http://127.0.0.1:23000/dashboards
Running Breeze with an OpenTelemetry Metrics Stack
..................................................
----
[Work in Progress]
NOTE: This will launch the stack as described below but Airflow integration is
still a Work in Progress. This should be considered experimental and likely to
change by the time Airflow fully supports emitting metrics via OpenTelemetry.
----
You can launch an instance of Breeze pre-configured to emit OpenTelemetry metrics
using ``breeze start-airflow --integration otel``. This will launch Airflow within
the Breeze environment as well as containers running OpenTelemetry-Collector,
Prometheus, and Grafana. The integration handles all configuration of the
"Targets" in Prometheus and the "Datasources" in Grafana, so it is ready to use.
When you run Airflow Breeze with this integration, in addition to the standard ports
(See "Port Forwarding" below), the following are also automatically forwarded:
* 28889 -> forwarded to OpenTelemetry Collector -> breeze-otel-collector:8889
* 29090 -> forwarded to Prometheus -> breeze-prometheus:9090
* 23000 -> forwarded to Grafana -> breeze-grafana:3000
You can connect to these ports using:
* OpenTelemetry Collector: http://127.0.0.1:28889/metrics
* Prometheus Targets: http://127.0.0.1:29090/targets
* Grafana Dashboards: http://127.0.0.1:23000/dashboards
Stopping the environment
------------------------
After starting up, the environment runs in the background and takes quite some memory which you might
want to free for other things you are running on your host.
You can always stop it via:
.. code-block:: bash
breeze down
Those are all available flags of ``down`` command:
.. image:: ./images/breeze/output_down.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_down.svg
:width: 100%
:alt: Breeze down
Troubleshooting
===============
If you are having problems with the Breeze environment, try the steps below. After each step you
can check whether your problem is fixed.
1. If you are on macOS, check if you have enough disk space for Docker (Breeze will warn you if not).
2. Stop Breeze with ``breeze down``.
3. Delete the ``.build`` directory and run ``breeze ci-image build``.
4. Clean up Docker images via ``breeze cleanup`` command.
5. Restart your Docker Engine and try again.
6. Restart your machine and try again.
7. Re-install Docker Desktop and try again.
In case the problems are not solved, you can set the VERBOSE_COMMANDS variable to "true":
.. code-block::
export VERBOSE_COMMANDS="true"
Then run the failed command, copy-and-paste the output from your terminal to the
`Airflow Slack <https://s.apache.org/airflow-slack>`_ #airflow-breeze channel and
describe your problem.
.. warning::
Some operating systems (Fedora, ArchLinux, RHEL, Rocky) have recently introduced Kernel changes that result in
Airflow in Breeze consuming 100% memory when run inside the community Docker implementation maintained
by the OS teams.
This is an issue with backwards-incompatible containerd configuration that some of Airflow dependencies
have problems with and is tracked in a few issues:
* `Moby issue <https://github.com/moby/moby/issues/43361>`_
* `Containerd issue <https://github.com/containerd/containerd/pull/7566>`_
There is no solution yet from the containerd team, but seems that installing
`Docker Desktop on Linux <https://docs.docker.com/desktop/install/linux-install/>`_ solves the problem as
stated in `This comment <https://github.com/moby/moby/issues/43361#issuecomment-1227617516>`_ and allows to
run Breeze with no problems.
ETIMEOUT Error
--------------
When running ``breeze start-airflow``, the following output might be observed:
.. code-block:: bash
Skip fixing ownership of generated files as Host OS is darwin
Waiting for asset compilation to complete in the background.
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
The asset compilation is taking too long.
If it does not complete soon, you might want to stop it and remove file lock:
* press Ctrl-C
* run 'rm /opt/airflow/.build/www/.asset_compile.lock'
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
Still waiting .....
The asset compilation failed. Exiting.
[INFO] Locking pre-commit directory
Error 1 returned
This error is actually caused by the following error during the asset compilation which resulted in
ETIMEOUT when ``npm`` command is trying to install required packages:
.. code-block:: bash
npm ERR! code ETIMEDOUT
npm ERR! syscall connect
npm ERR! errno ETIMEDOUT
npm ERR! network request to https://registry.npmjs.org/yarn failed, reason: connect ETIMEDOUT 2606:4700::6810:1723:443
npm ERR! network This is a problem related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly. See: 'npm help config'
In this situation, notice that the IP address ``2606:4700::6810:1723:443`` is in IPv6 format, which was the
reason why the connection did not go through the router, as the router did not support IPv6 addresses in its DNS lookup.
In this case, disabling IPv6 in the host machine and using IPv4 instead resolved the issue.
The similar issue could happen if you are behind an HTTP/HTTPS proxy and your access to required websites are
blocked by it, or your proxy setting has not been done properly.
Advanced commands
=================
Airflow Breeze is a bash script serving as a "swiss-army-knife" of Airflow testing. Under the
hood it uses other scripts that you can also run manually if you have problem with running the Breeze
environment. Breeze script allows performing the following tasks:
Running tests
-------------
You can run tests with ``breeze``. There are various tests type and breeze allows to run different test
types easily. You can run unit tests in different ways, either interactively run tests with the default
``shell`` command or via the ``testing`` commands. The latter allows to run more kinds of tests easily.
Here is the detailed set of options for the ``breeze testing`` command.
.. image:: ./images/breeze/output_testing.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing.svg
:width: 100%
:alt: Breeze testing
Iterate on tests interactively via ``shell`` command
....................................................
You can simply enter the ``breeze`` container and run ``pytest`` command there. You can enter the
container via just ``breeze`` command or ``breeze shell`` command (the latter has more options
useful when you run integration or system tests). This is the best way if you want to interactively
run selected tests and iterate with the tests. Once you enter ``breeze`` environment it is ready
out-of-the-box to run your tests by running the right ``pytest`` command (autocomplete should help
you with autocompleting test name if you start typing ``pytest tests<TAB>``).
Here are few examples:
Running single test:
.. code-block:: bash
pytest tests/core/test_core.py::TestCore::test_check_operators
To run the whole test class:
.. code-block:: bash
pytest tests/core/test_core.py::TestCore
You can re-run the tests interactively, add extra parameters to pytest and modify the files before
re-running the test to iterate over the tests. You can also add more flags when starting the
``breeze shell`` command when you run integration tests or system tests. Read more details about it
in the ``TESTING.rst <TESTING.rst#>`` where all the test types of our are explained and more information
on how to run them.
This applies to all kind of tests - all our tests can be run using pytest.
Running unit tests
..................
Another option you have is that you can also run tests via built-in ``breeze testing tests`` command.
The iterative ``pytest`` command allows to run test individually, or by class or in any other way
pytest allows to test them and run them interactively, but ``breeze testing tests`` command allows to
run the tests in the same test "types" that are used to run the tests in CI: for example Core, Always
API, Providers. This how our CI runs them - running each group in parallel to other groups and you can
replicate this behaviour.
Another interesting use of the ``breeze testing tests`` command is that you can easily specify sub-set of the
tests for Providers.
For example this will only run provider tests for airbyte and http providers:
.. code-block:: bash
breeze testing tests --test-type "Providers[airbyte,http]"
You can also exclude tests for some providers from being run when whole "Providers" test type is run.
For example this will run tests for all providers except amazon and google provider tests:
.. code-block:: bash
breeze testing tests --test-type "Providers[-amazon,google]"
You can also run parallel tests with ``--run-in-parallel`` flag - by default it will run all tests types
in parallel, but you can specify the test type that you want to run with space separated list of test
types passed to ``--parallel-test-types`` flag.
For example this will run API and WWW tests in parallel:
.. code-block:: bash
breeze testing tests --parallel-test-types "API WWW" --run-in-parallel
There are few special types of tests that you can run:
* ``All`` - all tests are run in single pytest run.
* ``PlainAsserts`` - some tests of ours fail when ``--assert=rewrite`` feature of pytest is used. This
is in order to get better output of ``assert`` statements This is a special test type that runs those
select tests tests with ``--assert=plain`` flag.
* ``Postgres`` - runs all tests that require Postgres database
* ``MySQL`` - runs all tests that require MySQL database
* ``Quarantine`` - runs all tests that are in quarantine (marked with ``@pytest.mark.quarantined``
decorator)
Here is the detailed set of options for the ``breeze testing tests`` command.
.. image:: ./images/breeze/output_testing_tests.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_tests.svg
:width: 100%
:alt: Breeze testing tests
Running integration tests
.........................
You can also run integration tests via built-in ``breeze testing integration-tests`` command. Some of our
tests require additional integrations to be started in docker-compose. The integration tests command will
run the expected integration and tests that need that integration.
For example this will only run kerberos tests:
.. code-block:: bash
breeze testing integration-tests --integration kerberos
Here is the detailed set of options for the ``breeze testing integration-tests`` command.
.. image:: ./images/breeze/output_testing_integration-tests.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_integration_tests.svg
:width: 100%
:alt: Breeze testing integration-tests
Running Helm tests
..................
You can use Breeze to run all Helm tests. Those tests are run inside the breeze image as there are all
necessary tools installed there.
.. image:: ./images/breeze/output_testing_helm-tests.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_helm-tests.svg
:width: 100%
:alt: Breeze testing helm-tests
You can also iterate over those tests with pytest commands, similarly as in case of regular unit tests.
The helm tests can be found in ``tests/chart`` folder in the main repo.
Running docker-compose tests
............................
You can use Breeze to run all docker-compose tests. Those tests are run using Production image
and they are running test with the Quick-start docker compose we have.
.. image:: ./images/breeze/output_testing_docker-compose-tests.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_docker-compose-tests.svg
:width: 100%
:alt: Breeze testing docker-compose-tests
You can also iterate over those tests with pytest command, but - unlike regular unit tests and
Helm tests, they need to be run in local virtual environment. They also require to have
``DOCKER_IMAGE`` environment variable set, pointing to the image to test if you do not run them
through ``breeze testing docker-compose-tests`` command.
The docker-compose tests are in ``docker-tests/`` folder in the main repo.
Running Kubernetes tests
------------------------
Breeze helps with running Kubernetes tests in the same environment/way as CI tests are run.
Breeze helps to setup KinD cluster for testing, setting up virtualenv and downloads the right tools
automatically to run the tests.
You can:
* Setup environment for k8s tests with ``breeze k8s setup-env``
* Build airflow k8S images with ``breeze k8s build-k8s-image``
* Manage KinD Kubernetes cluster and upload image and deploy Airflow to KinD cluster via
``breeze k8s create-cluster``, ``breeze k8s configure-cluster``, ``breeze k8s deploy-airflow``, ``breeze k8s status``,
``breeze k8s upload-k8s-image``, ``breeze k8s delete-cluster`` commands
* Run Kubernetes tests specified with ``breeze k8s tests`` command
* Run complete test run with ``breeze k8s run-complete-tests`` - performing the full cycle of creating
cluster, uploading the image, deploying airflow, running tests and deleting the cluster
* Enter the interactive kubernetes test environment with ``breeze k8s shell`` and ``breeze k8s k9s`` command
* Run multi-cluster-operations ``breeze k8s list-all-clusters`` and
``breeze k8s delete-all-clusters`` commands as well as running complete tests in parallel
via ``breeze k8s dump-logs`` command
This is described in detail in `Testing Kubernetes <TESTING.rst#running-tests-with-kubernetes>`_.
You can read more about KinD that we use in `The documentation <https://kind.sigs.k8s.io/>`_
Here is the detailed set of options for the ``breeze k8s`` command.
.. image:: ./images/breeze/output_k8s.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s.svg
:width: 100%
:alt: Breeze k8s
Setting up K8S environment
..........................
Kubernetes environment can be set with the ``breeze k8s setup-env`` command.
It will create appropriate virtualenv to run tests and download the right set of tools to run
the tests: ``kind``, ``kubectl`` and ``helm`` in the right versions. You can re-run the command
when you want to make sure the expected versions of the tools are installed properly in the
virtualenv. The Virtualenv is available in ``.build/.k8s-env/bin`` subdirectory of your Airflow
installation.
.. image:: ./images/breeze/output_k8s_setup-env.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_setup-env.svg
:width: 100%
:alt: Breeze k8s setup-env
Creating K8S cluster
....................
You can create kubernetes cluster (separate cluster for each python/kubernetes version) via
``breeze k8s create-cluster`` command. With ``--force`` flag the cluster will be
deleted if exists. You can also use it to create multiple clusters in parallel with
``--run-in-parallel`` flag - this is what happens in our CI.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_create-cluster.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_create-cluster.svg
:width: 100%
:alt: Breeze k8s create-cluster
Deleting K8S cluster
....................
You can delete current kubernetes cluster via ``breeze k8s delete-cluster`` command. You can also add
``--run-in-parallel`` flag to delete all clusters.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_delete-cluster.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_delete-cluster.svg
:width: 100%
:alt: Breeze k8s delete-cluster
Building Airflow K8s images
...........................
Before deploying Airflow Helm Chart, you need to make sure the appropriate Airflow image is build (it has
embedded test dags, pod templates and webserver is configured to refresh immediately. This can
be done via ``breeze k8s build-k8s-image`` command. It can also be done in parallel for all images via
``--run-in-parallel`` flag.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_build-k8s-image.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_build-k8s-image.svg
:width: 100%
:alt: Breeze k8s build-k8s-image
Uploading Airflow K8s images
............................
The K8S airflow images need to be uploaded to the KinD cluster. This can be done via
``breeze k8s upload-k8s-image`` command. It can also be done in parallel for all images via
``--run-in-parallel`` flag.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_upload-k8s-image.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_upload-k8s-image.svg
:width: 100%
:alt: Breeze k8s upload-k8s-image
Configuring K8S cluster
.......................
In order to deploy Airflow, the cluster needs to be configured. Airflow namespace needs to be created
and test resources should be deployed. By passing ``--run-in-parallel`` the configuration can be run
for all clusters in parallel.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_configure-cluster.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_configure-cluster.svg
:width: 100%
:alt: Breeze k8s configure-cluster
Deploying Airflow to the Cluster
................................
Airflow can be deployed to the Cluster with ``breeze k8s deploy-airflow``. This step will automatically
(unless disabled by switches) will rebuild the image to be deployed. It also uses the latest version
of the Airflow Helm Chart to deploy it. You can also choose to upgrade existing airflow deployment
and pass extra arguments to ``helm install`` or ``helm upgrade`` commands that are used to
deploy airflow. By passing ``--run-in-parallel`` the deployment can be run
for all clusters in parallel.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_deploy-airflow.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_deploy-airflow.svg
:width: 100%
:alt: Breeze k8s deploy-airflow
Checking status of the K8S cluster
..................................
You can delete kubernetes cluster and airflow deployed in the current cluster
via ``breeze k8s status`` command. It can be also checked for all clusters created so far by passing
``--all`` flag.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_status.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_status.svg
:width: 100%
:alt: Breeze k8s status
Running k8s tests
.................
You can run ``breeze k8s tests`` command to run ``pytest`` tests with your cluster. Those tests are placed
in ``kubernetes_tests/`` and you can either specify the tests to run as parameter of the tests command or
you can leave them empty to run all tests. By passing ``--run-in-parallel`` the tests can be run
for all clusters in parallel.
Run all tests:
.. code-block::bash
breeze k8s tests
Run selected tests:
.. code-block::bash
breeze k8s tests test_kubernetes_executor.py
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_tests.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_tests.svg
:width: 100%
:alt: Breeze k8s tests
You can also specify any pytest flags as extra parameters - they will be passed to the
shell command directly. In case the shell parameters are the same as the parameters of the command, you
can pass them after ``--``. For example this is the way how you can see all available parameters of the shell
you have:
.. code-block::bash
breeze k8s tests -- --help
The options that are not overlapping with the ``tests`` command options can be passed directly and mixed
with the specifications of tests you want to run. For example the command below will only run
``test_kubernetes_executor.py`` and will suppress capturing output from Pytest so that you can see the
output during test execution.
.. code-block::bash
breeze k8s tests -- test_kubernetes_executor.py -s
Running k8s complete tests
..........................
You can run ``breeze k8s run-complete-tests`` command to combine all previous steps in one command. That
command will create cluster, deploy airflow and run tests and finally delete cluster. It is used in CI
to run the whole chains in parallel.
Run all tests:
.. code-block::bash
breeze k8s run-complete-tests
Run selected tests:
.. code-block::bash
breeze k8s run-complete-tests test_kubernetes_executor.py
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_run-complete-tests.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_run-complete-tests.svg
:width: 100%
:alt: Breeze k8s tests
You can also specify any pytest flags as extra parameters - they will be passed to the
shell command directly. In case the shell parameters are the same as the parameters of the command, you
can pass them after ``--``. For example this is the way how you can see all available parameters of the shell
you have:
.. code-block::bash
breeze k8s run-complete-tests -- --help
The options that are not overlapping with the ``tests`` command options can be passed directly and mixed
with the specifications of tests you want to run. For example the command below will only run
``test_kubernetes_executor.py`` and will suppress capturing output from Pytest so that you can see the
output during test execution.
.. code-block::bash
breeze k8s run-complete-tests -- test_kubernetes_executor.py -s
Entering k8s shell
..................
You can have multiple clusters created - with different versions of Kubernetes and Python at the same time.
Breeze enables you to interact with the chosen cluster by entering dedicated shell session that has the
cluster pre-configured. This is done via ``breeze k8s shell`` command.
Once you are in the shell, the prompt will indicate which cluster you are interacting with as well
as executor you use, similar to:
.. code-block::bash
(kind-airflow-python-3.9-v1.24.0:KubernetesExecutor)>
The shell automatically activates the virtual environment that has all appropriate dependencies
installed and you can interactively run all k8s tests with pytest command (of course the cluster need to
be created and airflow deployed to it before running the tests):
.. code-block::bash
(kind-airflow-python-3.9-v1.24.0:KubernetesExecutor)> pytest test_kubernetes_executor.py
================================================= test session starts =================================================
platform linux -- Python 3.10.6, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jarek/code/airflow/.build/.k8s-env/bin/python
cachedir: .pytest_cache
rootdir: /home/jarek/code/airflow, configfile: pytest.ini
plugins: anyio-3.6.1
collected 2 items
test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag PASSED [ 50%]
test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure PASSED [100%]
================================================== warnings summary ===================================================
.build/.k8s-env/lib/python3.10/site-packages/_pytest/config/__init__.py:1233
/home/jarek/code/airflow/.build/.k8s-env/lib/python3.10/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: asyncio_mode
self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================ 2 passed, 1 warning in 38.62s ============================================
(kind-airflow-python-3.9-v1.24.0:KubernetesExecutor)>
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_shell.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_shell.svg
:width: 100%
:alt: Breeze k8s shell
You can also specify any shell flags and commands as extra parameters - they will be passed to the
shell command directly. In case the shell parameters are the same as the parameters of the command, you
can pass them after ``--``. For example this is the way how you can see all available parameters of the shell
you have:
.. code-block::bash
breeze k8s shell -- --help
Running k9s tool
................
The ``k9s`` is a fantastic tool that allows you to interact with running k8s cluster. Since we can have
multiple clusters capability, ``breeze k8s k9s`` allows you to start k9s without setting it up or
downloading - it uses k9s docker image to run it and connect it to the right cluster.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_k9s.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_k9s.svg
:width: 100%
:alt: Breeze k8s k9s
You can also specify any ``k9s`` flags and commands as extra parameters - they will be passed to the
``k9s`` command directly. In case the ``k9s`` parameters are the same as the parameters of the command, you
can pass them after ``--``. For example this is the way how you can see all available parameters of the
``k9s`` you have:
.. code-block::bash
breeze k8s k9s -- --help
Dumping logs from all k8s clusters
..................................
KinD allows to export logs from the running cluster so that you can troubleshoot your deployment.
This can be done with ``breeze k8s logs`` command. Logs can be also dumped for all clusters created
so far by passing ``--all`` flag.
All parameters of the command are here:
.. image:: ./images/breeze/output_k8s_logs.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_k8s_logs.svg
:width: 100%
:alt: Breeze k8s logs
CI Image tasks
--------------
The image building is usually run for users automatically when needed,
but sometimes Breeze users might want to manually build, pull or verify the CI images.
.. image:: ./images/breeze/output_ci-image.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci-image.svg
:width: 100%
:alt: Breeze ci-image
For all development tasks, unit tests, integration tests, and static code checks, we use the
**CI image** maintained in GitHub Container Registry.
The CI image is built automatically as needed, however it can be rebuilt manually with
``ci image build`` command.
Building the image first time pulls a pre-built version of images from the Docker Hub, which may take some
time. But for subsequent source code changes, no wait time is expected.
However, changes to sensitive files like ``setup.py`` or ``Dockerfile.ci`` will trigger a rebuild
that may take more time though it is highly optimized to only rebuild what is needed.
Breeze has built in mechanism to check if your local image has not diverged too much from the
latest image build on CI. This might happen when for example latest patches have been released as new
Python images or when significant changes are made in the Dockerfile. In such cases, Breeze will
download the latest images before rebuilding because this is usually faster than rebuilding the image.
Building CI image
.................
Those are all available flags of ``ci-image build`` command:
.. image:: ./images/breeze/output_ci-image_build.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci-image_build.svg
:width: 100%
:alt: Breeze ci-image build
Pulling CI image
................
You can also pull the CI images locally in parallel with optional verification.
Those are all available flags of ``pull`` command:
.. image:: ./images/breeze/output_ci-image_pull.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci-image_pull.svg
:width: 100%
:alt: Breeze ci-image pull
Verifying CI image
..................
Finally, you can verify CI image by running tests - either with the pulled/built images or
with an arbitrary image.
Those are all available flags of ``verify`` command:
.. image:: ./images/breeze/output_ci-image_verify.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci-image_verify.svg
:width: 100%
:alt: Breeze ci-image verify
PROD Image tasks
----------------
Users can also build Production images when they are developing them. However when you want to
use the PROD image, the regular docker build commands are recommended. See
`building the image <https://airflow.apache.org/docs/docker-stack/build.html>`_
.. image:: ./images/breeze/output_prod-image.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_prod-image.svg
:width: 100%
:alt: Breeze prod-image
The **Production image** is also maintained in GitHub Container Registry for Caching
and in ``apache/airflow`` manually pushed for released versions. This Docker image (built using official
Dockerfile) contains size-optimised Airflow installation with selected extras and dependencies.
However in many cases you want to add your own custom version of the image - with added apt dependencies,
python dependencies, additional Airflow extras. Breeze's ``prod-image build`` command helps to build your own,
customized variant of the image that contains everything you need.
You can building the production image manually by using ``prod-image build`` command.
Note, that the images can also be built using ``docker build`` command by passing appropriate
build-args as described in `IMAGES.rst <IMAGES.rst>`_ , but Breeze provides several flags that
makes it easier to do it. You can see all the flags by running ``breeze prod-image build --help``,
but here typical examples are presented:
.. code-block:: bash
breeze prod-image build --additional-extras "jira"
This installs additional ``jira`` extra while installing airflow in the image.
.. code-block:: bash
breeze prod-image build --additional-python-deps "torchio==0.17.10"
This install additional pypi dependency - torchio in specified version.
.. code-block:: bash
breeze prod-image build --additional-dev-apt-deps "libasound2-dev" \
--additional-runtime-apt-deps "libasound2"
This installs additional apt dependencies - ``libasound2-dev`` in the build image and ``libasound`` in the
final image. Those are development dependencies that might be needed to build and use python packages added
via the ``--additional-python-deps`` flag. The ``dev`` dependencies are not installed in the final
production image, they are only installed in the build "segment" of the production image that is used
as an intermediate step to build the final image. Usually names of the ``dev`` dependencies end with ``-dev``
suffix and they need to also be paired with corresponding runtime dependency added for the runtime image
(without -dev).
.. code-block:: bash
breeze prod-image build --python 3.7 --additional-dev-deps "libasound2-dev" \
--additional-runtime-apt-deps "libasound2"
Same as above but uses python 3.7.
Building PROD image
...................
Those are all available flags of ``build-prod-image`` command:
.. image:: ./images/breeze/output_prod-image_build.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_prod-image_build.svg
:width: 100%
:alt: Breeze prod-image build
Pulling PROD image
..................
You can also pull PROD images in parallel with optional verification.
Those are all available flags of ``pull-prod-image`` command:
.. image:: ./images/breeze/output_prod-image_pull.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_prod-image_pull.svg
:width: 100%
:alt: Breeze prod-image pull
Verifying PROD image
....................
Finally, you can verify PROD image by running tests - either with the pulled/built images or
with an arbitrary image.
Those are all available flags of ``verify-prod-image`` command:
.. image:: ./images/breeze/output_prod-image_verify.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_prod-image_verify.svg
:width: 100%
:alt: Breeze prod-image verify
Breeze setup
------------
Breeze has tools that you can use to configure defaults and breeze behaviours and perform some maintenance
operations that might be necessary when you add new commands in Breeze. It also allows to configure your
host operating system for Breeze autocompletion.
Those are all available flags of ``setup`` command:
.. image:: ./images/breeze/output_setup.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup.svg
:width: 100%
:alt: Breeze setup
Breeze configuration
....................
You can configure and inspect settings of Breeze command via this command: Python version, Backend used as
well as backend versions.
Another part of configuration is enabling/disabling cheatsheet, asciiart. The cheatsheet and asciiart can
be disabled - they are "nice looking" and cheatsheet
contains useful information for first time users but eventually you might want to disable both if you
find it repetitive and annoying.
With the config setting colour-blind-friendly communication for Breeze messages. By default we communicate
with the users about information/errors/warnings/successes via colour-coded messages, but we can switch
it off by passing ``--no-colour`` to config in which case the messages to the user printed by Breeze
will be printed using different schemes (italic/bold/underline) to indicate different kind of messages
rather than colours.
Those are all available flags of ``setup config`` command:
.. image:: ./images/breeze/output_setup_config.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup_config.svg
:width: 100%
:alt: Breeze setup config
Setting up autocompletion
.........................
You get the auto-completion working when you re-enter the shell (follow the instructions printed).
The command will warn you and not reinstall autocomplete if you already did, but you can
also force reinstalling the autocomplete via:
.. code-block:: bash
breeze setup autocomplete --force
Those are all available flags of ``setup-autocomplete`` command:
.. image:: ./images/breeze/output_setup_autocomplete.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup_autocomplete.svg
:width: 100%
:alt: Breeze setup autocomplete
Breeze version
..............
You can display Breeze version and with ``--verbose`` flag it can provide more information: where
Breeze is installed from and details about setup hashes.
Those are all available flags of ``version`` command:
.. image:: ./images/breeze/output_setup_version.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup_version.svg
:width: 100%
:alt: Breeze version
Breeze self-upgrade
...................
You can self-upgrade breeze automatically. Those are all available flags of ``self-upgrade`` command:
.. image:: ./images/breeze/output_setup_self-upgrade.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup_self-upgrade.svg
:width: 100%
:alt: Breeze setup self-upgrade
Regenerating images for documentation
.....................................
This documentation contains exported images with "help" of their commands and parameters. You can
regenerate those images that need to be regenerated because their commands changed (usually after
the breeze code has been changed) via ``regenerate-command-images`` command. Usually this is done
automatically via pre-commit, but sometimes (for example when ``rich`` or ``rich-click`` library changes)
you need to regenerate those images.
You can add ``--force`` flag (or ``FORCE="true"`` environment variable to regenerate all images (not
only those that need regeneration). You can also run the command with ``--check-only`` flag to simply
check if there are any images that need regeneration.
.. image:: ./images/breeze/output_setup_regenerate-command-images.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup_regenerate-command-images.svg
:width: 100%
:alt: Breeze setup regenerate-command-images
Breeze check-all-params-in-groups
...................
When you add a breeze command or modify a parameter, you are also supposed to make sure that "rich groups"
for the command is present and that all parameters are assigned to the right group so they can be
nicely presented in ``--help`` output. You can check that via ``check-all-params-in-groups`` command.
.. image:: ./images/breeze/output_setup_check-all-params-in-groups.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_setup_check-all-params-in-groups.svg
:width: 100%
:alt: Breeze setup check-all-params-in-group
CI tasks
--------
Breeze hase a number of commands that are mostly used in CI environment to perform cleanup.
.. image:: ./images/breeze/output_ci.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci.svg
:width: 100%
:alt: Breeze ci commands
Running resource check
......................
Breeze requires certain resources to be available - disk, memory, CPU. When you enter Breeze's shell,
the resources are checked and information if there is enough resources is displayed. However you can
manually run resource check any time by ``breeze ci resource-check`` command.
Those are all available flags of ``resource-check`` command:
.. image:: ./images/breeze/output_ci_resource-check.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci_resource-check.svg
:width: 100%
:alt: Breeze ci resource-check
Freeing the space
.................
When our CI runs a job, it needs all memory and disk it can have. We have a Breeze command that frees
the memory and disk space used. You can also use it clear space locally but it performs a few operations
that might be a bit invasive - such are removing swap file and complete pruning of docker disk space used.
Those are all available flags of ``free-space`` command:
.. image:: ./images/breeze/output_ci_free-space.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci_free-space.svg
:width: 100%
:alt: Breeze ci free-space
Fixing File/Directory Ownership
...............................
On Linux, there is a problem with propagating ownership of created files (a known Docker problem). The
files and directories created in the container are not owned by the host user (but by the root user in our
case). This may prevent you from switching branches, for example, if files owned by the root user are
created within your sources. In case you are on a Linux host and have some files in your sources created
by the root user, you can fix the ownership of those files by running :
.. code-block::
breeze ci fix-ownership
Those are all available flags of ``fix-ownership`` command:
.. image:: ./images/breeze/output_ci_fix-ownership.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci_fix-ownership.svg
:width: 100%
:alt: Breeze ci fix-ownership
Selective check
...............
When our CI runs a job, it needs to decide which tests to run, whether to build images and how much the test
should be run on multiple combinations of Python, Kubernetes, Backend versions. In order to optimize time
needed to run the CI Builds. You can also use the tool to test what tests will be run when you provide
a specific commit that Breeze should run the tests on.
The selective-check command will produce the set of ``name=value`` pairs of outputs derived
from the context of the commit/PR to be merged via stderr output.
More details about the algorithm used to pick the right tests and the available outputs can be
found in `Selective Checks <dev/breeze/SELECTIVE_CHECKS.md>`_.
Those are all available flags of ``selective-check`` command:
.. image:: ./images/breeze/output_ci_selective-check.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci_selective-check.svg
:width: 100%
:alt: Breeze ci selective-check
Getting workflow information
............................
When our CI runs a job, it might be within one of several workflows. Information about those workflows
is stored in GITHUB_CONTEXT. Rather than using some jq/bash commands, we retrieve the necessary information
(like PR labels, event_type, where the job runs on, job description and convert them into GA outputs.
Those are all available flags of ``get-workflow-info`` command:
.. image:: ./images/breeze/output_ci_get-workflow-info.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_ci_get-workflow-info.svg
:width: 100%
:alt: Breeze ci get-workflow-info
Release management tasks
------------------------
Maintainers also can use Breeze for other purposes (those are commands that regular contributors likely
do not need or have no access to run). Those are usually connected with releasing Airflow:
.. image:: ./images/breeze/output_release-management.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management.svg
:width: 100%
:alt: Breeze release management
Breeze can be used to prepare airflow packages - both "apache-airflow" main package and
provider packages.
Preparing provider documentation
................................
You can read more about testing provider packages in
`TESTING.rst <TESTING.rst#running-tests-with-provider-packages>`_
There are several commands that you can run in Breeze to manage and build packages:
* preparing Provider documentation files
* preparing Airflow packages
* preparing Provider packages
Preparing provider documentation files is part of the release procedure by the release managers
and it is described in detail in `dev <dev/README_RELEASE_PROVIDER_PACKAGES.md>`_ .
The below example perform documentation preparation for provider packages.
.. code-block:: bash
breeze release-management prepare-provider-documentation
By default, the documentation preparation runs package verification to check if all packages are
importable, but you can add ``--skip-package-verification`` to skip it.
.. code-block:: bash
breeze release-management prepare-provider-documentation --skip-package-verification
You can also add ``--answer yes`` to perform non-interactive build.
.. image:: ./images/breeze/output_release-management_prepare-provider-documentation.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_prepare-provider-documentation.svg
:width: 100%
:alt: Breeze prepare-provider-documentation
Preparing provider packages
...........................
You can use Breeze to prepare provider packages.
The packages are prepared in ``dist`` folder. Note, that this command cleans up the ``dist`` folder
before running, so you should run it before generating airflow package below as it will be removed.
The below example builds provider packages in the wheel format.
.. code-block:: bash
breeze release-management prepare-provider-packages
If you run this command without packages, you will prepare all packages, you can however specify
providers that you would like to build. By default ``both`` types of packages are prepared (
``wheel`` and ``sdist``, but you can change it providing optional --package-format flag.
.. code-block:: bash
breeze release-management prepare-provider-packages google amazon
You can see all providers available by running this command:
.. code-block:: bash
breeze release-management prepare-provider-packages --help
.. image:: ./images/breeze/output_release-management_prepare-provider-packages.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_prepare-provider-packages.svg
:width: 100%
:alt: Breeze prepare-provider-packages
Verifying provider packages
...........................
Breeze can also be used to verify if provider classes are importable and if they are following the
right naming conventions. This happens automatically on CI but you can also run it manually if you
just prepared provider packages and they are present in ``dist`` folder.
.. code-block:: bash
breeze release-management verify-provider-packages
You can also run the verification with an earlier airflow version to check for compatibility.
.. code-block:: bash
breeze release-management verify-provider-packages --use-airflow-version 2.4.0
All the command parameters are here:
.. image:: ./images/breeze/output_release-management_verify-provider-packages.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_verify-provider-packages.svg
:width: 100%
:alt: Breeze verify-provider-packages
Installing provider packages
............................
In some cases we want to just see if the provider packages generated can be installed with airflow without
verifying them. This happens automatically on CI for sdist pcackages but you can also run it manually if you
just prepared provider packages and they are present in ``dist`` folder.
.. code-block:: bash
breeze release-management install-provider-packages
You can also run the verification with an earlier airflow version to check for compatibility.
.. code-block:: bash
breeze release-management install-provider-packages --use-airflow-version 2.4.0
All the command parameters are here:
.. image:: ./images/breeze/output_release-management_install-provider-packages.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_install-provider-packages.svg
:width: 100%
:alt: Breeze install-provider-packages
Generating Provider Issue
.........................
You can use Breeze to generate a provider issue when you release new providers.
.. image:: ./images/breeze/output_release-management_generate-issue-content-providers.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_generate-issue-content-providers.svg
:width: 100%
:alt: Breeze generate-issue-content-providers
Preparing airflow packages
..........................
You can prepare airflow packages using Breeze:
.. code-block:: bash
breeze release-management prepare-airflow-package
This prepares airflow .whl package in the dist folder.
Again, you can specify optional ``--package-format`` flag to build selected formats of airflow packages,
default is to build ``both`` type of packages ``sdist`` and ``wheel``.
.. code-block:: bash
breeze release-management prepare-airflow-package --package-format=wheel
.. image:: ./images/breeze/output_release-management_prepare-airflow-package.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_prepare-airflow-package.svg
:width: 100%
:alt: Breeze release-management prepare-airflow-package
Generating constraints
......................
Whenever setup.py gets modified, the CI main job will re-generate constraint files. Those constraint
files are stored in separated orphan branches: ``constraints-main``, ``constraints-2-0``.
Those are constraint files as described in detail in the
`<CONTRIBUTING.rst#pinned-constraint-files>`_ contributing documentation.
You can use ``breeze release-management generate-constraints`` command to manually generate constraints for
all or selected python version and single constraint mode like this:
.. warning::
In order to generate constraints, you need to build all images with ``--upgrade-to-newer-dependencies``
flag - for all python versions.
.. code-block:: bash
breeze release-management generate-constraints --airflow-constraints-mode constraints
Constraints are generated separately for each python version and there are separate constraints modes:
* 'constraints' - those are constraints generated by matching the current airflow version from sources
and providers that are installed from PyPI. Those are constraints used by the users who want to
install airflow with pip.
* "constraints-source-providers" - those are constraints generated by using providers installed from
current sources. While adding new providers their dependencies might change, so this set of providers
is the current set of the constraints for airflow and providers from the current main sources.
Those providers are used by CI system to keep "stable" set of constraints.
* "constraints-no-providers" - those are constraints generated from only Apache Airflow, without any
providers. If you want to manage airflow separately and then add providers individually, you can
use those.
Those are all available flags of ``generate-constraints`` command:
.. image:: ./images/breeze/output_release-management_generate-constraints.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_generate-constraints.svg
:width: 100%
:alt: Breeze generate-constraints
In case someone modifies setup.py, the scheduled CI Tests automatically upgrades and
pushes changes to the constraint files, however you can also perform test run of this locally using
the procedure described in `Refreshing CI Cache <dev/REFRESHING_CI_CACHE.md#manually-generating-constraint-files>`_
which utilises multiple processors on your local machine to generate such constraints faster.
This bumps the constraint files to latest versions and stores hash of setup.py. The generated constraint
and setup.py hash files are stored in the ``files`` folder and while generating the constraints diff
of changes vs the previous constraint files is printed.
Releasing Production images
...........................
The **Production image** can be released by release managers who have permissions to push the image. This
happens only when there is an RC candidate or final version of Airflow released.
You release "regular" and "slim" images as separate steps.
Releasing "regular" images:
.. code-block:: bash
breeze release-management release-prod-images --airflow-version 2.4.0
Or "slim" images:
.. code-block:: bash
breeze release-management release-prod-images --airflow-version 2.4.0 --slim-images
By default when you are releasing the "final" image, we also tag image with "latest" tags but this
step can be skipped if you pass the ``--skip-latest`` flag.
These are all of the available flags for the ``release-prod-images`` command:
.. image:: ./images/breeze/output_release-management_release-prod-images.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_release-management_release-prod-images.svg
:width: 100%
:alt: Breeze release management release prod images
Details of Breeze usage
=======================
Database volumes in Breeze
--------------------------
Breeze keeps data for all it's integration in named docker volumes. Each backend and integration
keeps data in their own volume. Those volumes are persisted until ``breeze down`` command.
You can also preserve the volumes by adding flag ``--preserve-volumes`` when you run the command.
Then, next time when you start Breeze, it will have the data pre-populated.
Those are all available flags of ``down`` command:
.. image:: ./images/breeze/output-down.svg
:target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output-down.svg
:width: 100%
:alt: Breeze down
Additional tools
----------------
To shrink the Docker image, not all tools are pre-installed in the Docker image. But we have made sure that there
is an easy process to install additional tools.
Additional tools are installed in ``/files/bin``. This path is added to ``$PATH``, so your shell will
automatically autocomplete files that are in that directory. You can also keep the binaries for your tools
in this directory if you need to.
**Installation scripts**
For the development convenience, we have also provided installation scripts for commonly used tools. They are
installed to ``/files/opt/``, so they are preserved after restarting the Breeze environment. Each script
is also available in ``$PATH``, so just type ``install_<TAB>`` to get a list of tools.
Currently available scripts:
* ``install_aws.sh`` - installs `the AWS CLI <https://aws.amazon.com/cli/>`__ including
* ``install_az.sh`` - installs `the Azure CLI <https://github.com/Azure/azure-cli>`__ including
* ``install_gcloud.sh`` - installs `the Google Cloud SDK <https://cloud.google.com/sdk>`__ including
``gcloud``, ``gsutil``.
* ``install_imgcat.sh`` - installs `imgcat - Inline Images Protocol <https://iterm2.com/documentation-images.html>`__
for iTerm2 (Mac OS only)
* ``install_java.sh`` - installs `the OpenJDK 8u41 <https://openjdk.java.net/>`__
* ``install_kubectl.sh`` - installs `the Kubernetes command-line tool, kubectl <https://kubernetes.io/docs/reference/kubectl/kubectl/>`__
* ``install_snowsql.sh`` - installs `SnowSQL <https://docs.snowflake.com/en/user-guide/snowsql.html>`__
* ``install_terraform.sh`` - installs `Terraform <https://www.terraform.io/docs/index.html>`__
Launching Breeze integrations
-----------------------------
When Breeze starts, it can start additional integrations. Those are additional docker containers
that are started in the same docker-compose command. Those are required by some of the tests
as described in `<TESTING.rst#airflow-integration-tests>`_.
By default Breeze starts only airflow container without any integration enabled. If you selected
``postgres`` or ``mysql`` backend, the container for the selected backend is also started (but only the one
that is selected). You can start the additional integrations by passing ``--integration`` flag
with appropriate integration name when starting Breeze. You can specify several ``--integration`` flags
to start more than one integration at a time.
Finally you can specify ``--integration all-testable`` to start all testable integrations and
``--integration all`` to enable all integrations.
Once integration is started, it will continue to run until the environment is stopped with
``breeze down`` command.
Note that running integrations uses significant resources - CPU and memory.
Using local virtualenv environment in Your Host IDE
---------------------------------------------------
You can set up your host IDE (for example, IntelliJ's PyCharm/Idea) to work with Breeze
and benefit from all the features provided by your IDE, such as local and remote debugging,
language auto-completion, documentation support, etc.
To use your host IDE with Breeze:
1. Create a local virtual environment:
You can use any of the following wrappers to create and manage your virtual environments:
`pyenv <https://github.com/pyenv/pyenv>`_, `pyenv-virtualenv <https://github.com/pyenv/pyenv-virtualenv>`_,
or `virtualenvwrapper <https://virtualenvwrapper.readthedocs.io/en/latest/>`_.
2. Use the right command to activate the virtualenv (``workon`` if you use virtualenvwrapper or
``pyenv activate`` if you use pyenv.
3. Initialize the created local virtualenv:
.. code-block:: bash
./scripts/tools/initialize_virtualenv.py
.. warning::
Make sure that you use the right Python version in this command - matching the Python version you have
in your local virtualenv. If you don't, you will get strange conflicts.
4. Select the virtualenv you created as the project's default virtualenv in your IDE.
Note that you can also use the local virtualenv for Airflow development without Breeze.
This is a lightweight solution that has its own limitations.
More details on using the local virtualenv are available in the `LOCAL_VIRTUALENV.rst <LOCAL_VIRTUALENV.rst>`_.
Internal details of Breeze
==========================
Airflow directory structure inside container
--------------------------------------------
When you are in the CI container, the following directories are used:
.. code-block:: text
/opt/airflow - Contains sources of Airflow mounted from the host (AIRFLOW_SOURCES).
/root/airflow - Contains all the "dynamic" Airflow files (AIRFLOW_HOME), such as:
airflow.db - sqlite database in case sqlite is used;
logs - logs from Airflow executions;
unittest.cfg - unit test configuration generated when entering the environment;
webserver_config.py - webserver configuration generated when running Airflow in the container.
/files - files mounted from "files" folder in your sources. You can edit them in the host as well
dags - this is the folder where Airflow DAGs are read from
airflow-breeze-config - this is where you can keep your own customization configuration of breeze
Note that when running in your local environment, the ``/root/airflow/logs`` folder is actually mounted
from your ``logs`` directory in the Airflow sources, so all logs created in the container are automatically
visible in the host as well. Every time you enter the container, the ``logs`` directory is
cleaned so that logs do not accumulate.
When you are in the production container, the following directories are used:
.. code-block:: text
/opt/airflow - Contains sources of Airflow mounted from the host (AIRFLOW_SOURCES).
/root/airflow - Contains all the "dynamic" Airflow files (AIRFLOW_HOME), such as:
airflow.db - sqlite database in case sqlite is used;
logs - logs from Airflow executions;
unittest.cfg - unit test configuration generated when entering the environment;
webserver_config.py - webserver configuration generated when running Airflow in the container.
/files - files mounted from "files" folder in your sources. You can edit them in the host as well
dags - this is the folder where Airflow DAGs are read from
Note that when running in your local environment, the ``/root/airflow/logs`` folder is actually mounted
from your ``logs`` directory in the Airflow sources, so all logs created in the container are automatically
visible in the host as well. Every time you enter the container, the ``logs`` directory is
cleaned so that logs do not accumulate.
Setting default answers for user interaction
--------------------------------------------
Sometimes during the build, you are asked whether to perform an action, skip it, or quit. This happens
when rebuilding or removing an image and in few other cases - actions that take a lot of time
or could be potentially destructive. You can force answer to the questions by providing an
``--answer`` flag in the commands that support it.
For automation scripts, you can export the ``ANSWER`` variable (and set it to
``y``, ``n``, ``q``, ``yes``, ``no``, ``quit`` - in all case combinations).
.. code-block::
export ANSWER="yes"
Mounting Local Sources to Breeze
--------------------------------
Important sources of Airflow are mounted inside the ``airflow`` container that you enter.
This means that you can continue editing your changes on the host in your favourite IDE and have them
visible in the Docker immediately and ready to test without rebuilding images. You can disable mounting
by specifying ``--skip-mounting-local-sources`` flag when running Breeze. In this case you will have sources
embedded in the container and changes to these sources will not be persistent.
After you run Breeze for the first time, you will have empty directory ``files`` in your source code,
which will be mapped to ``/files`` in your Docker container. You can pass there any files you need to
configure and run Docker. They will not be removed between Docker runs.
By default ``/files/dags`` folder is mounted from your local ``<AIRFLOW_SOURCES>/files/dags`` and this is
the directory used by airflow scheduler and webserver to scan dags for. You can use it to test your dags
from local sources in Airflow. If you wish to add local DAGs that can be run by Breeze.
The ``/files/airflow-breeze-config`` folder contains configuration files that might be used to
customize your breeze instance. Those files will be kept across checking out a code from different
branches and stopping/starting breeze so you can keep your configuration there and use it continuously while
you switch to different source code versions.
Port Forwarding
---------------
When you run Airflow Breeze, the following ports are automatically forwarded:
* 12322 -> forwarded to Airflow ssh server -> airflow:22
* 28080 -> forwarded to Airflow webserver -> airflow:8080
* 25555 -> forwarded to Flower dashboard -> airflow:5555
* 25433 -> forwarded to Postgres database -> postgres:5432
* 23306 -> forwarded to MySQL database -> mysql:3306
* 21433 -> forwarded to MSSQL database -> mssql:1443
* 26379 -> forwarded to Redis broker -> redis:6379
You can connect to these ports/databases using:
* ssh connection for remote debugging: ssh -p 12322 airflow@127.0.0.1 pw: airflow
* Webserver: http://127.0.0.1:28080
* Flower: http://127.0.0.1:25555
* Postgres: jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
* Mysql: jdbc:mysql://127.0.0.1:23306/airflow?user=root
* MSSQL: jdbc:sqlserver://127.0.0.1:21433;databaseName=airflow;user=sa;password=Airflow123
* Redis: redis://127.0.0.1:26379/0
If you do not use ``start-airflow`` command, you can start the webserver manually with
the ``airflow webserver`` command if you want to run it. You can use ``tmux`` to multiply terminals.
You may need to create a user prior to running the webserver in order to log in.
This can be done with the following command:
.. code-block:: bash
airflow users create --role Admin --username admin --password admin --email admin@example.com --firstname foo --lastname bar
For databases, you need to run ``airflow db reset`` at least once (or run some tests) after you started
Airflow Breeze to get the database/tables created. You can connect to databases with IDE or any other
database client:
.. raw:: html
<div align="center">
<img src="images/database_view.png" width="640"
alt="Airflow Breeze - Database view">
</div>
You can change the used host port numbers by setting appropriate environment variables:
* ``SSH_PORT``
* ``WEBSERVER_HOST_PORT``
* ``POSTGRES_HOST_PORT``
* ``MYSQL_HOST_PORT``
* ``MSSQL_HOST_PORT``
* ``FLOWER_HOST_PORT``
* ``REDIS_HOST_PORT``
If you set these variables, next time when you enter the environment the new ports should be in effect.
Managing Dependencies
---------------------
If you need to change apt dependencies in the ``Dockerfile.ci``, add Python packages in ``setup.py``
for airflow and in provider.yaml for packages. If you add any "node" dependencies in ``airflow/www``
, you need to compile them in the host with ``breeze compile-www-assets`` command.
Adding Dependencies Permanently
...............................
You can add dependencies to the ``Dockerfile.ci``, ``setup.py``.
After you exit the container and re-run ``breeze``, Breeze detects changes in dependencies,
asks you to confirm rebuilding the image and proceeds with rebuilding if you confirm (or skip it
if you do not confirm). After rebuilding is done, Breeze drops you to shell. You may also use the
``build`` command to only rebuild CI image and not to go into shell.
Incremental apt Dependencies in the Dockerfile.ci during development
....................................................................
During development, changing dependencies in ``apt-get`` closer to the top of the ``Dockerfile.ci``
invalidates cache for most of the image. It takes long time for Breeze to rebuild the image.
So, it is a recommended practice to add new dependencies initially closer to the end
of the ``Dockerfile.ci``. This way dependencies will be added incrementally.
Before merge, these dependencies should be moved to the appropriate ``apt-get install`` command,
which is already in the ``Dockerfile.ci``.
Recording command output
========================
Breeze uses built-in capability of ``rich`` to record and print the command help as an ``svg`` file.
It's enabled by setting ``RECORD_BREEZE_OUTPUT_FILE`` to a file name where it will be recorded.
By default it records the screenshots with default characters width and with "Breeze screenshot" title,
but you can override it with ``RECORD_BREEZE_WIDTH`` and ``RECORD_BREEZE_TITLE`` variables respectively.
Uninstalling Breeze
===================
Breeze was installed with ``pipx``, with ``pipx list``, you can list the installed packages.
Once you have the name of ``breeze`` package you can proceed to uninstall it.
.. code-block:: bash
pipx list
This will also remove breeze from the folder: ``${HOME}.local/bin/``
.. code-block:: bash
pipx uninstall apache-airflow-breeze