Table of Contents generated with DocToc

Automated cache refreshing in CI

Our CI system is build in the way that it self-maintains. Regular scheduled builds and merges to main branch have separate maintenance step that take care about refreshing the cache that is used to speed up our builds and to speed up rebuilding of Breeze images for development purpose. This is all happening automatically, usually:

  • The latest constraints are pushed to appropriate branch after all tests succeeded in main merge or in scheduled build

  • The images in ghcr.io registry are refreshed after every successful merge to main or scheduled build and after pushing the constraints, this means that the latest image cache uses also the latest tested constraints

Sometimes however, when we have prolonged period of fighting with flakiness of GitHub Actions runners or our tests, the refresh might not be triggered - because tests will not succeed for some time. In this case manual refresh might be needed.

Manually generating constraint files

breeze ci-image build --run-in-parallel --upgrade-to-newer-dependencies --answer yes
breeze release-management generate-constraints --airflow-constraints-mode constraints --run-in-parallel --answer yes
breeze release-management generate-constraints --airflow-constraints-mode constraints-source-providers --run-in-parallel --answer yes
breeze release-management generate-constraints --airflow-constraints-mode constraints-no-providers --run-in-parallel --answer yes

AIRFLOW_SOURCES=$(pwd)

The constraints will be generated in files/constraints-PYTHON_VERSION/constraints-*.txt files. You need to check out the right ‘constraints-’ branch in a separate repository, and then you can copy, commit and push the generated files:

cd <AIRFLOW_WITH_CONSTRAINTS-MAIN_DIRECTORY>
git pull
cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
git diff
git add .
git commit -m "Your commit message here" --no-verify
git push

Manually refreshing the images

Note that in order to refresh images you have to not only have buildx command installed for docker, but you should also make sure that you have the buildkit builder configured and set. Since we also build multi-platform images (for both AMD and ARM), you need to have support for qemu or hardware ARM/AMD builders configured.

Setting up cache refreshing with emulation

According to the official installation instructions this can be achieved via:

docker run --privileged --rm tonistiigi/binfmt --install all

More information can be found here

However, emulation is very slow - more than 10x slower than hardware-backed builds.

Setting up cache refreshing with hardware ARM/AMD support

If you plan to build a number of images, probably better solution is to set up a hardware remote builder for your ARM or AMD builds (depending which platform you build images on - the “other” platform should be remote.

This can be achieved by settings build as described in this guideline and adding it to docker buildx airflow_cache builder.

This usually can be done with those two commands:

docker buildx create --name airflow_cache   # your local builder
docker buildx create --name airflow_cache --append HOST:PORT  # your remote builder

One of the ways to have HOST:PORT is to login to the remote machine via SSH and forward the port to the docker engine running on the remote machine.

When everything is fine you should see both local and remote builder configured and reporting status:

docker buildx ls

  airflow_cache          docker-container
       airflow_cache0    unix:///var/run/docker.sock
       airflow_cache1    tcp://127.0.0.1:2375

How to refresh the image

The images can be rebuilt and refreshed after the constraints are pushed. Refreshing image for all python version sis a simple as running the refresh_images.sh script which will sequentially rebuild all the images. Usually building several images in parallel on one machine does not speed up the build significantly, that's why the images are build sequentially.

./dev/refresh_images.sh