GHA workflows are defined under .github/workflows/
.
We have a weekly job defined in docker_image.yml
to build docker images for CI verification. The docker files and images are listed below:
file | images | comments |
---|---|---|
dev/docker/Dockerfile.centos7-static-build | apache/gluten:vcpkg-centos-7 | centos 7, static link, jdk8 |
dev/docker/Dockerfile.centos8-static-build | apache/gluten:vcpkg-centos-8 | centos 8, static link, jdk8 |
dev/docker/Dockerfile.centos8-dynamic-build | apache/gluten:centos-8-jdk8 | centos 8, dynamic link, jdk8 |
dev/docker/Dockerfile.centos8-dynamic-build | apache/gluten:centos-8-jdk11 | centos 8, dynamic link, jdk11 |
dev/docker/Dockerfile.centos8-dynamic-build | apache/gluten:centos-8-jdk17 | centos 8, dynamic link, jdk17 |
dev/docker/cudf/Dockerfile | apache/gluten:centos-9-jdk8-cudf | centos 9, dynamic link, jdk8 |
Docker images can be found from https://hub.docker.com/r/apache/gluten/tags
Gluten main branch is pulled down during static build in docker. And vcpkg will cache binary data of all dependencies defined under dev/vcpkg. These binary data is cached into /var/cache/vcpkg
and CI job can re-use them in new build. By setting VCPKG_BINARY_SOURCES=clear
in env., reusing vcpkg cache can be disabled.
Arrow libs are pre-installed in docker, assuming they are not actively changed, then not necessarily to be re-built every time.
The dependency libraries are pre installed in to /root/.m2 by mvn dependency:go-offline
Spark is set to 3.5 by default.
Since the docker image is rebuilt weekly, the ccache mostly are outdated. So the cache is removed from the image.
Two GitHub secrets DOCKERHUB_USER
& DOCKERHUB_TOKEN
can be used to push docker image to docker hub: https://hub.docker.com/r/apache/gluten/tags. Note GitHub secrets are not retrievable in PR from forked repo.