tree: 86b8865496a9b3febe05ab617698722f7fa4fa12 [path history] [tgz]
  1. base/
  2. base-dev/
  3. conf/
  4. flink/
  5. kafka/
  6. service/
  7. spark_hadoop_mr/
  8. utilities/
  9. ci_run.sh
  10. Dockerfile
  11. README.md
  12. validate.sh
packaging/bundle-validation/README.md

Bundle Validation for Hudi

This directory contains scripts for running bundle validation in Github Actions (validate-bundles specified in .github/workflows/bot.yml) and build profile for Docker images used.

Docker Image for Bundle Validation

The base image for bundle validation is pre-built and upload to the Docker Hub: https://hub.docker.com/r/apachehudi/hudi-ci-bundle-validation-base.

The Dockerfile for the image is under base/. To build the image with updated Dockerfile, you may use the script in the folder. Here are the docker commands to build the image by specifying different versions:

docker build \
 --build-arg HIVE_VERSION=3.1.3 \
 --build-arg FLINK_VERSION=1.13.6 \
 --build-arg SPARK_VERSION=3.1.3 \
 --build-arg SPARK_HADOOP_VERSION=2.7 \
 -t hudi-ci-bundle-validation-base:flink1136hive313spark313 .
docker image tag hudi-ci-bundle-validation-base:flink1136hive313spark313 apachehudi/hudi-ci-bundle-validation-base:flink1136hive313spark313

To upload the image with the tag:

docker push apachehudi/hudi-ci-bundle-validation-base:flink1136hive313spark313

Note that for each library like Hive and Spark, the download and extraction happen under one RUN instruction so that only one layer is generated to limit the size of the image. However, this makes repeated downloads when rebuilding the image. If you need faster iteration for local build, you may use the Dockerfile under base-dev/, which uses ADD instruction for downloads, which provides caching across builds. This increases the size of the generated image compared to base/ and the image should only be used for development only and not be pushed to remote.