commit | 8a277f6b65bb5ef5ff0d6475ee48eff8c1c72e37 | [log] [tgz] |
---|---|---|
author | Sagar Sumit <sagarsumit09@gmail.com> | Tue Aug 09 20:02:45 2022 +0530 |
committer | Sagar Sumit <sagarsumit09@gmail.com> | Wed Aug 10 18:03:46 2022 +0530 |
tree | 23a32f03304544aa0ee86a6ad5c8b6be677a6904 | |
parent | ae71461057e8434d78866513f48bbfec2fee8e69 [diff] |
Bumping release candidate number 2
Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals
. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).
Hudi supports three types of queries:
Learn more about Hudi at https://hudi.apache.org
Prerequisites for building Apache Hudi:
# Checkout code and build git clone https://github.com/apache/hudi.git && cd hudi mvn clean package -DskipTests # Start command spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
To build for integration tests that include hudi-integ-test-bundle
, use -Dintegration-tests
.
To build the Javadoc for all Java and Scala classes:
# Javadoc generated under target/site/apidocs mvn clean javadoc:aggregate -Pjavadocs
The default Spark version supported is 2.4.4. Refer to the table below for building with different Spark and Scala versions.
Maven build options | Expected Spark bundle jar name | Notes |
---|---|---|
(empty) | hudi-spark-bundle_2.11 (legacy bundle name) | For Spark 2.4.4 and Scala 2.11 (default options) |
-Dspark2.4 | hudi-spark2.4-bundle_2.11 | For Spark 2.4.4 and Scala 2.11 (same as default) |
-Dspark2.4 -Dscala-2.12 | hudi-spark2.4-bundle_2.12 | For Spark 2.4.4 and Scala 2.12 |
-Dspark3.1 -Dscala-2.12 | hudi-spark3.1-bundle_2.12 | For Spark 3.1.x and Scala 2.12 |
-Dspark3.2 -Dscala-2.12 | hudi-spark3.2-bundle_2.12 | For Spark 3.2.x and Scala 2.12 |
-Dspark3 | hudi-spark3-bundle_2.12 (legacy bundle name) | For Spark 3.2.x and Scala 2.12 |
-Dscala-2.12 | hudi-spark-bundle_2.12 (legacy bundle name) | For Spark 2.4.4 and Scala 2.12 |
For example,
# Build against Spark 3.2.x mvn clean package -DskipTests -Dspark3.2 -Dscala-2.12 # Build against Spark 3.1.x mvn clean package -DskipTests -Dspark3.1 -Dscala-2.12 # Build against Spark 2.4.4 and Scala 2.12 mvn clean package -DskipTests -Dspark2.4 -Dscala-2.12
Starting from versions 0.11, Hudi no longer requires spark-avro
to be specified using --packages
The default Flink version supported is 1.14. Refer to the table below for building with different Flink and Scala versions.
Maven build options | Expected Flink bundle jar name | Notes |
---|---|---|
(empty) | hudi-flink1.14-bundle_2.11 | For Flink 1.14 and Scala 2.11 (default options) |
-Dflink1.14 | hudi-flink1.14-bundle_2.11 | For Flink 1.14 and Scala 2.11 (same as default) |
-Dflink1.14 -Dscala-2.12 | hudi-flink1.14-bundle_2.12 | For Flink 1.14 and Scala 2.12 |
-Dflink1.13 | hudi-flink1.13-bundle_2.11 | For Flink 1.13 and Scala 2.11 |
-Dflink1.13 -Dscala-2.12 | hudi-flink1.13-bundle_2.12 | For Flink 1.13 and Scala 2.12 |
Unit tests can be run with maven profile unit-tests
.
mvn -Punit-tests test
Functional tests, which are tagged with @Tag("functional")
, can be run with maven profile functional-tests
.
mvn -Pfunctional-tests test
To run tests with spark event logging enabled, define the Spark event log directory. This allows visualizing test DAG and stages using Spark History Server UI.
mvn -Punit-tests test -DSPARK_EVLOG_DIR=/path/for/spark/event/log
Please visit https://hudi.apache.org/docs/quick-start-guide.html to quickly explore Hudi's capabilities using spark-shell.