tag | b9bb3c4db02203a021f3db65cdbace9ea92aa2d4 | |
---|---|---|
tagger | Raymond Xu <xu.shiyan.raymond@gmail.com> | Wed Apr 06 17:19:13 2022 +0800 |
object | 209d54164842621d4107c0bc992dba78d58e223e |
0.11.0
commit | 209d54164842621d4107c0bc992dba78d58e223e | [log] [tgz] |
---|---|---|
author | Raymond Xu <xu.shiyan.raymond@gmail.com> | Wed Apr 06 15:24:49 2022 +0800 |
committer | Raymond Xu <xu.shiyan.raymond@gmail.com> | Wed Apr 06 15:24:49 2022 +0800 |
tree | 2454d285fa2a3367e6b21f47bbc83bad0493a413 | |
parent | 8baeb816d553f202c7bd4b6564d6955be46d74b5 [diff] |
Create release branch for version 0.11.0.
Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals
. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).
Hudi supports three types of queries:
Learn more about Hudi at https://hudi.apache.org
Prerequisites for building Apache Hudi:
# Checkout code and build git clone https://github.com/apache/hudi.git && cd hudi mvn clean package -DskipTests # Start command spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
To build the Javadoc for all Java and Scala classes:
# Javadoc generated under target/site/apidocs mvn clean javadoc:aggregate -Pjavadocs
The default Spark version supported is 2.4.4. To build for different Spark versions and Scala 2.12, use the corresponding profile
Label | Artifact Name for Spark Bundle | Maven Profile Option | Notes |
---|---|---|---|
Spark 2.4, Scala 2.11 | hudi-spark2.4-bundle_2.11 | -Pspark2.4 | For Spark 2.4.4, which is the same as the default |
Spark 2.4, Scala 2.12 | hudi-spark2.4-bundle_2.12 | -Pspark2.4,scala-2.12 | For Spark 2.4.4, which is the same as the default and Scala 2.12 |
Spark 3.1, Scala 2.12 | hudi-spark3.1-bundle_2.12 | -Pspark3.1 | For Spark 3.1.x |
Spark 3.2, Scala 2.12 | hudi-spark3.2-bundle_2.12 | -Pspark3.2 | For Spark 3.2.x |
Spark 3, Scala 2.12 | hudi-spark3-bundle_2.12 | -Pspark3 | This is the same as Spark 3.2, Scala 2.12 |
Spark, Scala 2.11 | hudi-spark-bundle_2.11 | Default | The default profile, supporting Spark 2.4.4 |
Spark, Scala 2.12 | hudi-spark-bundle_2.12 | -Pscala-2.12 | The default profile (for Spark 2.4.4) with Scala 2.12 |
For example,
# Build against Spark 3.2.x (the default build shipped with the public Spark 3 bundle) mvn clean package -DskipTests -Pspark3.2 # Build against Spark 3.1.x mvn clean package -DskipTests -Pspark3.1 # Build against Spark 2.4.4 and Scala 2.12 mvn clean package -DskipTests -Pspark2.4,scala-2.12
Starting from versions 0.11, Hudi no longer requires spark-avro
to be specified using --packages
Unit tests can be run with maven profile unit-tests
.
mvn -Punit-tests test
Functional tests, which are tagged with @Tag("functional")
, can be run with maven profile functional-tests
.
mvn -Pfunctional-tests test
To run tests with spark event logging enabled, define the Spark event log directory. This allows visualizing test DAG and stages using Spark History Server UI.
mvn -Punit-tests test -DSPARK_EVLOG_DIR=/path/for/spark/event/log
Please visit https://hudi.apache.org/docs/quick-start-guide.html to quickly explore Hudi's capabilities using spark-shell.