tag | e36518cc31994b1cfc9f97f44285ee2059e6362a | |
---|---|---|
tagger | Udit Mehrotra <udit.mehrotra90@gmail.com> | Fri Aug 20 14:24:10 2021 -0700 |
object | c9c9efa3ab8947c69e86e6f068199a6d75a7fbdc |
0.9.0
commit | c9c9efa3ab8947c69e86e6f068199a6d75a7fbdc | [log] [tgz] |
---|---|---|
author | Udit Mehrotra <udit.mehrotra90@gmail.com> | Fri Aug 20 13:45:17 2021 -0700 |
committer | Udit Mehrotra <udit.mehrotra90@gmail.com> | Fri Aug 20 13:45:17 2021 -0700 |
tree | 9602ef0de88d6f791a24c29e0f095fa5e0e3731d | |
parent | 2e69c23a29de350dcc9b92a3b1cb12c133eca036 [diff] |
Bumping release candidate number 2
Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals
. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).
Hudi supports three types of queries:
Learn more about Hudi at https://hudi.apache.org
Prerequisites for building Apache Hudi:
# Checkout code and build git clone https://github.com/apache/hudi.git && cd hudi mvn clean package -DskipTests # Start command spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
To build the Javadoc for all Java and Scala classes:
# Javadoc generated under target/site/apidocs mvn clean javadoc:aggregate -Pjavadocs
The default Scala version supported is 2.11. To build for Scala 2.12 version, build using scala-2.12
profile
mvn clean package -DskipTests -Dscala-2.12
The default Spark version supported is 2.4.4. To build for Spark 3.0.0 version, build using spark3
profile
mvn clean package -DskipTests -Dspark3
The default hudi-jar bundles spark-avro module. To build without spark-avro module, build using spark-shade-unbundle-avro
profile
# Checkout code and build git clone https://github.com/apache/hudi.git && cd hudi mvn clean package -DskipTests -Pspark-shade-unbundle-avro # Start command spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages org.apache.spark:spark-avro_2.11:2.4.4 \ --jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
Unit tests can be run with maven profile unit-tests
.
mvn -Punit-tests test
Functional tests, which are tagged with @Tag("functional")
, can be run with maven profile functional-tests
.
mvn -Pfunctional-tests test
To run tests with spark event logging enabled, define the Spark event log directory. This allows visualizing test DAG and stages using Spark History Server UI.
mvn -Punit-tests test -DSPARK_EVLOG_DIR=/path/for/spark/event/log
Please visit https://hudi.apache.org/docs/quick-start-guide.html to quickly explore Hudi's capabilities using spark-shell.