commit | 8919be6a5d8038db7265bfd7459d72fbd545f133 | [log] [tgz] |
---|---|---|
author | Balaji Varadarajan <varadarb@uber.com> | Sun Jun 28 02:04:50 2020 -0700 |
committer | GitHub <noreply@github.com> | Sun Jun 28 02:04:50 2020 -0700 |
tree | b3931b411a52859c0c0e3f94b9dfaf305de864fa | |
parent | 31247e9b3417871d90454c1c62f8d3bdcbd7ffa9 [diff] |
[HUDI-855] Run Cleaner async with writing (#1577) - Cleaner can now run concurrently with write operation - Configs to turn on/off Co-authored-by: Vinoth Chandar <vinoth@apache.org>
Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals
. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).
Hudi supports three types of queries:
Learn more about Hudi at https://hudi.apache.org
Prerequisites for building Apache Hudi:
# Checkout code and build git clone https://github.com/apache/hudi.git && cd hudi mvn clean package -DskipTests -DskipITs # Start command spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
To build the Javadoc for all Java and Scala classes:
# Javadoc generated under target/site/apidocs mvn clean javadoc:aggregate -Pjavadocs
The default Scala version supported is 2.11. To build for Scala 2.12 version, build using scala-2.12
profile
mvn clean package -DskipTests -DskipITs -Dscala-2.12
The default hudi-jar bundles spark-avro module. To build without spark-avro module, build using spark-shade-unbundle-avro
profile
# Checkout code and build git clone https://github.com/apache/hudi.git && cd hudi mvn clean package -DskipTests -DskipITs -Pspark-shade-unbundle-avro # Start command spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages org.apache.spark:spark-avro_2.11:2.4.4 \ --jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
Please visit https://hudi.apache.org/docs/quick-start-guide.html to quickly explore Hudi's capabilities using spark-shell.