tree: 3adcd8090663b97ebbe89f2a8d1ab7378060d5dc [path history] [tgz]
  1. hudi-spark/
  2. hudi-spark-common/
  3. hudi-spark3-common/
  4. hudi-spark3.3.x/
  5. hudi-spark3.4.x/
  6. hudi-spark3.5.x/
  7. pom.xml
  8. README.md
hudi-spark-datasource/README.md

Description of the relationship between each module

This repo contains the code that integrate Hudi with Spark. The repo is split into the following modules

hudi-spark hudi-spark3.3.x hudi-spark3.4.x hudi-spark3.5.x hudi-spark3-common hudi-spark-common

  • hudi-spark is the module that contains the code that spark3 version would share.
  • hudi-spark3.3.x is the module that contains the code that compatible with spark3.3.x versions.
  • hudi-spark3.4.x is the module that contains the code that compatible with spark 3.4.x versions.
  • hudi-spark3.5.x is the module that contains the code that compatible with spark 3.5.x versions.
  • hudi-spark3-common is the module that contains the code that would be reused between spark3.x versions.
  • hudi-spark-common is the module that contains the code that would be reused between spark3.x versions.

Description of Time Travel

  • HoodieSpark3_2ExtendedSqlAstBuilder have comments in the spark3.2's code fork from org.apache.spark.sql.catalyst.parser.AstBuilder, and additional withTimeTravel method.
  • SqlBase.g4 have comments in the code forked from spark3.2's parser, and add SparkSQL Syntax TIMESTAMP AS OF and VERSION AS OF.

Time Travel Support Spark Version:

versionsupport
2.4.xNo
3.0.xNo
3.1.2No
3.2.0Yes

To improve:

Spark3.3 support time travel syntax link SPARK-37219. Once Spark 3.3 released. The files in the following list will be removed:

  • hudi-spark3.3.x's HoodieSpark3_3ExtendedSqlAstBuilder.scala, HoodieSpark3_3ExtendedSqlParser.scala, TimeTravelRelation.scala, SqlBase.g4, HoodieSqlBase.g4 Tracking Jira: HUDI-4468

Some other improvements undergoing:

  • Port borrowed classes from Spark 3.3 HUDI-4467