Multi FS Support

 - Reviving PR 191, to make FileSystem creation off actual path
 - Streamline all filesystem access to HoodieTableMetaClient
 - Hadoop Conf from Spark Context serialized & passed to executor code too
 - Pick up env vars prefixed with HOODIE_ENV_ into Configuration object
 - Cleanup usage of FSUtils.getFS, piggybacking off HoodieTableMetaClient.getFS
 - Adding s3a to supported schemes & support escaping "." in env vars
 - Tests use HoodieTestUtils.getDefaultHadoopConf
78 files changed
tree: ca379d5624ce9ae848ab59e04e615968ec7e73cf
  1. deploy/
  2. docs/
  3. hoodie-cli/
  4. hoodie-client/
  5. hoodie-common/
  6. hoodie-hadoop-mr/
  7. hoodie-hive/
  8. hoodie-spark/
  9. hoodie-utilities/
  10. .gitignore
  11. .travis.yml
  12. _config.yml
  13. CHANGELOG.md
  14. LICENSE.txt
  15. pom.xml
  16. README.md
  17. RELEASE_NOTES.md
README.md

Hudi

Hudi (pronounced Hoodie) stands for Hadoop Upserts anD Incrementals. Hudi manages storage of large analytical datasets on HDFS and serve them out via two types of tables

  • Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
  • Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here