Adding hoodie-spark to support Spark Datasource for Hoodie

 - Write with COW/MOR paths work fully
 - Read with RO view works on both storages*
 - Incremental view supported on COW
 - Refactored out HoodieReadClient methods, to just contain key based access
 - HoodieDataSourceHelpers class can be now used to construct inputs to datasource
 - Tests in hoodie-client using new helpers and mechanisms
 - Basic tests around save modes & insert/upserts (more to follow)
 - Bumped up scala to 2.11, since 2.10 is deprecated & complains with scalatest
 - Updated documentation to describe usage
 - New sample app written using the DataSource API
45 files changed
tree: 7cb72248968ce86dbb7057d906668fe93014ed6c
  1. deploy/
  2. docs/
  3. hoodie-cli/
  4. hoodie-client/
  5. hoodie-common/
  6. hoodie-hadoop-mr/
  7. hoodie-hive/
  8. hoodie-spark/
  9. hoodie-utilities/
  10. .gitignore
  11. .travis.yml
  12. _config.yml
  13. CHANGELOG.md
  14. LICENSE.txt
  15. pom.xml
  16. README.md
README.md

Hoodie

Hoodie manages storage of large analytical datasets on HDFS and serve them out via two types of tables

  • Read Optimized Table - Provides excellent query performance via purely columnar storage (e.g. Parquet)
  • Near-Real time Table (WIP) - Provides queries on real-time data, using a combination of columnar & row based storage (e.g Parquet + Avro)

For more, head over here