DRILL-4982: Separate Hive reader classes for different data formats to improve performance.

1, Separating Hive reader classes allows optimization to apply on different classes in optimized ways. This  separation effectively avoid the performance degradation of scan.

2, Do not apply Skip footer/header mechanism on most Hive formats. This skip mechanism introduces extra checks on each incoming records.

close apache/drill#638
6 files changed
tree: 5caa40bd6e07158377f7ec9fc4f96a8c47ef40e7
  1. common/
  2. contrib/
  3. distribution/
  4. exec/
  5. logical/
  6. protocol/
  7. sample-data/
  8. src/
  9. tools/
  10. .gitignore
  11. .travis.yml
  12. header
  13. INSTALL.md
  14. KEYS
  15. LICENSE
  16. NOTICE
  17. pom.xml
  18. README.md
README.md

Apache Drill

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

Quickstart

Please read INSTALL.md for setting up and running Apache Drill.

More Information

Please see the Apache Drill Website or the Apache Drill Documentation for more information including:

  • Remote Execution Installation Instructions
  • Information about how to submit logical and distributed physical plans
  • More example queries and sample data
  • Find out ways to be involved or disuss Drill

Join the community!

Apache Drill is an Apache Foundation project and is seeking all types of contributions. Please say hello on the Apache Drill mailing list or join our Google Hangouts for more information. (More information can be found at the Apache Drill website).