tree: 11fdc932fba160e82f9c570a33b9344efbc1dc0e [path history] [tgz]
  1. src/
  2. .gitignore
  3. fetch-data.sh
  4. pom.xml
  5. README.md
java/bench/README.md

File Format Benchmarks

These big data file format benchmarks, compare:

  • Avro
  • Json
  • ORC
  • Parquet

To build this library:

% mvn clean package

To fetch the source data:

% ./fetch-data.sh

To generate the derived data:

% java -jar target/orc-benchmarks-*-uber.jar generate data

To run a scan of all of the data:

% java -jar target/orc-benchmarks-*-uber.jar scan data

To run full read benchmark:

% java -jar target/orc-benchmarks-*-uber.jar read-all data

To run column projection benchmark:

% java -jar target/orc-benchmarks-*-uber.jar read-some data