Running Parquet Benchmarks

The Parquet benchmarks in this module are run using the OpenJDK Java Microbenchmarking Harness.

First, building the parquet-benchmarks module creates an uber-jar including the Parquet classes and all dependencies, and a main class to launch the JMH tool.

mvn --projects parquet-benchmarks -amd -DskipTests -Denforcer.skip=true clean package

JMH doesn't have the notion of “benchmark suites”, but there are certain benchmarks that make sense to group together or to run in isolation during development. The ./parquet-benchmarks/run.sh script can be used to launch all or some benchmarks:

# More information about the run script and the available arguments.
./parquet-benchmarks/run.sh

# More information on the JMH options available.
./parquet-benchmarks/run.sh all -help

# Run every benchmark once (~20 minutes).
./parquet-benchmarks/run.sh all -wi 0 -i 1 -f 1

# A more rigourous run of all benchmarks, saving a report for comparison.
./parquet-benchmarks/run.sh all -wi 5 -i 5 -f 3 -rff /tmp/benchmark1.json

# Run a benchmark "suite" built into the script, with JMH defaults (about 30 minutes)
./parquet-benchmarks/run.sh checksum

# Running one specific benchmark using a regex.
./parquet-benchmarks/run.sh all org.apache.parquet.benchmarks.NestedNullWritingBenchmarks

# Manually clean up any state left behind from a previous run.
./parquet-benchmarks/run.sh clean