Fury Java Benchmark

Fury Java Benchmark contains benchmarks for:

  • Fury
  • JDK
  • Hession
  • Kryo
  • Fst
  • Protostuff
  • Jsonb

Part of benchmark data is based on Kryo benchmark. Kryo benchmark suite is based on Kryo benchmark.

How to run

# Install fury
cd ../java && mvn install -DskipTests && cd -

# build benchmark jar
mvn package

# run benchmark
nohup java -jar target/benchmarks.jar -f 5 -wi 3 -i 5 -t 1 -w 3s -r 5s -rf csv >bench.log 2>&1 &
java -jar target/benchmarks.jar "io.*\.deserialize$" -f 1 -wi 1 -i 3 -t 1 -w 2s -r 2s -rf csv

Maven run

cd .. && mvn -T10 install -DskipTests -Dcheckstyle.skip -Dlicense.skip -Dmaven.javadoc.skip
mvn exec:java -Dexec.args="-f 3 -wi 5 -i 15 -t 1 -w 2s -r 2s -rf csv"

See org.openjdk.jmh.runner.options.CommandLineOptions for more information about jmh options:

-f fork
-wi Number of warmup iterations to do.
-i Number of measurement iterations to do
-t Number of worker threads to run with.
-w Time to spend at each warmup iteration.
-r Time to spend at each measurement iteration.
-rf Result format type

Save benchmark data to specified dir, then run tool.py to plot graphs.

Plotting

Fury uses pandas to process the jmh data, and uses matplotlib for plotting.

pip install pandas matplotlib
python analyze.py

Debug

Using async-profiler to generate flame graph.

export pic=s1.html
nohup java -jar target/benchmarks.jar 'io.*Fury.*deserialize*' -f 1 -wi 1 -i 1 -t 1 -w 1s -r 35s -rf csv &
profiler.sh  -d 30 -f $pic `jps | grep ForkedMain | awk '{print $1}'`

JIT optimization

  1. Use -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining to inspect jit: java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -jar target/benchmarks.jar io.Fury. UserTypeBenchmark.serialize -f 0 -wi 1 -i 1 -t 1 -w 1s -r 35s -rf csv > compile.log
  2. Determine what the flags are set to on current platform:
  • Use java ${other_options} -XX:+PrintFlagsFinal -version, should include all other options on the command line because some options affect others, particularly when setting GC-related flags.
  • jcmd $pid VM.flags -all -XX:FreqInlineSize= flag specifies the maximum number of bytecode instructions to inline for a method. The default value depends on the platform – for 64-bit Linux, it's 325.
  1. hot method too big need to be optimized.
  2. See: 4.1 https://wiki.openjdk.java.net/display/HotSpot/Server+Compiler+Inlining+Messages 4.2 https://techblug.wordpress.com/2013/08/19/java-jit-compiler-inlining/
  3. escape analysis
  4. -server -XX:+TieredCompilation: In Java 8, when the server compiler is enabled, tiered compilation is also enabled by default. 64-bit java8 use server compiler by default(use java -version to check).
  5. check CodeCache: grep -nr ‘CodeCache’ compile.log. 64-bit server with tiered compilation, Default code cache for Java 8 is 240 MB. (not happen in benchmarks)
  6. deoptimization: made not entrant and made zombie. For tiered compilation, the code will be compiled to new level, and old code will be made not entrant and zombie.
  7. size > DesiredMethodLimit: the inlining that's been done so far has inlined more than DesiredMethodLimit bytecodes so inlining will be stopped.
  8. See https://jcdav.is/2015/08/30/reading-assembly-from-hotspot/ to view assembly code. hsdis-amd64.dylib: https://github.com/importsource/jvm-tuts/blob/master/hsdis-amd64.dylib