| # Apache Fory™ Java Benchmark |
| |
| Apache Fory™ Java Benchmark contains benchmarks for: |
| |
| - Apache Fory™ |
| - JDK |
| - Hession |
| - Kryo |
| - Fst |
| - Protostuff |
| - Jsonb |
| - Protobuf |
| - Flatbuffers |
| - Msgpack |
| |
| > Part of benchmark data is based on [Kryo benchmark](https://github.com/EsotericSoftware/kryo/tree/master/benchmarks). |
| > Kryo benchmark suite is based on [Kryo benchmark](https://github.com/EsotericSoftware/kryo/tree/master/benchmarks). |
| > The msgpack's official provides [msgpack-jackson](https://github.com/msgpack/msgpack-java/tree/main/msgpack-jackson) lib, but the performance is relatively poor. So, generate a basic [manually written code](https://github.com/apache/fory/tree/main/benchmarks/java_benchmarkk/src/main/java/org/apache/fory/benchmark/util/MsgpackUtil.java) using qwen3(LLM). Then modify it. |
| |
| ## How to run |
| |
| This benchmark use [jmh](https://github.com/openjdk/jmh) as benchmark tool. [jmh](https://github.com/openjdk/jmh) is |
| licensed under GPL V2 with CLASSPATH exception, the usage can't be included in apache source/binary release unless |
| as an optional feature. So Apache Fory™ make it as an optional dependency and you must enable `jmh` profile to activate it. |
| |
| ```bash |
| # Install fory |
| cd ../../java && mvn install -DskipTests -Dcheckstyle.skip -Dlicense.skip -Dmaven.javadoc.skip && cd - |
| |
| # build benchmark jar |
| # use `-Pjmh` to download jmh dependencies, we mark it as optional |
| # since jmh is licenced under GPL V2 andn not comply with the license policy of ASF. |
| mvn package -Pjmh |
| # run benchmark |
| nohup java -jar target/benchmarks.jar -f 5 -wi 3 -i 5 -t 1 -w 3s -r 5s -rf csv >bench.log 2>&1 & |
| java -jar target/benchmarks.jar "org.apache.fory.*UserTypeSerializeSuite.*" -f 1 -wi 1 -i 1 -t 1 -w 2s -r 2s -rf csv -p objectType=MEDIA_CONTENT -p bufferType=array -p references=false |
| ``` |
| |
| Generate Protobuf/Flatbuffers code manually: |
| |
| ```bash |
| flatc -o src/main/java -j src/main/java/org/apache/fory/integration_tests/state/bench.fbs |
| protoc -I=src/main/java/org/apache/fory/integration_tests/state --java_out=src/main/java/ bench.proto |
| ``` |
| |
| Protobuf code can be generated by maven plugin automatically. Flatbuffers generated code is short, so we added generated files to repo directly. |
| |
| ## Maven run |
| |
| ```bash |
| cd .. && mvn -T10 install -DskipTests -Dcheckstyle.skip -Dlicense.skip -Dmaven.javadoc.skip |
| mvn exec:java -Dexec.args="-f 3 -wi 5 -i 15 -t 1 -w 2s -r 2s -rf csv" |
| ``` |
| |
| See `org.openjdk.jmh.runner.options.CommandLineOptions` for more information about jmh options: |
| |
| ``` |
| -f fork |
| -wi Number of warmup iterations to do. |
| -i Number of measurement iterations to do |
| -t Number of worker threads to run with. |
| -w Time to spend at each warmup iteration. |
| -r Time to spend at each measurement iteration. |
| -rf Result format type |
| ``` |
| |
| Save benchmark data to specified dir, then run `tool.py` to plot graphs. |
| |
| ## Plotting |
| |
| Apache Fory™ uses pandas to process the jmh data, and uses matplotlib for plotting. |
| |
| ```bash |
| pip install pandas matplotlib |
| python analyze.py |
| ``` |
| |
| ## Debug |
| |
| Using `async-profiler` to generate flame graph. |
| |
| ```bash |
| export pic=s1.html |
| nohup java -jar target/benchmarks.jar 'org.apache.fory.*Fory.*deserialize*' -f 1 -wi 1 -i 1 -t 1 -w 1s -r 35s -rf csv & |
| profiler.sh -d 30 -f $pic `jps | grep ForkedMain | awk '{print $1}'` |
| ``` |
| |
| ## JIT optimization |
| |
| 1. Use `-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to inspect JIT: |
| `java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -jar target/benchmarks.jar org.apache.fory.*Fory.* UserTypeBenchmark.serialize -f 0 -wi 1 -i 1 -t 1 -w 1s -r 35s -rf csv > compile.log` |
| 2. Determine what the flags are set to on current platform: |
| - Use `java ${other_options} -XX:+PrintFlagsFinal -version`, should include all other options on the command line because some options affect others, particularly when setting GC-related flags. |
| - `jcmd $pid VM.flags -all` -XX:FreqInlineSize= flag specifies the maximum number of bytecode instructions to inline for a method. The default value depends on the platform – for 64-bit Linux, it's 325. |
| 3. `hot method too big` need to be optimized. |
| 4. See: |
| - https://wiki.openjdk.java.net/display/HotSpot/Server+Compiler+Inlining+Messages |
| - https://techblug.wordpress.com/2013/08/19/java-jit-compiler-inlining/ |
| 5. escape analysis |
| 6. -server -XX:+TieredCompilation: In Java 8, when the server compiler is enabled, tiered compilation |
| is also enabled by default. 64-bit java8 use server compiler by default(use java -version to check). |
| 7. check CodeCache: grep -nr 'CodeCache' compile.log. 64-bit server with tiered compilation, Default code cache for Java |
| 8 is 240 MB. (not happen in benchmarks) |
| 8. deoptimization: made not entrant and made zombie. For tiered compilation, the code will be compiled to new level, and |
| old code will be made not entrant and zombie. |
| 9. size > DesiredMethodLimit: the inlining that's been done so far has inlined more than DesiredMethodLimit bytecodes so |
| inlining will be stopped. |
| 10. See https://jcdav.is/2015/08/30/reading-assembly-from-hotspot/ to view assembly code. |
| hsdis-amd64.dylib: https://github.com/importsource/jvm-tuts/blob/master/hsdis-amd64.dylib |