blob: f3b0880a0068545555dfa2e251d81006ee7c8df6 [file] [log] [blame] [view]
# Apache Fory™ Java Benchmark
Apache Fory Java Benchmark contains benchmarks for:
- Apache Fory
- JDK
- Hession
- Kryo
- Fst
- Protostuff
- Jsonb
- Protobuf
- Flatbuffers
- Msgpack
> Part of benchmark data is based on [Kryo benchmark](https://github.com/EsotericSoftware/kryo/tree/master/benchmarks).
> Kryo benchmark suite is based on [Kryo benchmark](https://github.com/EsotericSoftware/kryo/tree/master/benchmarks).
> The msgpack's official provides [msgpack-jackson](https://github.com/msgpack/msgpack-java/tree/main/msgpack-jackson) lib, but the performance is relatively poor. So, generate a basic [manually written code](https://github.com/apache/fory/tree/main/benchmarks/java_benchmarkk/src/main/java/org/apache/fory/benchmark/util/MsgpackUtil.java) using qwen3(LLM). Then modify it.
## How to run
This benchmark use [jmh](https://github.com/openjdk/jmh) as benchmark tool. [jmh](https://github.com/openjdk/jmh) is
licensed under GPL V2 with CLASSPATH exception, the usage can't be included in apache source/binary release unless
as an optional feature. So Apache Fory make it as an optional dependency and you must enable `jmh` profile to activate it.
```bash
# Install fory
cd ../../java && mvn install -DskipTests -Dcheckstyle.skip -Dlicense.skip -Dmaven.javadoc.skip && cd -
# build benchmark jar
# use `-Pjmh` to download jmh dependencies, we mark it as optional
# since jmh is licenced under GPL V2 andn not comply with the license policy of ASF.
mvn package -Pjmh
# run benchmark
nohup java -jar target/benchmarks.jar -f 5 -wi 3 -i 5 -t 1 -w 3s -r 5s -rf csv >bench.log 2>&1 &
java -jar target/benchmarks.jar "org.apache.fory.*UserTypeSerializeSuite.*" -f 1 -wi 1 -i 1 -t 1 -w 2s -r 2s -rf csv -p objectType=MEDIA_CONTENT -p bufferType=array -p references=false
```
Generate Protobuf/Flatbuffers code manually:
```bash
flatc -o src/main/java -j src/main/java/org/apache/fory/integration_tests/state/bench.fbs
protoc -I=src/main/java/org/apache/fory/integration_tests/state --java_out=src/main/java/ bench.proto
```
Protobuf code can be generated by maven plugin automatically. Flatbuffers generated code is short, so we added generated files to repo directly.
## Maven run
```bash
cd .. && mvn -T10 install -DskipTests -Dcheckstyle.skip -Dlicense.skip -Dmaven.javadoc.skip
mvn exec:java -Dexec.args="-f 3 -wi 5 -i 15 -t 1 -w 2s -r 2s -rf csv"
```
See `org.openjdk.jmh.runner.options.CommandLineOptions` for more information about jmh options:
```
-f fork
-wi Number of warmup iterations to do.
-i Number of measurement iterations to do
-t Number of worker threads to run with.
-w Time to spend at each warmup iteration.
-r Time to spend at each measurement iteration.
-rf Result format type
```
Save benchmark data to specified dir, then run `tool.py` to plot graphs.
## Plotting
Apache Fory uses pandas to process the jmh data, and uses matplotlib for plotting.
```bash
pip install pandas matplotlib
python analyze.py
```
## Debug
Using `async-profiler` to generate flame graph.
```bash
export pic=s1.html
nohup java -jar target/benchmarks.jar 'org.apache.fory.*Fory.*deserialize*' -f 1 -wi 1 -i 1 -t 1 -w 1s -r 35s -rf csv &
profiler.sh -d 30 -f $pic `jps | grep ForkedMain | awk '{print $1}'`
```
## JIT optimization
1. Use `-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to inspect JIT:
`java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -jar target/benchmarks.jar org.apache.fory.*Fory.* UserTypeBenchmark.serialize -f 0 -wi 1 -i 1 -t 1 -w 1s -r 35s -rf csv > compile.log`
2. Determine what the flags are set to on current platform:
- Use `java ${other_options} -XX:+PrintFlagsFinal -version`, should include all other options on the command line because some options affect others, particularly when setting GC-related flags.
- `jcmd $pid VM.flags -all` -XX:FreqInlineSize= flag specifies the maximum number of bytecode instructions to inline for a method. The default value depends on the platform for 64-bit Linux, it's 325.
3. `hot method too big` need to be optimized.
4. See:
- https://wiki.openjdk.java.net/display/HotSpot/Server+Compiler+Inlining+Messages
- https://techblug.wordpress.com/2013/08/19/java-jit-compiler-inlining/
5. escape analysis
6. -server -XX:+TieredCompilation: In Java 8, when the server compiler is enabled, tiered compilation
is also enabled by default. 64-bit java8 use server compiler by default(use java -version to check).
7. check CodeCache: grep -nr 'CodeCache' compile.log. 64-bit server with tiered compilation, Default code cache for Java
8 is 240 MB. (not happen in benchmarks)
8. deoptimization: made not entrant and made zombie. For tiered compilation, the code will be compiled to new level, and
old code will be made not entrant and zombie.
9. size > DesiredMethodLimit: the inlining that's been done so far has inlined more than DesiredMethodLimit bytecodes so
inlining will be stopped.
10. See https://jcdav.is/2015/08/30/reading-assembly-from-hotspot/ to view assembly code.
hsdis-amd64.dylib: https://github.com/importsource/jvm-tuts/blob/master/hsdis-amd64.dylib