Spark JVM Profiler Plugin

Build

To build

  ./build/mvn clean package -DskipTests -Pjvm-profiler

Executor Code Profiling

The spark profiler module enables code profiling of executors in cluster mode based on the the async profiler, a low overhead sampling profiler. This allows a Spark application to capture CPU and memory profiles for application running on a cluster which can later be analyzed for performance issues. The profiler captures Java Flight Recorder (jfr) files for each executor; these can be read by many tools including Java Mission Control and Intellij.

The profiler writes the jfr files to the executor‘s working directory in the executor’s local file system and the files can grow to be large so it is advisable that the executor machines have adequate storage. The profiler can be configured to copy the jfr files to a hdfs location before the executor shuts down.

Code profiling is currently only supported for

Linux (x64)
Linux (arm 64)
Linux (musl, x64)
MacOS

To get maximum profiling information set the following jvm options for the executor :

spark.executor.extraJavaOptions=-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer

For more information on async_profiler see the Async Profiler Manual

To enable code profiling, first enable the code profiling plugin via

spark.plugins=org.apache.spark.executor.profiler.ExecutorProfilerPlugin

Then enable the profiling in the configuration.

Code profiling configuration

Kubernetes

On Kubernetes, spark will try to shut down the executor pods while the profiler files are still being saved. To prevent this set

  spark.kubernetes.executor.deleteOnTermination=false

Example

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  -c spark.executor.extraJavaOptions="-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+PreserveFramePointer" \
  -c spark.plugins=org.apache.spark.executor.profiler.ExecutorProfilerPlugin \
  -c spark.executor.profiling.enabled=true \
  -c spark.executor.profiling.dfsDir=s3a://my-bucket/spark/profiles/  \
  -c spark.executor.profiling.options=event=wall,interval=10ms,alloc=2m,lock=10ms,chunktime=300s \
  -c spark.executor.profiling.fraction=0.10  \
  -c spark.kubernetes.executor.deleteOnTermination=false \
  <application-jar> \
  [application-arguments]