docs/developers/ProfileMemoryOfGlutenWithVelox.md - gluten - Git at Google

 ---
 layout: page
 title: Profile memory consumption of Gluten
 nav_order: 8
 has_children: true
 parent: /developer-overview/
 ---
 Gluten offloads most of Spark SQL execution to native engine. We can use [gperftools](https://github.com/gperftools/gperftools) or [jemalloc](https://github.com/jemalloc/jemalloc)
 to analyze the offheap memory and cpu profile.

 # Profile with gperftools

 `gperftools` is a collection of a high-performance multi-threaded
 malloc() implementation, plus some pretty nifty performance analysis
 tools, see more: https://github.com/gperftools/gperftools/wiki.

 ## Build and install gperftools

 Download `gperftools` from https://github.com/gperftools/gperftools/releases, build and install.

 ```bash
 wget https://github.com/gperftools/gperftools/releases/download/gperftools<version>/gperftools-<version>.tar.gz
 tar xzvf gperftools-<version>.tar.gz
 cd gperftools-<version>
 ./configure
 make && make install
 ```

 Then we can find the tcmalloc libraries in `$GPERFTOOLS_HOME/.lib`.

 ## Run Gluten with gperftools

 Configure `--files` or `spark.files` for Spark.

 ```
 --files /path/to/gperftools/libtcmalloc_and_profiler.so
 or
 spark.files /path/to/gperftools/libtcmalloc_and_profiler.so
 ```

 Use `LD_PRELOAD` to preload tcmalloc library, and enable heap profile with `HEAPPROFILE` or cpu profile with `CPUPROFILE`.

 Example of enabling heap profile in spark executor:

 ```
 spark.executorEnv.LD_PRELOAD  ./libtcmalloc_and_profiler.so

 # Specifies dump profile path. ${CONTAINER_ID} is only used to distinguish the result files when running on yarn.
 spark.executorEnv.HEAPPROFILE /tmp/gluten_heap_perf_${CONTAINER_ID}
 ```

 Finally, profiling files prefixed with `/tmp/gluten_heap_perf_${CONTAINER_ID}` will be generated for each spark executor.

 ## Analyze profiling output

 Prepare the required native libraries. Assume static build is used for Gluten, there is no other shared dependency libs.

 ```bash
 jar xf gluten-velox-bundle-spark3.5_2.12-centos_7_x86_64-1.2.0.jar relative/path/to/libvelox.so ralative/path/to/libgluten.so
 mv libvelox.so libgluten.so /path/to/gluten_lib_prefix
 ```

 Generate a GIF of the analysis result:

 ```bash
 # `/usr/bin/java` indicates the program used by running spark executor
 pprof --show_bytes --gif --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX > result.gif
 ```

 Result like:

 <img src="../image/velox_profile_memory_gif.gif" width="200" />

 Or display analysis result in TEXT:

 ```bash
 pprof --text --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX
 ```

 Result like:

 <img src="../image/velox_profile_memory_text.png" width="400" />

 **\*\*** Get more help from https://github.com/gperftools/gperftools/wiki#documentation.

 # Profile with jemalloc

 `jemalloc` is a general purpose malloc(3) implementation that emphasizes fragmentation
 avoidance and scalable concurrency support. We can also use it to analyze Gluten performance.
 Getting Started with `jemalloc`: https://github.com/jemalloc/jemalloc/wiki/Getting-Started.

 ## Build and install jemalloc

 Download `jemalloc` from https://github.com/jemalloc/jemalloc/releases, build and install.

 ```
 cd /path/to/jemalloc
 ./autogen.sh --enable-prof
 make && make install
 ```
 Then we can find the jemalloc library in `$JEMALLOC_HOME/.lib`.

 ## Run Gluten with jemalloc

 Configure `--files` or `spark.files` for Spark.

 ```
 --files /path/to/jemalloc/libjemalloc.so
 or
 spark.files /path/to/jemalloc/libjemalloc.so
 ```

 Example of enabling heap profile in spark executor:

 ```
 spark.executorEnv.LD_PRELOAD  ./libjemalloc.so
 spark.executorEnv.MALLOC_CONF prof:true,lg_prof_interval:30,prof_prefix:/tmp/gluten_heap_perf
 ```

 Finally, profiling files prefixed with `/tmp/gluten_heap_perf.${PID}` will be generated for each spark executor.

 ## Memory dump on spark executor exit

 Sometimes, when native memory is not managed by gluten or there are some memory leaks that will cause spark executor to be killed due to memory limit,
 we only need to trigger a memory dump on executor exit.

 If we want to enable this feature we need to follow steps:

 1. Build gluten with `--enable_jemalloc_stats=ON` to enabled jemalloc stats.
 2. Enabled memory dump on exit, add spark executor environments to load jemalloc lib and make memory profiling active.
     ```
    spark.gluten.monitor.memoryDumpOnExit=true
    spark.executorEnv.LD_PRELOAD=/path/to/libjemalloc.so
    spark.executorEnv.MALLOC_CONF=prof:true,prof_prefix:/tmp/gluten_heap_perf
    ```

 ## Analyze profiling output

 Prepare the required native libraries. Assume static build is used for Gluten, so there is no other shared dependency libs.

 ```bash
 jar xf gluten-velox-bundle-spark3.5_2.12-centos_7_x86_64-1.2.0.jar relative/path/to/libvelox.so relative/path/to/libgluten.so
 mv libvelox.so libgluten.so /path/to/gluten_lib_prefix
 ```

 Generate a GIF of the analysis result:

 ```bash
 # `/usr/bin/java` indicates the program used by running spark executor
 jeprof --show_bytes --gif --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX > result.gif
 ```

 Or display analysis result in TEXT:

 ```bash
 jeprof --text --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX
 ```

 **\*\*** Get more help from https://jemalloc.net/jemalloc.3.html.
	---
	layout: page
	title: Profile memory consumption of Gluten
	nav_order: 8
	has_children: true
	parent: /developer-overview/
	---
	Gluten offloads most of Spark SQL execution to native engine. We can use [gperftools](https://github.com/gperftools/gperftools) or [jemalloc](https://github.com/jemalloc/jemalloc)
	to analyze the offheap memory and cpu profile.

	# Profile with gperftools

	`gperftools` is a collection of a high-performance multi-threaded
	malloc() implementation, plus some pretty nifty performance analysis
	tools, see more: https://github.com/gperftools/gperftools/wiki.

	## Build and install gperftools

	Download `gperftools` from https://github.com/gperftools/gperftools/releases, build and install.

	```bash
	wget https://github.com/gperftools/gperftools/releases/download/gperftools<version>/gperftools-<version>.tar.gz
	tar xzvf gperftools-<version>.tar.gz
	cd gperftools-<version>
	./configure
	make && make install
	```

	Then we can find the tcmalloc libraries in `$GPERFTOOLS_HOME/.lib`.

	## Run Gluten with gperftools

	Configure `--files` or `spark.files` for Spark.

	```
	--files /path/to/gperftools/libtcmalloc_and_profiler.so
	or
	spark.files /path/to/gperftools/libtcmalloc_and_profiler.so
	```

	Use `LD_PRELOAD` to preload tcmalloc library, and enable heap profile with `HEAPPROFILE` or cpu profile with `CPUPROFILE`.

	Example of enabling heap profile in spark executor:

	```
	spark.executorEnv.LD_PRELOAD ./libtcmalloc_and_profiler.so

	# Specifies dump profile path. ${CONTAINER_ID} is only used to distinguish the result files when running on yarn.
	spark.executorEnv.HEAPPROFILE /tmp/gluten_heap_perf_${CONTAINER_ID}
	```

	Finally, profiling files prefixed with `/tmp/gluten_heap_perf_${CONTAINER_ID}` will be generated for each spark executor.

	## Analyze profiling output

	Prepare the required native libraries. Assume static build is used for Gluten, there is no other shared dependency libs.

	```bash
	jar xf gluten-velox-bundle-spark3.5_2.12-centos_7_x86_64-1.2.0.jar relative/path/to/libvelox.so ralative/path/to/libgluten.so
	mv libvelox.so libgluten.so /path/to/gluten_lib_prefix
	```

	Generate a GIF of the analysis result:

	```bash
	# `/usr/bin/java` indicates the program used by running spark executor
	pprof --show_bytes --gif --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX > result.gif
	```

	Result like:

	<img src="../image/velox_profile_memory_gif.gif" width="200" />

	Or display analysis result in TEXT:

	```bash
	pprof --text --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX
	```

	Result like:

	<img src="../image/velox_profile_memory_text.png" width="400" />

	*\\*** Get more help from https://github.com/gperftools/gperftools/wiki#documentation.

	# Profile with jemalloc

	`jemalloc` is a general purpose malloc(3) implementation that emphasizes fragmentation
	avoidance and scalable concurrency support. We can also use it to analyze Gluten performance.
	Getting Started with `jemalloc`: https://github.com/jemalloc/jemalloc/wiki/Getting-Started.

	## Build and install jemalloc

	Download `jemalloc` from https://github.com/jemalloc/jemalloc/releases, build and install.

	```
	cd /path/to/jemalloc
	./autogen.sh --enable-prof
	make && make install
	```
	Then we can find the jemalloc library in `$JEMALLOC_HOME/.lib`.

	## Run Gluten with jemalloc

	Configure `--files` or `spark.files` for Spark.

	```
	--files /path/to/jemalloc/libjemalloc.so
	or
	spark.files /path/to/jemalloc/libjemalloc.so
	```

	Example of enabling heap profile in spark executor:

	```
	spark.executorEnv.LD_PRELOAD ./libjemalloc.so
	spark.executorEnv.MALLOC_CONF prof:true,lg_prof_interval:30,prof_prefix:/tmp/gluten_heap_perf
	```

	Finally, profiling files prefixed with `/tmp/gluten_heap_perf.${PID}` will be generated for each spark executor.

	## Memory dump on spark executor exit

	Sometimes, when native memory is not managed by gluten or there are some memory leaks that will cause spark executor to be killed due to memory limit,
	we only need to trigger a memory dump on executor exit.

	If we want to enable this feature we need to follow steps:

	1. Build gluten with `--enable_jemalloc_stats=ON` to enabled jemalloc stats.
	2. Enabled memory dump on exit, add spark executor environments to load jemalloc lib and make memory profiling active.
	```
	spark.gluten.monitor.memoryDumpOnExit=true
	spark.executorEnv.LD_PRELOAD=/path/to/libjemalloc.so
	spark.executorEnv.MALLOC_CONF=prof:true,prof_prefix:/tmp/gluten_heap_perf
	```

	## Analyze profiling output

	Prepare the required native libraries. Assume static build is used for Gluten, so there is no other shared dependency libs.

	```bash
	jar xf gluten-velox-bundle-spark3.5_2.12-centos_7_x86_64-1.2.0.jar relative/path/to/libvelox.so relative/path/to/libgluten.so
	mv libvelox.so libgluten.so /path/to/gluten_lib_prefix
	```

	Generate a GIF of the analysis result:

	```bash
	# `/usr/bin/java` indicates the program used by running spark executor
	jeprof --show_bytes --gif --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX > result.gif
	```

	Or display analysis result in TEXT:

	```bash
	jeprof --text --lib_prefix=/path/to/gluten_lib_prefix /usr/bin/java /path/to/gluten_heap_perf_XXX
	```

	*\\*** Get more help from https://jemalloc.net/jemalloc.3.html.