blob: 5a4bbea3fbbc5207136c279b476172520473f544 [file] [log] [blame] [view]
---
layout: page
title: Debug for CH Backend with gpertools Tool
nav_order: 11
has_children: true
parent: /developer-overview/
---
We need using gpertools to find the memory or CPU issue. That's what this document is about.
## Install gperftools
Install gperftools as described in https://github.com/gperftools/gperftools.
We get the library and the command line tools.
## Compiler libch.so
Disable jemalloc `-DENABLE_JEMALLOC=OFF` in cpp-ch/CMakeLists.txt, and recompile libch.so.
## Run Gluten with gperftools
For Spark on Yarn, we can change the submit script to run Gluten with gperftools.
Add the following to the submit script:
```
export tcmalloc_path=/data2/zzb/gperftools-2.10/.libs/libtcmalloc_and_profiler.so # the path to the tcmalloc library
export LD_PRELOAD=$tcmalloc_path,libch.so # load the library in the driver
--files $tcmalloc_path # upload the library to the cluster
--conf spark.executorEnv.LD_PRELOAD=./libtcmalloc_and_profiler.so,libch.so # load the library in the executor
--conf spark.executorEnv.HEAPPROFILE=/tmp/gluten_heap_perf # set the heap profile path, you can change to CPUPROFILE for CPU profiling
```
For thrift server on local machine, note using `export LD_PRELOAD="$tcmalloc_path libch.so" # load the library in the driver` to preload dynamic libraries.
## Analyze the result
We can get the result in the path we set in the previous step. For example, we can get the result in `/tmp/gluten_heap_perf`. We can use the following website to analyze the result:
https://gperftools.github.io/gperftools/heapprofile.html
https://gperftools.github.io/gperftools/cpuprofile.html