During Doris usage and development, debugging is often necessary. This document introduces commonly used debugging tools and methods.
Note: The BE binary file name doris_be mentioned in this document was palo_be in earlier versions.
FE is a Java process. Below are some commonly used Java debugging commands.
jmap -histo:live pid > 1.jmp
This command lists the memory usage of live objects sorted by size (replace pid with the FE process ID).
num #instances #bytes class name ---------------------------------------------- 1: 33528 10822024 [B 2: 80106 8662200 [C 3: 143 4688112 [Ljava.util.concurrent.ForkJoinTask; 4: 80563 1933512 java.lang.String 5: 15295 1714968 java.lang.Class 6: 45546 1457472 java.util.concurrent.ConcurrentHashMap$Node 7: 15483 1057416 [Ljava.lang.Object;
This method allows you to view the total memory occupied by live objects (at the end of the file) and analyze which objects consume more memory.
Note: This method triggers a FullGC due to the :live parameter.
jstat -gcutil pid 1000 1000
This command checks JVM memory usage in each region every second (replace pid with the FE process ID).
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794 2.043 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794 2.043 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794 2.043 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794 2.043 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794 2.043
Focus on the Old generation (O) percentage (3.03% in the example). High usage may lead to OOM or FullGC.
jstack -l pid > 1.js
This command prints the current FE thread stack (replace pid with the FE process ID).
The -l parameter also detects deadlocks. This method can be used to view FE thread execution status, detect deadlocks, and locate blocking positions.
Memory debugging focuses on two aspects:
The following tools can be used for tracking and analysis.
Doris Debug Tools provides packaged CPU flame graph and memory analysis tools that can be downloaded and used directly.
Note: Unofficial tool, for development and debugging purposes only.
Note: Doris 1.2.2 and later versions use Jemalloc as the default memory allocator.
For Heap Profiling principles, refer to Heap Profiling Principle Explanation. Note that Heap Profile records virtual memory.
Jemalloc supports both real-time and periodic Heap Dump methods, then uses the jeprof tool to parse the generated Heap Profile.
In be.conf, change prof:false to prof:true and prof_active:false to prof_active:true in JEMALLOC_CONF, then restart Doris BE. Use the Jemalloc Heap Dump HTTP interface to generate Heap Profile files on the BE machine.
Version Notes:
- Doris 2.1.8, 3.0.4 and later:
profis alreadytrueby default inJEMALLOC_CONF, no modification needed.- Before Doris 2.1.8 and 3.0.4:
JEMALLOC_CONFdoesn't haveprof_activeoption, just changeprof:falsetoprof:true.
curl http://be_host:be_webport/jeheap/dump
Configuration:
jeprofile_dir in be.conf, defaults to ${DORIS_HOME}/log.lg_prof_sample in JEMALLOC_CONF (default 19, i.e., 2^19 B = 512KB). Decreasing lg_prof_sample increases sampling frequency for more accurate profiles but higher overhead.Performance tip: Keep prof:false during performance testing to avoid Heap Dump overhead.
Change prof:false to prof:true in JEMALLOC_CONF in be.conf. Heap Profile files default to ${DORIS_HOME}/log with prefix specified by JEMALLOC_PROF_PRFIX (default jemalloc_heap_profile_).
Note: Before Doris 2.1.6,
JEMALLOC_PROF_PRFIXwas empty and needs to be set.
Dump triggers:
Dump after cumulative memory allocation
Change lg_prof_interval to 34 in JEMALLOC_CONF to dump after cumulative 16GB allocation (2^34 B = 16GB).
Note: Before Doris 2.1.6,
lg_prof_intervaldefaulted to32.
Dump on memory peak
Change prof_gdump to true in JEMALLOC_CONF and restart BE.
Dump on exit and detect leaks
Change prof_leak and prof_final to true in JEMALLOC_CONF and restart BE.
Dump cumulative (growth) instead of real-time values
Change prof_accum to true in JEMALLOC_CONF and restart BE. Use jeprof --alloc_space to display cumulative heap dump.
jeprofUse be/bin/jeprof to parse dumped Heap Profiles. Parsing may take minutes for large memory processes.
If jeprof binary is missing from be/bin, upload jeprof from doris/tools directory.
Notes:
- Requires addr2line version 2.35.2+, see QA-1 below.
- Execute Heap Dump and
jeprofparsing on the same machine running Doris BE, see QA-2 below.
1. Analyze single Heap Profile
jeprof --dot ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file
Paste terminal output to online dot visualization to generate memory allocation diagram.
To generate PDF directly (requires dependencies):
yum install ghostscript graphviz jeprof --pdf ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file > result.pdf
2. Analyze diff between two Heap Profiles
jeprof --dot ${DORIS_HOME}/lib/doris_be --base=${DORIS_HOME}/log/profile_file ${DORIS_HOME}/log/profile_file2
Compare heap files from different times to analyze diff by using earlier file as baseline.
QA-1: Errors after running jeprof: addr2line: Dwarf Error: found dwarf version xxx, this reader only handles version xxx
GCC 11+ defaults to DWARF-v5, requiring Binutils 2.35.2+. Doris Ldb_toolchain uses GCC 11.
Solution: Upgrade addr2line to 2.35.2.
# Download addr2line source wget https://ftp.gnu.org/gnu/binutils/binutils-2.35.tar.bz2 # Install dependencies if needed yum install make gcc gcc-c++ binutils # Compile & install addr2line tar -xvf binutils-2.35.tar.bz2 cd binutils-2.35 ./configure --prefix=/usr/local make make install # Verify addr2line -h # Replace addr2line chmod +x addr2line mv /usr/bin/addr2line /usr/bin/addr2line.bak mv /bin/addr2line /bin/addr2line.bak cp addr2line /bin/addr2line cp addr2line /usr/bin/addr2line hash -r
Note: Don't use addr2line 2.3.9, which may be incompatible and cause memory growth.
QA-2: Errors after running jeprof: addr2line: DWARF error: invalid or unhandled FORM value: 0x25, parsed heap stacks show memory addresses instead of function names
Usually occurs when Heap Dump and jeprof parsing are on different servers, causing symbol table resolution failure.
Solution:
jeprof parsing on the same machine running Doris BE.be/bin/doris_be binary and Heap Profile to a machine with matching Linux kernel version and run jeprof.QA-3: If heap stacks still show memory addresses instead of function names after parsing on the BE machine
Use this script for manual parsing. Modify these variables:
heap: Heap Profile filename.bin: be/bin/doris_be binary filename.llvm_symbolizer: Path to llvm symbolizer, preferably the version used to compile the binary.#!/bin/bash ## @brief ## @author zhoufei ## @email gavineaglechou@gmail.com ## @date 2024-02-24-Sat # 1. jeprof --dot ${bin} ${heap} > heap.dot to generate calling profile # 2. find base addr and symbol # 3. get addr to symble table with llvm-symbolizer # 4. replace the addr with symbol # heap file name heap=jeheap_dump.1708694081.3443.945778264.heap # binary name bin=doris_be_aws.3.0.5 # path to llvm symbolizer llvm_symbolizer=$HOME/opt/ldb-toolchain-16/bin/llvm-symbolizer # output file name out=out.dot vaddr_baddr_symbol=vaddr_baddr_symbol.txt program_name=doris_be jeprof --dot ${bin} ${heap} > ${out} baseaddr=$(grep ${program_name} ${heap} | head -n 1 | awk -F'-' '{print $1}') echo "$baseaddr: ${baseaddr}" function find_symbol() { local addr="$1" "${llvm_symbolizer}" --inlining --obj=${bin} ${addr} | head -n 1 | awk -F'(' '{print $1}' } if [ -f ${vaddr_baddr_symbol} ]; then cat ${vaddr_baddr_symbol} | while read vaddr baddr; do symbol=$(find_symbol ${baddr}) echo "${vaddr} ${baddr} ${symbol}" sed -ri.orig "s/${vaddr}/${symbol}/g" ${out} done else # recalculate the addr and grep -oP '0x(\d|[a-f])+' ${out} | xargs -I {} python -c "print('{}', '0x{:x}'.format({} - 0x${baseaddr}))" \ | while read vaddr baddr; do symbol=$(find_symbol ${baddr}) echo "${vaddr} ${baddr} ${symbol}" sed -ri.orig "s/${vaddr}/${symbol}/g" ${out} done | tee ${vaddr_baddr_symbol} fi # vim: et tw=80 ts=2 sw=2 cc=80:
QA-4: If none of the above methods work
be/bin/doris_be on the BE machine to compile, run, and parse on the same machine.USE_JEMALLOC=OFF ./build.sh --be, then use TCMalloc Heap Profile as described below.Note: Doris 1.2.1 and earlier use TCMalloc. Doris 1.2.2+ default to Jemalloc. To switch back to TCMalloc, compile with
USE_JEMALLOC=OFF sh build.sh --be.
When using TCMalloc, large memory allocations print stacks to be.out:
tcmalloc: large alloc 1396277248 bytes == 0x3f3488000 @ 0x2af6f63 0x2c4095b 0x134d278 0x134bdcb 0x133d105 0x133d1d0 0x19930ed
This indicates Doris BE attempted to allocate 1396277248 bytes at this stack. Use addr2line to convert to readable information:
addr2line -e lib/doris_be 0x2af6f63 0x2c4095b 0x134d278 0x134bdcb 0x133d105 0x133d1d0 0x19930ed
Output example:
/home/ssd0/zc/palo/doris/core/thirdparty/src/gperftools-gperftools-2.7/src/tcmalloc.cc:1335 /home/ssd0/zc/palo/doris/core/thirdparty/src/gperftools-gperftools-2.7/src/tcmalloc.cc:1357 /home/disk0/baidu-doris/baidu/bdg/doris-baidu/core/be/src/exec/hash_table.cpp:267 /home/disk0/baidu-doris/baidu/bdg/doris-baidu/core/be/src/exec/hash_table.hpp:86 /home/disk0/baidu-doris/baidu/bdg/doris-baidu/core/be/src/exec/hash_join_node.cpp:239 /home/disk0/baidu-doris/baidu/bdg/doris-baidu/core/be/src/exec/hash_join_node.cpp:213 thread.cpp:?
Sometimes memory issues come from accumulating small allocations, not visible in logs. Use TCMalloc's HEAP PROFILE feature. Set HEAPPROFILE environment variable before starting Doris BE:
export TCMALLOC_SAMPLE_PARAMETER=64000 HEAP_PROFILE_ALLOCATION_INTERVAL=-1 HEAP_PROFILE_INUSE_INTERVAL=-1 HEAP_PROFILE_TIME_INTERVAL=5 HEAPPROFILE=/tmp/doris_be.hprof ./bin/start_be.sh --daemon
Note: HEAPPROFILE requires absolute path, and directory must exist.
When HEAPPROFILE dump conditions are met, memory usage writes to specified file. Use pprof tool to analyze output.
pprof --text lib/doris_be /tmp/doris_be.hprof.0012.heap | head -30
Output example:
Using local file lib/doris_be. Using local file /tmp/doris_be.hprof.0012.heap. Total: 668.6 MB 610.6 91.3% 91.3% 610.6 91.3% doris::SystemAllocator::allocate_via_malloc (inline) 18.1 2.7% 94.0% 18.1 2.7% _objalloc_alloc 5.6 0.8% 94.9% 63.4 9.5% doris::RowBatch::RowBatch 5.1 0.8% 95.6% 7.1 1.1% butil::ResourcePool::add_block (inline) 3.7 0.5% 96.2% 3.7 0.5% butil::iobuf::create_block (inline) 3.4 0.5% 96.7% 3.4 0.5% butil::FlatMap::init 3.2 0.5% 97.2% 5.2 0.8% butil::ObjectPool::add_block (inline) 2.6 0.4% 97.6% 2.6 0.4% __gnu_cxx::new_allocator::allocate (inline) 2.0 0.3% 97.9% 2.0 0.3% butil::ObjectPool::add_block_group (inline) 2.0 0.3% 98.2% 2.0 0.3% butil::ResourcePool::add_block_group (inline) 1.7 0.3% 98.4% 1.7 0.3% doris::SegmentReader::_load_index
Column meanings:
Generate call relationship graph in SVG format:
pprof --svg lib/doris_be /tmp/doris_be.hprof.0012.heap > heap.svg
Performance tip: This option affects performance. Use cautiously on production instances.
HEAP PROFILE has limitations: 1. Requires BE restart; 2. Continuous enabling impacts performance.
Doris BE supports dynamic heap profiling. Doris supports GPerftools remote server debugging. Use pprof to dynamically profile remote running Doris BE. Example for viewing memory usage increment:
pprof --text --seconds=60 http://be_host:be_webport/pprof/heap
Output example:
Total: 1296.4 MB 484.9 37.4% 37.4% 484.9 37.4% doris::StorageByteBuffer::create 272.2 21.0% 58.4% 273.3 21.1% doris::RowBlock::init 157.5 12.1% 70.5% 157.5 12.1% doris::RowBatch::RowBatch 90.7 7.0% 77.5% 90.7 7.0% doris::SystemAllocator::allocate_via_malloc 66.6 5.1% 82.7% 66.6 5.1% doris::IntegerColumnReader::init 47.9 3.7% 86.4% 47.9 3.7% __gnu_cxx::new_allocator::allocate 20.8 1.6% 88.0% 35.4 2.7% doris::SegmentReader::_load_index 12.7 1.0% 89.0% 12.7 1.0% doris::DecimalColumnReader::init 12.7 1.0% 89.9% 12.7 1.0% doris::LargeIntColumnReader::init 12.7 1.0% 90.9% 12.7 1.0% doris::StringColumnDirectReader::init 12.3 0.9% 91.9% 12.3 0.9% std::__cxx11::basic_string::_M_mutate 10.4 0.8% 92.7% 10.4 0.8% doris::VectorizedRowBatch::VectorizedRowBatch 10.0 0.8% 93.4% 10.0 0.8% doris::PlainTextLineReader::PlainTextLineReader
Output and viewing method match HEAP PROFILE. This command only enables statistics during execution, causing less performance impact than HEAP PROFILE.
LSAN is an address checking tool integrated in GCC. Enable during compilation to activate this feature. When determinable memory leaks occur, leak stacks are printed. Doris BE has integrated this tool. Compile with:
BUILD_TYPE=LSAN ./build.sh
When memory leaks are detected, corresponding information outputs to be.out. For demonstration, we intentionally inject a memory leak in the StorageEngine open function:
char* leak_buf = new char[1024]; strcpy(leak_buf, "hello world"); LOG(INFO) << leak_buf;
Then be.out shows:
================================================================= ==24732==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1024 byte(s) in 1 object(s) allocated from: #0 0xd10586 in operator new (unsigned long) ../../../../gcc-7.3.0/libsanitizer/lsan/lsan_interceptors.cc:164 #1 0xe333a2 in doris::StorageEngine::open(doris::EngineOptions const&, doris::StorageEngine**) /home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:104 #2 0xd3cc96 in main /home/ssd0/zc/palo/doris/core/be/src/service/doris_main.cpp:159 #3 0x7f573b5eebd4 in __libc_start_main (/opt/compiler/gcc-4.8.2/lib64/libc.so.6+0x21bd4) SUMMARY: LeakSanitizer: 1024 byte(s) leaked in 1 allocation(s).
Output shows 1024 bytes leaked with memory allocation stack trace.
Performance tip: This option affects performance. Use cautiously on production instances.
Note: Enabling LSAN automatically disables TCMalloc.
Besides improper memory usage and leaks, illegal address access errors can occur. Use ASAN to find root causes. Like LSAN, ASAN is integrated in GCC. Compile Doris with:
BUILD_TYPE=ASAN ./build.sh
When abnormal access is detected, the binary exits immediately and outputs illegal access stack to be.out. ASAN output analysis uses the same method as LSAN. For demonstration, inject an address access error in the StorageEngine open function:
char* invalid_buf = new char[1024]; for (int i = 0; i < 1025; ++i) { invalid_buf[i] = i; } LOG(INFO) << invalid_buf;
Then be.out shows:
================================================================= ==23284==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61900008bf80 at pc 0x00000129f56a bp 0x7fff546eed90 sp 0x7fff546eed88 WRITE of size 1 at 0x61900008bf80 thread T0 #0 0x129f569 in doris::StorageEngine::open(doris::EngineOptions const&, doris::StorageEngine**) /home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:106 #1 0xe2c1e3 in main /home/ssd0/zc/palo/doris/core/be/src/service/doris_main.cpp:159 #2 0x7fa5580fbbd4 in __libc_start_main (/opt/compiler/gcc-4.8.2/lib64/libc.so.6+0x21bd4) #3 0xd30794 (/home/ssd0/zc/palo/doris/core/output3/be/lib/doris_be+0xd30794) 0x61900008bf80 is located 0 bytes to the right of 1024-byte region [0x61900008bb80,0x61900008bf80) allocated by thread T0 here: #0 0xdeb040 in operator new[](unsigned long) ../../../../gcc-7.3.0/libsanitizer/asan/asan_new_delete.cc:82 #1 0x129f50d in doris::StorageEngine::open(doris::EngineOptions const&, doris::StorageEngine**) /home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:104 #2 0xe2c1e3 in main /home/ssd0/zc/palo/doris/core/be/src/service/doris_main.cpp:159 #3 0x7fa5580fbbd4 in __libc_start_main (/opt/compiler/gcc-4.8.2/lib64/libc.so.6+0x21bd4) SUMMARY: AddressSanitizer: heap-buffer-overflow /home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:106 in doris::StorageEngine::open(doris::EngineOptions const&, doris::StorageEngine**)
This shows an attempted one-byte write to illegal address 0x61900008bf80, and the allocation stack for region [0x61900008bb80,0x61900008bf80).
Performance tip: This option affects performance. Use cautiously on production instances.
Note: Enabling ASAN automatically disables TCMalloc.
If be.out stack output lacks function symbols, manual processing is needed. Use the asan_symbolize script to parse ASAN output:
cat be.out | python asan_symbolize.py | c++filt
This command produces readable stack information.
When system CPU Idle is low, CPU is the main bottleneck. Analyze current CPU usage. For Doris BE, there are serveral methods to analyze CPU bottlenecks.
Doris Debug Tools provides packaged CPU flame graph and memory analysis tools that can be downloaded and used directly.
Note: Unofficial tool, for development and debugging purposes only.
pprof from gperftools converts gperftools output to readable formats like PDF, SVG, Text.
Since Doris has integrated and is compatible with GPerf REST interface, use pprof tool to analyze remote Doris BE:
pprof --svg --seconds=60 http://be_host:be_webport/pprof/profile > be.svg
This command generates a BE CPU consumption graph.
This is a very general CPU analysis method. Unlike pprof, this method requires login to the physical machine. But compared to pprof's timed sampling, perf can collect stack information through different events.
Tool introduction:
Usage:
perf record -g -p be_pid -- sleep 60
This command profiles BE CPU usage for 60 seconds and generates perf.data file. Analyze perf.data with perf command:
perf report
Analysis example:
Or visualize with FlameGraph:
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > be.svg
This also generates a CPU consumption graph.