blob: 6bb7da6b5703bec9de43beb8e5c6ac6f75df14d4 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="breakpad" rev="2.6.0 IMPALA-2686">
<title>Breakpad Minidumps for Impala (<keyword keyref="impala26"/> or higher only)</title>
<titlealts audience="PDF"><navtitle>Breakpad Minidumps</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Troubleshooting"/>
<data name="Category" value="Support"/>
<data name="Category" value="Administrators"/>
</metadata>
</prolog>
<conbody>
<p rev="2.6.0 IMPALA-2686">
The <xref href="https://chromium.googlesource.com/breakpad/breakpad/" scope="external" format="html">breakpad</xref>
project is an open-source framework for crash reporting.
In <keyword keyref="impala26_full"/> and higher, Impala can use <codeph>breakpad</codeph> to record stack information and
register values when any of the Impala-related daemons crash due to an error such as <codeph>SIGSEGV</codeph>
or unhandled exceptions.
The dump files are much smaller than traditional core dump files. The dump mechanism itself uses very little
memory, which improves reliability if the crash occurs while the system is low on memory.
</p>
<note type="important">
Because of the internal mechanisms involving Impala memory allocation and Linux
signalling for out-of-memory (OOM) errors, if an Impala-related daemon experiences a
crash due to an OOM condition, it does <i>not</i> generate a minidump for that error.
<p>
</p>
</note>
<p outputclass="toc inpage" audience="PDF"/>
</conbody>
<concept id="breakpad_minidump_enable">
<title>Enabling or Disabling Minidump Generation</title>
<conbody>
<p>
By default, a minidump file is generated when an Impala-related daemon
crashes.
</p>
<p>
To turn off generation of the minidump files, use one of the following
options:
<ul>
<li>
Set the <codeph>--enable_minidumps</codeph> configuration setting
to <codeph>false</codeph>. Restart the corresponding services or
daemons.
</li>
<li>
Set the <codeph>--minidump_path</codeph> configuration setting to
an empty string. Restart the corresponding services or daemons.
</li>
</ul>
</p>
<p rev="IMPALA-3677">
In <keyword keyref="impala27_full"/> and higher,
you can send a <codeph>SIGUSR1</codeph> signal to any Impala-related daemon to write a
Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
without triggering a crash.
</p>
</conbody>
</concept>
<concept id="breakpad_minidump_location" rev="IMPALA-3581">
<title>Specifying the Location for Minidump Files</title>
<conbody>
<p>
By default, all minidump files are written to the following location
on the host where a crash occurs:
<!-- Location stated in IMPALA-3581; overridden by different location from IMPALA-2686?
<filepath><varname>log_directory</varname>/minidumps/<varname>daemon_name</varname></filepath> -->
<ul>
<li>
<p>
Clusters not managed by cluster management software:
<filepath><varname>impala_log_dir</varname>/<varname>daemon_name</varname>/minidumps/<varname>daemon_name</varname></filepath>
</p>
</li>
</ul>
The minidump files for <cmdname>impalad</cmdname>, <cmdname>catalogd</cmdname>,
and <cmdname>statestored</cmdname> are each written to a separate directory.
</p>
<p>
To specify a different location, set the
<!-- Again, IMPALA-3581 says one thing and IMPALA-2686 says another.
log_dir vs. minidump_path -->
<uicontrol>minidump_path</uicontrol>
configuration setting of one or more Impala-related daemons, and restart the corresponding services or daemons.
</p>
<p>
If you specify a relative path for this setting, the value is interpreted relative to
the default <uicontrol>minidump_path</uicontrol> directory.
</p>
</conbody>
</concept>
<concept id="breakpad_minidump_number">
<title>Controlling the Number of Minidump Files</title>
<conbody>
<p>
Like any files used for logging or troubleshooting, consider limiting the number of
minidump files, or removing unneeded ones, depending on the amount of free storage
space on the hosts in the cluster.
</p>
<p>
Because the minidump files are only used for problem resolution, you can remove any such files that
are not needed to debug current issues.
</p>
<p>
To control how many minidump files Impala keeps around at any one time,
set the <uicontrol>max_minidumps</uicontrol> configuration setting for
of one or more Impala-related daemon, and restart the corresponding services or daemons.
The default for this setting is 9. A zero or negative value is interpreted as
<q>unlimited</q>.
</p>
</conbody>
</concept>
<concept id="breakpad_minidump_logging">
<title>Detecting Crash Events</title>
<conbody>
<p>
You can see in the Impala log files when crash events occur that generate
minidump files. Because each restart begins a new log file, the <q>crashed</q> message
is always at or near the bottom of the log file. There might be another later message
if core dumps are also enabled.
</p>
</conbody>
</concept>
<concept id="breakpad_demo">
<title>Demonstration of Breakpad Feature</title>
<conbody>
<p>
The following example uses the command <cmdname>kill -11</cmdname> to
simulate a <codeph>SIGSEGV</codeph> crash for an <cmdname>impalad</cmdname>
process on a single DataNode, then examines the relevant log files and minidump file.
</p>
<p>
First, as root on a worker node, kill the <cmdname>impalad</cmdname> process with a
<codeph>SIGSEGV</codeph> error. The original process ID was 23114.
</p>
<codeblock><![CDATA[
# ps ax | grep impalad
23114 ? Sl 0:18 /opt/local/parcels/<parcel_version>/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
31259 pts/0 S+ 0:00 grep impalad
#
# kill -11 23114
#
# ps ax | grep impalad
31374 ? Rl 0:04 /opt/local/parcels/<parcel_version>/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
31475 pts/0 S+ 0:00 grep impalad
]]>
</codeblock>
<p>
We locate the log directory underneath <filepath>/var/log</filepath>.
There is a <codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph>
log file for the 23114 process ID. The minidump message is written to the
<codeph>.INFO</codeph> file and the <codeph>.ERROR</codeph> file, but not the
<codeph>.WARNING</codeph> file. In this case, a large core file was also produced.
</p>
<codeblock><![CDATA[
# cd /var/log/impalad
# ls -la | grep 23114
-rw------- 1 impala impala 3539079168 Jun 23 15:20 core.23114
-rw-r--r-- 1 impala impala 99057 Jun 23 15:20 hs_err_pid23114.log
-rw-r--r-- 1 impala impala 351 Jun 23 15:20 impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
-rw-r--r-- 1 impala impala 29101 Jun 23 15:20 impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
-rw-r--r-- 1 impala impala 228 Jun 23 14:03 impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114
]]>
</codeblock>
<p>
The <codeph>.INFO</codeph> log includes the location of the minidump file, followed by
a report of a core dump. With the breakpad minidump feature enabled, now we might
disable core dumps or keep fewer of them around.
</p>
<codeblock><![CDATA[
# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
...
Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libpthread.so.0+0xb68a] pthread_cond_wait+0xca
#
# Core dump written. Default location: /var/log/impalad/core or core.23114
#
# An error report file with more information is saved as:
# /var/log/impalad/hs_err_pid23114.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
...
# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
Log file created at: 2016/06/23 14:03:43
Running on machine:.worker_node_123
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
]]>
</codeblock>
<p>
The resulting minidump file is much smaller than the corresponding core file,
making it much easier to supply diagnostic information to <keyword keyref="support_org"/>.
</p>
<codeblock><![CDATA[
# pwd
/var/log/impalad
# cd ../impala-minidumps/impalad
# ls
0980da2d-a905-01e1-25ff883a-04ee027a.dmp
# du -kh *
2.4M 0980da2d-a905-01e1-25ff883a-04ee027a.dmp
]]>
</codeblock>
</conbody>
</concept>
</concept>