blob: ec87e4a0d64efc6d0b3ed1ef2632c22cd64ccdba [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="logging">
<title>Using Impala Logging</title>
<titlealts audience="PDF">
<navtitle>Logging</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Logs"/>
<data name="Category" value="Troubleshooting"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
The Impala logs record information about:
</p>
<ul>
<li>
Any errors Impala encountered. If Impala experienced a serious error during startup, you
must diagnose and troubleshoot that problem before you can do anything further with
Impala.
</li>
<li>
How Impala is configured.
</li>
<li>
Jobs Impala has completed.
</li>
</ul>
<note>
<p>
Formerly, the logs contained the query profile for each query, showing low-level details
of how the work is distributed among nodes and how intermediate and final results are
transmitted across the network. To save space, those query profiles are now stored in
zlib-compressed files in <filepath>/var/log/impala/profiles</filepath>. You can access
them through the Impala web user interface. For example, at
<codeph>http://<varname>impalad-node-hostname</varname>:25000/queries</codeph>, each
query is followed by a <codeph>Profile</codeph> link leading to a page showing extensive
analytical data for the query execution.
</p>
<p rev="1.1.1">
The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log
files when enabled. See <xref href="impala_auditing.xml#auditing"/> for details.
</p>
<p rev="2.9.0 IMPALA-4431">
In <keyword keyref="impala29_full"/> and higher, you can control how many audit event
log files are kept on each host through the
<codeph>&#8209;&#8209;max_audit_event_log_files</codeph> startup option for the
<cmdname>impalad</cmdname> daemon, similar to the
<codeph>&#8209;&#8209;max_log_files</codeph> option for regular log files.
</p>
<p rev="2.2.0">
The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
enabled. See <xref href="impala_lineage.xml#lineage"/> for details.
</p>
</note>
<p outputclass="toc inpage"/>
</conbody>
<concept id="logs_details">
<title>Locations and Names of Impala Log Files</title>
<conbody>
<ul>
<li>
By default, the log files are under the directory
<filepath>/var/log/impala</filepath>. To change log file locations, modify the
defaults file described in <xref href="impala_processes.xml#processes"/>.
</li>
<li>
The significant files for the <codeph>impalad</codeph> process are
<filepath>impalad.INFO</filepath>, <filepath>impalad.WARNING</filepath>, and
<filepath>impalad.ERROR</filepath>. You might also see a file
<filepath>impalad.FATAL</filepath>, although this is only present in rare conditions.
</li>
<li>
The significant files for the <codeph>statestored</codeph> process are
<filepath>statestored.INFO</filepath>, <filepath>statestored.WARNING</filepath>, and
<filepath>statestored.ERROR</filepath>. You might also see a file
<filepath>statestored.FATAL</filepath>, although this is only present in rare
conditions.
</li>
<li rev="1.2">
The significant files for the <codeph>catalogd</codeph> process are
<filepath>catalogd.INFO</filepath>, <filepath>catalogd.WARNING</filepath>, and
<filepath>catalogd.ERROR</filepath>. You might also see a file
<filepath>catalogd.FATAL</filepath>, although this is only present in rare conditions.
</li>
<li>
Examine the <codeph>.INFO</codeph> files to see configuration settings for the
processes.
</li>
<li>
Examine the <codeph>.WARNING</codeph> files to see all kinds of problem information,
including such things as suboptimal settings and also serious runtime errors.
</li>
<li>
Examine the <codeph>.ERROR</codeph> and/or <codeph>.FATAL</codeph> files to see only
the most serious errors, if the processes crash, or queries fail to complete. These
messages are also in the <codeph>.WARNING</codeph> file.
</li>
<li>
A new set of log files is produced each time the associated daemon is restarted. These
log files have long names including a timestamp. The <codeph>.INFO</codeph>,
<codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files are physically
represented as symbolic links to the latest applicable log files.
</li>
</ul>
<p>
Impala stores information using the <codeph>glog_v</codeph> logging system. You will see
some messages referring to C++ file names. Logging is affected by:
</p>
<ul>
<li>
The <codeph>GLOG_v</codeph> environment variable specifies which types of messages are
logged. See <xref href="#log_levels"/> for details.
</li>
<li>
The <codeph>&#8209;&#8209;logbuflevel</codeph> startup flag for the
<cmdname>impalad</cmdname> daemon specifies how often the log information is written
to disk. The default is 0, meaning that the log is immediately flushed to disk when
Impala outputs an important messages such as a warning or an error, but less important
messages such as informational ones are buffered in memory rather than being flushed
to disk immediately.
</li>
</ul>
</conbody>
</concept>
<concept id="logs_rotate">
<title>Rotating Impala Logs</title>
<prolog>
<metadata>
<data name="Category" value="Disk Storage"/>
</metadata>
</prolog>
<conbody>
<p>
Impala periodically switches the physical files representing the current log files,
after which it is safe to remove the old files if they are no longer needed.
</p>
<p>
Impala can automatically remove older unneeded log files, a feature known as <term>log
rotation</term>.
</p>
<p>
In Impala 2.2 and higher, the <codeph>&#8209;&#8209;max_log_files</codeph> configuration
option specifies how many log files to keep at each severity level
(<codeph>INFO</codeph>, <codeph>WARNING</codeph>, <codeph>ERROR</codeph>, and
<codeph>FATAL</codeph>). You can specify an appropriate setting for each Impala-related
daemon (<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and
<cmdname>catalogd</cmdname>).
<ul>
<li>
A value of 0 preserves all log files, in which case you would set up set up manual
log rotation using your Linux tool or technique of choice.
</li>
<li>
A value of 1 preserves only the very latest log file.
</li>
<li>
The default value is 10.
</li>
</ul>
</p>
<p>
Impala checks to see if any old logs need to be removed based on the interval specified
in the <codeph>&#8209;&#8209;logbufsecs</codeph> setting, every 5 seconds by default.
</p>
<p>
For some log levels, Impala logs are first temporarily buffered in memory and only
written to disk periodically. The <codeph>&#8209;&#8209;logbufsecs</codeph> setting
controls the maximum time that log messages are buffered for. For example, with the
default value of 5 seconds, there may be up to a 5 second delay before a logged message
shows up in the log file.
</p>
<p>
It is not recommended that you set <codeph>&#8209;&#8209;logbufsecs</codeph> to 0 as the
setting makes the Impala daemon to spin in the thread that tries to delete old log
files.
</p>
</conbody>
</concept>
<concept id="logs_debug">
<title>Reviewing Impala Logs</title>
<conbody>
<p> By default, the Impala log is stored at
<codeph>/var/log/impalad/</codeph>. The most comprehensive log,
showing informational, warning, and error messages, is in the file name
<filepath>impalad.INFO</filepath>. View log file contents by using the
web interface or by examining the contents of the log file. (When you
examine the logs through the file system, you can troubleshoot problems
by reading the <filepath>impalad.WARNING</filepath> and/or
<filepath>impalad.ERROR</filepath> files, which contain the subsets of
messages indicating potential problems.) </p>
<note>
<p>
The web interface limits the amount of logging information displayed. To view every
log entry, access the log files directly through the file system.
</p>
</note>
<p>
You can view the contents of the <codeph>impalad.INFO</codeph> log file in the file
system. With the default configuration settings, the start of the log file appears as
follows:
</p>
<codeblock>[user@example impalad]$ pwd
/var/log/impalad
[user@example impalad]$ more impalad.INFO
Log file created at: 2013/01/07 08:42:12
Running on machine: impala.example.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
Built on Fri, 21 Dec 2012 12:55:19 PST
I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
--dump_ir=false
--module_output=
--be_port=22000
--classpath=
--hostname=impala.example.com</codeblock>
<note>
The preceding example shows only a small part of the log file. Impala log files are
often several megabytes in size.
</note>
</conbody>
</concept>
<concept id="log_format">
<title>Understanding Impala Log Contents</title>
<conbody>
<p>
The logs store information about Impala startup options. This information appears once
for each time Impala is started and may include:
</p>
<ul>
<li>
Machine name.
</li>
<li>
Impala version number.
</li>
<li>
Flags used to start Impala.
</li>
<li>
CPU information.
</li>
<li>
The number of available disks.
</li>
</ul>
</conbody>
</concept>
<concept id="log_levels">
<title>Setting Logging Levels</title>
<conbody>
<p>
Impala uses the GLOG system, which supports three logging levels. You can adjust logging
levels by exporting variable settings. To change logging settings manually, use a
command similar to the following on each node before starting <codeph>impalad</codeph>:
</p>
<codeblock>export GLOG_v=1</codeblock>
<note>
For performance reasons, do not enable the most verbose logging level of 3 unless there
is no other alternative for troubleshooting.
</note>
<p>
For more information on how to configure GLOG, including how to set variable logging
levels for different system components, see <xref keyref="glog.html">documentation for
the glog project on github</xref>.
</p>
<section id="loglevels_details">
<title>Understanding What is Logged at Different Logging Levels</title>
<p>
As logging levels increase, the categories of information logged are cumulative. For
example, GLOG_v=2 records everything GLOG_v=1 records, as well as additional
information.
</p>
<p>
Increasing logging levels imposes performance overhead and increases log size. Where
practical, use GLOG_v=1 for most cases: this level has minimal performance impact but
still captures useful troubleshooting information.
</p>
<p>
Additional information logged at each level is as follows:
</p>
<ul>
<li>
GLOG_v=1 - The default level. Logs information about each connection and query that
is initiated to an <codeph>impalad</codeph> instance, including runtime profiles.
</li>
<li>
GLOG_v=2 - Everything from the previous level plus information for each RPC
initiated. This level also records query execution progress information, including
details on each file that is read.
</li>
<li>
GLOG_v=3 - Everything from the previous level plus logging of every row that is
read. This level is only applicable for the most serious troubleshooting and tuning
scenarios, because it can produce exceptionally large and detailed log files,
potentially leading to its own set of performance and capacity problems.
</li>
</ul>
</section>
</conbody>
</concept>
<concept id="redaction" rev="2.2.0">
<title>Redacting Sensitive Information from Impala Log Files</title>
<prolog>
<metadata>
<data name="Category" value="Redaction"/>
</metadata>
</prolog>
<conbody>
<p>
<term>Log redaction</term> is a security feature that prevents sensitive
information from being displayed in locations used by administrators for
monitoring and troubleshooting, such as log files and the Impala debug
web user interface. You configure regular expressions that match
sensitive types of information processed by your system, such as credit
card numbers or tax IDs, and literals matching these patterns are
obfuscated wherever they would normally be recorded in log files or
displayed in administration or debugging user interfaces. </p>
<p>
In a security context, the log redaction feature is complementary to the Sentry
authorization framework. Sentry prevents unauthorized users from being able to directly
access table data. Redaction prevents administrators or support personnel from seeing
the smaller amounts of sensitive or personally identifying information (PII) that might
appear in queries issued by those authorized users.
</p>
<p>
See <xref keyref="sg_redaction"/> for details about how to enable this feature and set
up the regular expressions to detect and redact sensitive information within SQL
statement text.
</p>
</conbody>
</concept>
</concept>