docs/topics/impala_logging.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="logging">

   <title>Using Impala Logging</title>
   <titlealts audience="PDF"><navtitle>Logging</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Logs"/>
       <data name="Category" value="Troubleshooting"/>
       <data name="Category" value="Administrators"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <p>
       The Impala logs record information about:
     </p>

     <ul>
       <li>
         Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and
         troubleshoot that problem before you can do anything further with Impala.
       </li>

       <li>
         How Impala is configured.
       </li>

       <li>
         Jobs Impala has completed.
       </li>
     </ul>

     <note>
       <p>
         Formerly, the logs contained the query profile for each query, showing low-level details of how the work is
         distributed among nodes and how intermediate and final results are transmitted across the network. To save
         space, those query profiles are now stored in zlib-compressed files in
         <filepath>/var/log/impala/profiles</filepath>. You can access them through the Impala web user interface.
         For example, at <codeph>http://<varname>impalad-node-hostname</varname>:25000/queries</codeph>, each query
         is followed by a <codeph>Profile</codeph> link leading to a page showing extensive analytical data for the
         query execution.
       </p>

       <p rev="1.1.1">
         The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when
         enabled. See <xref href="impala_auditing.xml#auditing"/> for details.
       </p>

       <p rev="2.9.0 IMPALA-4431">
         In <keyword keyref="impala29_full"/> and higher, you can control how many
         audit event log files are kept on each host through the
         <codeph>--max_audit_event_log_files</codeph> startup option for the
         <cmdname>impalad</cmdname> daemon, similar to the <codeph>--max_log_files</codeph>
         option for regular log files.
       </p>

       <p rev="2.2.0">
         The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
         enabled. See <xref href="impala_lineage.xml#lineage"/> for details.
       </p>
     </note>

     <p outputclass="toc inpage"/>

   </conbody>

   <concept id="logs_details">

     <title>Locations and Names of Impala Log Files</title>

     <conbody>

       <ul>
         <li>
           By default, the log files are under the directory <filepath>/var/log/impala</filepath>.
           To change log file locations, modify the defaults file described in
           <xref href="impala_processes.xml#processes"/>.
         </li>

         <li>
           The significant files for the <codeph>impalad</codeph> process are <filepath>impalad.INFO</filepath>,
           <filepath>impalad.WARNING</filepath>, and <filepath>impalad.ERROR</filepath>. You might also see a file
           <filepath>impalad.FATAL</filepath>, although this is only present in rare conditions.
         </li>

         <li>
           The significant files for the <codeph>statestored</codeph> process are
           <filepath>statestored.INFO</filepath>, <filepath>statestored.WARNING</filepath>, and
           <filepath>statestored.ERROR</filepath>. You might also see a file <filepath>statestored.FATAL</filepath>,
           although this is only present in rare conditions.
         </li>

         <li rev="1.2">
           The significant files for the <codeph>catalogd</codeph> process are <filepath>catalogd.INFO</filepath>,
           <filepath>catalogd.WARNING</filepath>, and <filepath>catalogd.ERROR</filepath>. You might also see a file
           <filepath>catalogd.FATAL</filepath>, although this is only present in rare conditions.
         </li>

         <li>
           Examine the <codeph>.INFO</codeph> files to see configuration settings for the processes.
         </li>

         <li>
           Examine the <codeph>.WARNING</codeph> files to see all kinds of problem information, including such
           things as suboptimal settings and also serious runtime errors.
         </li>

         <li>
           Examine the <codeph>.ERROR</codeph> and/or <codeph>.FATAL</codeph> files to see only the most serious
           errors, if the processes crash, or queries fail to complete. These messages are also in the
           <codeph>.WARNING</codeph> file.
         </li>

         <li>
           A new set of log files is produced each time the associated daemon is restarted. These log files have
           long names including a timestamp. The <codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and
           <codeph>.ERROR</codeph> files are physically represented as symbolic links to the latest applicable log
           files.
         </li>

         <li>
           The init script for the <codeph>impala-server</codeph> service also produces a consolidated log file
           <codeph>/var/logs/impalad/impala-server.log</codeph>, with all the same information as the
           corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files.
         </li>

         <li>
           The init script for the <codeph>impala-state-store</codeph> service also produces a consolidated log file
           <codeph>/var/logs/impalad/impala-state-store.log</codeph>, with all the same information as the
           corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files.
         </li>
       </ul>

       <p>
         Impala stores information using the <codeph>glog_v</codeph> logging system. You will see some messages
         referring to C++ file names. Logging is affected by:
       </p>

       <ul>
         <li>
           The <codeph>GLOG_v</codeph> environment variable specifies which types of messages are logged. See
           <xref href="#log_levels"/> for details.
         </li>

         <li>
           The <codeph>--logbuflevel</codeph> startup flag for the <cmdname>impalad</cmdname> daemon specifies how
           often the log information is written to disk. The default is 0, meaning that the log is immediately
           flushed to disk when Impala outputs an important messages such as a warning or an error, but less
           important messages such as informational ones are buffered in memory rather than being flushed to disk
           immediately.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="logs_managing">

     <title>Managing Impala Logs</title>
   <prolog>
     <metadata>
       <data name="Category" value="Administrators"/>
     </metadata>
   </prolog>

     <conbody>

       <p>
         Review Impala log files on each host, when you have traced an issue back to a specific system.
       </p>

     </conbody>

   </concept>

   <concept id="logs_rotate">

     <title>Rotating Impala Logs</title>
   <prolog>
     <metadata>
       <data name="Category" value="Disk Storage"/>
     </metadata>
   </prolog>

     <conbody>

       <p>
         Impala periodically switches the physical files representing the current log files, after which it is safe
         to remove the old files if they are no longer needed.
       </p>

       <p>
         Impala can automatically remove older unneeded log files, a feature known as <term>log rotation</term>.
 <!-- Another instance of the text also used in impala_new_features.xml
              and impala_fixed_issues.xml. (Just took out the word "new"
              and added the reference to the starting release.)
              At this point, a conref is definitely in the cards. -->
       </p>

       <p>
         In Impala 2.2 and higher, the <codeph>--max_log_files</codeph> configuration option specifies how many log
         files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon
         (<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and <cmdname>catalogd</cmdname>). The default
         value is 10, meaning that Impala preserves the latest 10 log files for each severity level
         (<codeph>INFO</codeph>, <codeph>WARNING</codeph>, <codeph>ERROR</codeph>, and <codeph>FATAL</codeph>).
         Impala checks to see if any old logs need to be removed based on the interval specified in the
         <codeph>logbufsecs</codeph> setting, every 5 seconds by default.
       </p>

 <!-- This extra detail only appears here. Consider if it's worth including it
            in the conref so people don't need to follow a link just for a couple of
            minor factoids. -->

       <p>
         A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your
         Linux tool or technique of choice. A value of 1 preserves only the very latest log file.
       </p>

     </conbody>

   </concept>

   <concept id="logs_debug">

     <title>Reviewing Impala Logs</title>

     <conbody>

       <p>
         By default, the Impala log is stored at <codeph>/var/logs/impalad/</codeph>. The most comprehensive log,
         showing informational, warning, and error messages, is in the file name <filepath>impalad.INFO</filepath>.
         View log file contents by using the web interface or by examining the contents of the log file. (When you
         examine the logs through the file system, you can troubleshoot problems by reading the
         <filepath>impalad.WARNING</filepath> and/or <filepath>impalad.ERROR</filepath> files, which contain the
         subsets of messages indicating potential problems.)
       </p>

       <p>
         On a machine named <codeph>impala.example.com</codeph> with default settings, you could view the Impala
         logs on that machine by using a browser to access <codeph>http://impala.example.com:25000/logs</codeph>.
       </p>

       <note>
         <p>
           The web interface limits the amount of logging information displayed. To view every log entry, access the
           log files directly through the file system.
         </p>
       </note>

       <p>
         You can view the contents of the <codeph>impalad.INFO</codeph> log file in the file system. With the
         default configuration settings, the start of the log file appears as follows:
       </p>

 <codeblock>[user@example impalad]$ pwd
 /var/log/impalad
 [user@example impalad]$ more impalad.INFO
 Log file created at: 2013/01/07 08:42:12
 Running on machine: impala.example.com
 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
 I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
 Built on Fri, 21 Dec 2012 12:55:19 PST
 I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
 I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
 --dump_ir=false
 --module_output=
 --be_port=22000
 --classpath=
 --hostname=impala.example.com</codeblock>

       <note>
         The preceding example shows only a small part of the log file. Impala log files are often several megabytes
         in size.
       </note>

     </conbody>

   </concept>

   <concept id="log_format">

     <title>Understanding Impala Log Contents</title>

     <conbody>

       <p>
         The logs store information about Impala startup options. This information appears once for each time Impala
         is started and may include:
       </p>

       <ul>
         <li>
           Machine name.
         </li>

         <li>
           Impala version number.
         </li>

         <li>
           Flags used to start Impala.
         </li>

         <li>
           CPU information.
         </li>

         <li>
           The number of available disks.
         </li>
       </ul>

       <p>
         There is information about each job Impala has run. Because each Impala job creates an additional set of
         data about queries, the amount of job specific data may be very large. Logs may contained detailed
         information on jobs. These detailed log entries may include:
       </p>

       <ul>
         <li>
           The composition of the query.
         </li>

         <li>
           The degree of data locality.
         </li>

         <li>
           Statistics on data throughput and response times.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="log_levels">

     <title>Setting Logging Levels</title>

     <conbody>

       <p>
         Impala uses the GLOG system, which supports three logging levels. You can adjust logging levels
         by exporting variable settings. To change logging settings manually, use a command
         similar to the following on each node before starting <codeph>impalad</codeph>:
       </p>

 <codeblock>export GLOG_v=1</codeblock>

       <note>
         For performance reasons, do not enable the most verbose logging level of 3 unless there is
         no other alternative for troubleshooting.
       </note>

       <p>
         For more information on how to configure GLOG, including how to set variable logging levels for different
         system components, see
         <xref keyref="glog.html">documentation for the glog project on github</xref>.
       </p>

       <section id="loglevels_details">

         <title>Understanding What is Logged at Different Logging Levels</title>

         <p>
           As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2
           records everything GLOG_v=1 records, as well as additional information.
         </p>

         <p>
           Increasing logging levels imposes performance overhead and increases log size. Where practical, use
           GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful
           troubleshooting information.
         </p>

         <p>
           Additional information logged at each level is as follows:
         </p>

         <ul>
           <li>
             GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an
             <codeph>impalad</codeph> instance, including runtime profiles.
           </li>

           <li>
             GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also
             records query execution progress information, including details on each file that is read.
           </li>

           <li>
             GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is
             only applicable for the most serious troubleshooting and tuning scenarios, because it can produce
             exceptionally large and detailed log files, potentially leading to its own set of performance and
             capacity problems.
           </li>
         </ul>

       </section>

     </conbody>

   </concept>

   <concept id="redaction" rev="2.2.0">

     <title>Redacting Sensitive Information from Impala Log Files</title>
     <prolog>
       <metadata>
         <data name="Category" value="Redaction"/>
       </metadata>
     </prolog>

     <conbody>

       <p>
         <indexterm audience="hidden">redaction</indexterm>
         <term>Log redaction</term> is a security feature that prevents sensitive information from being displayed in
         locations used by administrators for monitoring and troubleshooting, such as log files and the Impala debug web
         user interface. You configure regular expressions that match sensitive types of information processed by your
         system, such as credit card numbers or tax IDs, and literals matching these patterns are obfuscated wherever
         they would normally be recorded in log files or displayed in administration or debugging user interfaces.
       </p>

       <p>
         In a security context, the log redaction feature is complementary to the Sentry authorization framework.
         Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents
         administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying
         information (PII) that might appear in queries issued by those authorized users.
       </p>

       <p>
         See <xref keyref="sg_redaction"/> for details about how to enable this feature and set
         up the regular expressions to detect and redact sensitive information within SQL statement text.
       </p>

     </conbody>

   </concept>

 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept id="logging">

	<title>Using Impala Logging</title>
	<titlealts audience="PDF"><navtitle>Logging</navtitle></titlealts>
	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="Logs"/>
	<data name="Category" value="Troubleshooting"/>
	<data name="Category" value="Administrators"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Data Analysts"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	The Impala logs record information about:
	</p>

	<ul>
	<li>
	Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and
	troubleshoot that problem before you can do anything further with Impala.
	</li>

	<li>
	How Impala is configured.
	</li>

	<li>
	Jobs Impala has completed.
	</li>
	</ul>

	<note>
	<p>
	Formerly, the logs contained the query profile for each query, showing low-level details of how the work is
	distributed among nodes and how intermediate and final results are transmitted across the network. To save
	space, those query profiles are now stored in zlib-compressed files in
	<filepath>/var/log/impala/profiles</filepath>. You can access them through the Impala web user interface.
	For example, at <codeph>http://<varname>impalad-node-hostname</varname>:25000/queries</codeph>, each query
	is followed by a <codeph>Profile</codeph> link leading to a page showing extensive analytical data for the
	query execution.
	</p>

	<p rev="1.1.1">
	The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when
	enabled. See <xref href="impala_auditing.xml#auditing"/> for details.
	</p>

	<p rev="2.9.0 IMPALA-4431">
	In <keyword keyref="impala29_full"/> and higher, you can control how many
	audit event log files are kept on each host through the
	<codeph>--max_audit_event_log_files</codeph> startup option for the
	<cmdname>impalad</cmdname> daemon, similar to the <codeph>--max_log_files</codeph>
	option for regular log files.
	</p>

	<p rev="2.2.0">
	The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
	enabled. See <xref href="impala_lineage.xml#lineage"/> for details.
	</p>
	</note>

	<p outputclass="toc inpage"/>

	</conbody>

	<concept id="logs_details">

	<title>Locations and Names of Impala Log Files</title>

	<conbody>

	<ul>
	<li>
	By default, the log files are under the directory <filepath>/var/log/impala</filepath>.
	To change log file locations, modify the defaults file described in
	<xref href="impala_processes.xml#processes"/>.
	</li>

	<li>
	The significant files for the <codeph>impalad</codeph> process are <filepath>impalad.INFO</filepath>,
	<filepath>impalad.WARNING</filepath>, and <filepath>impalad.ERROR</filepath>. You might also see a file
	<filepath>impalad.FATAL</filepath>, although this is only present in rare conditions.
	</li>

	<li>
	The significant files for the <codeph>statestored</codeph> process are
	<filepath>statestored.INFO</filepath>, <filepath>statestored.WARNING</filepath>, and
	<filepath>statestored.ERROR</filepath>. You might also see a file <filepath>statestored.FATAL</filepath>,
	although this is only present in rare conditions.
	</li>

	<li rev="1.2">
	The significant files for the <codeph>catalogd</codeph> process are <filepath>catalogd.INFO</filepath>,
	<filepath>catalogd.WARNING</filepath>, and <filepath>catalogd.ERROR</filepath>. You might also see a file
	<filepath>catalogd.FATAL</filepath>, although this is only present in rare conditions.
	</li>

	<li>
	Examine the <codeph>.INFO</codeph> files to see configuration settings for the processes.
	</li>

	<li>
	Examine the <codeph>.WARNING</codeph> files to see all kinds of problem information, including such
	things as suboptimal settings and also serious runtime errors.
	</li>

	<li>
	Examine the <codeph>.ERROR</codeph> and/or <codeph>.FATAL</codeph> files to see only the most serious
	errors, if the processes crash, or queries fail to complete. These messages are also in the
	<codeph>.WARNING</codeph> file.
	</li>

	<li>
	A new set of log files is produced each time the associated daemon is restarted. These log files have
	long names including a timestamp. The <codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and
	<codeph>.ERROR</codeph> files are physically represented as symbolic links to the latest applicable log
	files.
	</li>

	<li>
	The init script for the <codeph>impala-server</codeph> service also produces a consolidated log file
	<codeph>/var/logs/impalad/impala-server.log</codeph>, with all the same information as the
	corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files.
	</li>

	<li>
	The init script for the <codeph>impala-state-store</codeph> service also produces a consolidated log file
	<codeph>/var/logs/impalad/impala-state-store.log</codeph>, with all the same information as the
	corresponding<codeph>.INFO</codeph>, <codeph>.WARNING</codeph>, and <codeph>.ERROR</codeph> files.
	</li>
	</ul>

	<p>
	Impala stores information using the <codeph>glog_v</codeph> logging system. You will see some messages
	referring to C++ file names. Logging is affected by:
	</p>

	<ul>
	<li>
	The <codeph>GLOG_v</codeph> environment variable specifies which types of messages are logged. See
	<xref href="#log_levels"/> for details.
	</li>

	<li>
	The <codeph>--logbuflevel</codeph> startup flag for the <cmdname>impalad</cmdname> daemon specifies how
	often the log information is written to disk. The default is 0, meaning that the log is immediately
	flushed to disk when Impala outputs an important messages such as a warning or an error, but less
	important messages such as informational ones are buffered in memory rather than being flushed to disk
	immediately.
	</li>
	</ul>

	</conbody>

	</concept>

	<concept id="logs_managing">

	<title>Managing Impala Logs</title>
	<prolog>
	<metadata>
	<data name="Category" value="Administrators"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	Review Impala log files on each host, when you have traced an issue back to a specific system.
	</p>

	</conbody>

	</concept>

	<concept id="logs_rotate">

	<title>Rotating Impala Logs</title>
	<prolog>
	<metadata>
	<data name="Category" value="Disk Storage"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	Impala periodically switches the physical files representing the current log files, after which it is safe
	to remove the old files if they are no longer needed.
	</p>

	<p>
	Impala can automatically remove older unneeded log files, a feature known as <term>log rotation</term>.
	<!-- Another instance of the text also used in impala_new_features.xml
	and impala_fixed_issues.xml. (Just took out the word "new"
	and added the reference to the starting release.)
	At this point, a conref is definitely in the cards. -->
	</p>

	<p>
	In Impala 2.2 and higher, the <codeph>--max_log_files</codeph> configuration option specifies how many log
	files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon
	(<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and <cmdname>catalogd</cmdname>). The default
	value is 10, meaning that Impala preserves the latest 10 log files for each severity level
	(<codeph>INFO</codeph>, <codeph>WARNING</codeph>, <codeph>ERROR</codeph>, and <codeph>FATAL</codeph>).
	Impala checks to see if any old logs need to be removed based on the interval specified in the
	<codeph>logbufsecs</codeph> setting, every 5 seconds by default.
	</p>

	<!-- This extra detail only appears here. Consider if it's worth including it
	in the conref so people don't need to follow a link just for a couple of
	minor factoids. -->

	<p>
	A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your
	Linux tool or technique of choice. A value of 1 preserves only the very latest log file.
	</p>

	</conbody>

	</concept>

	<concept id="logs_debug">

	<title>Reviewing Impala Logs</title>

	<conbody>

	<p>
	By default, the Impala log is stored at <codeph>/var/logs/impalad/</codeph>. The most comprehensive log,
	showing informational, warning, and error messages, is in the file name <filepath>impalad.INFO</filepath>.
	View log file contents by using the web interface or by examining the contents of the log file. (When you
	examine the logs through the file system, you can troubleshoot problems by reading the
	<filepath>impalad.WARNING</filepath> and/or <filepath>impalad.ERROR</filepath> files, which contain the
	subsets of messages indicating potential problems.)
	</p>

	<p>
	On a machine named <codeph>impala.example.com</codeph> with default settings, you could view the Impala
	logs on that machine by using a browser to access <codeph>http://impala.example.com:25000/logs</codeph>.
	</p>

	<note>
	<p>
	The web interface limits the amount of logging information displayed. To view every log entry, access the
	log files directly through the file system.
	</p>
	</note>

	<p>
	You can view the contents of the <codeph>impalad.INFO</codeph> log file in the file system. With the
	default configuration settings, the start of the log file appears as follows:
	</p>

	<codeblock>[user@example impalad]$ pwd
	/var/log/impalad
	[user@example impalad]$ more impalad.INFO
	Log file created at: 2013/01/07 08:42:12
	Running on machine: impala.example.com
	Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
	I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
	Built on Fri, 21 Dec 2012 12:55:19 PST
	I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
	I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
	--dump_ir=false
	--module_output=
	--be_port=22000
	--classpath=
	--hostname=impala.example.com</codeblock>

	<note>
	The preceding example shows only a small part of the log file. Impala log files are often several megabytes
	in size.
	</note>

	</conbody>

	</concept>

	<concept id="log_format">

	<title>Understanding Impala Log Contents</title>

	<conbody>

	<p>
	The logs store information about Impala startup options. This information appears once for each time Impala
	is started and may include:
	</p>

	<ul>
	<li>
	Machine name.
	</li>

	<li>
	Impala version number.
	</li>

	<li>
	Flags used to start Impala.
	</li>

	<li>
	CPU information.
	</li>

	<li>
	The number of available disks.
	</li>
	</ul>

	<p>
	There is information about each job Impala has run. Because each Impala job creates an additional set of
	data about queries, the amount of job specific data may be very large. Logs may contained detailed
	information on jobs. These detailed log entries may include:
	</p>

	<ul>
	<li>
	The composition of the query.
	</li>

	<li>
	The degree of data locality.
	</li>

	<li>
	Statistics on data throughput and response times.
	</li>
	</ul>

	</conbody>

	</concept>

	<concept id="log_levels">

	<title>Setting Logging Levels</title>

	<conbody>

	<p>
	Impala uses the GLOG system, which supports three logging levels. You can adjust logging levels
	by exporting variable settings. To change logging settings manually, use a command
	similar to the following on each node before starting <codeph>impalad</codeph>:
	</p>

	<codeblock>export GLOG_v=1</codeblock>

	<note>
	For performance reasons, do not enable the most verbose logging level of 3 unless there is
	no other alternative for troubleshooting.
	</note>

	<p>
	For more information on how to configure GLOG, including how to set variable logging levels for different
	system components, see
	<xref keyref="glog.html">documentation for the glog project on github</xref>.
	</p>

	<section id="loglevels_details">

	<title>Understanding What is Logged at Different Logging Levels</title>

	<p>
	As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2
	records everything GLOG_v=1 records, as well as additional information.
	</p>

	<p>
	Increasing logging levels imposes performance overhead and increases log size. Where practical, use
	GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful
	troubleshooting information.
	</p>

	<p>
	Additional information logged at each level is as follows:
	</p>

	<ul>
	<li>
	GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an
	<codeph>impalad</codeph> instance, including runtime profiles.
	</li>

	<li>
	GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also
	records query execution progress information, including details on each file that is read.
	</li>

	<li>
	GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is
	only applicable for the most serious troubleshooting and tuning scenarios, because it can produce
	exceptionally large and detailed log files, potentially leading to its own set of performance and
	capacity problems.
	</li>
	</ul>

	</section>

	</conbody>

	</concept>

	<concept id="redaction" rev="2.2.0">

	<title>Redacting Sensitive Information from Impala Log Files</title>
	<prolog>
	<metadata>
	<data name="Category" value="Redaction"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	<indexterm audience="hidden">redaction</indexterm>
	<term>Log redaction</term> is a security feature that prevents sensitive information from being displayed in
	locations used by administrators for monitoring and troubleshooting, such as log files and the Impala debug web
	user interface. You configure regular expressions that match sensitive types of information processed by your
	system, such as credit card numbers or tax IDs, and literals matching these patterns are obfuscated wherever
	they would normally be recorded in log files or displayed in administration or debugging user interfaces.
	</p>

	<p>
	In a security context, the log redaction feature is complementary to the Sentry authorization framework.
	Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents
	administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying
	information (PII) that might appear in queries issued by those authorized users.
	</p>

	<p>
	See <xref keyref="sg_redaction"/> for details about how to enable this feature and set
	up the regular expressions to detect and redact sensitive information within SQL statement text.
	</p>

	</conbody>

	</concept>

	</concept>