docs/build/plain-html/topics/impala_explain_level.html - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 <meta name="copyright" content="(C) Copyright 2023" />
 <meta name="DC.rights.owner" content="(C) Copyright 2023" />
 <meta name="DC.Type" content="concept" />
 <meta name="DC.Title" content="EXPLAIN_LEVEL Query Option" />
 <meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" />
 <meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html" />
 <meta name="prodname" content="Impala" />
 <meta name="prodname" content="Impala" />
 <meta name="version" content="Impala 3.4.x" />
 <meta name="version" content="Impala 3.4.x" />
 <meta name="DC.Format" content="XHTML" />
 <meta name="DC.Identifier" content="explain_level" />
 <link rel="stylesheet" type="text/css" href="../commonltr.css" />
 <title>EXPLAIN_LEVEL Query Option</title>
 </head>
 <body id="explain_level">


   <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN_LEVEL Query Option</h1>


   <div class="body conbody">

     <p class="p"> Controls the amount of detail provided in the output of the
         <code class="ph codeph">EXPLAIN</code> statement. The basic output can help you
       identify high-level performance issues such as scanning a higher volume of
       data or more partitions than you expect. The higher levels of detail show
       how intermediate results flow between nodes and how different SQL
       operations such as <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>,
       joins, and <code class="ph codeph">WHERE</code> clauses are implemented within a
       distributed query. </p>


     <p class="p">
       <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> or <code class="ph codeph">INT</code>
     </p>


     <p class="p">
       <strong class="ph b">Default:</strong> <code class="ph codeph">1</code>
     </p>


     <p class="p">
       <strong class="ph b">Arguments:</strong>
     </p>


     <p class="p">
       The allowed range of numeric values for this option is 0 to 3:
     </p>


     <ul class="ul">
       <li class="li">
         <code class="ph codeph">0</code> or <code class="ph codeph">MINIMAL</code>: A barebones list, one line per operation. Primarily useful
         for checking the join order in very long queries where the regular <code class="ph codeph">EXPLAIN</code> output is too
         long to read easily.
       </li>


       <li class="li">
         <code class="ph codeph">1</code> or <code class="ph codeph">STANDARD</code>: The default level of detail, showing the logical way that
         work is split up for the distributed query.
       </li>


       <li class="li">
         <code class="ph codeph">2</code> or <code class="ph codeph">EXTENDED</code>: Includes additional
         detail about how the query planner uses statistics in its
         decision-making process, to understand how a query could be tuned by
         gathering statistics, using query hints, adding or removing predicates,
         and so on. In <span class="keyword">Impala 3.2</span> and higher, the output
         also includes the analyzed query with the cast information in the output
         header, and the implicit cast info in the Predicate section.</li>


       <li class="li">
         <code class="ph codeph">3</code> or <code class="ph codeph">VERBOSE</code>: The maximum level of detail, showing how work is split up
         within each node into <span class="q">"query fragments"</span> that are connected in a pipeline. This extra detail is
         primarily useful for low-level performance testing and tuning within Impala itself, rather than for
         rewriting the SQL code at the user level.
       </li>

     </ul>


     <div class="note note"><span class="notetitle">Note:</span>
       Prior to Impala 1.3, the allowed argument range for <code class="ph codeph">EXPLAIN_LEVEL</code> was 0 to 1: level 0 had
       the mnemonic <code class="ph codeph">NORMAL</code>, and level 1 was <code class="ph codeph">VERBOSE</code>. In Impala 1.3 and higher,
       <code class="ph codeph">NORMAL</code> is not a valid mnemonic value, and <code class="ph codeph">VERBOSE</code> still applies to the
       highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
       older <code class="ph codeph">impala-shell</code> script files that set the <code class="ph codeph">EXPLAIN_LEVEL</code> query option.
     </div>


     <p class="p">
       Changing the value of this option controls the amount of detail in the output of the <code class="ph codeph">EXPLAIN</code>
       statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
       you need to confirm whether the work for the query is distributed the way you expect, particularly for the
       most resource-intensive operations such as join queries against large tables, queries against tables with
       large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
       check estimated resource usage when you use the admission control or resource management features explained
       in <a class="xref" href="impala_resource_management.html#resource_management">Resource Management</a>. See
       <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for the syntax of the <code class="ph codeph">EXPLAIN</code> statement, and
       <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details about how to use the extended information.
     </p>


     <p class="p">
         <strong class="ph b">Usage notes:</strong>
       </p>


     <p class="p">
       As always, read the <code class="ph codeph">EXPLAIN</code> output from bottom to top. The lowest lines represent the
       initial work of the query (scanning data files), the lines in the middle represent calculations done on each
       node and how intermediate results are transmitted from one node to another, and the topmost lines represent
       the final results being sent back to the coordinator node.
     </p>


     <p class="p">
       The numbers in the left column are generated internally during the initial planning phase and do not
       represent the actual order of operations, so it is not significant if they appear out of order in the
       <code class="ph codeph">EXPLAIN</code> output.
     </p>


     <p class="p">
       At all <code class="ph codeph">EXPLAIN</code> levels, the plan contains a warning if any tables in the query are missing
       statistics. Use the <code class="ph codeph">COMPUTE STATS</code> statement to gather statistics for each table and suppress
       this warning. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details about how the statistics help
       query performance.
     </p>


     <p class="p">
       The <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> always starts with an explain plan
       showing full detail, the same as with <code class="ph codeph">EXPLAIN_LEVEL=3</code>. <span class="ph">After the explain
       plan comes the executive summary, the same output as produced by the <code class="ph codeph">SUMMARY</code> command in
       <span class="keyword cmdname">impala-shell</span>.</span>
     </p>


     <p class="p">
         <strong class="ph b">Examples:</strong>
       </p>


     <p class="p">
       These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
       in <code class="ph codeph">EXPLAIN</code> output:
     </p>


 <pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, s string);
 [localhost:21000] &gt; set explain_level=1;
 [localhost:21000] &gt; explain select count(*) from t1;
 +------------------------------------------------------------------------+
 | Explain String                                                         |
 +------------------------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=10.00MB VCores=1               |
 | WARNING: The following tables are missing relevant table and/or column |
 |   statistics.                                                          |
 | explain_plan.t1                                                        |
 |                                                                        |
 | 03:AGGREGATE [MERGE FINALIZE]                                          |
 | |  output: sum(count(*))                                               |
 | |                                                                      |
 | 02:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
 | |                                                                      |
 | 01:AGGREGATE                                                           |
 | |  output: count(*)                                                    |
 | |                                                                      |
 | 00:SCAN HDFS [explain_plan.t1]                                         |
 |    partitions=1/1 size=0B                                              |
 +------------------------------------------------------------------------+
 [localhost:21000] &gt; explain select * from t1;
 +------------------------------------------------------------------------+
 | Explain String                                                         |
 +------------------------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
 | WARNING: The following tables are missing relevant table and/or column |
 |   statistics.                                                          |
 | explain_plan.t1                                                        |
 |                                                                        |
 | 01:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
 | |                                                                      |
 | 00:SCAN HDFS [explain_plan.t1]                                         |
 |    partitions=1/1 size=0B                                              |
 +------------------------------------------------------------------------+
 [localhost:21000] &gt; set explain_level=2;
 [localhost:21000] &gt; explain select * from t1;
 +------------------------------------------------------------------------+
 | Explain String                                                         |
 +------------------------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
 | WARNING: The following tables are missing relevant table and/or column |
 |   statistics.                                                          |
 | explain_plan.t1                                                        |
 |                                                                        |
 | 01:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
 | |  hosts=0 per-host-mem=unavailable                                    |
 | |  tuple-ids=0 row-size=19B cardinality=unavailable                    |
 | |                                                                      |
 | 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                       |
 |    partitions=1/1 size=0B                                              |
 |    table stats: unavailable                                            |
 |    column stats: unavailable                                           |
 |    hosts=0 per-host-mem=0B                                             |
 |    tuple-ids=0 row-size=19B cardinality=unavailable                    |
 +------------------------------------------------------------------------+
 [localhost:21000] &gt; set explain_level=3;
 [localhost:21000] &gt; explain select * from t1;
 +------------------------------------------------------------------------+
 | Explain String                                                         |
 +------------------------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
 <strong class="ph b">| WARNING: The following tables are missing relevant table and/or column |</strong>
 <strong class="ph b">|   statistics.                                                          |</strong>
 <strong class="ph b">| explain_plan.t1                                                        |</strong>
 |                                                                        |
 | F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
 |   01:EXCHANGE [PARTITION=UNPARTITIONED]                                |
 |      hosts=0 per-host-mem=unavailable                                  |
 |      tuple-ids=0 row-size=19B cardinality=unavailable                  |
 |                                                                        |
 | F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
 |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
 |   00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                     |
 |      partitions=1/1 size=0B                                            |
 <strong class="ph b">|      table stats: unavailable                                          |</strong>
 <strong class="ph b">|      column stats: unavailable                                         |</strong>
 |      hosts=0 per-host-mem=0B                                           |
 |      tuple-ids=0 row-size=19B cardinality=unavailable                  |
 +------------------------------------------------------------------------+
 </code></pre>

     <p class="p">
       As the warning message demonstrates, most of the information needed for Impala to do efficient query
       planning, and for you to understand the performance characteristics of the query, requires running the
       <code class="ph codeph">COMPUTE STATS</code> statement for the table:
     </p>


 <pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats t1;
 +-----------------------------------------+
 | summary                                 |
 +-----------------------------------------+
 | Updated 1 partition(s) and 2 column(s). |
 +-----------------------------------------+
 [localhost:21000] &gt; explain select * from t1;
 +------------------------------------------------------------------------+
 | Explain String                                                         |
 +------------------------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
 |                                                                        |
 | F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
 |   01:EXCHANGE [PARTITION=UNPARTITIONED]                                |
 |      hosts=0 per-host-mem=unavailable                                  |
 |      tuple-ids=0 row-size=20B cardinality=0                            |
 |                                                                        |
 | F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
 |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
 |   00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                     |
 |      partitions=1/1 size=0B                                            |
 <strong class="ph b">|      table stats: 0 rows total                                         |</strong>
 <strong class="ph b">|      column stats: all                                                 |</strong>
 |      hosts=0 per-host-mem=0B                                           |
 |      tuple-ids=0 row-size=20B cardinality=0                            |
 +------------------------------------------------------------------------+
 </code></pre>

     <p class="p">
       Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
       <code class="ph codeph">EXPLAIN</code> output and customize the amount of detail in the output. This example shows the
       default <code class="ph codeph">EXPLAIN</code> output for a three-way join query, then the equivalent output with a
       <code class="ph codeph">[SHUFFLE]</code> hint to change the join mechanism between the first two tables from a broadcast
       join to a shuffle join.
     </p>


 <pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=1;
 [localhost:21000] &gt; explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
 +---------------------------------------------------------+
 | Explain String                                          |
 +---------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
 |                                                         |
 | 07:EXCHANGE [PARTITION=UNPARTITIONED]                   |
 | |                                                       |
 <strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
 | |  hash predicates: two.x = three.x                     |
 | |                                                       |
 <strong class="ph b">| |--06:EXCHANGE [BROADCAST]                              |</strong>
 | |  |                                                    |
 | |  02:SCAN HDFS [explain_plan.t1 three]                 |
 | |     partitions=1/1 size=0B                            |
 | |                                                       |
 <strong class="ph b">| 03:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
 | |  hash predicates: one.x = two.x                       |
 | |                                                       |
 <strong class="ph b">| |--05:EXCHANGE [BROADCAST]                              |</strong>
 | |  |                                                    |
 | |  01:SCAN HDFS [explain_plan.t1 two]                   |
 | |     partitions=1/1 size=0B                            |
 | |                                                       |
 | 00:SCAN HDFS [explain_plan.t1 one]                      |
 |    partitions=1/1 size=0B                               |
 +---------------------------------------------------------+
 [localhost:21000] &gt; explain select one.*, two.*, three.*
                   &gt; from t1 one join [shuffle] t1 two join t1 three
                   &gt; where one.x = two.x and two.x = three.x;
 +---------------------------------------------------------+
 | Explain String                                          |
 +---------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
 |                                                         |
 | 08:EXCHANGE [PARTITION=UNPARTITIONED]                   |
 | |                                                       |
 <strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
 | |  hash predicates: two.x = three.x                     |
 | |                                                       |
 <strong class="ph b">| |--07:EXCHANGE [BROADCAST]                              |</strong>
 | |  |                                                    |
 | |  02:SCAN HDFS [explain_plan.t1 three]                 |
 | |     partitions=1/1 size=0B                            |
 | |                                                       |
 <strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED]                  |</strong>
 | |  hash predicates: one.x = two.x                       |
 | |                                                       |
 <strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)]                  |</strong>
 | |  |                                                    |
 | |  01:SCAN HDFS [explain_plan.t1 two]                   |
 | |     partitions=1/1 size=0B                            |
 | |                                                       |
 <strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)]                     |</strong>
 | |                                                       |
 | 00:SCAN HDFS [explain_plan.t1 one]                      |
 |    partitions=1/1 size=0B                               |
 +---------------------------------------------------------+
 </code></pre>

     <p class="p">
       For a join involving many different tables, the default <code class="ph codeph">EXPLAIN</code> output might stretch over
       several pages, and the only details you care about might be the join order and the mechanism (broadcast or
       shuffle) for joining each pair of tables. In that case, you might set <code class="ph codeph">EXPLAIN_LEVEL</code> to its
       lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
       shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
       cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
       final stage of join processing.
     </p>


 <pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=0;
 [localhost:21000] &gt; explain select one.*, two.*, three.*
                   &gt; from t1 one join [shuffle] t1 two join t1 three
                   &gt; where one.x = two.x and two.x = three.x;
 +---------------------------------------------------------+
 | Explain String                                          |
 +---------------------------------------------------------+
 | Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
 |                                                         |
 | 08:EXCHANGE [PARTITION=UNPARTITIONED]                   |
 <strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
 <strong class="ph b">| |--07:EXCHANGE [BROADCAST]                              |</strong>
 | |  02:SCAN HDFS [explain_plan.t1 three]                 |
 <strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED]                  |</strong>
 <strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)]                  |</strong>
 | |  01:SCAN HDFS [explain_plan.t1 two]                   |
 <strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)]                     |</strong>
 | 00:SCAN HDFS [explain_plan.t1 one]                      |
 +---------------------------------------------------------+
 </code></pre>


   </div>

 <div class="related-links">
 <ul class="ullinks">
 <li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br />
 </li>
 </ul>

 <div class="familylinks">
 <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div>
 </div>
 </div></body>
 </html>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

	<meta name="copyright" content="(C) Copyright 2023" />
	<meta name="DC.rights.owner" content="(C) Copyright 2023" />
	<meta name="DC.Type" content="concept" />
	<meta name="DC.Title" content="EXPLAIN_LEVEL Query Option" />
	<meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" />
	<meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html" />
	<meta name="prodname" content="Impala" />
	<meta name="prodname" content="Impala" />
	<meta name="version" content="Impala 3.4.x" />
	<meta name="version" content="Impala 3.4.x" />
	<meta name="DC.Format" content="XHTML" />
	<meta name="DC.Identifier" content="explain_level" />
	<link rel="stylesheet" type="text/css" href="../commonltr.css" />
	<title>EXPLAIN_LEVEL Query Option</title>
	</head>
	<body id="explain_level">


	<h1 class="title topictitle1" id="ariaid-title1">EXPLAIN_LEVEL Query Option</h1>




	<div class="body conbody">

	<p class="p"> Controls the amount of detail provided in the output of the
	<code class="ph codeph">EXPLAIN</code> statement. The basic output can help you
	identify high-level performance issues such as scanning a higher volume of
	data or more partitions than you expect. The higher levels of detail show
	how intermediate results flow between nodes and how different SQL
	operations such as <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>,
	joins, and <code class="ph codeph">WHERE</code> clauses are implemented within a
	distributed query. </p>


	<p class="p">
	<strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> or <code class="ph codeph">INT</code>
	</p>


	<p class="p">
	<strong class="ph b">Default:</strong> <code class="ph codeph">1</code>
	</p>


	<p class="p">
	<strong class="ph b">Arguments:</strong>
	</p>


	<p class="p">
	The allowed range of numeric values for this option is 0 to 3:
	</p>


	<ul class="ul">
	<li class="li">
	<code class="ph codeph">0</code> or <code class="ph codeph">MINIMAL</code>: A barebones list, one line per operation. Primarily useful
	for checking the join order in very long queries where the regular <code class="ph codeph">EXPLAIN</code> output is too
	long to read easily.
	</li>


	<li class="li">
	<code class="ph codeph">1</code> or <code class="ph codeph">STANDARD</code>: The default level of detail, showing the logical way that
	work is split up for the distributed query.
	</li>


	<li class="li">
	<code class="ph codeph">2</code> or <code class="ph codeph">EXTENDED</code>: Includes additional
	detail about how the query planner uses statistics in its
	decision-making process, to understand how a query could be tuned by
	gathering statistics, using query hints, adding or removing predicates,
	and so on. In <span class="keyword">Impala 3.2</span> and higher, the output
	also includes the analyzed query with the cast information in the output
	header, and the implicit cast info in the Predicate section.</li>


	<li class="li">
	<code class="ph codeph">3</code> or <code class="ph codeph">VERBOSE</code>: The maximum level of detail, showing how work is split up
	within each node into <span class="q">"query fragments"</span> that are connected in a pipeline. This extra detail is
	primarily useful for low-level performance testing and tuning within Impala itself, rather than for
	rewriting the SQL code at the user level.
	</li>

	</ul>


	<div class="note note"><span class="notetitle">Note:</span>
	Prior to Impala 1.3, the allowed argument range for <code class="ph codeph">EXPLAIN_LEVEL</code> was 0 to 1: level 0 had
	the mnemonic <code class="ph codeph">NORMAL</code>, and level 1 was <code class="ph codeph">VERBOSE</code>. In Impala 1.3 and higher,
	<code class="ph codeph">NORMAL</code> is not a valid mnemonic value, and <code class="ph codeph">VERBOSE</code> still applies to the
	highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
	older <code class="ph codeph">impala-shell</code> script files that set the <code class="ph codeph">EXPLAIN_LEVEL</code> query option.
	</div>


	<p class="p">
	Changing the value of this option controls the amount of detail in the output of the <code class="ph codeph">EXPLAIN</code>
	statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
	you need to confirm whether the work for the query is distributed the way you expect, particularly for the
	most resource-intensive operations such as join queries against large tables, queries against tables with
	large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
	check estimated resource usage when you use the admission control or resource management features explained
	in <a class="xref" href="impala_resource_management.html#resource_management">Resource Management</a>. See
	<a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for the syntax of the <code class="ph codeph">EXPLAIN</code> statement, and
	<a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details about how to use the extended information.
	</p>


	<p class="p">
	<strong class="ph b">Usage notes:</strong>
	</p>


	<p class="p">
	As always, read the <code class="ph codeph">EXPLAIN</code> output from bottom to top. The lowest lines represent the
	initial work of the query (scanning data files), the lines in the middle represent calculations done on each
	node and how intermediate results are transmitted from one node to another, and the topmost lines represent
	the final results being sent back to the coordinator node.
	</p>


	<p class="p">
	The numbers in the left column are generated internally during the initial planning phase and do not
	represent the actual order of operations, so it is not significant if they appear out of order in the
	<code class="ph codeph">EXPLAIN</code> output.
	</p>


	<p class="p">
	At all <code class="ph codeph">EXPLAIN</code> levels, the plan contains a warning if any tables in the query are missing
	statistics. Use the <code class="ph codeph">COMPUTE STATS</code> statement to gather statistics for each table and suppress
	this warning. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details about how the statistics help
	query performance.
	</p>


	<p class="p">
	The <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> always starts with an explain plan
	showing full detail, the same as with <code class="ph codeph">EXPLAIN_LEVEL=3</code>. <span class="ph">After the explain
	plan comes the executive summary, the same output as produced by the <code class="ph codeph">SUMMARY</code> command in
	<span class="keyword cmdname">impala-shell</span>.</span>
	</p>


	<p class="p">
	<strong class="ph b">Examples:</strong>
	</p>


	<p class="p">
	These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
	in <code class="ph codeph">EXPLAIN</code> output:
	</p>


	<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int, s string);
	[localhost:21000] > set explain_level=1;
	[localhost:21000] > explain select count(*) from t1;
	+------------------------------------------------------------------------+
	\| Explain String \|
	+------------------------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=10.00MB VCores=1 \|
	\| WARNING: The following tables are missing relevant table and/or column \|
	\| statistics. \|
	\| explain_plan.t1 \|
	\| \|
	\| 03:AGGREGATE [MERGE FINALIZE] \|
	\| \| output: sum(count(*)) \|
	\| \| \|
	\| 02:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| \| \|
	\| 01:AGGREGATE \|
	\| \| output: count(*) \|
	\| \| \|
	\| 00:SCAN HDFS [explain_plan.t1] \|
	\| partitions=1/1 size=0B \|
	+------------------------------------------------------------------------+
	[localhost:21000] > explain select * from t1;
	+------------------------------------------------------------------------+
	\| Explain String \|
	+------------------------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 \|
	\| WARNING: The following tables are missing relevant table and/or column \|
	\| statistics. \|
	\| explain_plan.t1 \|
	\| \|
	\| 01:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| \| \|
	\| 00:SCAN HDFS [explain_plan.t1] \|
	\| partitions=1/1 size=0B \|
	+------------------------------------------------------------------------+
	[localhost:21000] > set explain_level=2;
	[localhost:21000] > explain select * from t1;
	+------------------------------------------------------------------------+
	\| Explain String \|
	+------------------------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 \|
	\| WARNING: The following tables are missing relevant table and/or column \|
	\| statistics. \|
	\| explain_plan.t1 \|
	\| \|
	\| 01:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| \| hosts=0 per-host-mem=unavailable \|
	\| \| tuple-ids=0 row-size=19B cardinality=unavailable \|
	\| \| \|
	\| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] \|
	\| partitions=1/1 size=0B \|
	\| table stats: unavailable \|
	\| column stats: unavailable \|
	\| hosts=0 per-host-mem=0B \|
	\| tuple-ids=0 row-size=19B cardinality=unavailable \|
	+------------------------------------------------------------------------+
	[localhost:21000] > set explain_level=3;
	[localhost:21000] > explain select * from t1;
	+------------------------------------------------------------------------+
	\| Explain String \|
	+------------------------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 \|
	<strong class="ph b">\| WARNING: The following tables are missing relevant table and/or column \|</strong>
	<strong class="ph b">\| statistics. \|</strong>
	<strong class="ph b">\| explain_plan.t1 \|</strong>
	\| \|
	\| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] \|
	\| 01:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| hosts=0 per-host-mem=unavailable \|
	\| tuple-ids=0 row-size=19B cardinality=unavailable \|
	\| \|
	\| F00:PLAN FRAGMENT [PARTITION=RANDOM] \|
	\| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] \|
	\| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] \|
	\| partitions=1/1 size=0B \|
	<strong class="ph b">\| table stats: unavailable \|</strong>
	<strong class="ph b">\| column stats: unavailable \|</strong>
	\| hosts=0 per-host-mem=0B \|
	\| tuple-ids=0 row-size=19B cardinality=unavailable \|
	+------------------------------------------------------------------------+
	</code></pre>

	<p class="p">
	As the warning message demonstrates, most of the information needed for Impala to do efficient query
	planning, and for you to understand the performance characteristics of the query, requires running the
	<code class="ph codeph">COMPUTE STATS</code> statement for the table:
	</p>


	<pre class="pre codeblock"><code>[localhost:21000] > compute stats t1;
	+-----------------------------------------+
	\| summary \|
	+-----------------------------------------+
	\| Updated 1 partition(s) and 2 column(s). \|
	+-----------------------------------------+
	[localhost:21000] > explain select * from t1;
	+------------------------------------------------------------------------+
	\| Explain String \|
	+------------------------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 \|
	\| \|
	\| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] \|
	\| 01:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| hosts=0 per-host-mem=unavailable \|
	\| tuple-ids=0 row-size=20B cardinality=0 \|
	\| \|
	\| F00:PLAN FRAGMENT [PARTITION=RANDOM] \|
	\| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] \|
	\| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] \|
	\| partitions=1/1 size=0B \|
	<strong class="ph b">\| table stats: 0 rows total \|</strong>
	<strong class="ph b">\| column stats: all \|</strong>
	\| hosts=0 per-host-mem=0B \|
	\| tuple-ids=0 row-size=20B cardinality=0 \|
	+------------------------------------------------------------------------+
	</code></pre>

	<p class="p">
	Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
	<code class="ph codeph">EXPLAIN</code> output and customize the amount of detail in the output. This example shows the
	default <code class="ph codeph">EXPLAIN</code> output for a three-way join query, then the equivalent output with a
	<code class="ph codeph">[SHUFFLE]</code> hint to change the join mechanism between the first two tables from a broadcast
	join to a shuffle join.
	</p>


	<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=1;
	[localhost:21000] > explain select one., two., three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
	+---------------------------------------------------------+
	\| Explain String \|
	+---------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 \|
	\| \|
	\| 07:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| \| \|
	<strong class="ph b">\| 04:HASH JOIN [INNER JOIN, BROADCAST] \|</strong>
	\| \| hash predicates: two.x = three.x \|
	\| \| \|
	<strong class="ph b">\| \|--06:EXCHANGE [BROADCAST] \|</strong>
	\| \| \| \|
	\| \| 02:SCAN HDFS [explain_plan.t1 three] \|
	\| \| partitions=1/1 size=0B \|
	\| \| \|
	<strong class="ph b">\| 03:HASH JOIN [INNER JOIN, BROADCAST] \|</strong>
	\| \| hash predicates: one.x = two.x \|
	\| \| \|
	<strong class="ph b">\| \|--05:EXCHANGE [BROADCAST] \|</strong>
	\| \| \| \|
	\| \| 01:SCAN HDFS [explain_plan.t1 two] \|
	\| \| partitions=1/1 size=0B \|
	\| \| \|
	\| 00:SCAN HDFS [explain_plan.t1 one] \|
	\| partitions=1/1 size=0B \|
	+---------------------------------------------------------+
	[localhost:21000] > explain select one., two., three.*
	> from t1 one join [shuffle] t1 two join t1 three
	> where one.x = two.x and two.x = three.x;
	+---------------------------------------------------------+
	\| Explain String \|
	+---------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 \|
	\| \|
	\| 08:EXCHANGE [PARTITION=UNPARTITIONED] \|
	\| \| \|
	<strong class="ph b">\| 04:HASH JOIN [INNER JOIN, BROADCAST] \|</strong>
	\| \| hash predicates: two.x = three.x \|
	\| \| \|
	<strong class="ph b">\| \|--07:EXCHANGE [BROADCAST] \|</strong>
	\| \| \| \|
	\| \| 02:SCAN HDFS [explain_plan.t1 three] \|
	\| \| partitions=1/1 size=0B \|
	\| \| \|
	<strong class="ph b">\| 03:HASH JOIN [INNER JOIN, PARTITIONED] \|</strong>
	\| \| hash predicates: one.x = two.x \|
	\| \| \|
	<strong class="ph b">\| \|--06:EXCHANGE [PARTITION=HASH(two.x)] \|</strong>
	\| \| \| \|
	\| \| 01:SCAN HDFS [explain_plan.t1 two] \|
	\| \| partitions=1/1 size=0B \|
	\| \| \|
	<strong class="ph b">\| 05:EXCHANGE [PARTITION=HASH(one.x)] \|</strong>
	\| \| \|
	\| 00:SCAN HDFS [explain_plan.t1 one] \|
	\| partitions=1/1 size=0B \|
	+---------------------------------------------------------+
	</code></pre>

	<p class="p">
	For a join involving many different tables, the default <code class="ph codeph">EXPLAIN</code> output might stretch over
	several pages, and the only details you care about might be the join order and the mechanism (broadcast or
	shuffle) for joining each pair of tables. In that case, you might set <code class="ph codeph">EXPLAIN_LEVEL</code> to its
	lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
	shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
	cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
	final stage of join processing.
	</p>


	<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=0;
	[localhost:21000] > explain select one., two., three.*
	> from t1 one join [shuffle] t1 two join t1 three
	> where one.x = two.x and two.x = three.x;
	+---------------------------------------------------------+
	\| Explain String \|
	+---------------------------------------------------------+
	\| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 \|
	\| \|
	\| 08:EXCHANGE [PARTITION=UNPARTITIONED] \|
	<strong class="ph b">\| 04:HASH JOIN [INNER JOIN, BROADCAST] \|</strong>
	<strong class="ph b">\| \|--07:EXCHANGE [BROADCAST] \|</strong>
	\| \| 02:SCAN HDFS [explain_plan.t1 three] \|
	<strong class="ph b">\| 03:HASH JOIN [INNER JOIN, PARTITIONED] \|</strong>
	<strong class="ph b">\| \|--06:EXCHANGE [PARTITION=HASH(two.x)] \|</strong>
	\| \| 01:SCAN HDFS [explain_plan.t1 two] \|
	<strong class="ph b">\| 05:EXCHANGE [PARTITION=HASH(one.x)] \|</strong>
	\| 00:SCAN HDFS [explain_plan.t1 one] \|
	+---------------------------------------------------------+
	</code></pre>



	</div>

	<div class="related-links">
	<ul class="ullinks">
	<li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br />
	</li>
	</ul>

	<div class="familylinks">
	<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div>
	</div>
	</div></body>
	</html>