| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept rev="1.2" id="explain_level"> |
| |
| <title>EXPLAIN_LEVEL Query Option</title> |
| <titlealts audience="PDF"><navtitle>EXPLAIN_LEVEL</navtitle></titlealts> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Impala Query Options"/> |
| <data name="Category" value="Troubleshooting"/> |
| <data name="Category" value="Querying"/> |
| <data name="Category" value="Performance"/> |
| <data name="Category" value="Reports"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> Controls the amount of detail provided in the output of the |
| <codeph>EXPLAIN</codeph> statement. The basic output can help you |
| identify high-level performance issues such as scanning a higher volume of |
| data or more partitions than you expect. The higher levels of detail show |
| how intermediate results flow between nodes and how different SQL |
| operations such as <codeph>ORDER BY</codeph>, <codeph>GROUP BY</codeph>, |
| joins, and <codeph>WHERE</codeph> clauses are implemented within a |
| distributed query. </p> |
| |
| <p> |
| <b>Type:</b> <codeph>STRING</codeph> or <codeph>INT</codeph> |
| </p> |
| |
| <p> |
| <b>Default:</b> <codeph>1</codeph> |
| </p> |
| |
| <p> |
| <b>Arguments:</b> |
| </p> |
| |
| <p> |
| The allowed range of numeric values for this option is 0 to 3: |
| </p> |
| |
| <ul> |
| <li> |
| <codeph>0</codeph> or <codeph>MINIMAL</codeph>: A barebones list, one line per operation. Primarily useful |
| for checking the join order in very long queries where the regular <codeph>EXPLAIN</codeph> output is too |
| long to read easily. |
| </li> |
| |
| <li> |
| <codeph>1</codeph> or <codeph>STANDARD</codeph>: The default level of detail, showing the logical way that |
| work is split up for the distributed query. |
| </li> |
| |
| <li> |
| <codeph>2</codeph> or <codeph>EXTENDED</codeph>: Includes additional |
| detail about how the query planner uses statistics in its |
| decision-making process, to understand how a query could be tuned by |
| gathering statistics, using query hints, adding or removing predicates, |
| and so on. In <keyword keyref="impala32_full"/> and higher, the output |
| also includes the analyzed query with the cast information in the output |
| header, and the implicit cast info in the Predicate section.</li> |
| |
| <li> |
| <codeph>3</codeph> or <codeph>VERBOSE</codeph>: The maximum level of detail, showing how work is split up |
| within each node into <q>query fragments</q> that are connected in a pipeline. This extra detail is |
| primarily useful for low-level performance testing and tuning within Impala itself, rather than for |
| rewriting the SQL code at the user level. |
| </li> |
| </ul> |
| |
| <note> |
| Prior to Impala 1.3, the allowed argument range for <codeph>EXPLAIN_LEVEL</codeph> was 0 to 1: level 0 had |
| the mnemonic <codeph>NORMAL</codeph>, and level 1 was <codeph>VERBOSE</codeph>. In Impala 1.3 and higher, |
| <codeph>NORMAL</codeph> is not a valid mnemonic value, and <codeph>VERBOSE</codeph> still applies to the |
| highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any |
| older <codeph>impala-shell</codeph> script files that set the <codeph>EXPLAIN_LEVEL</codeph> query option. |
| </note> |
| |
| <p> |
| Changing the value of this option controls the amount of detail in the output of the <codeph>EXPLAIN</codeph> |
| statement. The extended information from level 2 or 3 is especially useful during performance tuning, when |
| you need to confirm whether the work for the query is distributed the way you expect, particularly for the |
| most resource-intensive operations such as join queries against large tables, queries against tables with |
| large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to |
| check estimated resource usage when you use the admission control or resource management features explained |
| in <xref href="impala_resource_management.xml#resource_management"/>. See |
| <xref href="impala_explain.xml#explain"/> for the syntax of the <codeph>EXPLAIN</codeph> statement, and |
| <xref href="impala_explain_plan.xml#perf_explain"/> for details about how to use the extended information. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> |
| |
| <p> |
| As always, read the <codeph>EXPLAIN</codeph> output from bottom to top. The lowest lines represent the |
| initial work of the query (scanning data files), the lines in the middle represent calculations done on each |
| node and how intermediate results are transmitted from one node to another, and the topmost lines represent |
| the final results being sent back to the coordinator node. |
| </p> |
| |
| <p> |
| The numbers in the left column are generated internally during the initial planning phase and do not |
| represent the actual order of operations, so it is not significant if they appear out of order in the |
| <codeph>EXPLAIN</codeph> output. |
| </p> |
| |
| <p> |
| At all <codeph>EXPLAIN</codeph> levels, the plan contains a warning if any tables in the query are missing |
| statistics. Use the <codeph>COMPUTE STATS</codeph> statement to gather statistics for each table and suppress |
| this warning. See <xref href="impala_perf_stats.xml#perf_stats"/> for details about how the statistics help |
| query performance. |
| </p> |
| |
| <p> |
| The <codeph>PROFILE</codeph> command in <cmdname>impala-shell</cmdname> always starts with an explain plan |
| showing full detail, the same as with <codeph>EXPLAIN_LEVEL=3</codeph>. <ph rev="1.4.0">After the explain |
| plan comes the executive summary, the same output as produced by the <codeph>SUMMARY</codeph> command in |
| <cmdname>impala-shell</cmdname>.</ph> |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/example_blurb"/> |
| |
| <p> |
| These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown |
| in <codeph>EXPLAIN</codeph> output: |
| </p> |
| |
| <codeblock>[localhost:21000] > create table t1 (x int, s string); |
| [localhost:21000] > set explain_level=1; |
| [localhost:21000] > explain select count(*) from t1; |
| +------------------------------------------------------------------------+ |
| | Explain String | |
| +------------------------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=10.00MB VCores=1 | |
| | WARNING: The following tables are missing relevant table and/or column | |
| | statistics. | |
| | explain_plan.t1 | |
| | | |
| | 03:AGGREGATE [MERGE FINALIZE] | |
| | | output: sum(count(*)) | |
| | | | |
| | 02:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | | |
| | 01:AGGREGATE | |
| | | output: count(*) | |
| | | | |
| | 00:SCAN HDFS [explain_plan.t1] | |
| | partitions=1/1 size=0B | |
| +------------------------------------------------------------------------+ |
| [localhost:21000] > explain select * from t1; |
| +------------------------------------------------------------------------+ |
| | Explain String | |
| +------------------------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 | |
| | WARNING: The following tables are missing relevant table and/or column | |
| | statistics. | |
| | explain_plan.t1 | |
| | | |
| | 01:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | | |
| | 00:SCAN HDFS [explain_plan.t1] | |
| | partitions=1/1 size=0B | |
| +------------------------------------------------------------------------+ |
| [localhost:21000] > set explain_level=2; |
| [localhost:21000] > explain select * from t1; |
| +------------------------------------------------------------------------+ |
| | Explain String | |
| +------------------------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 | |
| | WARNING: The following tables are missing relevant table and/or column | |
| | statistics. | |
| | explain_plan.t1 | |
| | | |
| | 01:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | hosts=0 per-host-mem=unavailable | |
| | | tuple-ids=0 row-size=19B cardinality=unavailable | |
| | | | |
| | 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] | |
| | partitions=1/1 size=0B | |
| | table stats: unavailable | |
| | column stats: unavailable | |
| | hosts=0 per-host-mem=0B | |
| | tuple-ids=0 row-size=19B cardinality=unavailable | |
| +------------------------------------------------------------------------+ |
| [localhost:21000] > set explain_level=3; |
| [localhost:21000] > explain select * from t1; |
| +------------------------------------------------------------------------+ |
| | Explain String | |
| +------------------------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 | |
| <b>| WARNING: The following tables are missing relevant table and/or column |</b> |
| <b>| statistics. |</b> |
| <b>| explain_plan.t1 |</b> |
| | | |
| | F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] | |
| | 01:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | hosts=0 per-host-mem=unavailable | |
| | tuple-ids=0 row-size=19B cardinality=unavailable | |
| | | |
| | F00:PLAN FRAGMENT [PARTITION=RANDOM] | |
| | DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] | |
| | 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] | |
| | partitions=1/1 size=0B | |
| <b>| table stats: unavailable |</b> |
| <b>| column stats: unavailable |</b> |
| | hosts=0 per-host-mem=0B | |
| | tuple-ids=0 row-size=19B cardinality=unavailable | |
| +------------------------------------------------------------------------+ |
| </codeblock> |
| |
| <p> |
| As the warning message demonstrates, most of the information needed for Impala to do efficient query |
| planning, and for you to understand the performance characteristics of the query, requires running the |
| <codeph>COMPUTE STATS</codeph> statement for the table: |
| </p> |
| |
| <codeblock>[localhost:21000] > compute stats t1; |
| +-----------------------------------------+ |
| | summary | |
| +-----------------------------------------+ |
| | Updated 1 partition(s) and 2 column(s). | |
| +-----------------------------------------+ |
| [localhost:21000] > explain select * from t1; |
| +------------------------------------------------------------------------+ |
| | Explain String | |
| +------------------------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 | |
| | | |
| | F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] | |
| | 01:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | hosts=0 per-host-mem=unavailable | |
| | tuple-ids=0 row-size=20B cardinality=0 | |
| | | |
| | F00:PLAN FRAGMENT [PARTITION=RANDOM] | |
| | DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] | |
| | 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] | |
| | partitions=1/1 size=0B | |
| <b>| table stats: 0 rows total |</b> |
| <b>| column stats: all |</b> |
| | hosts=0 per-host-mem=0B | |
| | tuple-ids=0 row-size=20B cardinality=0 | |
| +------------------------------------------------------------------------+ |
| </codeblock> |
| |
| <p> |
| Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the |
| <codeph>EXPLAIN</codeph> output and customize the amount of detail in the output. This example shows the |
| default <codeph>EXPLAIN</codeph> output for a three-way join query, then the equivalent output with a |
| <codeph>[SHUFFLE]</codeph> hint to change the join mechanism between the first two tables from a broadcast |
| join to a shuffle join. |
| </p> |
| |
| <codeblock>[localhost:21000] > set explain_level=1; |
| [localhost:21000] > explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x; |
| +---------------------------------------------------------+ |
| | Explain String | |
| +---------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=4.00GB VCores=3 | |
| | | |
| | 07:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | | |
| <b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b> |
| | | hash predicates: two.x = three.x | |
| | | | |
| <b>| |--06:EXCHANGE [BROADCAST] |</b> |
| | | | | |
| | | 02:SCAN HDFS [explain_plan.t1 three] | |
| | | partitions=1/1 size=0B | |
| | | | |
| <b>| 03:HASH JOIN [INNER JOIN, BROADCAST] |</b> |
| | | hash predicates: one.x = two.x | |
| | | | |
| <b>| |--05:EXCHANGE [BROADCAST] |</b> |
| | | | | |
| | | 01:SCAN HDFS [explain_plan.t1 two] | |
| | | partitions=1/1 size=0B | |
| | | | |
| | 00:SCAN HDFS [explain_plan.t1 one] | |
| | partitions=1/1 size=0B | |
| +---------------------------------------------------------+ |
| [localhost:21000] > explain select one.*, two.*, three.* |
| > from t1 one join [shuffle] t1 two join t1 three |
| > where one.x = two.x and two.x = three.x; |
| +---------------------------------------------------------+ |
| | Explain String | |
| +---------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=4.00GB VCores=3 | |
| | | |
| | 08:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | | |
| <b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b> |
| | | hash predicates: two.x = three.x | |
| | | | |
| <b>| |--07:EXCHANGE [BROADCAST] |</b> |
| | | | | |
| | | 02:SCAN HDFS [explain_plan.t1 three] | |
| | | partitions=1/1 size=0B | |
| | | | |
| <b>| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</b> |
| | | hash predicates: one.x = two.x | |
| | | | |
| <b>| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</b> |
| | | | | |
| | | 01:SCAN HDFS [explain_plan.t1 two] | |
| | | partitions=1/1 size=0B | |
| | | | |
| <b>| 05:EXCHANGE [PARTITION=HASH(one.x)] |</b> |
| | | | |
| | 00:SCAN HDFS [explain_plan.t1 one] | |
| | partitions=1/1 size=0B | |
| +---------------------------------------------------------+ |
| </codeblock> |
| |
| <p> |
| For a join involving many different tables, the default <codeph>EXPLAIN</codeph> output might stretch over |
| several pages, and the only details you care about might be the join order and the mechanism (broadcast or |
| shuffle) for joining each pair of tables. In that case, you might set <codeph>EXPLAIN_LEVEL</codeph> to its |
| lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example |
| shows how the rows from the first and second joined tables are hashed and divided among the nodes of the |
| cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the |
| final stage of join processing. |
| </p> |
| |
| <codeblock>[localhost:21000] > set explain_level=0; |
| [localhost:21000] > explain select one.*, two.*, three.* |
| > from t1 one join [shuffle] t1 two join t1 three |
| > where one.x = two.x and two.x = three.x; |
| +---------------------------------------------------------+ |
| | Explain String | |
| +---------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=4.00GB VCores=3 | |
| | | |
| | 08:EXCHANGE [PARTITION=UNPARTITIONED] | |
| <b>| 04:HASH JOIN [INNER JOIN, BROADCAST] |</b> |
| <b>| |--07:EXCHANGE [BROADCAST] |</b> |
| | | 02:SCAN HDFS [explain_plan.t1 three] | |
| <b>| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</b> |
| <b>| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</b> |
| | | 01:SCAN HDFS [explain_plan.t1 two] | |
| <b>| 05:EXCHANGE [PARTITION=HASH(one.x)] |</b> |
| | 00:SCAN HDFS [explain_plan.t1 one] | |
| +---------------------------------------------------------+ |
| </codeblock> |
| |
| <!-- Consider adding a related info section to collect the xrefs earlier on this page. --> |
| |
| </conbody> |
| </concept> |