| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="explain"> |
| |
| <title>EXPLAIN Statement</title> |
| <titlealts audience="PDF"><navtitle>EXPLAIN</navtitle></titlealts> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Querying"/> |
| <data name="Category" value="Reports"/> |
| <data name="Category" value="Planning"/> |
| <data name="Category" value="Performance"/> |
| <data name="Category" value="Troubleshooting"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| <indexterm audience="hidden">EXPLAIN statement</indexterm> |
| Returns the execution plan for a statement, showing the low-level mechanisms that Impala will use to read the |
| data, divide the work among nodes in the cluster, and transmit intermediate and final results across the |
| network. Use <codeph>explain</codeph> followed by a complete <codeph>SELECT</codeph> query. For example: |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/syntax_blurb"/> |
| |
| <codeblock>EXPLAIN { <varname>select_query</varname> | <varname>ctas_stmt</varname> | <varname>insert_stmt</varname> } |
| </codeblock> |
| |
| <p> |
| The <varname>select_query</varname> is a <codeph>SELECT</codeph> statement, optionally prefixed by a |
| <codeph>WITH</codeph> clause. See <xref href="impala_select.xml#select"/> for details. |
| </p> |
| |
| <p> |
| The <varname>insert_stmt</varname> is an <codeph>INSERT</codeph> statement that inserts into or overwrites an |
| existing table. It can use either the <codeph>INSERT ... SELECT</codeph> or <codeph>INSERT ... |
| VALUES</codeph> syntax. See <xref href="impala_insert.xml#insert"/> for details. |
| </p> |
| |
| <p> |
| The <varname>ctas_stmt</varname> is a <codeph>CREATE TABLE</codeph> statement using the <codeph>AS |
| SELECT</codeph> clause, typically abbreviated as a <q>CTAS</q> operation. See |
| <xref href="impala_create_table.xml#create_table"/> for details. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> |
| |
| <p> |
| You can interpret the output to judge whether the query is performing efficiently, and adjust the query |
| and/or the schema if not. For example, you might change the tests in the <codeph>WHERE</codeph> clause, add |
| hints to make join operations more efficient, introduce subqueries, change the order of tables in a join, add |
| or change partitioning for a table, collect column statistics and/or table statistics in Hive, or any other |
| performance tuning steps. |
| </p> |
| |
| <p> |
| The <codeph>EXPLAIN</codeph> output reminds you if table or column statistics are missing from any table |
| involved in the query. These statistics are important for optimizing queries involving large tables or |
| multi-table joins. See <xref href="impala_compute_stats.xml#compute_stats"/> for how to gather statistics, |
| and <xref href="impala_perf_stats.xml#perf_stats"/> for how to use this information for query tuning. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/explain_interpret"/> |
| |
| <p> |
| If you come from a traditional database background and are not familiar with data warehousing, keep in mind |
| that Impala is optimized for full table scans across very large tables. The structure and distribution of |
| this data is typically not suitable for the kind of indexing and single-row lookups that are common in OLTP |
| environments. Seeing a query scan entirely through a large table is common, not necessarily an indication of |
| an inefficient query. Of course, if you can reduce the volume of scanned data by orders of magnitude, for |
| example by using a query that affects only certain partitions within a partitioned table, then you might be |
| able to optimize a query so that it executes in seconds rather than minutes. |
| </p> |
| |
| <p> |
| For more information and examples to help you interpret <codeph>EXPLAIN</codeph> output, see |
| <xref href="impala_explain_plan.xml#perf_explain"/>. |
| </p> |
| |
| <p rev="1.2"> |
| <b>Extended EXPLAIN output:</b> |
| </p> |
| |
| <p rev="1.2"> |
| For performance tuning of complex queries, and capacity planning (such as using the admission control and |
| resource management features), you can enable more detailed and informative output for the |
| <codeph>EXPLAIN</codeph> statement. In the <cmdname>impala-shell</cmdname> interpreter, issue the command |
| <codeph>SET EXPLAIN_LEVEL=<varname>level</varname></codeph>, where <varname>level</varname> is an integer |
| from 0 to 3 or corresponding mnemonic values <codeph>minimal</codeph>, <codeph>standard</codeph>, |
| <codeph>extended</codeph>, or <codeph>verbose</codeph>. |
| </p> |
| |
| <p rev="1.2"> |
| When extended <codeph>EXPLAIN</codeph> output is enabled, <codeph>EXPLAIN</codeph> statements print |
| information about estimated memory requirements, minimum number of virtual cores, and so on. |
| <!-- |
| that you can use to fine-tune the resource management options explained in <xref href="impala_resource_management.xml#rm_options"/>. |
| (The estimated memory requirements are intentionally on the high side, to allow a margin for error, |
| to avoid cancelling a query unnecessarily if you set the <codeph>MEM_LIMIT</codeph> option to the estimated memory figure.) |
| --> |
| </p> |
| |
| <p> |
| See <xref href="impala_explain_level.xml#explain_level"/> for details and examples. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/example_blurb"/> |
| |
| <p> |
| This example shows how the standard <codeph>EXPLAIN</codeph> output moves from the lowest (physical) level to |
| the higher (logical) levels. The query begins by scanning a certain amount of data; each node performs an |
| aggregation operation (evaluating <codeph>COUNT(*)</codeph>) on some subset of data that is local to that |
| node; the intermediate results are transmitted back to the coordinator node (labelled here as the |
| <codeph>EXCHANGE</codeph> node); lastly, the intermediate results are summed to display the final result. |
| </p> |
| |
| <codeblock id="explain_plan_simple">[impalad-host:21000] > explain select count(*) from customer_address; |
| +----------------------------------------------------------+ |
| | Explain String | |
| +----------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=42.00MB VCores=1 | |
| | | |
| | 03:AGGREGATE [MERGE FINALIZE] | |
| | | output: sum(count(*)) | |
| | | | |
| | 02:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | | |
| | 01:AGGREGATE | |
| | | output: count(*) | |
| | | | |
| | 00:SCAN HDFS [default.customer_address] | |
| | partitions=1/1 size=5.25MB | |
| +----------------------------------------------------------+ |
| </codeblock> |
| |
| <p> |
| These examples show how the extended <codeph>EXPLAIN</codeph> output becomes more accurate and informative as |
| statistics are gathered by the <codeph>COMPUTE STATS</codeph> statement. Initially, much of the information |
| about data size and distribution is marked <q>unavailable</q>. Impala can determine the raw data size, but |
| not the number of rows or number of distinct values for each column without additional analysis. The |
| <codeph>COMPUTE STATS</codeph> statement performs this analysis, so a subsequent <codeph>EXPLAIN</codeph> |
| statement has additional information to use in deciding how to optimize the distributed query. |
| </p> |
| |
| <!-- To do: |
| Re-run these examples with more substantial tables populated with data. |
| --> |
| |
| <codeblock rev="1.2">[localhost:21000] > set explain_level=extended; |
| EXPLAIN_LEVEL set to extended |
| [localhost:21000] > explain select x from t1; |
| [localhost:21000] > explain select x from t1; |
| +----------------------------------------------------------+ |
| | Explain String | |
| +----------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=32.00MB VCores=1 | |
| | | |
| | 01:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | hosts=1 per-host-mem=unavailable | |
| <b>| | tuple-ids=0 row-size=4B cardinality=unavailable |</b> |
| | | | |
| | 00:SCAN HDFS [default.t2, PARTITION=RANDOM] | |
| | partitions=1/1 size=36B | |
| <b>| table stats: unavailable |</b> |
| <b>| column stats: unavailable |</b> |
| | hosts=1 per-host-mem=32.00MB | |
| <b>| tuple-ids=0 row-size=4B cardinality=unavailable |</b> |
| +----------------------------------------------------------+ |
| </codeblock> |
| |
| <codeblock rev="1.2">[localhost:21000] > compute stats t1; |
| +-----------------------------------------+ |
| | summary | |
| +-----------------------------------------+ |
| | Updated 1 partition(s) and 1 column(s). | |
| +-----------------------------------------+ |
| [localhost:21000] > explain select x from t1; |
| +----------------------------------------------------------+ |
| | Explain String | |
| +----------------------------------------------------------+ |
| | Estimated Per-Host Requirements: Memory=64.00MB VCores=1 | |
| | | |
| | 01:EXCHANGE [PARTITION=UNPARTITIONED] | |
| | | hosts=1 per-host-mem=unavailable | |
| | | tuple-ids=0 row-size=4B cardinality=0 | |
| | | | |
| | 00:SCAN HDFS [default.t1, PARTITION=RANDOM] | |
| | partitions=1/1 size=36B | |
| <b>| table stats: 0 rows total |</b> |
| <b>| column stats: all |</b> |
| | hosts=1 per-host-mem=64.00MB | |
| <b>| tuple-ids=0 row-size=4B cardinality=0 |</b> |
| +----------------------------------------------------------+ |
| </codeblock> |
| |
| <p conref="../shared/impala_common.xml#common/security_blurb"/> |
| <p conref="../shared/impala_common.xml#common/redaction_yes"/> |
| |
| <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/> |
| |
| <p conref="../shared/impala_common.xml#common/permissions_blurb"/> |
| <p rev=""> |
| <!-- Doublecheck these details. Does EXPLAIN really need any permissions? --> |
| The user ID that the <cmdname>impalad</cmdname> daemon runs under, |
| typically the <codeph>impala</codeph> user, must have read |
| and execute permissions for all applicable directories in all source tables |
| for the query that is being explained. |
| (A <codeph>SELECT</codeph> operation could read files from multiple different HDFS directories |
| if the source table is partitioned.) |
| </p> |
| |
| <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/> |
| <p> |
| The <codeph>EXPLAIN</codeph> statement displays equivalent plan |
| information for queries against Kudu tables as for queries |
| against HDFS-based tables. |
| </p> |
| |
| <p> |
| To see which predicates Impala can <q>push down</q> to Kudu for |
| efficient evaluation, without transmitting unnecessary rows back |
| to Impala, look for the <codeph>kudu predicates</codeph> item in |
| the scan phase of the query. The label <codeph>kudu predicates</codeph> |
| indicates a condition that can be evaluated efficiently on the Kudu |
| side. The label <codeph>predicates</codeph> in a <codeph>SCAN KUDU</codeph> |
| node indicates a condition that is evaluated by Impala. |
| For example, in a table with primary key column <codeph>X</codeph> |
| and non-primary key column <codeph>Y</codeph>, you can see that |
| some operators in the <codeph>WHERE</codeph> clause are evaluated |
| immediately by Kudu and others are evaluated later by Impala: |
| </p> |
| |
| <codeblock rev="2.9.0 IMPALA-4859"> |
| EXPLAIN SELECT x,y from kudu_table WHERE |
| x = 1 AND y NOT IN (2,3) AND z = 1 |
| AND a IS NOT NULL AND b > 0 AND length(s) > 5; |
| +---------------- |
| | Explain String |
| +---------------- |
| ... |
| | 00:SCAN KUDU [kudu_table] |
| | predicates: y NOT IN (2, 3), length(s) > 5 |
| | kudu predicates: a IS NOT NULL, b > 0, x = 1, z = 1 |
| </codeblock> |
| |
| <p rev="2.9.0 IMPALA-4859"> |
| Only binary predicates, <codeph>IS NULL</codeph> and <codeph>IS NOT NULL</codeph> |
| (in <keyword keyref="impala29"/> and higher), and <codeph>IN</codeph> predicates |
| containing literal values that exactly match the types in the Kudu table, and do not |
| require any casting, can be pushed to Kudu. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/related_info"/> |
| <p> |
| <xref href="impala_select.xml#select"/>, |
| <xref href="impala_insert.xml#insert"/>, |
| <xref href="impala_create_table.xml#create_table"/>, |
| <xref href="impala_explain_plan.xml#explain_plan"/> |
| </p> |
| |
| </conbody> |
| </concept> |