docs/_docs/sql-reference/explain-statement.adoc - ignite-3 - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = EXPLAIN Command

 The `EXPLAIN` command is used to display the execution plan of an SQL query, showing how the query will be processed by the sql engine.
 It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step.
 This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions.

 == Syntax

 [.diagram-container]
 Diagram(
     Terminal('EXPLAIN'),
     Optional(
         Sequence(
             Choice(
                 0,
                 Terminal('PLAN'),
                 Terminal('MAPPING')
             ),
             Terminal('FOR')
         )
     ),
     NonTerminal('query_or_dml')
 )

 If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit.

 Parameters:

 - `PLAN` - explains query in terms of relational operators tree.
 This representation is suitable for investigation of performance issues related to the optimizer.

 - `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster.
 This representation is suitable for investigation of performance issues related to the data colocation.

 Examples:

 [source,sql]
 ----
 EXPLAIN SELECT * FROM lineitem;
 EXPLAIN PLAN FOR SELECT * FROM lineitem;
 EXPLAIN MAPPING FOR SELECT * FROM lineitem;
 ----

 == Understanding The Output

 Each query plan is represented as a tree-like structure composed of link:sql-reference/explain-operators-list[**relational operators**].

 A node in the plan includes:

 - A **name**, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types)
 - A set of **attributes**, relevant to that specific operator

 [source,text]
 ----
 OperatorName
     attribute1: value1
     attribute2: value2
 ----

 Examples:

 [source,text]
 ----
 TableScan                        // Full table access
     table: PUBLIC.EMP
     fieldNames: [NAME, SALARY]
     est: (rows=1)

 IndexScan                        // Index-based access
     table: PUBLIC.EMP
     index: EMP_NAME_DESC_IDX
     type: SORTED
     fields: [NAME]
     collation: [NAME DESC]
     est: (rows=1)

 Sort                             // Sort rows
     collation: [NAME DESC NULLS LAST]
     est: (rows=1)
 ----

 === Operator Naming

 The operator name reflects the specific algorithm or strategy used.
 For example:

 - `TableScan` – Full scan of a base table.
 - `IndexScan` – Access via index, possibly sorted.
 - `Sort` – Explicit sorting step.
 - `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms.
 - `Limit`, `Project`, `Exchange` – Execution-related transformations and controls.

 === Hierarchical Plan Structure

 The plan is structured as a **tree**, where:

 - **Leaf nodes** represent data sources (e.g., `TableScan`)
 - **Internal nodes** represent data transformations (e.g., `Join`, `Sort`)
 - **The root node** (topmost) is the final operator that produces the result

 == Examples

 === Example: Complex Join

 [source,sql]
 ----
 EXPLAIN PLAN FOR
  SELECT
       U.UserName, P.ProductName, R.ReviewText, R.Rating
    FROM Users U, Reviews R, Products P
   WHERE U.UserID = R.UserID
     AND R.ProductID = P.ProductID
     AND P.ProductName = 'Product_' || ?::varchar
 ----

 The resulting output is:

 [example]
 ----
 Project
     fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
     projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
     est: (rows=16650)
   HashJoin
       predicate: =(USERID$0, USERID)
       fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
       type: inner
       est: (rows=16650)
     HashJoin
         predicate: =(PRODUCTID, PRODUCTID$0)
         fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
         type: inner
         est: (rows=16650)
       Exchange
           distribution: single
           est: (rows=50000)
         TableScan
             table: PUBLIC.REVIEWS
             fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
             est: (rows=50000)
       Exchange
           distribution: single
           est: (rows=1665)
         TableScan
             table: PUBLIC.PRODUCTS
             predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
             fieldNames: [PRODUCTID, PRODUCTNAME]
             est: (rows=1665)
     Exchange
         distribution: single
         est: (rows=10000)
       TableScan
           table: PUBLIC.USERS
           fieldNames: [USERID, USERNAME]
           est: (rows=10000)
 ----

 This execution plan represents a query that joins three tables: `USERS`, `REVIEWS`, and `PRODUCTS`, and selects four fields after filtering by product name.

 * **Project** (root node):
 Outputs the final selected fields — `USERNAME`, `PRODUCTNAME`, `REVIEWTEXT`, and `RATING`.

 * **HashJoins** (two levels):
 Perform the inner joins.
 ** The first (bottom-most) joins `REVIEWS` with `PRODUCTS` on `PRODUCTID`.
 ** The second joins the result with `USERS` on `USERID`.

 * **TableScans**:
 Each table is scanned:
 ** `REVIEWS` is fully scanned.
 ** `PRODUCTS` is scanned with a filter on `PRODUCTNAME`.
 ** `USERS` is fully scanned.

 * **Exchange** nodes:
 Indicate data redistribution between operators.

 Each node includes:

 - `fieldNames`: Output columns at that stage.
 - `predicate`: Join or filter condition.
 - `est`: Estimated number of rows at that point in the plan.

 === Example: Query Mapping

 A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology.
 So, for the command like below:

 [source,sql]
 ----
 EXPLAIN MAPPING FOR
  SELECT
       U.UserName, P.ProductName, R.ReviewText, R.Rating
    FROM Users U, Reviews R, Products P
   WHERE U.UserID = R.UserID
     AND R.ProductID = P.ProductID
     AND P.ProductName = 'Product_' || ?::varchar
 ----

 The resulting output is:

 [example]
 ----
 Fragment#0 root
   distribution: single
   executionNodes: [node_1]
   tree:
     Project
         fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
         projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
         est: (rows=1)
       HashJoin
           predicate: =(USERID$0, USERID)
           fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
           type: inner
           est: (rows=1)
         HashJoin
             predicate: =(PRODUCTID, PRODUCTID$0)
             fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
             type: inner
             est: (rows=1)
           Receiver
               fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
               sourceFragmentId: 1
               est: (rows=1)
           Receiver
               fieldNames: [PRODUCTID, PRODUCTNAME]
               sourceFragmentId: 2
               est: (rows=1)
         Receiver
             fieldNames: [USERID, USERNAME]
             sourceFragmentId: 3
             est: (rows=1)

 Fragment#1
   distribution: random
   executionNodes: [node_1, node_2, node_3]
   partitions: [REVIEWS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
   tree:
     Sender
         distribution: single
         targetFragmentId: 0
         est: (rows=50000)
       TableScan
           table: PUBLIC.REVIEWS
           fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
           est: (rows=50000)

 Fragment#2
   distribution: table PUBLIC.PRODUCTS in zone "Default"
   executionNodes: [node_1, node_2, node_3]
   partitions: [PRODUCTS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
   tree:
     Sender
         distribution: single
         targetFragmentId: 0
         est: (rows=1665)
       TableScan
           table: PUBLIC.PRODUCTS
           predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
           fieldNames: [PRODUCTID, PRODUCTNAME]
           est: (rows=1665)

 Fragment#3
   distribution: table PUBLIC.USERS in zone "Default"
   executionNodes: [node_1, node_2, node_3]
   partitions: [USERS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
   tree:
     Sender
         distribution: single
         targetFragmentId: 0
         est: (rows=10000)
       TableScan
           table: PUBLIC.USERS
           fieldNames: [USERID, USERNAME]
           est: (rows=10000)
 ----

 where:

 - **Fragment#0** means fragment with id=0
 - A **root** marks a fragment which is considered as root fragment, i.e. a fragment which represents user's cursor
 - A **distribution** attribute provides an insight into which mapping strategy was applied to this particular fragment
 - A **executionNodes** attribute provides a list of nodes this fragment will be executed on
 - A **partitions** attribute provides an insight into which partitions of which tables will be read from which nodes
 - A **tree** attribute specifies which part of the relational tree corresponds to this fragment

 The output above shows how the query is broken into multiple execution fragments and distributed across the cluster. It gives insight into both the logical execution plan and how it maps to the physical topology.

 The query starts execution in *Fragment#0*, which serves as the root of the plan — this is where the final result is produced. It runs on a single node (`node_1`) and contains the main logic of the query, including the projection and two nested hash joins. Instead of scanning tables directly, it receives data from other fragments through `Receiver` operators. These incoming streams correspond to the `REVIEWS`, `PRODUCTS`, and `USERS` tables.

 The actual table scans happen in *Fragments 1 through 3*, each responsible for one of the involved tables. These fragments operate in parallel across the cluster. Each performs a scan on its respective table and then sends the results back to Fragment#0.

 - *Fragment#1* handles the `REVIEWS` table. It runs on all nodes and uses a random distribution strategy. Data is partitioned across nodes, and after scanning the table, results are sent upstream.
 - *Fragment#2* is in charge of the `PRODUCTS` table. It also spans all nodes but follows a zone-based distribution linked to the table's partitioning. There's a filter applied to `PRODUCTNAME`, which limits the amount of data sent to the root.
 - *Fragment#3* covers the `USERS` table. Like the others, it’s distributed and reads from table partitions spread across the cluster.

 Each fragment includes metadata such as the nodes it's executed on, how data is partitioned, and how results are sent between fragments. This layout provides a clear view of not only how the query is logically processed, but also how the workload is split and coordinated in a distributed environment.
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= EXPLAIN Command

	The `EXPLAIN` command is used to display the execution plan of an SQL query, showing how the query will be processed by the sql engine.
	It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step.
	This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions.

	== Syntax

	[.diagram-container]
	Diagram(
	Terminal('EXPLAIN'),
	Optional(
	Sequence(
	Choice(
	0,
	Terminal('PLAN'),
	Terminal('MAPPING')
	),
	Terminal('FOR')
	)
	),
	NonTerminal('query_or_dml')
	)

	If neither `PLAN` nor `MAPPING` is specified, then `PLAN` is implicit.

	Parameters:

	- `PLAN` - explains query in terms of relational operators tree.
	This representation is suitable for investigation of performance issues related to the optimizer.

	- `MAPPING` - explains query in terms of mapping of query fragment to a particular node of the cluster.
	This representation is suitable for investigation of performance issues related to the data colocation.

	Examples:

	[source,sql]
	----
	EXPLAIN SELECT * FROM lineitem;
	EXPLAIN PLAN FOR SELECT * FROM lineitem;
	EXPLAIN MAPPING FOR SELECT * FROM lineitem;
	----

	== Understanding The Output

	Each query plan is represented as a tree-like structure composed of link:sql-reference/explain-operators-list[relational operators].

	A node in the plan includes:

	- A name, indicating the relational operator (e.g., `TableScan`, `IndexScan`, `Sort`, `Join` types)
	- A set of attributes, relevant to that specific operator

	[source,text]
	----
	OperatorName
	attribute1: value1
	attribute2: value2
	----

	Examples:

	[source,text]
	----
	TableScan // Full table access
	table: PUBLIC.EMP
	fieldNames: [NAME, SALARY]
	est: (rows=1)

	IndexScan // Index-based access
	table: PUBLIC.EMP
	index: EMP_NAME_DESC_IDX
	type: SORTED
	fields: [NAME]
	collation: [NAME DESC]
	est: (rows=1)

	Sort // Sort rows
	collation: [NAME DESC NULLS LAST]
	est: (rows=1)
	----

	=== Operator Naming

	The operator name reflects the specific algorithm or strategy used.
	For example:

	- `TableScan` – Full scan of a base table.
	- `IndexScan` – Access via index, possibly sorted.
	- `Sort` – Explicit sorting step.
	- `HashJoin`, `MergeJoin`, `NestedLoopJoin` – Types of join algorithms.
	- `Limit`, `Project`, `Exchange` – Execution-related transformations and controls.

	=== Hierarchical Plan Structure

	The plan is structured as a tree, where:

	- Leaf nodes represent data sources (e.g., `TableScan`)
	- Internal nodes represent data transformations (e.g., `Join`, `Sort`)
	- The root node (topmost) is the final operator that produces the result

	== Examples

	=== Example: Complex Join

	[source,sql]
	----
	EXPLAIN PLAN FOR
	SELECT
	U.UserName, P.ProductName, R.ReviewText, R.Rating
	FROM Users U, Reviews R, Products P
	WHERE U.UserID = R.UserID
	AND R.ProductID = P.ProductID
	AND P.ProductName = 'Product_' \|\| ?::varchar
	----

	The resulting output is:

	[example]
	----
	Project
	fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
	projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
	est: (rows=16650)
	HashJoin
	predicate: =(USERID$0, USERID)
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
	type: inner
	est: (rows=16650)
	HashJoin
	predicate: =(PRODUCTID, PRODUCTID$0)
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
	type: inner
	est: (rows=16650)
	Exchange
	distribution: single
	est: (rows=50000)
	TableScan
	table: PUBLIC.REVIEWS
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
	est: (rows=50000)
	Exchange
	distribution: single
	est: (rows=1665)
	TableScan
	table: PUBLIC.PRODUCTS
	predicate: =(PRODUCTNAME, \|\|(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
	fieldNames: [PRODUCTID, PRODUCTNAME]
	est: (rows=1665)
	Exchange
	distribution: single
	est: (rows=10000)
	TableScan
	table: PUBLIC.USERS
	fieldNames: [USERID, USERNAME]
	est: (rows=10000)
	----

	This execution plan represents a query that joins three tables: `USERS`, `REVIEWS`, and `PRODUCTS`, and selects four fields after filtering by product name.

	* Project (root node):
	Outputs the final selected fields — `USERNAME`, `PRODUCTNAME`, `REVIEWTEXT`, and `RATING`.

	* HashJoins (two levels):
	Perform the inner joins.
	** The first (bottom-most) joins `REVIEWS` with `PRODUCTS` on `PRODUCTID`.
	** The second joins the result with `USERS` on `USERID`.

	* TableScans:
	Each table is scanned:
	** `REVIEWS` is fully scanned.
	** `PRODUCTS` is scanned with a filter on `PRODUCTNAME`.
	** `USERS` is fully scanned.

	* Exchange nodes:
	Indicate data redistribution between operators.

	Each node includes:

	- `fieldNames`: Output columns at that stage.
	- `predicate`: Join or filter condition.
	- `est`: Estimated number of rows at that point in the plan.

	=== Example: Query Mapping

	A result of EXPLAIN MAPPING command includes additional metadata providing insight at how the query is mapped on cluster topology.
	So, for the command like below:

	[source,sql]
	----
	EXPLAIN MAPPING FOR
	SELECT
	U.UserName, P.ProductName, R.ReviewText, R.Rating
	FROM Users U, Reviews R, Products P
	WHERE U.UserID = R.UserID
	AND R.ProductID = P.ProductID
	AND P.ProductName = 'Product_' \|\| ?::varchar
	----

	The resulting output is:

	[example]
	----
	Fragment#0 root
	distribution: single
	executionNodes: [node_1]
	tree:
	Project
	fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
	projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
	est: (rows=1)
	HashJoin
	predicate: =(USERID$0, USERID)
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
	type: inner
	est: (rows=1)
	HashJoin
	predicate: =(PRODUCTID, PRODUCTID$0)
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
	type: inner
	est: (rows=1)
	Receiver
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
	sourceFragmentId: 1
	est: (rows=1)
	Receiver
	fieldNames: [PRODUCTID, PRODUCTNAME]
	sourceFragmentId: 2
	est: (rows=1)
	Receiver
	fieldNames: [USERID, USERNAME]
	sourceFragmentId: 3
	est: (rows=1)

	Fragment#1
	distribution: random
	executionNodes: [node_1, node_2, node_3]
	partitions: [REVIEWS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
	tree:
	Sender
	distribution: single
	targetFragmentId: 0
	est: (rows=50000)
	TableScan
	table: PUBLIC.REVIEWS
	fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
	est: (rows=50000)

	Fragment#2
	distribution: table PUBLIC.PRODUCTS in zone "Default"
	executionNodes: [node_1, node_2, node_3]
	partitions: [PRODUCTS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
	tree:
	Sender
	distribution: single
	targetFragmentId: 0
	est: (rows=1665)
	TableScan
	table: PUBLIC.PRODUCTS
	predicate: =(PRODUCTNAME, \|\|(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
	fieldNames: [PRODUCTID, PRODUCTNAME]
	est: (rows=1665)

	Fragment#3
	distribution: table PUBLIC.USERS in zone "Default"
	executionNodes: [node_1, node_2, node_3]
	partitions: [USERS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
	tree:
	Sender
	distribution: single
	targetFragmentId: 0
	est: (rows=10000)
	TableScan
	table: PUBLIC.USERS
	fieldNames: [USERID, USERNAME]
	est: (rows=10000)
	----

	where:

	- Fragment#0 means fragment with id=0
	- A root marks a fragment which is considered as root fragment, i.e. a fragment which represents user's cursor
	- A distribution attribute provides an insight into which mapping strategy was applied to this particular fragment
	- A executionNodes attribute provides a list of nodes this fragment will be executed on
	- A partitions attribute provides an insight into which partitions of which tables will be read from which nodes
	- A tree attribute specifies which part of the relational tree corresponds to this fragment

	The output above shows how the query is broken into multiple execution fragments and distributed across the cluster. It gives insight into both the logical execution plan and how it maps to the physical topology.

	The query starts execution in Fragment#0, which serves as the root of the plan — this is where the final result is produced. It runs on a single node (`node_1`) and contains the main logic of the query, including the projection and two nested hash joins. Instead of scanning tables directly, it receives data from other fragments through `Receiver` operators. These incoming streams correspond to the `REVIEWS`, `PRODUCTS`, and `USERS` tables.

	The actual table scans happen in Fragments 1 through 3, each responsible for one of the involved tables. These fragments operate in parallel across the cluster. Each performs a scan on its respective table and then sends the results back to Fragment#0.

	- Fragment#1 handles the `REVIEWS` table. It runs on all nodes and uses a random distribution strategy. Data is partitioned across nodes, and after scanning the table, results are sent upstream.
	- Fragment#2 is in charge of the `PRODUCTS` table. It also spans all nodes but follows a zone-based distribution linked to the table's partitioning. There's a filter applied to `PRODUCTNAME`, which limits the amount of data sent to the root.
	- Fragment#3 covers the `USERS` table. Like the others, it’s distributed and reads from table partitions spread across the cluster.

	Each fragment includes metadata such as the nodes it's executed on, how data is partitioned, and how results are sent between fragments. This layout provides a clear view of not only how the query is logically processed, but also how the workload is split and coordinated in a distributed environment.