{/* Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */}
The EXPLAIN command is used to display the execution plan of an SQL query, showing how the query will be processed by the SQL engine. It provides insights into the relational operators used, their configuration, and the estimated number of rows processed at each step. This information is essential for diagnosing performance bottlenecks and understanding query optimization decisions.
{/* Railroad diagram omitted - see 3.1.0 docs */}
If neither PLAN nor MAPPING is specified, then PLAN is implicit.
Parameters:
PLAN - explains query in terms of relational operators tree. This representation is suitable for investigation of performance issues related to the optimizer.
MAPPING - explains query in terms of mapping of query fragment to a particular node of the cluster. This representation is suitable for investigation of performance issues related to the data colocation.
Examples:
EXPLAIN SELECT * FROM lineitem; EXPLAIN PLAN FOR SELECT * FROM lineitem; EXPLAIN MAPPING FOR SELECT * FROM lineitem;
Each query plan is represented as a tree-like structure composed of relational operators.
A node in the plan includes:
TableScan, IndexScan, Sort, Join types)OperatorName attribute1: value1 attribute2: value2
Examples:
TableScan // Full table access table: PUBLIC.EMP fieldNames: [NAME, SALARY] est: (rows=1) IndexScan // Index-based access table: PUBLIC.EMP index: EMP_NAME_DESC_IDX type: SORTED fields: [NAME] collation: [NAME DESC] est: (rows=1) Sort // Sort rows collation: [NAME DESC NULLS LAST] est: (rows=1)
The operator name reflects the specific algorithm or strategy used. For example:
TableScan – Full scan of a base table.IndexScan – Access via index, possibly sorted.Sort – Explicit sorting step.HashJoin, MergeJoin, NestedLoopJoin – Types of join algorithms.Limit, Project, Exchange – Execution-related transformations and controls.The plan is structured as a tree, where:
TableScan)Join, Sort)EXPLAIN PLAN FOR SELECT U.UserName, P.ProductName, R.ReviewText, R.Rating FROM Users U, Reviews R, Products P WHERE U.UserID = R.UserID AND R.ProductID = P.ProductID AND P.ProductName = 'Product_' || ?::varchar
The resulting output is:
Project
fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
est: (rows=16650)
HashJoin
predicate: =(USERID$0, USERID)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
type: inner
est: (rows=16650)
HashJoin
predicate: =(PRODUCTID, PRODUCTID$0)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
type: inner
est: (rows=16650)
Exchange
distribution: single
est: (rows=50000)
TableScan
table: PUBLIC.REVIEWS
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
est: (rows=50000)
Exchange
distribution: single
est: (rows=1665)
TableScan
table: PUBLIC.PRODUCTS
predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
fieldNames: [PRODUCTID, PRODUCTNAME]
est: (rows=1665)
Exchange
distribution: single
est: (rows=10000)
TableScan
table: PUBLIC.USERS
fieldNames: [USERID, USERNAME]
est: (rows=10000)
This execution plan represents a query that joins three tables: USERS, REVIEWS, and PRODUCTS, and selects four fields after filtering by product name.
Project (root node): Outputs the final selected fields — USERNAME, PRODUCTNAME, REVIEWTEXT, and RATING.
HashJoins (two levels): Perform the inner joins.
REVIEWS with PRODUCTS on PRODUCTID.USERS on USERID.TableScans: Each table is scanned:
REVIEWS is fully scanned.PRODUCTS is scanned with a filter on PRODUCTNAME.USERS is fully scanned.Exchange nodes: Indicate data redistribution between operators.
Each node includes:
fieldNames: Output columns at that stage.predicate: Join or filter condition.est: Estimated number of rows at that point in the plan.A result of the EXPLAIN MAPPING command includes additional metadata providing insight into how the query is mapped on cluster topology. So, for the command like below:
EXPLAIN MAPPING FOR SELECT U.UserName, P.ProductName, R.ReviewText, R.Rating FROM Users U, Reviews R, Products P WHERE U.UserID = R.UserID AND R.ProductID = P.ProductID AND P.ProductName = 'Product_' || ?::varchar
The resulting output is:
Fragment#0 root
distribution: single
executionNodes: [node_1]
tree:
Project
fieldNames: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
projection: [USERNAME, PRODUCTNAME, REVIEWTEXT, RATING]
est: (rows=1)
HashJoin
predicate: =(USERID$0, USERID)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME, USERID$0, USERNAME]
type: inner
est: (rows=1)
HashJoin
predicate: =(PRODUCTID, PRODUCTID$0)
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING, PRODUCTID$0, PRODUCTNAME]
type: inner
est: (rows=1)
Receiver
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
sourceFragmentId: 1
est: (rows=1)
Receiver
fieldNames: [PRODUCTID, PRODUCTNAME]
sourceFragmentId: 2
est: (rows=1)
Receiver
fieldNames: [USERID, USERNAME]
sourceFragmentId: 3
est: (rows=1)
Fragment#1
distribution: random
executionNodes: [node_1, node_2, node_3]
partitions: [REVIEWS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
tree:
Sender
distribution: single
targetFragmentId: 0
est: (rows=50000)
TableScan
table: PUBLIC.REVIEWS
fieldNames: [PRODUCTID, USERID, REVIEWTEXT, RATING]
est: (rows=50000)
Fragment#2
distribution: table PUBLIC.PRODUCTS in zone "Default"
executionNodes: [node_1, node_2, node_3]
partitions: [PRODUCTS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
tree:
Sender
distribution: single
targetFragmentId: 0
est: (rows=1665)
TableScan
table: PUBLIC.PRODUCTS
predicate: =(PRODUCTNAME, ||(_UTF-8'Product_', CAST(?0):VARCHAR CHARACTER SET "UTF-8"))
fieldNames: [PRODUCTID, PRODUCTNAME]
est: (rows=1665)
Fragment#3
distribution: table PUBLIC.USERS in zone "Default"
executionNodes: [node_1, node_2, node_3]
partitions: [USERS=[node_1={0, 2, 5, 6, 7, 8, 9, 10, 12, 13, 20}, node_2={1, 3, 11, 19, 21, 22, 23, 24}, node_3={4, 14, 15, 16, 17, 18}]]
tree:
Sender
distribution: single
targetFragmentId: 0
est: (rows=10000)
TableScan
table: PUBLIC.USERS
fieldNames: [USERID, USERNAME]
est: (rows=10000)
where:
The output above shows how the query is broken into multiple execution fragments and distributed across the cluster. It gives insight into both the logical execution plan and how it maps to the physical topology.
The query starts execution in Fragment#0, which serves as the root of the plan — this is where the final result is produced. It runs on a single node (node_1) and contains the main logic of the query, including the projection and two nested hash joins. Instead of scanning tables directly, it receives data from other fragments through Receiver operators. These incoming streams correspond to the REVIEWS, PRODUCTS, and USERS tables.
The actual table scans happen in Fragments 1 through 3, each responsible for one of the involved tables. These fragments operate in parallel across the cluster. Each performs a scan on its respective table and then sends the results back to Fragment#0.
REVIEWS table. It runs on all nodes and uses a random distribution strategy. Data is partitioned across nodes, and after scanning the table, results are sent upstream.PRODUCTS table. It also spans all nodes but follows a zone-based distribution linked to the table‘s partitioning. There’s a filter applied to PRODUCTNAME, which limits the amount of data sent to the root.USERS table. Like the others, it's distributed and reads from table partitions spread across the cluster.Each fragment includes metadata such as the nodes it's executed on, how data is partitioned, and how results are sent between fragments. This layout provides a clear view of not only how the query is logically processed, but also how the workload is split and coordinated in a distributed environment.