docs/_docs/sql-tuning/using-explain.adoc - ignite-3 - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = How to Improve Queries With EXPLAIN Command

 The SQL  command is a powerful tool used to analyze and understand the execution plan of a query without actually executing it.

 When you use the EXPLAIN command, it returns the query execution plan, which includes details such as:

 - The order in which tables are accessed.
 - The type of join operations used (for example, nested loops, hash joins, or merge joins).
 - Any indexes that are used to speed up data retrieval.
 - Estimated costs and row counts for different parts of the query.

 This information is crucial for optimizing query performance, identifying bottlenecks, and making informed decisions about database schema design and indexing strategies.

 SQL EXPLAIN output analysis can help you optimize slow query execution. You can avoid common bottlenecks in SQL execution by following these guidelines:

 - Avoid scanning an entire table.
 - Avoid scanning non-optimal indexes.
 - Avoid suboptimal join ordering or join algorithm.
 - Ensure optimal data colocation for your queries.

 In the following sections, we will see some common issues with queries and ways to identify and fix them.

 == Full Scan Instead of Index Scan

 Suppose related sql execution flow looks like:

 [source,sql]
 ----
 CREATE TABLE t (id INT PRIMARY KEY, col1 VARCHAR);
 CREATE INDEX t_col1_idx ON t(col1);

 SELECT id FROM t WHERE col1 = '1';
 ----

 And possible EXPLAIN output:

 [source,sql]
 ----
    TableScan
        table: PUBLIC.T
        predicate: =(COL1, _UTF-8'1')
        fieldNames: [ID]
        est: (rows=1)
 ----

 NOTE: For simplicity, here and below, information that is not related to the example is omitted from the EXPLAIN output.

 We can see a full scan (*TableScan* operator) with predicate.
 The execution planner chooses which scan implementation (**TableScan** or **IndexScan**) to use.
 If you expect that index scan is preferable, you can use the `FORCE_INDEX` hint to manually force `IndexScan` approach:

 [source,sql]
 ----
 SELECT /*+ FORCE_INDEX(t_col1_idx) */ id FROM t WHERE col1 = '1';
 ----

 Will show a different plan, like:

 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_IDX
        type: SORTED
        predicate: =(COL1, _UTF-8'1')
        searchBounds: [ExactBounds [bound=_UTF-8'1']]
        ieldNames: [ID]
        collation: []
        est: (rows=1)
 ----

 == Suboptimal Indexes

 Indexes with less prediction can be chosen, for example schema and query may look as follows:

 [source,sql]
 ----
 CREATE TABLE t (id INT PRIMARY KEY, col1 VARCHAR, col2 VARCHAR);
 CREATE INDEX t_col1_col2_idx ON T(col1, col2);
 CREATE INDEX t_col1_idx ON t(col1);

 SELECT id FROM t WHERE col1 = '1' AND col2 = '2';
 ----

 and a possible plan would be:

 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_IDX
        ...
 ----

 We can see that the execution uses the **T_COL1_IDX** index, through both predicates **COL1 = '1' AND COL2 = '2'** are involved and the **T_COL1_COL2_IDX** is preferable. In this case, the optimal plan would be:

 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_COL2_IDX
        ...
 ----

 You can also use the `FORCE_INDEX` hint to achieve this:

 [source,sql]
 ----
 SELECT /*+ FORCE_INDEX(t_col1_col2_idx) */ id FROM t WHERE col1 = '1' AND col2 = '2';
 ----

 == Unexpected Sort Operation

 By default, sorted indexes store their entries in ascending order.
 You can adjust the ordering of a sorted index by including the options ASC or DESC.

 Let's suppose the schema and related query look like this:

 [source,sql]
 ----
 CREATE TABLE t1 (id INT PRIMARY KEY, col1 VARCHAR);
 CREATE TABLE t2 (id INT PRIMARY KEY, col1 VARCHAR);
 CREATE INDEX t1_col1_desc_idx ON t1(col1 DESC);
 CREATE INDEX t2_col1_idx ON t2(col1);

 SELECT t1.id as t1id, t2.id as t2id FROM t1 JOIN t2 USING (col1);
 ----

 And the possible execution plan looks like this:

 [source,sql]
 ----
    MergeJoin
          ...
        Sort
            collation: [COL1 ASC]
            ...
        IndexScan
            index: T2_COL1_IDX
            ...
 ----

 In the example above, the planner adds the **Sort** operation before performing the **IndexScan**,
 as the index used is sorted in descending order, while ascending order is required.

 Extra **Sort** operations adds performance costs, and we can avoid it by creating an index with the appropriate sort ordering:

 [source,sql]
 ----
 CREATE INDEX t1_col1_idx ON t1(col1);
 ----

 And plan will no longer display the **Sort** operation, improving query execution speed:

 [source,sql]
 ----
    MergeJoin
          ...
        IndexScan
            index: T1_COL1_IDX
            ...
        IndexScan
            index: T2_COL1_IDX
            ...
 ----

 == Performance Impact of Correlated Subqueries

 The SQL-99 standard allows for nested subqueries at nearly all places within a query, so Ignite 3 supports nested subqueries, both correlated and not.
 Performance of certain complex correlated subqueries may be insufficient. Let`s consider a correlated query:

 [source,sql]
 ----
 CREATE TABLE emp(dept_id INTEGER PRIMARY KEY, name VARCHAR, salary INTEGER);
 CREATE TABLE dept(id INTEGER PRIMARY KEY, name VARCHAR);

 SELECT emp.name, (SELECT dept.name FROM dept WHERE emp.dept_id=dept.id)
 FROM emp
 WHERE emp.salary > 1000;
 ----

 We can see nested correlated subquery here, lets check the plan:

 [source,sql]
 ----
    CorrelatedNestedLoopJoin
          ...
        TableScan
            table: PUBLIC.EMP
            predicate: >(SALARY, 1000)
            ...
      ColocatedHashAggregate
          ...
          TableScan
              table: PUBLIC.DEPT
              predicate: =($cor1.DEPT_ID, ID)
              ...
 ----

 The example above shows the slow **CorrelatedNestedLoopJoin** operation. Queries with this operation may cause a number of issues:

 - Such subqueries may become bottlenecks.
 - Queries can cause high CPU load.
 - Certain queries may perform slower than expected.

 If performance issues are found in similar queries, it would be more
 efficient to rewrite the query without nested subqueries, for example:

 [source,sql]
 ----
 SELECT emp.name, dept.name
 FROM emp, dept
 WHERE emp.salary > 1000 AND emp.dept_id=dept.id;
 ----
 And new plan becomes:
 [source,sql]
 ----
      HashJoin
          predicate: =(DEPT_ID, ID)
          ...
        TableScan
            table: PUBLIC.EMP
            predicate: >(SALARY, 1000)
            ...
          TableScan
              table: PUBLIC.DEPT
              ...
 ----

 Without the **CorrelatedNestedLoopJoin** operation, the query should perform much better than the previous one.

 == Excessive Sorting

 Lets explain we have an index involved two columns one of them is participate in predicate and other in ordering, or in sql terms:

 [source,sql]
 ----
 CREATE TABLE emp(dept_id INTEGER PRIMARY KEY, name VARCHAR, salary INTEGER);
 CREATE INDEX emp_salary_name_idx ON emp(salary, name);

 SELECT dept_id FROM emp WHERE salary = 1 ORDER BY name;
 ----

 Expectations:

 - Index need to be used here.
 - No additional sort is needed because index is ordered by **name** column is satisfies initial query ordering.

 But the real execution plan shows a different result:

 [source,sql]
 ----
      Sort
          collation: [NAME ASC]
          ...
        TableScan
            table: PUBLIC.EMP
            predicate: =(SALARY, 1)
            ...
 ----

 We can see a redundant **Sort** operator. A bit query refactoring can help to avoid excessive sorting:

 [source,sql]
 ----
 SELECT dept_id FROM emp WHERE salary = 1 ORDER BY salary, name;
 ----

 And the plan becomes as follows:

 [source,sql]
 ----
      IndexScan
          table: PUBLIC.EMP
          index: EMP_SALARY_NAME_IDX
          predicate: =(SALARY, 1)
          ...
 ----

 == Select Count Optimization

 Some queries can be optimized to use more optimal plans which brings performance speed up. For example, plan for:

 [source,sql]
 ----
 SELECT COUNT(*) FROM emp;
 ----

 Can look like this:

 [source,sql]
 ----
  SelectCount
      table: PUBLIC.EMP
      est: (rows=43)
      ...
 ----

 But there are numerous cases where such optimization is not applicable. In such a cases, a plan can be different and the execution may require more time.

 The same query as above, but with explicit transaction may produce a different plan, for example:

 [source,sql]
 ----
    ReduceSortAggregate
        ...
        MapSortAggregate
            ...
          TableScan
              table: PUBLIC.EMP
              est: (rows=43)
              ...
 ----

 == Index Scan Without Exact Search Bounds

 Table scans are available in two implementations: direct table scan and scan through index.
 Index scans contain predicate and search bounds.
 Predicate provides final rows comparison. If search bounds are absentm the query degenerates into table scan through index scan (requiring an additional store look up), with further predicate comparison, that incurs additional performance overhead costs.

 Let`s suppose we have schema and query like this:

 [source,sql]
 ----
 CREATE TABLE t (id INTEGER PRIMARY KEY, col1 DECIMAL(5, 3));
 CREATE INDEX t_col1_idx ON t(col1);

 SELECT id FROM t WHERE col1 = 43;
 ----

 And possible plan would look like this:

 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_IDX
        predicate: =(CAST(COL1):DECIMAL(13, 3), 43.000)
        ...
 ----

 We can see here - only **predicate** (and no **searchBounds**) which means that **all** rows from index will go through predicate and bring additional performance penalty.

 Two type of solutions are possible here:

 - You can prohibit suboptimal index usage;
 - You can explicitly help the planner with type derivation.

 === Prohibit Index Usage

 For the first approach, use the **NO_INDEX** hint to prohibit index usage:

 [source,sql]
 ----
 SELECT /*+ NO_INDEX */ id FROM t WHERE col1 = 43;

 -- or with direct index mention:

 SELECT /*+ NO_INDEX(t_col1_idx) */ id FROM t WHERE col1 = 43;
 ----

 As a result, you will have a plan similar to this:

 [source,sql]
 ----
    TableScan
        table: PUBLIC.T
        predicate: =(CAST(COL1):DECIMAL(13, 3), 43.000)
        ...
 ----

 === Manual Type Casting

 You can append additional cast to the same query to explicitly cast data as a specific type:

 [source,sql]
 ----
 SELECT id FROM t WHERE col1 = 43::DECIMAL(5, 3);
 ----

 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_IDX
        predicate: =(COL1, 43.000)
        searchBounds: [ExactBounds [bound=43.000:DECIMAL(5, 3)]]
        ...
 ----

 We can see here both **searchBounds** and **predicate** which means that only exact lookup through index will be involved.


 The same case as above but for a bit complicated query:

 [source,sql]
 ----
 CREATE TABLE t (id INT PRIMARY KEY, col1 INT);
 CREATE INDEX t_col1_asc_idx ON t (col1);

 SELECT * FROM t WHERE col1::varchar = SUBSTR(CURRENT_DATE::varchar, 4);
 ----

 Possible plan:
 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_IDX
        predicate: =(CAST(COL1):VARCHAR CHARACTER SET "UTF-8", SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4))
        ...
 ----
 And we also can see that no **search bounds** are involved here.

 Try to change it like:

 [source,sql]
 ----
 SELECT * FROM t WHERE col1 = SUBSTR(CURRENT_DATE::varchar, 4)::int;
 ----

 And the possible plan will become:
 [source,sql]
 ----
    IndexScan
        table: PUBLIC.T
        index: T_COL1_ASC_IDX
        predicate: =(COL1, CAST(SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4)):INTEGER NOT NULL)
        searchBounds: [ExactBounds [bound=CAST(SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4)):INTEGER]]
        ...
 ----
 We can see that **searchBounds** are present - thus more productive execution flow is expected here.


 == Colocation Usage

 As mentioned above, right colocated columns choice plays a significant role in query execution performance.
 For example, if initially tables are created without any thoughts about further usage columns colocation you can have the following scenario:

 [source,sql]
 ----
 -- by default, the table is implicitly colocated by PRIMARY KEY
 CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name));

 -- implicitly colocated by PRIMARY KEY
 CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(name, id));
 ----
 And query as follows:
 [source,sql]
 ----
 SELECT emp.name, dept.name FROM emp JOIN dept ON emp.dept_id = dept.id AND emp.salary > 1000;
 ----
 Bring plan like:
 [source,sql]
 ----
    HashJoin
        predicate: =(DEPT_ID, ID)
        ...
      Exchange
          ...
        TableScan
            table: PUBLIC.EMP
            ...
      Exchange
          ...
        TableScan
            table: PUBLIC.DEPT
            ...
 ----

 We can see two **Exchange** operators, which means that all rows are transferred into a single node and then are joined.
 This execution flow brings a performance cost and slows down query execution.

 Let`s try to improve it by adding explicit colocation for the **dept** table by the **ID** column:

 [source,sql]
 ----
 -- implicitly colocated by PRIMARY KEY
 CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name));
 -- explicitly colocated by ID
 CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(name, id)) COLOCATE BY (id);
 ----

 Now the dependent rows from **emp** table are transferred into the appropriate
 node where **dept** holds the rows according to **DEPT.ID** distribution:

 [source,sql]
 ----
      HashJoin
          predicate: =(DEPT_ID, ID)
          ...
        Exchange
            distribution: table PUBLIC.DEPT in zone "Default" by [DEPT_ID]
            ...
          TableScan
              table: PUBLIC.EMP
              ...
        TableScan
            table: PUBLIC.DEPT
            ...
 ----

 Only one **Exchange** operator for now, which, once again, mean only rows transferring
 from **emp** table to appropriate **dept** one.

 And finally, both join predicate related columns are colocated:

 NOTE: The following colocation example will only work if the **emp** and **dept** tables belong to the same distribution zone.

 [source,sql]
 ----
 -- explicitly colocated by DEPT_ID
 CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name)) COLOCATE BY(dept_id);
 -- explicitly colocated by ID
 CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(id, name)) COLOCATE BY(id);
 ----

 Now, the **emp** and **dept** tables are both colocated.

 And the final plan will look like this:

 [source,sql]
 ----
      HashJoin
         predicate: =(DEPT_ID, ID)
          ...
        TableScan
            table: PUBLIC.EMP
            ...
        TableScan
            table: PUBLIC.DEPT
            ...
 ----

 No **Exchange** operators are involved in the explanation, which means that no excessive rows transfer has occurred.
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= How to Improve Queries With EXPLAIN Command

	The SQL command is a powerful tool used to analyze and understand the execution plan of a query without actually executing it.

	When you use the EXPLAIN command, it returns the query execution plan, which includes details such as:

	- The order in which tables are accessed.
	- The type of join operations used (for example, nested loops, hash joins, or merge joins).
	- Any indexes that are used to speed up data retrieval.
	- Estimated costs and row counts for different parts of the query.

	This information is crucial for optimizing query performance, identifying bottlenecks, and making informed decisions about database schema design and indexing strategies.

	SQL EXPLAIN output analysis can help you optimize slow query execution. You can avoid common bottlenecks in SQL execution by following these guidelines:

	- Avoid scanning an entire table.
	- Avoid scanning non-optimal indexes.
	- Avoid suboptimal join ordering or join algorithm.
	- Ensure optimal data colocation for your queries.

	In the following sections, we will see some common issues with queries and ways to identify and fix them.

	== Full Scan Instead of Index Scan

	Suppose related sql execution flow looks like:

	[source,sql]
	----
	CREATE TABLE t (id INT PRIMARY KEY, col1 VARCHAR);
	CREATE INDEX t_col1_idx ON t(col1);

	SELECT id FROM t WHERE col1 = '1';
	----

	And possible EXPLAIN output:

	[source,sql]
	----
	TableScan
	table: PUBLIC.T
	predicate: =(COL1, _UTF-8'1')
	fieldNames: [ID]
	est: (rows=1)
	----

	NOTE: For simplicity, here and below, information that is not related to the example is omitted from the EXPLAIN output.

	We can see a full scan (TableScan operator) with predicate.
	The execution planner chooses which scan implementation (TableScan or IndexScan) to use.
	If you expect that index scan is preferable, you can use the `FORCE_INDEX` hint to manually force `IndexScan` approach:

	[source,sql]
	----
	SELECT /+ FORCE_INDEX(t_col1_idx) / id FROM t WHERE col1 = '1';
	----

	Will show a different plan, like:

	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_IDX
	type: SORTED
	predicate: =(COL1, _UTF-8'1')
	searchBounds: [ExactBounds [bound=_UTF-8'1']]
	ieldNames: [ID]
	collation: []
	est: (rows=1)
	----

	== Suboptimal Indexes

	Indexes with less prediction can be chosen, for example schema and query may look as follows:

	[source,sql]
	----
	CREATE TABLE t (id INT PRIMARY KEY, col1 VARCHAR, col2 VARCHAR);
	CREATE INDEX t_col1_col2_idx ON T(col1, col2);
	CREATE INDEX t_col1_idx ON t(col1);

	SELECT id FROM t WHERE col1 = '1' AND col2 = '2';
	----

	and a possible plan would be:

	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_IDX
	...
	----

	We can see that the execution uses the T_COL1_IDX index, through both predicates COL1 = '1' AND COL2 = '2' are involved and the T_COL1_COL2_IDX is preferable. In this case, the optimal plan would be:

	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_COL2_IDX
	...
	----

	You can also use the `FORCE_INDEX` hint to achieve this:

	[source,sql]
	----
	SELECT /+ FORCE_INDEX(t_col1_col2_idx) / id FROM t WHERE col1 = '1' AND col2 = '2';
	----

	== Unexpected Sort Operation

	By default, sorted indexes store their entries in ascending order.
	You can adjust the ordering of a sorted index by including the options ASC or DESC.

	Let's suppose the schema and related query look like this:

	[source,sql]
	----
	CREATE TABLE t1 (id INT PRIMARY KEY, col1 VARCHAR);
	CREATE TABLE t2 (id INT PRIMARY KEY, col1 VARCHAR);
	CREATE INDEX t1_col1_desc_idx ON t1(col1 DESC);
	CREATE INDEX t2_col1_idx ON t2(col1);

	SELECT t1.id as t1id, t2.id as t2id FROM t1 JOIN t2 USING (col1);
	----

	And the possible execution plan looks like this:

	[source,sql]
	----
	MergeJoin
	...
	Sort
	collation: [COL1 ASC]
	...
	IndexScan
	index: T2_COL1_IDX
	...
	----

	In the example above, the planner adds the Sort operation before performing the IndexScan,
	as the index used is sorted in descending order, while ascending order is required.

	Extra Sort operations adds performance costs, and we can avoid it by creating an index with the appropriate sort ordering:

	[source,sql]
	----
	CREATE INDEX t1_col1_idx ON t1(col1);
	----

	And plan will no longer display the Sort operation, improving query execution speed:

	[source,sql]
	----
	MergeJoin
	...
	IndexScan
	index: T1_COL1_IDX
	...
	IndexScan
	index: T2_COL1_IDX
	...
	----

	== Performance Impact of Correlated Subqueries

	The SQL-99 standard allows for nested subqueries at nearly all places within a query, so Ignite 3 supports nested subqueries, both correlated and not.
	Performance of certain complex correlated subqueries may be insufficient. Let`s consider a correlated query:

	[source,sql]
	----
	CREATE TABLE emp(dept_id INTEGER PRIMARY KEY, name VARCHAR, salary INTEGER);
	CREATE TABLE dept(id INTEGER PRIMARY KEY, name VARCHAR);

	SELECT emp.name, (SELECT dept.name FROM dept WHERE emp.dept_id=dept.id)
	FROM emp
	WHERE emp.salary > 1000;
	----

	We can see nested correlated subquery here, lets check the plan:

	[source,sql]
	----
	CorrelatedNestedLoopJoin
	...
	TableScan
	table: PUBLIC.EMP
	predicate: >(SALARY, 1000)
	...
	ColocatedHashAggregate
	...
	TableScan
	table: PUBLIC.DEPT
	predicate: =($cor1.DEPT_ID, ID)
	...
	----

	The example above shows the slow CorrelatedNestedLoopJoin operation. Queries with this operation may cause a number of issues:

	- Such subqueries may become bottlenecks.
	- Queries can cause high CPU load.
	- Certain queries may perform slower than expected.

	If performance issues are found in similar queries, it would be more
	efficient to rewrite the query without nested subqueries, for example:

	[source,sql]
	----
	SELECT emp.name, dept.name
	FROM emp, dept
	WHERE emp.salary > 1000 AND emp.dept_id=dept.id;
	----
	And new plan becomes:
	[source,sql]
	----
	HashJoin
	predicate: =(DEPT_ID, ID)
	...
	TableScan
	table: PUBLIC.EMP
	predicate: >(SALARY, 1000)
	...
	TableScan
	table: PUBLIC.DEPT
	...
	----

	Without the CorrelatedNestedLoopJoin operation, the query should perform much better than the previous one.

	== Excessive Sorting

	Lets explain we have an index involved two columns one of them is participate in predicate and other in ordering, or in sql terms:

	[source,sql]
	----
	CREATE TABLE emp(dept_id INTEGER PRIMARY KEY, name VARCHAR, salary INTEGER);
	CREATE INDEX emp_salary_name_idx ON emp(salary, name);

	SELECT dept_id FROM emp WHERE salary = 1 ORDER BY name;
	----

	Expectations:

	- Index need to be used here.
	- No additional sort is needed because index is ordered by name column is satisfies initial query ordering.

	But the real execution plan shows a different result:

	[source,sql]
	----
	Sort
	collation: [NAME ASC]
	...
	TableScan
	table: PUBLIC.EMP
	predicate: =(SALARY, 1)
	...
	----

	We can see a redundant Sort operator. A bit query refactoring can help to avoid excessive sorting:

	[source,sql]
	----
	SELECT dept_id FROM emp WHERE salary = 1 ORDER BY salary, name;
	----

	And the plan becomes as follows:

	[source,sql]
	----
	IndexScan
	table: PUBLIC.EMP
	index: EMP_SALARY_NAME_IDX
	predicate: =(SALARY, 1)
	...
	----

	== Select Count Optimization

	Some queries can be optimized to use more optimal plans which brings performance speed up. For example, plan for:

	[source,sql]
	----
	SELECT COUNT(*) FROM emp;
	----

	Can look like this:

	[source,sql]
	----
	SelectCount
	table: PUBLIC.EMP
	est: (rows=43)
	...
	----

	But there are numerous cases where such optimization is not applicable. In such a cases, a plan can be different and the execution may require more time.

	The same query as above, but with explicit transaction may produce a different plan, for example:

	[source,sql]
	----
	ReduceSortAggregate
	...
	MapSortAggregate
	...
	TableScan
	table: PUBLIC.EMP
	est: (rows=43)
	...
	----

	== Index Scan Without Exact Search Bounds

	Table scans are available in two implementations: direct table scan and scan through index.
	Index scans contain predicate and search bounds.
	Predicate provides final rows comparison. If search bounds are absentm the query degenerates into table scan through index scan (requiring an additional store look up), with further predicate comparison, that incurs additional performance overhead costs.

	Let`s suppose we have schema and query like this:

	[source,sql]
	----
	CREATE TABLE t (id INTEGER PRIMARY KEY, col1 DECIMAL(5, 3));
	CREATE INDEX t_col1_idx ON t(col1);

	SELECT id FROM t WHERE col1 = 43;
	----

	And possible plan would look like this:

	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_IDX
	predicate: =(CAST(COL1):DECIMAL(13, 3), 43.000)
	...
	----

	We can see here - only predicate (and no searchBounds) which means that all rows from index will go through predicate and bring additional performance penalty.

	Two type of solutions are possible here:

	- You can prohibit suboptimal index usage;
	- You can explicitly help the planner with type derivation.

	=== Prohibit Index Usage

	For the first approach, use the NO_INDEX hint to prohibit index usage:

	[source,sql]
	----
	SELECT /+ NO_INDEX / id FROM t WHERE col1 = 43;

	-- or with direct index mention:

	SELECT /+ NO_INDEX(t_col1_idx) / id FROM t WHERE col1 = 43;
	----

	As a result, you will have a plan similar to this:

	[source,sql]
	----
	TableScan
	table: PUBLIC.T
	predicate: =(CAST(COL1):DECIMAL(13, 3), 43.000)
	...
	----

	=== Manual Type Casting

	You can append additional cast to the same query to explicitly cast data as a specific type:

	[source,sql]
	----
	SELECT id FROM t WHERE col1 = 43::DECIMAL(5, 3);
	----

	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_IDX
	predicate: =(COL1, 43.000)
	searchBounds: [ExactBounds [bound=43.000:DECIMAL(5, 3)]]
	...
	----

	We can see here both searchBounds and predicate which means that only exact lookup through index will be involved.


	The same case as above but for a bit complicated query:

	[source,sql]
	----
	CREATE TABLE t (id INT PRIMARY KEY, col1 INT);
	CREATE INDEX t_col1_asc_idx ON t (col1);

	SELECT * FROM t WHERE col1::varchar = SUBSTR(CURRENT_DATE::varchar, 4);
	----

	Possible plan:
	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_IDX
	predicate: =(CAST(COL1):VARCHAR CHARACTER SET "UTF-8", SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4))
	...
	----
	And we also can see that no search bounds are involved here.

	Try to change it like:

	[source,sql]
	----
	SELECT * FROM t WHERE col1 = SUBSTR(CURRENT_DATE::varchar, 4)::int;
	----

	And the possible plan will become:
	[source,sql]
	----
	IndexScan
	table: PUBLIC.T
	index: T_COL1_ASC_IDX
	predicate: =(COL1, CAST(SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4)):INTEGER NOT NULL)
	searchBounds: [ExactBounds [bound=CAST(SUBSTR(CAST(CURRENT_DATE):VARCHAR CHARACTER SET "UTF-8" NOT NULL, 4)):INTEGER]]
	...
	----
	We can see that searchBounds are present - thus more productive execution flow is expected here.


	== Colocation Usage

	As mentioned above, right colocated columns choice plays a significant role in query execution performance.
	For example, if initially tables are created without any thoughts about further usage columns colocation you can have the following scenario:

	[source,sql]
	----
	-- by default, the table is implicitly colocated by PRIMARY KEY
	CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name));

	-- implicitly colocated by PRIMARY KEY
	CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(name, id));
	----
	And query as follows:
	[source,sql]
	----
	SELECT emp.name, dept.name FROM emp JOIN dept ON emp.dept_id = dept.id AND emp.salary > 1000;
	----
	Bring plan like:
	[source,sql]
	----
	HashJoin
	predicate: =(DEPT_ID, ID)
	...
	Exchange
	...
	TableScan
	table: PUBLIC.EMP
	...
	Exchange
	...
	TableScan
	table: PUBLIC.DEPT
	...
	----

	We can see two Exchange operators, which means that all rows are transferred into a single node and then are joined.
	This execution flow brings a performance cost and slows down query execution.

	Let`s try to improve it by adding explicit colocation for the dept table by the ID column:

	[source,sql]
	----
	-- implicitly colocated by PRIMARY KEY
	CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name));
	-- explicitly colocated by ID
	CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(name, id)) COLOCATE BY (id);
	----

	Now the dependent rows from emp table are transferred into the appropriate
	node where dept holds the rows according to DEPT.ID distribution:

	[source,sql]
	----
	HashJoin
	predicate: =(DEPT_ID, ID)
	...
	Exchange
	distribution: table PUBLIC.DEPT in zone "Default" by [DEPT_ID]
	...
	TableScan
	table: PUBLIC.EMP
	...
	TableScan
	table: PUBLIC.DEPT
	...
	----

	Only one Exchange operator for now, which, once again, mean only rows transferring
	from emp table to appropriate dept one.

	And finally, both join predicate related columns are colocated:

	NOTE: The following colocation example will only work if the emp and dept tables belong to the same distribution zone.

	[source,sql]
	----
	-- explicitly colocated by DEPT_ID
	CREATE TABLE emp(dept_id INTEGER, name VARCHAR, salary INTEGER, PRIMARY KEY(dept_id, name)) COLOCATE BY(dept_id);
	-- explicitly colocated by ID
	CREATE TABLE dept(id INTEGER, name VARCHAR, PRIMARY KEY(id, name)) COLOCATE BY(id);
	----

	Now, the emp and dept tables are both colocated.

	And the final plan will look like this:

	[source,sql]
	----
	HashJoin
	predicate: =(DEPT_ID, ID)
	...
	TableScan
	table: PUBLIC.EMP
	...
	TableScan
	table: PUBLIC.DEPT
	...
	----

	No Exchange operators are involved in the explanation, which means that no excessive rows transfer has occurred.