blob: 9a7a58b127adfcd29c2771858ee46f31fab4b6cc [file] [log] [blame]
---
title: GPORCA Features and Enhancements
---
GPORCA includes enhancements for specific types of queries and operations:
- [Queries Against Partitioned Tables](#topic_dwy_zml_gr)
- [Queries that Contain Subqueries](#topic_vph_wml_gr)
- [Queries that Contain Common Table Expressions](#topic_c3v_rml_gr)
- [DML Operation Enhancements with GPORCA](#topic_plx_mml_gr)
- [Queries with Distinct Qualified Aggregates (DQA)](#topic_anl_t3t_pv)
GPORCA also includes these optimization enhancements:
- Improved join ordering
- Join-Aggregate reordering
- Sort order optimization
- Data skew estimates included in query optimization
## Queries Against Partitioned Tables<a id="topic_dwy_zml_gr"></a>
GPORCA includes these enhancements for queries against partitioned tables:
- Partition elimination is improved.
- Query plan can contain the `Partition selector` operator.
- Partitions are not enumerated in `EXPLAIN` plans.
For queries that involve static partition selection where the partitioning key is compared to a constant, GPORCA lists the number of partitions to be scanned in the `EXPLAIN` output under the Partition Selector operator. This example Partition Selector operator shows the filter and number of partitions selected:
``` pre
Partition Selector for Part_Table (dynamic scan id: 1)
Filter: a > 10
Partitions selected: 1 (out of 3)
```
For queries that involve dynamic partition selection where the partitioning key is compared to a variable, the number of partitions that are scanned will be known only during query execution. The partitions selected are not shown in the `EXPLAIN` output.
- Plan size is independent of number of partitions.
- Out of memory errors caused by number of partitions are reduced.
This example `CREATE TABLE` command creates a range partitioned table.
``` pre
CREATE TABLE sales(order_id int, item_id int, amount numeric(15,2),
date date, yr_qtr int)
range partitioned by yr_qtr;
```
GPORCA improves on these types of queries against partitioned tables:
- Full table scan. Partitions are not enumerated in plans.
``` pre
SELECT * FROM sales;
```
- Query with a constant filter predicate. Partition elimination is performed.
``` pre
SELECT * FROM sales WHERE yr_qtr = 201201;
```
- Range selection. Partition elimination is performed.
``` pre
SELECT * FROM sales WHERE yr_qtr BETWEEN 201301 AND 201404 ;
```
- Joins involving partitioned tables. In this example, the partitioned dimension table *date\_dim* is joined with fact table *catalog\_sales*:
``` pre
SELECT * FROM catalog_sales
WHERE date_id IN (SELECT id FROM date_dim WHERE month=12);
```
## Queries that Contain Subqueries<a id="topic_vph_wml_gr"></a>
GPORCA handles subqueries more efficiently. A subquery is query that is nested inside an outer query block. In the following query, the `SELECT` in the `WHERE` clause is a subquery.
``` pre
SELECT * FROM part
WHERE price > (SELECT avg(price) FROM part);
```
GPORCA also handles queries that contain a correlated subquery (CSQ) more efficiently. A correlated subquery is a subquery that uses values from the outer query. In the following query, the `price` column is used in both the outer query and the subquery.
``` pre
SELECT * FROM part p1
WHERE price > (SELECT avg(price) FROM part p2
WHERE p2.brand = p1.brand);
```
GPORCA generates more efficient plans for the following types of subqueries:
- CSQ in the `SELECT` list.
``` pre
SELECT *,
(SELECT min(price) FROM part p2 WHERE p1.brand = p2.brand)
AS foo
FROM part p1;
```
- CSQ in disjunctive (`OR`) filters.
``` pre
SELECT FROM part p1 WHERE p_size > 40 OR
p_retailprice >
(SELECT avg(p_retailprice)
FROM part p2
WHERE p2.p_brand = p1.p_brand)
```
- Nested CSQ with skip level correlations
``` pre
SELECT * FROM part p1 WHERE p1.p_partkey
IN (SELECT p_partkey FROM part p2 WHERE p2.p_retailprice =
(SELECT min(p_retailprice)
FROM part p3
WHERE p3.p_brand = p1.p_brand)
);
```
**Note:** Nested CSQ with skip level correlations are not supported by the legacy query optimizer.
- CSQ with aggregate and inequality. This example contains a CSQ with an inequality.
``` pre
SELECT * FROM part p1 WHERE p1.p_retailprice =
(SELECT min(p_retailprice) FROM part p2 WHERE p2.p_brand <> p1.p_brand);
```
<!-- -->
- CSQ that must return one row.
``` pre
SELECT p_partkey,
(SELECT p_retailprice FROM part p2 WHERE p2.p_brand = p1.p_brand )
FROM part p1;
```
## Queries that Contain Common Table Expressions<a id="topic_c3v_rml_gr"></a>
GPORCA handles queries that contain the `WITH` clause. The `WITH` clause, also known as a common table expression (CTE), generates temporary tables that exist only for the query. This example query contains a CTE.
``` pre
WITH v AS (SELECT a, sum(b) as s FROM T where c < 10 GROUP BY a)
SELECT *FROM v AS v1 , v AS v2
WHERE v1.a <> v2.a AND v1.s < v2.s;
```
As part of query optimization, GPORCA can push down predicates into a CTE. For example query, GPORCA pushes the equality predicates to the CTE.
``` pre
WITH v AS (SELECT a, sum(b) as s FROM T GROUP BY a)
SELECT *
FROM v as v1, v as v2, v as v3
WHERE v1.a < v2.a
AND v1.s < v3.s
AND v1.a = 10
AND v2.a = 20
AND v3.a = 30;
```
GPORCA can handle these types of CTEs:
- CTE that defines one or multiple tables. In this query, the CTE defines two tables.
``` pre
WITH cte1 AS (SELECT a, sum(b) as s FROM T
where c < 10 GROUP BY a),
cte2 AS (SELECT a, s FROM cte1 where s > 1000)
SELECT *
FROM cte1 as v1, cte2 as v2, cte2 as v3
WHERE v1.a < v2.a AND v1.s < v3.s;
```
- Nested CTEs.
``` pre
WITH v AS (WITH w AS (SELECT a, b FROM foo
WHERE b < 5)
SELECT w1.a, w2.b
FROM w AS w1, w AS w2
WHERE w1.a = w2.a AND w1.a > 2)
SELECT v1.a, v2.a, v2.b
FROM v as v1, v as v2
WHERE v1.a < v2.a;
```
## DML Operation Enhancements with GPORCA<a id="topic_plx_mml_gr"></a>
GPORCA contains enhancements for DML operations such as `INSERT`.
- A DML node in a query plan is a query plan operator.
- Can appear anywhere in the plan, as a regular node (top slice only for now)
- Can have consumers
- New query plan operator `Assert` is used for constraints checking.
This example plan shows the `Assert` operator.
``` pre
QUERY PLAN
------------------------------------------------------------
Insert (cost=0.00..4.61 rows=3 width=8)
-> Assert (cost=0.00..3.37 rows=3 width=24)
Assert Cond: (dmlsource.a > 2) IS DISTINCT FROM
false
-> Assert (cost=0.00..2.25 rows=3 width=24)
Assert Cond: NOT dmlsource.b IS NULL
-> Result (cost=0.00..1.14 rows=3 width=24)
-> Table Scan on dmlsource
```
## Queries with Distinct Qualified Aggregates (DQA)<a id="topic_anl_t3t_pv"></a>
GPORCA improves performance for queries that contain distinct qualified aggregates (DQA) without a grouping column and when the table is not distributed on the columns used by the DQA. When encountering these types of queries, GPORCA uses an alternative plan that evaluates the aggregate functions in three stages (local, intermediate, and global aggregations).
See [optimizer\_prefer\_scalar\_dqa\_multistage\_agg](../../reference/guc/parameter_definitions.html#optimizer_prefer_scalar_dqa_multistage_agg) for information on the configuration parameter that controls this behavior.