src/backend/gpopt/translate/README - cloudberry - Git at Google

 Query To DXL Translator
 =======================

 The Query To DXL Translator, as known as the algebrizer, takes as
 input a parsed query object and returns its DXL representation, which
 is an algebrized tree comprised of logical and scalar operations. The
 serialized DXL is then shipped to GPORCA for optimization.


 Nomenclature
 ------------

 Logical Operator:
 Set based operator that maps one or more sets of tuples to a set of
 tuples.

 Physical Operator:
 Implementation of the logical operator, e.g. a logical join operation
 can be implemented as a nested-loop, hash- or merge-join.

 Scalar Operator:
 Maps a single or set of tuples to a single value.

 Algebrized Representation (Tree):
 The DXL representation of the GPDB query object


 Algebrizer Components
 ---------------------

 CQueryMutator: normalize the incoming GPDB Query object.

 CTranslatorQueryToDXL: translate the normalized Query into a DXL tree
 structure. This DXL structure will then be serialized into DXL
 document and be shipped to the optimizer.

 CTranslatorDXLToQuery: translates the DXL back into a GPDB Query, this
 is only used to verify the correctness of the translation from Query
 to DXL.


 Normalized Query Form
 ---------------------

 1) GROUP BY

 We assert that the group by operator has aggregate functions, grouping
 functions and grouping columns and nothing else. Expression on
 aggregate operators or grouping functions are translated as a project
 on top of the group by operation. To facilitate this we mutate the
 original query to a form that conforms to this assumption. To
 illustrate consider the following queries:

 Example 1:
     INPUT:
         SELECT a, count(*) + 1, grouping(a) + 1 FROM foo GROUP BY a
     OUTPUT:
         SELECT q_new.a, q_new.ct + 1, q_new.grouping + 1 FROM (SELECT
         a, count(*) ct, grouping(a) grouping FROM foo GROUP BY a)
         q_new

 Example 2:
 	INPUT:
         SELECT * FROM bar where bar.c = (SELECT count(*) + sum(a) FROM
         foo GROUP BY a)
 	OUTPUT:
         SELECT * FROM bar where bar.c = (SELECT q_new.ct + q_new.sm
         FROM (SELECT a, count(*) ct, sum(b) sm FROM foo GROUP BY a)
         q_new)


 2) HAVING

 The having clause is translated into a select on top the
 original query. To accomplish this, we traverse the having clause to
 collect all aggregates pertaining to the current grouping operation
 and replacing it with the corresponding variable. To illustrate
 consider the following queries:

 Example:
     INPUT:
         SELECT a FROM foo GROUP BY a HAVING (count(*) > (SELECT
         sum(foo.b) FROM bar))
     OUTPUT:
         SELECT q_new.a FROM (SELECT a, count(*) ct, sum(foo.b) sm FROM
         foo GROUP BY a) q_new) WHERE q_new.ct > (SELECT q_new.sm FROM
         bar))


 3) WINDOW

 Similar to GROUP BY, a window operator only has simple window
 functions rather than expression on window functions. The expression
 involving window functions are then translated into a project on top
 of the window operator. The mutation methodology is identical to the
 process described in GROUP BY.


 4) DISTINCT

 We transform the distinct clause into a GROUP BY on top the original
 query.

 Example:
     INPUT:
         SELECT distinct a, count(*) FROM foo GROUP BY a, b
     OUTPUT:
         SELECT q_new.a, q_new.ct FROM (SELECT a, count(*) ct, FROM foo
         GROUP BY a) q_new) GROUP BY q_new.a, q_new.ct


 Query To DXL Translation
 ------------------------

 Input: Query Object

 CTranslatorQueryToDXL assumes that the query has already been
 normalized. Therefore the following members in the query objects are
 NULL:
 1. Distinct Columns
 2. Having Clause


 Output: DXL

 DXL containing:
 1. List of output columns along with their alias name
 2. List of CTE Producers where each CTE producer has:
         a. An unique identifier
         b. List of output columns
 3. The DXL representation of the query


 Translation Algorithm

 1. If the query has set operations then
     a. Process the inputs to the set op.
     b. Check if the inputs need to casted
         - If so the optimizer will generate new output column for the
           casting functions
     c. Create a logical set operator that contains:
         - List of input columns from each input
         - List of output columns of the set op
 2. If not, process the from clause (FROM EXPR)
     a. Translate each range table entry used in the from clause based
        on the type of the range table entry
         - Table Entries: translate into logical get or logical
           external get
         - TVF: translate into logical tvf
         - Value Scan: translate into a constant table get
         - CTE: represented as a logical CTE consumer with it CTE
           Producer ID and list of output columns.
         - Translate a derived table to a logical tree
 3. Check if the query has window operation
     a. If the window specification is a computed column then create a
        project child
     b. Add window function in the target list to the project list of
        the window operator
 4. Check if the query has GROUP BY (including grouping sets/rollup)
     a. Expand rollup/grouping sets into a union all over group bys
     b. Add the aggregates in the target list to the project list of
        the group by operator
 5. Check if the query has ordering clause
     a. Create a logical limit operator with
         - Order Spec (sorting columns and sorting functions used)
         - limit count, and limit offset
 6. Generate the list of output columns and the alias name (for display)
	Query To DXL Translator
	=======================

	The Query To DXL Translator, as known as the algebrizer, takes as
	input a parsed query object and returns its DXL representation, which
	is an algebrized tree comprised of logical and scalar operations. The
	serialized DXL is then shipped to GPORCA for optimization.


	Nomenclature
	------------

	Logical Operator:
	Set based operator that maps one or more sets of tuples to a set of
	tuples.

	Physical Operator:
	Implementation of the logical operator, e.g. a logical join operation
	can be implemented as a nested-loop, hash- or merge-join.

	Scalar Operator:
	Maps a single or set of tuples to a single value.

	Algebrized Representation (Tree):
	The DXL representation of the GPDB query object


	Algebrizer Components
	---------------------

	CQueryMutator: normalize the incoming GPDB Query object.

	CTranslatorQueryToDXL: translate the normalized Query into a DXL tree
	structure. This DXL structure will then be serialized into DXL
	document and be shipped to the optimizer.

	CTranslatorDXLToQuery: translates the DXL back into a GPDB Query, this
	is only used to verify the correctness of the translation from Query
	to DXL.


	Normalized Query Form
	---------------------

	1) GROUP BY

	We assert that the group by operator has aggregate functions, grouping
	functions and grouping columns and nothing else. Expression on
	aggregate operators or grouping functions are translated as a project
	on top of the group by operation. To facilitate this we mutate the
	original query to a form that conforms to this assumption. To
	illustrate consider the following queries:

	Example 1:
	INPUT:
	SELECT a, count(*) + 1, grouping(a) + 1 FROM foo GROUP BY a
	OUTPUT:
	SELECT q_new.a, q_new.ct + 1, q_new.grouping + 1 FROM (SELECT
	a, count(*) ct, grouping(a) grouping FROM foo GROUP BY a)
	q_new

	Example 2:
	INPUT:
	SELECT * FROM bar where bar.c = (SELECT count(*) + sum(a) FROM
	foo GROUP BY a)
	OUTPUT:
	SELECT * FROM bar where bar.c = (SELECT q_new.ct + q_new.sm
	FROM (SELECT a, count(*) ct, sum(b) sm FROM foo GROUP BY a)
	q_new)


	2) HAVING

	The having clause is translated into a select on top the
	original query. To accomplish this, we traverse the having clause to
	collect all aggregates pertaining to the current grouping operation
	and replacing it with the corresponding variable. To illustrate
	consider the following queries:

	Example:
	INPUT:
	SELECT a FROM foo GROUP BY a HAVING (count(*) > (SELECT
	sum(foo.b) FROM bar))
	OUTPUT:
	SELECT q_new.a FROM (SELECT a, count(*) ct, sum(foo.b) sm FROM
	foo GROUP BY a) q_new) WHERE q_new.ct > (SELECT q_new.sm FROM
	bar))


	3) WINDOW

	Similar to GROUP BY, a window operator only has simple window
	functions rather than expression on window functions. The expression
	involving window functions are then translated into a project on top
	of the window operator. The mutation methodology is identical to the
	process described in GROUP BY.


	4) DISTINCT

	We transform the distinct clause into a GROUP BY on top the original
	query.

	Example:
	INPUT:
	SELECT distinct a, count(*) FROM foo GROUP BY a, b
	OUTPUT:
	SELECT q_new.a, q_new.ct FROM (SELECT a, count(*) ct, FROM foo
	GROUP BY a) q_new) GROUP BY q_new.a, q_new.ct


	Query To DXL Translation
	------------------------

	Input: Query Object

	CTranslatorQueryToDXL assumes that the query has already been
	normalized. Therefore the following members in the query objects are
	NULL:
	1. Distinct Columns
	2. Having Clause


	Output: DXL

	DXL containing:
	1. List of output columns along with their alias name
	2. List of CTE Producers where each CTE producer has:
	a. An unique identifier
	b. List of output columns
	3. The DXL representation of the query


	Translation Algorithm

	1. If the query has set operations then
	a. Process the inputs to the set op.
	b. Check if the inputs need to casted
	- If so the optimizer will generate new output column for the
	casting functions
	c. Create a logical set operator that contains:
	- List of input columns from each input
	- List of output columns of the set op
	2. If not, process the from clause (FROM EXPR)
	a. Translate each range table entry used in the from clause based
	on the type of the range table entry
	- Table Entries: translate into logical get or logical
	external get
	- TVF: translate into logical tvf
	- Value Scan: translate into a constant table get
	- CTE: represented as a logical CTE consumer with it CTE
	Producer ID and list of output columns.
	- Translate a derived table to a logical tree
	3. Check if the query has window operation
	a. If the window specification is a computed column then create a
	project child
	b. Add window function in the target list to the project list of
	the window operator
	4. Check if the query has GROUP BY (including grouping sets/rollup)
	a. Expand rollup/grouping sets into a union all over group bys
	b. Add the aggregates in the target list to the project list of
	the group by operator
	5. Check if the query has ordering clause
	a. Create a logical limit operator with
	- Order Spec (sorting columns and sorting functions used)
	- limit count, and limit offset
	6. Generate the list of output columns and the alias name (for display)