| Query To DXL Translator |
| ======================= |
| |
| The Query To DXL Translator, as known as the algebrizer, takes as |
| input a parsed query object and returns its DXL representation, which |
| is an algebrized tree comprised of logical and scalar operations. The |
| serialized DXL is then shipped to GPORCA for optimization. |
| |
| |
| Nomenclature |
| ------------ |
| |
| Logical Operator: |
| Set based operator that maps one or more sets of tuples to a set of |
| tuples. |
| |
| Physical Operator: |
| Implementation of the logical operator, e.g. a logical join operation |
| can be implemented as a nested-loop, hash- or merge-join. |
| |
| Scalar Operator: |
| Maps a single or set of tuples to a single value. |
| |
| Algebrized Representation (Tree): |
| The DXL representation of the GPDB query object |
| |
| |
| Algebrizer Components |
| --------------------- |
| |
| CQueryMutator: normalize the incoming GPDB Query object. |
| |
| CTranslatorQueryToDXL: translate the normalized Query into a DXL tree |
| structure. This DXL structure will then be serialized into DXL |
| document and be shipped to the optimizer. |
| |
| CTranslatorDXLToQuery: translates the DXL back into a GPDB Query, this |
| is only used to verify the correctness of the translation from Query |
| to DXL. |
| |
| |
| Normalized Query Form |
| --------------------- |
| |
| 1) GROUP BY |
| |
| We assert that the group by operator has aggregate functions, grouping |
| functions and grouping columns and nothing else. Expression on |
| aggregate operators or grouping functions are translated as a project |
| on top of the group by operation. To facilitate this we mutate the |
| original query to a form that conforms to this assumption. To |
| illustrate consider the following queries: |
| |
| Example 1: |
| INPUT: |
| SELECT a, count(*) + 1, grouping(a) + 1 FROM foo GROUP BY a |
| OUTPUT: |
| SELECT q_new.a, q_new.ct + 1, q_new.grouping + 1 FROM (SELECT |
| a, count(*) ct, grouping(a) grouping FROM foo GROUP BY a) |
| q_new |
| |
| Example 2: |
| INPUT: |
| SELECT * FROM bar where bar.c = (SELECT count(*) + sum(a) FROM |
| foo GROUP BY a) |
| OUTPUT: |
| SELECT * FROM bar where bar.c = (SELECT q_new.ct + q_new.sm |
| FROM (SELECT a, count(*) ct, sum(b) sm FROM foo GROUP BY a) |
| q_new) |
| |
| |
| 2) HAVING |
| |
| The having clause is translated into a select on top the |
| original query. To accomplish this, we traverse the having clause to |
| collect all aggregates pertaining to the current grouping operation |
| and replacing it with the corresponding variable. To illustrate |
| consider the following queries: |
| |
| Example: |
| INPUT: |
| SELECT a FROM foo GROUP BY a HAVING (count(*) > (SELECT |
| sum(foo.b) FROM bar)) |
| OUTPUT: |
| SELECT q_new.a FROM (SELECT a, count(*) ct, sum(foo.b) sm FROM |
| foo GROUP BY a) q_new) WHERE q_new.ct > (SELECT q_new.sm FROM |
| bar)) |
| |
| |
| 3) WINDOW |
| |
| Similar to GROUP BY, a window operator only has simple window |
| functions rather than expression on window functions. The expression |
| involving window functions are then translated into a project on top |
| of the window operator. The mutation methodology is identical to the |
| process described in GROUP BY. |
| |
| |
| 4) DISTINCT |
| |
| We transform the distinct clause into a GROUP BY on top the original |
| query. |
| |
| Example: |
| INPUT: |
| SELECT distinct a, count(*) FROM foo GROUP BY a, b |
| OUTPUT: |
| SELECT q_new.a, q_new.ct FROM (SELECT a, count(*) ct, FROM foo |
| GROUP BY a) q_new) GROUP BY q_new.a, q_new.ct |
| |
| |
| Query To DXL Translation |
| ------------------------ |
| |
| Input: Query Object |
| |
| CTranslatorQueryToDXL assumes that the query has already been |
| normalized. Therefore the following members in the query objects are |
| NULL: |
| 1. Distinct Columns |
| 2. Having Clause |
| |
| |
| Output: DXL |
| |
| DXL containing: |
| 1. List of output columns along with their alias name |
| 2. List of CTE Producers where each CTE producer has: |
| a. An unique identifier |
| b. List of output columns |
| 3. The DXL representation of the query |
| |
| |
| Translation Algorithm |
| |
| 1. If the query has set operations then |
| a. Process the inputs to the set op. |
| b. Check if the inputs need to casted |
| - If so the optimizer will generate new output column for the |
| casting functions |
| c. Create a logical set operator that contains: |
| - List of input columns from each input |
| - List of output columns of the set op |
| 2. If not, process the from clause (FROM EXPR) |
| a. Translate each range table entry used in the from clause based |
| on the type of the range table entry |
| - Table Entries: translate into logical get or logical |
| external get |
| - TVF: translate into logical tvf |
| - Value Scan: translate into a constant table get |
| - CTE: represented as a logical CTE consumer with it CTE |
| Producer ID and list of output columns. |
| - Translate a derived table to a logical tree |
| 3. Check if the query has window operation |
| a. If the window specification is a computed column then create a |
| project child |
| b. Add window function in the target list to the project list of |
| the window operator |
| 4. Check if the query has GROUP BY (including grouping sets/rollup) |
| a. Expand rollup/grouping sets into a union all over group bys |
| b. Add the aggregates in the target list to the project list of |
| the group by operator |
| 5. Check if the query has ordering clause |
| a. Create a logical limit operator with |
| - Order Spec (sorting columns and sorting functions used) |
| - limit count, and limit offset |
| 6. Generate the list of output columns and the alias name (for display) |