| $PostgreSQL: pgsql/src/backend/optimizer/README,v 1.49.2.2 2009/09/29 01:20:53 tgl Exp $ |
| |
| Optimizer |
| ========= |
| |
| These directories take the Query structure returned by the parser, and |
| generate a plan used by the executor. The /plan directory generates the |
| actual output plan, the /path code generates all possible ways to join the |
| tables, and /prep handles various preprocessing steps for special cases. |
| /util is utility stuff. /geqo is the separate "genetic optimization" planner |
| --- it does a semi-random search through the join tree space, rather than |
| exhaustively considering all possible join trees. (But each join considered |
| by /geqo is given to /path to create paths for, so we consider all possible |
| implementation paths for each specific join pair even in GEQO mode.) |
| |
| |
| Paths and Join Pairs |
| -------------------- |
| |
| During the planning/optimizing process, we build "Path" trees representing |
| the different ways of doing a query. We select the cheapest Path that |
| generates the desired relation and turn it into a Plan to pass to the |
| executor. (There is pretty much a one-to-one correspondence between the |
| Path and Plan trees, but Path nodes omit info that won't be needed during |
| planning, and include info needed for planning that won't be needed by the |
| executor.) |
| |
| The optimizer builds a RelOptInfo structure for each base relation used in |
| the query. Base rels are either primitive tables, or subquery subselects |
| that are planned via a separate recursive invocation of the planner. A |
| RelOptInfo is also built for each join relation that is considered during |
| planning. A join rel is simply a combination of base rels. There is only |
| one join RelOptInfo for any given set of baserels --- for example, the join |
| {A B C} is represented by the same RelOptInfo no matter whether we build it |
| by joining A and B first and then adding C, or joining B and C first and |
| then adding A, etc. These different means of building the joinrel are |
| represented as Paths. For each RelOptInfo we build a list of Paths that |
| represent plausible ways to implement the scan or join of that relation. |
| Once we've considered all the plausible Paths for a rel, we select the one |
| that is cheapest according to the planner's cost estimates. The final plan |
| is derived from the cheapest Path for the RelOptInfo that includes all the |
| base rels of the query. |
| |
| Possible Paths for a primitive table relation include plain old sequential |
| scan, plus index scans for any indexes that exist on the table, plus bitmap |
| index scans using one or more indexes. A subquery base relation just has |
| one Path, a "SubqueryScan" path (which links to the subplan that was built |
| by a recursive invocation of the planner). Likewise a function-RTE base |
| relation has only one possible Path. |
| |
| Joins always occur using two RelOptInfos. One is outer, the other inner. |
| Outers drive lookups of values in the inner. In a nested loop, lookups of |
| values in the inner occur by scanning the inner path once per outer tuple |
| to find each matching inner row. In a mergejoin, inner and outer rows are |
| ordered, and are accessed in order, so only one scan is required to perform |
| the entire join: both inner and outer paths are scanned in-sync. (There's |
| not a lot of difference between inner and outer in a mergejoin...) In a |
| hashjoin, the inner is scanned first and all its rows are entered in a |
| hashtable, then the outer is scanned and for each row we lookup the join |
| key in the hashtable. |
| |
| A Path for a join relation is actually a tree structure, with the top |
| Path node representing the join method. It has left and right subpaths |
| that represent the scan or join methods used for the two input relations. |
| |
| |
| Join Tree Construction |
| ---------------------- |
| |
| The optimizer generates optimal query plans by doing a more-or-less |
| exhaustive search through the ways of executing the query. The best Path |
| tree is found by a recursive process: |
| |
| 1) Take each base relation in the query, and make a RelOptInfo structure |
| for it. Find each potentially useful way of accessing the relation, |
| including sequential and index scans, and make Paths representing those |
| ways. All the Paths made for a given relation are placed in its |
| RelOptInfo.pathlist. (Actually, we discard Paths that are obviously |
| inferior alternatives before they ever get into the pathlist --- what |
| ends up in the pathlist is the cheapest way of generating each potentially |
| useful sort ordering of the relation.) Also create a RelOptInfo.joininfo |
| list including all the join clauses that involve this relation. For |
| example, the WHERE clause "tab1.col1 = tab2.col1" generates entries in |
| both tab1 and tab2's joininfo lists. |
| |
| If we have only a single base relation in the query, we are done. |
| Otherwise we have to figure out how to join the base relations into a |
| single join relation. |
| |
| 2) Normally, any explicit JOIN clauses are "flattened" so that we just |
| have a list of relations to join. However, FULL OUTER JOIN clauses are |
| never flattened, and other kinds of JOIN might not be either, if the |
| flattening process is stopped by join_collapse_limit or from_collapse_limit |
| restrictions. Therefore, we end up with a planning problem that contains |
| lists of relations to be joined in any order, where any individual item |
| might be a sub-list that has to be joined together before we can consider |
| joining it to its siblings. We process these sub-problems recursively, |
| bottom up. Note that the join list structure constrains the possible join |
| orders, but it doesn't constrain the join implementation method at each |
| join (nestloop, merge, hash), nor does it say which rel is considered outer |
| or inner at each join. We consider all these possibilities in building |
| Paths. We generate a Path for each feasible join method, and select the |
| cheapest Path. |
| |
| For each planning problem, therefore, we will have a list of relations |
| that are either base rels or joinrels constructed per sub-join-lists. |
| We can join these rels together in any order the planner sees fit. |
| The standard (non-GEQO) planner does this as follows: |
| |
| Consider joining each RelOptInfo to each other RelOptInfo for which there |
| is a usable joinclause, and generate a Path for each possible join method |
| for each such pair. (If we have a RelOptInfo with no join clauses, we have |
| no choice but to generate a clauseless Cartesian-product join; so we |
| consider joining that rel to each other available rel. But in the presence |
| of join clauses we will only consider joins that use available join |
| clauses. Note that join-order restrictions induced by outer joins and |
| IN/EXISTS clauses are also checked, to ensure that we find a workable join |
| order in cases where those restrictions force a clauseless join to be done.) |
| |
| If we only had two relations in the list, we are done: we just pick |
| the cheapest path for the join RelOptInfo. If we had more than two, we now |
| need to consider ways of joining join RelOptInfos to each other to make |
| join RelOptInfos that represent more than two list items. |
| |
| The join tree is constructed using a "dynamic programming" algorithm: |
| in the first pass (already described) we consider ways to create join rels |
| representing exactly two list items. The second pass considers ways |
| to make join rels that represent exactly three list items; the next pass, |
| four items, etc. The last pass considers how to make the final join |
| relation that includes all list items --- obviously there can be only one |
| join rel at this top level, whereas there can be more than one join rel |
| at lower levels. At each level we use joins that follow available join |
| clauses, if possible, just as described for the first level. |
| |
| For example: |
| |
| SELECT * |
| FROM tab1, tab2, tab3, tab4 |
| WHERE tab1.col = tab2.col AND |
| tab2.col = tab3.col AND |
| tab3.col = tab4.col |
| |
| Tables 1, 2, 3, and 4 are joined as: |
| {1 2},{2 3},{3 4} |
| {1 2 3},{2 3 4} |
| {1 2 3 4} |
| (other possibilities will be excluded for lack of join clauses) |
| |
| SELECT * |
| FROM tab1, tab2, tab3, tab4 |
| WHERE tab1.col = tab2.col AND |
| tab1.col = tab3.col AND |
| tab1.col = tab4.col |
| |
| Tables 1, 2, 3, and 4 are joined as: |
| {1 2},{1 3},{1 4} |
| {1 2 3},{1 3 4},{1 2 4} |
| {1 2 3 4} |
| |
| We consider left-handed plans (the outer rel of an upper join is a joinrel, |
| but the inner is always a single list item); right-handed plans (outer rel |
| is always a single item); and bushy plans (both inner and outer can be |
| joins themselves). For example, when building {1 2 3 4} we consider |
| joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and |
| {1 2} to {3 4} (bushy), among other choices. Although the jointree |
| scanning code produces these potential join combinations one at a time, |
| all the ways to produce the same set of joined base rels will share the |
| same RelOptInfo, so the paths produced from different join combinations |
| that produce equivalent joinrels will compete in add_path(). |
| |
| Once we have built the final join rel, we use either the cheapest path |
| for it or the cheapest path with the desired ordering (if that's cheaper |
| than applying a sort to the cheapest other path). |
| |
| If the query contains one-sided outer joins (LEFT or RIGHT joins), or |
| IN or EXISTS WHERE clauses that were converted to joins, then some of |
| the possible join orders may be illegal. These are excluded by having |
| join_is_legal consult a side list of such "special" joins to see |
| whether a proposed join is illegal. (The same consultation allows it |
| to see which join style should be applied for a valid join, ie, |
| JOIN_INNER, JOIN_LEFT, etc.) |
| |
| |
| Valid OUTER JOIN Optimizations |
| ------------------------------ |
| |
| The planner's treatment of outer join reordering is based on the following |
| identities: |
| |
| 1. (A leftjoin B on (Pab)) innerjoin C on (Pac) |
| = (A innerjoin C on (Pac)) leftjoin B on (Pab) |
| |
| where Pac is a predicate referencing A and C, etc (in this case, clearly |
| Pac cannot reference B, or the transformation is nonsensical). |
| |
| 2. (A leftjoin B on (Pab)) leftjoin C on (Pac) |
| = (A leftjoin C on (Pac)) leftjoin B on (Pab) |
| |
| 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc) |
| = A leftjoin (B leftjoin C on (Pbc)) on (Pab) |
| |
| Identity 3 only holds if predicate Pbc must fail for all-null B rows |
| (that is, Pbc is strict for at least one column of B). If Pbc is not |
| strict, the first form might produce some rows with nonnull C columns |
| where the second form would make those entries null. |
| |
| RIGHT JOIN is equivalent to LEFT JOIN after switching the two input |
| tables, so the same identities work for right joins. |
| |
| An example of a case that does *not* work is moving an innerjoin into or |
| out of the nullable side of an outer join: |
| |
| A leftjoin (B join C on (Pbc)) on (Pab) |
| != (A leftjoin B on (Pab)) join C on (Pbc) |
| |
| SEMI joins work a little bit differently. A semijoin can be reassociated |
| into or out of the lefthand side of another semijoin, left join, or |
| antijoin, but not into or out of the righthand side. Likewise, an inner |
| join, left join, or antijoin can be reassociated into or out of the |
| lefthand side of a semijoin, but not into or out of the righthand side. |
| |
| ANTI joins work approximately like LEFT joins, except that identity 3 |
| fails if the join to C is an antijoin (even if Pbc is strict, and in |
| both the cases where the other join is a leftjoin and where it is an |
| antijoin). So we can't reorder antijoins into or out of the RHS of a |
| leftjoin or antijoin, even if the relevant clause is strict. |
| |
| The current code does not attempt to re-order FULL JOINs at all. |
| FULL JOIN ordering is enforced by not collapsing FULL JOIN nodes when |
| translating the jointree to "joinlist" representation. Other types of |
| JOIN nodes are normally collapsed so that they participate fully in the |
| join order search. To avoid generating illegal join orders, the planner |
| creates a SpecialJoinInfo node for each non-inner join, and join_is_legal |
| checks this list to decide if a proposed join is legal. |
| |
| What we store in SpecialJoinInfo nodes are the minimum sets of Relids |
| required on each side of the join to form the outer join. Note that |
| these are minimums; there's no explicit maximum, since joining other |
| rels to the OJ's syntactic rels may be legal. Per identities 1 and 2, |
| non-FULL joins can be freely associated into the lefthand side of an |
| OJ, but in some cases they can't be associated into the righthand side. |
| So the restriction enforced by join_is_legal is that a proposed join |
| can't join a rel within or partly within an RHS boundary to one outside |
| the boundary, unless the join validly implements some outer join. |
| (To support use of identity 3, we have to allow cases where an apparent |
| violation of a lower OJ's RHS is committed while forming an upper OJ. |
| If this wouldn't in fact be legal, the upper OJ's minimum LHS or RHS |
| set must be expanded to include the whole of the lower OJ, thereby |
| preventing it from being formed before the lower OJ is.) |
| |
| |
| Pulling Up Subqueries |
| --------------------- |
| |
| As we described above, a subquery appearing in the range table is planned |
| independently and treated as a "black box" during planning of the outer |
| query. This is necessary when the subquery uses features such as |
| aggregates, GROUP, or DISTINCT. But if the subquery is just a simple |
| scan or join, treating the subquery as a black box may produce a poor plan |
| compared to considering it as part of the entire plan search space. |
| Therefore, at the start of the planning process the planner looks for |
| simple subqueries and pulls them up into the main query's jointree. |
| |
| Pulling up a subquery may result in FROM-list joins appearing below the top |
| of the join tree. Each FROM-list is planned using the dynamic-programming |
| search method described above. |
| |
| If pulling up a subquery produces a FROM-list as a direct child of another |
| FROM-list, then we can merge the two FROM-lists together. Once that's |
| done, the subquery is an absolutely integral part of the outer query and |
| will not constrain the join tree search space at all. However, that could |
| result in unpleasant growth of planning time, since the dynamic-programming |
| search has runtime exponential in the number of FROM-items considered. |
| Therefore, we don't merge FROM-lists if the result would have too many |
| FROM-items in one list. |
| |
| |
| Optimizer Functions |
| ------------------- |
| |
| The primary entry point is planner(). |
| |
| planner() |
| set up for recursive handling of subqueries |
| do final cleanup after planning |
| -subquery_planner() |
| pull up sublinks and subqueries from rangetable, if possible |
| canonicalize qual |
| Attempt to simplify WHERE clause to the most useful form; this includes |
| flattening nested AND/ORs and detecting clauses that are duplicated in |
| different branches of an OR. |
| simplify constant expressions |
| process sublinks |
| convert Vars of outer query levels into Params |
| --grouping_planner() |
| preprocess target list for non-SELECT queries |
| handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates, |
| ORDER BY, DISTINCT, LIMIT |
| --query_planner() |
| pull out constant quals, which can be used to gate execution of the |
| whole plan (if any are found, we make a top-level Result node |
| to do the gating) |
| make list of base relations used in query |
| split up the qual into restrictions (a=1) and joins (b=c) |
| find qual clauses that enable merge and hash joins |
| ----make_one_rel() |
| set_base_rel_pathlist() |
| find seqscan and all index paths for each base relation |
| find selectivity of columns used in joins |
| make_rel_from_joinlist() |
| hand off join subproblems to a plugin, GEQO, or standard_join_search() |
| -----standard_join_search() |
| call join_search_one_level() for each level of join tree needed |
| join_search_one_level(): |
| For each joinrel of the prior level, do make_rels_by_clause_joins() |
| if it has join clauses, or make_rels_by_clauseless_joins() if not. |
| Also generate "bushy plan" joins between joinrels of lower levels. |
| Back at standard_join_search(), apply set_cheapest() to extract the |
| cheapest path for each newly constructed joinrel. |
| Loop back if this wasn't the top join level. |
| Back at query_planner: |
| put back any constant quals by adding a Result node |
| Back at grouping_planner: |
| do grouping(GROUP) |
| do aggregates |
| make unique(DISTINCT) |
| make sort(ORDER BY) |
| make limit(LIMIT/OFFSET) |
| |
| |
| Optimizer Data Structures |
| ------------------------- |
| |
| PlannerGlobal - global information for a single planner invocation |
| |
| PlannerInfo - information for planning a particular Query (we make |
| a separate PlannerInfo node for each sub-Query) |
| |
| RelOptInfo - a relation or joined relations |
| |
| RestrictInfo - WHERE clauses, like "x = 3" or "y = z" |
| (note the same structure is used for restriction and |
| join clauses) |
| |
| Path - every way to generate a RelOptInfo(sequential,index,joins) |
| SeqScan - a plain Path node with pathtype = T_SeqScan |
| IndexPath - index scans |
| BitmapHeapPath - top of a bitmapped index scan |
| TidPath - scan by CTID |
| AppendPath - append multiple subpaths together |
| ResultPath - a Result plan node (used for FROM-less SELECT) |
| MaterialPath - a Material plan node |
| UniquePath - remove duplicate rows |
| NestPath - nested-loop joins |
| MergePath - merge joins |
| HashPath - hash joins |
| |
| EquivalenceClass - a data structure representing a set of values known equal |
| |
| PathKey - a data structure representing the sort ordering of a path |
| |
| The optimizer spends a good deal of its time worrying about the ordering |
| of the tuples returned by a path. The reason this is useful is that by |
| knowing the sort ordering of a path, we may be able to use that path as |
| the left or right input of a mergejoin and avoid an explicit sort step. |
| Nestloops and hash joins don't really care what the order of their inputs |
| is, but mergejoin needs suitably ordered inputs. Therefore, all paths |
| generated during the optimization process are marked with their sort order |
| (to the extent that it is known) for possible use by a higher-level merge. |
| |
| It is also possible to avoid an explicit sort step to implement a user's |
| ORDER BY clause if the final path has the right ordering already, so the |
| sort ordering is of interest even at the top level. query_planner() will |
| look for the cheapest path with a sort order matching the desired order, |
| and grouping_planner() will compare its cost to the cost of using the |
| cheapest-overall path and doing an explicit sort. |
| |
| When we are generating paths for a particular RelOptInfo, we discard a path |
| if it is more expensive than another known path that has the same or better |
| sort order. We will never discard a path that is the only known way to |
| achieve a given sort order (without an explicit sort, that is). In this |
| way, the next level up will have the maximum freedom to build mergejoins |
| without sorting, since it can pick from any of the paths retained for its |
| inputs. |
| |
| |
| EquivalenceClasses |
| ------------------ |
| |
| During the deconstruct_jointree() scan of the query's qual clauses, we look |
| for mergejoinable equality clauses A = B whose applicability is not delayed |
| by an outer join; these are called "equivalence clauses". When we find |
| one, we create an EquivalenceClass containing the expressions A and B to |
| record this knowledge. If we later find another equivalence clause B = C, |
| we add C to the existing EquivalenceClass for {A B}; this may require |
| merging two existing EquivalenceClasses. At the end of the scan, we have |
| sets of values that are known all transitively equal to each other. We can |
| therefore use a comparison of any pair of the values as a restriction or |
| join clause (when these values are available at the scan or join, of |
| course); furthermore, we need test only one such comparison, not all of |
| them. Therefore, equivalence clauses are removed from the standard qual |
| distribution process. Instead, when preparing a restriction or join clause |
| list, we examine each EquivalenceClass to see if it can contribute a |
| clause, and if so we select an appropriate pair of values to compare. For |
| example, if we are trying to join A's relation to C's, we can generate the |
| clause A = C, even though this appeared nowhere explicitly in the original |
| query. This may allow us to explore join paths that otherwise would have |
| been rejected as requiring Cartesian-product joins. |
| |
| Sometimes an EquivalenceClass may contain a pseudo-constant expression |
| (i.e., one not containing Vars or Aggs of the current query level, nor |
| volatile functions). In this case we do not follow the policy of |
| dynamically generating join clauses: instead, we dynamically generate |
| restriction clauses "var = const" wherever one of the variable members of |
| the class can first be computed. For example, if we have A = B and B = 42, |
| we effectively generate the restriction clauses A = 42 and B = 42, and then |
| we need not bother with explicitly testing the join clause A = B when the |
| relations are joined. In effect, all the class members can be tested at |
| relation-scan level and there's never a need for join tests. |
| |
| The precise technical interpretation of an EquivalenceClass is that it |
| asserts that at any plan node where more than one of its member values |
| can be computed, output rows in which the values are not all equal may |
| be discarded without affecting the query result. (We require all levels |
| of the plan to enforce EquivalenceClasses, hence a join need not recheck |
| equality of values that were computable by one of its children.) For an |
| ordinary EquivalenceClass that is "valid everywhere", we can further infer |
| that the values are all non-null, because all mergejoinable operators are |
| strict. However, we also allow equivalence clauses that appear below the |
| nullable side of an outer join to form EquivalenceClasses; for these |
| classes, the interpretation is that either all the values are equal, or |
| all (except pseudo-constants) have gone to null. (This requires a |
| limitation that non-constant members be strict, else they might not go |
| to null when the other members do.) Consider for example |
| |
| SELECT * |
| FROM a LEFT JOIN |
| (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss |
| ON a.x = ss.y |
| WHERE a.x = 42; |
| |
| We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby |
| apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed |
| clauses from forming EquivalenceClasses is exactly that we want to be able |
| to push any derived clauses as far down as possible.) But once above the |
| outer join it's no longer necessarily the case that b.y = 10, and thus we |
| cannot use such EquivalenceClasses to conclude that sorting is unnecessary |
| (see discussion of PathKeys below). |
| |
| In this example, notice also that a.x = ss.y (really a.x = b.y) is not an |
| equivalence clause because its applicability to b is delayed by the outer |
| join; thus we do not try to insert b.y into the equivalence class {a.x 42}. |
| But since we see that a.x has been equated to 42 above the outer join, we |
| are able to form a below-outer-join class {b.y 42}; this restriction can be |
| added because no b/c row not having b.y = 42 can contribute to the result |
| of the outer join, and so we need not compute such rows. Now this class |
| will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42, |
| which lets the planner deduce that the b/c join need not be computed at all |
| because none of its rows can contribute to the outer join. (This gets |
| implemented as a gating Result filter, since more usually the potential |
| contradiction involves Param values rather than just Consts, and thus has |
| to be checked at runtime.) |
| |
| To aid in determining the sort ordering(s) that can work with a mergejoin, |
| we mark each mergejoinable clause with the EquivalenceClasses of its left |
| and right inputs. For an equivalence clause, these are of course the same |
| EquivalenceClass. For a non-equivalence mergejoinable clause (such as an |
| outer-join qualification), we generate two separate EquivalenceClasses for |
| the left and right inputs. This may result in creating single-item |
| equivalence "classes", though of course these are still subject to merging |
| if other equivalence clauses are later found to bear on the same |
| expressions. |
| |
| Another way that we may form a single-item EquivalenceClass is in creation |
| of a PathKey to represent a desired sort order (see below). This is a bit |
| different from the above cases because such an EquivalenceClass might |
| contain an aggregate function or volatile expression. (A clause containing |
| a volatile function will never be considered mergejoinable, even if its top |
| operator is mergejoinable, so there is no way for a volatile expression to |
| get into EquivalenceClasses otherwise. Aggregates are disallowed in WHERE |
| altogether, so will never be found in a mergejoinable clause.) This is just |
| a convenience to maintain a uniform PathKey representation: such an |
| EquivalenceClass will never be merged with any other. Note in particular |
| that a single-item EquivalenceClass {a.x} is *not* meant to imply an |
| assertion that a.x = a.x; the practical effect of this is that a.x could |
| be NULL. |
| |
| An EquivalenceClass also contains a list of btree opfamily OIDs, which |
| determines what the equalities it represents actually "mean". All the |
| equivalence clauses that contribute to an EquivalenceClass must have |
| equality operators that belong to the same set of opfamilies. (Note: most |
| of the time, a particular equality operator belongs to only one family, but |
| it's possible that it belongs to more than one. We keep track of all the |
| families to ensure that we can make use of an index belonging to any one of |
| the families for mergejoin purposes.) |
| |
| |
| PathKeys |
| -------- |
| |
| The PathKeys data structure represents what is known about the sort order |
| of the tuples generated by a particular Path. A path's pathkeys field is a |
| list of PathKey nodes, where the n'th item represents the n'th sort key of |
| the result. Each PathKey contains these fields: |
| |
| * a reference to an EquivalenceClass |
| * a btree opfamily OID (must match one of those in the EC) |
| * a sort direction (ascending or descending) |
| * a nulls-first-or-last flag |
| |
| The EquivalenceClass represents the value being sorted on. Since the |
| various members of an EquivalenceClass are known equal according to the |
| opfamily, we can consider a path sorted by any one of them to be sorted by |
| any other too; this is what justifies referencing the whole |
| EquivalenceClass rather than just one member of it. |
| |
| In single/base relation RelOptInfo's, the Paths represent various ways |
| of scanning the relation and the resulting ordering of the tuples. |
| Sequential scan Paths have NIL pathkeys, indicating no known ordering. |
| Index scans have Path.pathkeys that represent the chosen index's ordering, |
| if any. A single-key index would create a single-PathKey list, while a |
| multi-column index generates a list with one element per index column. |
| (Actually, since an index can be scanned either forward or backward, there |
| are two possible sort orders and two possible PathKey lists it can |
| generate.) |
| |
| Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL |
| pathkeys since we can say nothing about the overall order of its result. |
| Also, an indexscan on an unordered type of index generates NIL pathkeys. |
| However, we can always create a pathkey by doing an explicit sort. The |
| pathkeys for a Sort plan's output just represent the sort key fields and |
| the ordering operators used. |
| |
| Things get more interesting when we consider joins. Suppose we do a |
| mergejoin between A and B using the mergeclause A.X = B.Y. The output |
| of the mergejoin is sorted by X --- but it is also sorted by Y. Again, |
| this can be represented by a PathKey referencing an EquivalenceClass |
| containing both X and Y. |
| |
| With a little further thought, it becomes apparent that nestloop joins |
| can also produce sorted output. For example, if we do a nestloop join |
| between outer relation A and inner relation B, then any pathkeys relevant |
| to A are still valid for the join result: we have not altered the order of |
| the tuples from A. Even more interesting, if there was an equivalence clause |
| A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert |
| that B.Y is a pathkey for the join result; X was ordered before and still |
| is, and the joined values of Y are equal to the joined values of X, so Y |
| must now be ordered too. This is true even though we used neither an |
| explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted |
| on to preserve the order of their outer relation, because the executor |
| might decide to "batch" the join, so we always set pathkeys to NIL for |
| a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the |
| ordering of its outer relation, because it might insert nulls at random |
| points in the ordering. |
| |
| In general, we can justify using EquivalenceClasses as the basis for |
| pathkeys because, whenever we scan a relation containing multiple |
| EquivalenceClass members or join two relations each containing |
| EquivalenceClass members, we apply restriction or join clauses derived from |
| the EquivalenceClass. This guarantees that any two values listed in the |
| EquivalenceClass are in fact equal in all tuples emitted by the scan or |
| join, and therefore that if the tuples are sorted by one of the values, |
| they can be considered sorted by any other as well. It does not matter |
| whether the test clause is used as a mergeclause, or merely enforced |
| after-the-fact as a qpqual filter. |
| |
| Note that there is no particular difficulty in labeling a path's sort |
| order with a PathKey referencing an EquivalenceClass that contains |
| variables not yet joined into the path's output. We can simply ignore |
| such entries as not being relevant (yet). This makes it possible to |
| use the same EquivalenceClasses throughout the join planning process. |
| In fact, by being careful not to generate multiple identical PathKey |
| objects, we can reduce comparison of EquivalenceClasses and PathKeys |
| to simple pointer comparison, which is a huge savings because add_path |
| has to make a large number of PathKey comparisons in deciding whether |
| competing Paths are equivalently sorted. |
| |
| Pathkeys are also useful to represent an ordering that we wish to achieve, |
| since they are easily compared to the pathkeys of a potential candidate |
| path. So, SortGroupClause lists are turned into pathkeys lists for use |
| inside the optimizer. |
| |
| Because we have to generate pathkeys lists from the sort clauses before |
| we've finished EquivalenceClass merging, we cannot use the pointer-equality |
| method of comparing PathKeys in the earliest stages of the planning |
| process. Instead, we generate "non canonical" PathKeys that reference |
| single-element EquivalenceClasses that might get merged later. After we |
| complete EquivalenceClass merging, we replace these with "canonical" |
| PathKeys that reference only fully-merged classes, and after that we make |
| sure we don't generate more than one copy of each "canonical" PathKey. |
| Then it is safe to use pointer comparison on canonical PathKeys. |
| |
| An additional refinement we can make is to insist that canonical pathkey |
| lists (sort orderings) do not mention the same EquivalenceClass more than |
| once. For example, in all these cases the second sort column is redundant, |
| because it cannot distinguish values that are the same according to the |
| first sort column: |
| SELECT ... ORDER BY x, x |
| SELECT ... ORDER BY x, x DESC |
| SELECT ... WHERE x = y ORDER BY x, y |
| Although a user probably wouldn't write "ORDER BY x,x" directly, such |
| redundancies are more probable once equivalence classes have been |
| considered. Also, the system may generate redundant pathkey lists when |
| computing the sort ordering needed for a mergejoin. By eliminating the |
| redundancy, we save time and improve planning, since the planner will more |
| easily recognize equivalent orderings as being equivalent. |
| |
| Another interesting property is that if the underlying EquivalenceClass |
| contains a constant and is not below an outer join, then the pathkey is |
| completely redundant and need not be sorted by at all! Every row must |
| contain the same constant value, so there's no need to sort. (If the EC is |
| below an outer join, we still have to sort, since some of the rows might |
| have gone to null and others not. In this case we must be careful to pick |
| a non-const member to sort by. The assumption that all the non-const |
| members go to null at the same plan level is critical here, else they might |
| not produce the same sort order.) This might seem pointless because users |
| are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to |
| recognize when particular index columns are irrelevant to the sort order: |
| if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y) |
| produces correctly ordered data without a sort step. We used to have very |
| ugly ad-hoc code to recognize that in limited contexts, but discarding |
| constant ECs from pathkeys makes it happen cleanly and automatically. |
| |
| You might object that a below-outer-join EquivalenceClass doesn't always |
| represent the same values at every level of the join tree, and so using |
| it to uniquely identify a sort order is dubious. This is true, but we |
| can avoid dealing with the fact explicitly because we always consider that |
| an outer join destroys any ordering of its nullable inputs. Thus, even |
| if a path was sorted by {a.x} below an outer join, we'll re-sort if that |
| sort ordering was important; and so using the same PathKey for both sort |
| orderings doesn't create any real problem. |
| |
| |
| |
| Though Bob Devine <bob.devine@worldnet.att.net> was not involved in the |
| coding of our optimizer, he is available to field questions about |
| optimizer topics. |
| |
| -- bjm & tgl |
| |