| <!-- doc/src/sgml/xoper.sgml --> |
| |
| <sect1 id="xoper"> |
| <title>User-Defined Operators</title> |
| |
| <indexterm zone="xoper"> |
| <primary>operator</primary> |
| <secondary>user-defined</secondary> |
| </indexterm> |
| |
| <para> |
| Every operator is <quote>syntactic sugar</quote> for a call to an |
| underlying function that does the real work; so you must |
| first create the underlying function before you can create |
| the operator. However, an operator is <emphasis>not merely</emphasis> |
| syntactic sugar, because it carries additional information |
| that helps the query planner optimize queries that use the |
| operator. The next section will be devoted to explaining |
| that additional information. |
| </para> |
| |
| <para> |
| <productname>PostgreSQL</productname> supports prefix |
| and infix operators. Operators can be |
| overloaded;<indexterm><primary>overloading</primary><secondary>operators</secondary></indexterm> |
| that is, the same operator name can be used for different operators |
| that have different numbers and types of operands. When a query is |
| executed, the system determines the operator to call from the |
| number and types of the provided operands. |
| </para> |
| |
| <para> |
| Here is an example of creating an operator for adding two complex |
| numbers. We assume we've already created the definition of type |
| <type>complex</type> (see <xref linkend="xtypes"/>). First we need a |
| function that does the work, then we can define the operator: |
| |
| <programlisting> |
| CREATE FUNCTION complex_add(complex, complex) |
| RETURNS complex |
| AS '<replaceable>filename</replaceable>', 'complex_add' |
| LANGUAGE C IMMUTABLE STRICT; |
| |
| CREATE OPERATOR + ( |
| leftarg = complex, |
| rightarg = complex, |
| function = complex_add, |
| commutator = + |
| ); |
| </programlisting> |
| </para> |
| |
| <para> |
| Now we could execute a query like this: |
| |
| <screen> |
| SELECT (a + b) AS c FROM test_complex; |
| |
| c |
| ----------------- |
| (5.2,6.05) |
| (133.42,144.95) |
| </screen> |
| </para> |
| |
| <para> |
| We've shown how to create a binary operator here. To create a prefix |
| operator, just omit the <literal>leftarg</literal>. |
| The <literal>function</literal> |
| clause and the argument clauses are the only required items in |
| <command>CREATE OPERATOR</command>. The <literal>commutator</literal> |
| clause shown in the example is an optional hint to the query |
| optimizer. Further details about <literal>commutator</literal> and other |
| optimizer hints appear in the next section. |
| </para> |
| </sect1> |
| |
| <sect1 id="xoper-optimization"> |
| <title>Operator Optimization Information</title> |
| |
| <indexterm zone="xoper-optimization"> |
| <primary>optimization information</primary> |
| <secondary>for operators</secondary> |
| </indexterm> |
| |
| <para> |
| A <productname>PostgreSQL</productname> operator definition can include |
| several optional clauses that tell the system useful things about how |
| the operator behaves. These clauses should be provided whenever |
| appropriate, because they can make for considerable speedups in execution |
| of queries that use the operator. But if you provide them, you must be |
| sure that they are right! Incorrect use of an optimization clause can |
| result in slow queries, subtly wrong output, or other Bad Things. |
| You can always leave out an optimization clause if you are not sure |
| about it; the only consequence is that queries might run slower than |
| they need to. |
| </para> |
| |
| <para> |
| Additional optimization clauses might be added in future versions of |
| <productname>PostgreSQL</productname>. The ones described here are all |
| the ones that release &version; understands. |
| </para> |
| |
| <para> |
| It is also possible to attach a planner support function to the function |
| that underlies an operator, providing another way of telling the system |
| about the behavior of the operator. |
| See <xref linkend="xfunc-optimization"/> for more information. |
| </para> |
| |
| <sect2> |
| <title><literal>COMMUTATOR</literal></title> |
| |
| <para> |
| The <literal>COMMUTATOR</literal> clause, if provided, names an operator that is the |
| commutator of the operator being defined. We say that operator A is the |
| commutator of operator B if (x A y) equals (y B x) for all possible input |
| values x, y. Notice that B is also the commutator of A. For example, |
| operators <literal><</literal> and <literal>></literal> for a particular data type are usually each others' |
| commutators, and operator <literal>+</literal> is usually commutative with itself. |
| But operator <literal>-</literal> is usually not commutative with anything. |
| </para> |
| |
| <para> |
| The left operand type of a commutable operator is the same as the |
| right operand type of its commutator, and vice versa. So the name of |
| the commutator operator is all that <productname>PostgreSQL</productname> |
| needs to be given to look up the commutator, and that's all that needs to |
| be provided in the <literal>COMMUTATOR</literal> clause. |
| </para> |
| |
| <para> |
| It's critical to provide commutator information for operators that |
| will be used in indexes and join clauses, because this allows the |
| query optimizer to <quote>flip around</quote> such a clause to the forms |
| needed for different plan types. For example, consider a query with |
| a WHERE clause like <literal>tab1.x = tab2.y</literal>, where <literal>tab1.x</literal> |
| and <literal>tab2.y</literal> are of a user-defined type, and suppose that |
| <literal>tab2.y</literal> is indexed. The optimizer cannot generate an |
| index scan unless it can determine how to flip the clause around to |
| <literal>tab2.y = tab1.x</literal>, because the index-scan machinery expects |
| to see the indexed column on the left of the operator it is given. |
| <productname>PostgreSQL</productname> will <emphasis>not</emphasis> simply |
| assume that this is a valid transformation — the creator of the |
| <literal>=</literal> operator must specify that it is valid, by marking the |
| operator with commutator information. |
| </para> |
| |
| <para> |
| When you are defining a self-commutative operator, you just do it. |
| When you are defining a pair of commutative operators, things are |
| a little trickier: how can the first one to be defined refer to the |
| other one, which you haven't defined yet? There are two solutions |
| to this problem: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| One way is to omit the <literal>COMMUTATOR</literal> clause in the first operator that |
| you define, and then provide one in the second operator's definition. |
| Since <productname>PostgreSQL</productname> knows that commutative |
| operators come in pairs, when it sees the second definition it will |
| automatically go back and fill in the missing <literal>COMMUTATOR</literal> clause in |
| the first definition. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The other, more straightforward way is just to include <literal>COMMUTATOR</literal> clauses |
| in both definitions. When <productname>PostgreSQL</productname> processes |
| the first definition and realizes that <literal>COMMUTATOR</literal> refers to a nonexistent |
| operator, the system will make a dummy entry for that operator in the |
| system catalog. This dummy entry will have valid data only |
| for the operator name, left and right operand types, and result type, |
| since that's all that <productname>PostgreSQL</productname> can deduce |
| at this point. The first operator's catalog entry will link to this |
| dummy entry. Later, when you define the second operator, the system |
| updates the dummy entry with the additional information from the second |
| definition. If you try to use the dummy operator before it's been filled |
| in, you'll just get an error message. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title><literal>NEGATOR</literal></title> |
| |
| <para> |
| The <literal>NEGATOR</literal> clause, if provided, names an operator that is the |
| negator of the operator being defined. We say that operator A |
| is the negator of operator B if both return Boolean results and |
| (x A y) equals NOT (x B y) for all possible inputs x, y. |
| Notice that B is also the negator of A. |
| For example, <literal><</literal> and <literal>>=</literal> are a negator pair for most data types. |
| An operator can never validly be its own negator. |
| </para> |
| |
| <para> |
| Unlike commutators, a pair of unary operators could validly be marked |
| as each other's negators; that would mean (A x) equals NOT (B x) |
| for all x. |
| </para> |
| |
| <para> |
| An operator's negator must have the same left and/or right operand types |
| as the operator to be defined, so just as with <literal>COMMUTATOR</literal>, only the operator |
| name need be given in the <literal>NEGATOR</literal> clause. |
| </para> |
| |
| <para> |
| Providing a negator is very helpful to the query optimizer since |
| it allows expressions like <literal>NOT (x = y)</literal> to be simplified into |
| <literal>x <> y</literal>. This comes up more often than you might think, because |
| <literal>NOT</literal> operations can be inserted as a consequence of other rearrangements. |
| </para> |
| |
| <para> |
| Pairs of negator operators can be defined using the same methods |
| explained above for commutator pairs. |
| </para> |
| |
| </sect2> |
| |
| <sect2> |
| <title><literal>RESTRICT</literal></title> |
| |
| <para> |
| The <literal>RESTRICT</literal> clause, if provided, names a restriction selectivity |
| estimation function for the operator. (Note that this is a function |
| name, not an operator name.) <literal>RESTRICT</literal> clauses only make sense for |
| binary operators that return <type>boolean</type>. The idea behind a restriction |
| selectivity estimator is to guess what fraction of the rows in a |
| table will satisfy a <literal>WHERE</literal>-clause condition of the form: |
| <programlisting> |
| column OP constant |
| </programlisting> |
| for the current operator and a particular constant value. |
| This assists the optimizer by |
| giving it some idea of how many rows will be eliminated by <literal>WHERE</literal> |
| clauses that have this form. (What happens if the constant is on |
| the left, you might be wondering? Well, that's one of the things that |
| <literal>COMMUTATOR</literal> is for...) |
| </para> |
| |
| <para> |
| Writing new restriction selectivity estimation functions is far beyond |
| the scope of this chapter, but fortunately you can usually just use |
| one of the system's standard estimators for many of your own operators. |
| These are the standard restriction estimators: |
| <simplelist> |
| <member><function>eqsel</function> for <literal>=</literal></member> |
| <member><function>neqsel</function> for <literal><></literal></member> |
| <member><function>scalarltsel</function> for <literal><</literal></member> |
| <member><function>scalarlesel</function> for <literal><=</literal></member> |
| <member><function>scalargtsel</function> for <literal>></literal></member> |
| <member><function>scalargesel</function> for <literal>>=</literal></member> |
| </simplelist> |
| </para> |
| |
| <para> |
| You can frequently get away with using either <function>eqsel</function> or <function>neqsel</function> for |
| operators that have very high or very low selectivity, even if they |
| aren't really equality or inequality. For example, the |
| approximate-equality geometric operators use <function>eqsel</function> on the assumption that |
| they'll usually only match a small fraction of the entries in a table. |
| </para> |
| |
| <para> |
| You can use <function>scalarltsel</function>, <function>scalarlesel</function>, |
| <function>scalargtsel</function> and <function>scalargesel</function> for comparisons on |
| data types that have some sensible means of being converted into numeric |
| scalars for range comparisons. If possible, add the data type to those |
| understood by the function <function>convert_to_scalar()</function> in |
| <filename>src/backend/utils/adt/selfuncs.c</filename>. |
| (Eventually, this function should be replaced by per-data-type functions |
| identified through a column of the <classname>pg_type</classname> system catalog; but that hasn't happened |
| yet.) If you do not do this, things will still work, but the optimizer's |
| estimates won't be as good as they could be. |
| </para> |
| |
| <para> |
| Another useful built-in selectivity estimation function |
| is <function>matchingsel</function>, which will work for almost any |
| binary operator, if standard MCV and/or histogram statistics are |
| collected for the input data type(s). Its default estimate is set to |
| twice the default estimate used in <function>eqsel</function>, making |
| it most suitable for comparison operators that are somewhat less |
| strict than equality. (Or you could call the |
| underlying <function>generic_restriction_selectivity</function> |
| function, providing a different default estimate.) |
| </para> |
| |
| <para> |
| There are additional selectivity estimation functions designed for geometric |
| operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>, |
| and <function>contsel</function>. At this writing these are just stubs, but you might want |
| to use them (or even better, improve them) anyway. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title><literal>JOIN</literal></title> |
| |
| <para> |
| The <literal>JOIN</literal> clause, if provided, names a join selectivity |
| estimation function for the operator. (Note that this is a function |
| name, not an operator name.) <literal>JOIN</literal> clauses only make sense for |
| binary operators that return <type>boolean</type>. The idea behind a join |
| selectivity estimator is to guess what fraction of the rows in a |
| pair of tables will satisfy a <literal>WHERE</literal>-clause condition of the form: |
| <programlisting> |
| table1.column1 OP table2.column2 |
| </programlisting> |
| for the current operator. As with the <literal>RESTRICT</literal> clause, this helps |
| the optimizer very substantially by letting it figure out which |
| of several possible join sequences is likely to take the least work. |
| </para> |
| |
| <para> |
| As before, this chapter will make no attempt to explain how to write |
| a join selectivity estimator function, but will just suggest that |
| you use one of the standard estimators if one is applicable: |
| <simplelist> |
| <member><function>eqjoinsel</function> for <literal>=</literal></member> |
| <member><function>neqjoinsel</function> for <literal><></literal></member> |
| <member><function>scalarltjoinsel</function> for <literal><</literal></member> |
| <member><function>scalarlejoinsel</function> for <literal><=</literal></member> |
| <member><function>scalargtjoinsel</function> for <literal>></literal></member> |
| <member><function>scalargejoinsel</function> for <literal>>=</literal></member> |
| <member><function>matchingjoinsel</function> for generic matching operators</member> |
| <member><function>areajoinsel</function> for 2D area-based comparisons</member> |
| <member><function>positionjoinsel</function> for 2D position-based comparisons</member> |
| <member><function>contjoinsel</function> for 2D containment-based comparisons</member> |
| </simplelist> |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title><literal>HASHES</literal></title> |
| |
| <para> |
| The <literal>HASHES</literal> clause, if present, tells the system that |
| it is permissible to use the hash join method for a join based on this |
| operator. <literal>HASHES</literal> only makes sense for a binary operator that |
| returns <literal>boolean</literal>, and in practice the operator must represent |
| equality for some data type or pair of data types. |
| </para> |
| |
| <para> |
| The assumption underlying hash join is that the join operator can |
| only return true for pairs of left and right values that hash to the |
| same hash code. If two values get put in different hash buckets, the |
| join will never compare them at all, implicitly assuming that the |
| result of the join operator must be false. So it never makes sense |
| to specify <literal>HASHES</literal> for operators that do not represent |
| some form of equality. In most cases it is only practical to support |
| hashing for operators that take the same data type on both sides. |
| However, sometimes it is possible to design compatible hash functions |
| for two or more data types; that is, functions that will generate the |
| same hash codes for <quote>equal</quote> values, even though the values |
| have different representations. For example, it's fairly simple |
| to arrange this property when hashing integers of different widths. |
| </para> |
| |
| <para> |
| To be marked <literal>HASHES</literal>, the join operator must appear |
| in a hash index operator family. This is not enforced when you create |
| the operator, since of course the referencing operator family couldn't |
| exist yet. But attempts to use the operator in hash joins will fail |
| at run time if no such operator family exists. The system needs the |
| operator family to find the data-type-specific hash function(s) for the |
| operator's input data type(s). Of course, you must also create suitable |
| hash functions before you can create the operator family. |
| </para> |
| |
| <para> |
| Care should be exercised when preparing a hash function, because there |
| are machine-dependent ways in which it might fail to do the right thing. |
| For example, if your data type is a structure in which there might be |
| uninteresting pad bits, you cannot simply pass the whole structure to |
| <function>hash_any</function>. (Unless you write your other operators and |
| functions to ensure that the unused bits are always zero, which is the |
| recommended strategy.) |
| Another example is that on machines that meet the <acronym>IEEE</acronym> |
| floating-point standard, negative zero and positive zero are different |
| values (different bit patterns) but they are defined to compare equal. |
| If a float value might contain negative zero then extra steps are needed |
| to ensure it generates the same hash value as positive zero. |
| </para> |
| |
| <para> |
| A hash-joinable operator must have a commutator (itself if the two |
| operand data types are the same, or a related equality operator |
| if they are different) that appears in the same operator family. |
| If this is not the case, planner errors might occur when the operator |
| is used. Also, it is a good idea (but not strictly required) for |
| a hash operator family that supports multiple data types to provide |
| equality operators for every combination of the data types; this |
| allows better optimization. |
| </para> |
| |
| <note> |
| <para> |
| The function underlying a hash-joinable operator must be marked |
| immutable or stable. If it is volatile, the system will never |
| attempt to use the operator for a hash join. |
| </para> |
| </note> |
| |
| <note> |
| <para> |
| If a hash-joinable operator has an underlying function that is marked |
| strict, the |
| function must also be complete: that is, it should return true or |
| false, never null, for any two nonnull inputs. If this rule is |
| not followed, hash-optimization of <literal>IN</literal> operations might |
| generate wrong results. (Specifically, <literal>IN</literal> might return |
| false where the correct answer according to the standard would be null; |
| or it might yield an error complaining that it wasn't prepared for a |
| null result.) |
| </para> |
| </note> |
| |
| </sect2> |
| |
| <sect2> |
| <title><literal>MERGES</literal></title> |
| |
| <para> |
| The <literal>MERGES</literal> clause, if present, tells the system that |
| it is permissible to use the merge-join method for a join based on this |
| operator. <literal>MERGES</literal> only makes sense for a binary operator that |
| returns <literal>boolean</literal>, and in practice the operator must represent |
| equality for some data type or pair of data types. |
| </para> |
| |
| <para> |
| Merge join is based on the idea of sorting the left- and right-hand tables |
| into order and then scanning them in parallel. So, both data types must |
| be capable of being fully ordered, and the join operator must be one |
| that can only succeed for pairs of values that fall at the |
| <quote>same place</quote> |
| in the sort order. In practice this means that the join operator must |
| behave like equality. But it is possible to merge-join two |
| distinct data types so long as they are logically compatible. For |
| example, the <type>smallint</type>-versus-<type>integer</type> |
| equality operator is merge-joinable. |
| We only need sorting operators that will bring both data types into a |
| logically compatible sequence. |
| </para> |
| |
| <para> |
| To be marked <literal>MERGES</literal>, the join operator must appear |
| as an equality member of a <literal>btree</literal> index operator family. |
| This is not enforced when you create |
| the operator, since of course the referencing operator family couldn't |
| exist yet. But the operator will not actually be used for merge joins |
| unless a matching operator family can be found. The |
| <literal>MERGES</literal> flag thus acts as a hint to the planner that |
| it's worth looking for a matching operator family. |
| </para> |
| |
| <para> |
| A merge-joinable operator must have a commutator (itself if the two |
| operand data types are the same, or a related equality operator |
| if they are different) that appears in the same operator family. |
| If this is not the case, planner errors might occur when the operator |
| is used. Also, it is a good idea (but not strictly required) for |
| a <literal>btree</literal> operator family that supports multiple data types to provide |
| equality operators for every combination of the data types; this |
| allows better optimization. |
| </para> |
| |
| <note> |
| <para> |
| The function underlying a merge-joinable operator must be marked |
| immutable or stable. If it is volatile, the system will never |
| attempt to use the operator for a merge join. |
| </para> |
| </note> |
| </sect2> |
| </sect1> |