| <!-- doc/src/sgml/typeconv.sgml --> |
| |
| <chapter id="typeconv"> |
| <title>Type Conversion</title> |
| |
| <indexterm zone="typeconv"> |
| <primary>data type</primary> |
| <secondary>conversion</secondary> |
| </indexterm> |
| |
| <para> |
| <acronym>SQL</acronym> statements can, intentionally or not, require |
| the mixing of different data types in the same expression. |
| <productname>PostgreSQL</productname> has extensive facilities for |
| evaluating mixed-type expressions. |
| </para> |
| |
| <para> |
| In many cases a user does not need |
| to understand the details of the type conversion mechanism. |
| However, implicit conversions done by <productname>PostgreSQL</productname> |
| can affect the results of a query. When necessary, these results |
| can be tailored by using <emphasis>explicit</emphasis> type conversion. |
| </para> |
| |
| <para> |
| This chapter introduces the <productname>PostgreSQL</productname> |
| type conversion mechanisms and conventions. |
| Refer to the relevant sections in <xref linkend="datatype"/> and <xref linkend="functions"/> |
| for more information on specific data types and allowed functions and |
| operators. |
| </para> |
| |
| <sect1 id="typeconv-overview"> |
| <title>Overview</title> |
| |
| <para> |
| <acronym>SQL</acronym> is a strongly typed language. That is, every data item |
| has an associated data type which determines its behavior and allowed usage. |
| <productname>PostgreSQL</productname> has an extensible type system that is |
| more general and flexible than other <acronym>SQL</acronym> implementations. |
| Hence, most type conversion behavior in <productname>PostgreSQL</productname> |
| is governed by general rules rather than by ad hoc |
| heuristics. This allows the use of mixed-type expressions even with |
| user-defined types. |
| </para> |
| |
| <para> |
| The <productname>PostgreSQL</productname> scanner/parser divides lexical |
| elements into five fundamental categories: integers, non-integer numbers, |
| strings, identifiers, and key words. Constants of most non-numeric types are |
| first classified as strings. The <acronym>SQL</acronym> language definition |
| allows specifying type names with strings, and this mechanism can be used in |
| <productname>PostgreSQL</productname> to start the parser down the correct |
| path. For example, the query: |
| |
| <screen> |
| SELECT text 'Origin' AS "label", point '(0,0)' AS "value"; |
| |
| label | value |
| --------+------- |
| Origin | (0,0) |
| (1 row) |
| </screen> |
| |
| has two literal constants, of type <type>text</type> and <type>point</type>. |
| If a type is not specified for a string literal, then the placeholder type |
| <type>unknown</type> is assigned initially, to be resolved in later |
| stages as described below. |
| </para> |
| |
| <para> |
| There are four fundamental <acronym>SQL</acronym> constructs requiring |
| distinct type conversion rules in the <productname>PostgreSQL</productname> |
| parser: |
| |
| <variablelist> |
| <varlistentry> |
| <term> |
| Function calls |
| </term> |
| <listitem> |
| <para> |
| Much of the <productname>PostgreSQL</productname> type system is built around a |
| rich set of functions. Functions can have one or more arguments. |
| Since <productname>PostgreSQL</productname> permits function |
| overloading, the function name alone does not uniquely identify the function |
| to be called; the parser must select the right function based on the data |
| types of the supplied arguments. |
| </para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term> |
| Operators |
| </term> |
| <listitem> |
| <para> |
| <productname>PostgreSQL</productname> allows expressions with |
| prefix (one-argument) operators, |
| as well as infix (two-argument) operators. Like functions, operators can |
| be overloaded, so the same problem of selecting the right operator |
| exists. |
| </para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term> |
| Value Storage |
| </term> |
| <listitem> |
| <para> |
| <acronym>SQL</acronym> <command>INSERT</command> and <command>UPDATE</command> statements place the results of |
| expressions into a table. The expressions in the statement must be matched up |
| with, and perhaps converted to, the types of the target columns. |
| </para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term> |
| <literal>UNION</literal>, <literal>CASE</literal>, and related constructs |
| </term> |
| <listitem> |
| <para> |
| Since all query results from a unionized <command>SELECT</command> statement |
| must appear in a single set of columns, the types of the results of each |
| <command>SELECT</command> clause must be matched up and converted to a uniform set. |
| Similarly, the result expressions of a <literal>CASE</literal> construct must be |
| converted to a common type so that the <literal>CASE</literal> expression as a whole |
| has a known output type. Some other constructs, such |
| as <literal>ARRAY[]</literal> and the <function>GREATEST</function> |
| and <function>LEAST</function> functions, likewise require determination of a |
| common type for several subexpressions. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| </para> |
| |
| <para> |
| The system catalogs store information about which conversions, or |
| <firstterm>casts</firstterm>, exist between which data types, and how to |
| perform those conversions. Additional casts can be added by the user |
| with the <xref linkend="sql-createcast"/> |
| command. (This is usually |
| done in conjunction with defining new data types. The set of casts |
| between built-in types has been carefully crafted and is best not |
| altered.) |
| </para> |
| |
| <indexterm> |
| <primary>data type</primary> |
| <secondary>category</secondary> |
| </indexterm> |
| |
| <para> |
| An additional heuristic provided by the parser allows improved determination |
| of the proper casting behavior among groups of types that have implicit casts. |
| Data types are divided into several basic <firstterm>type |
| categories</firstterm>, including <type>boolean</type>, <type>numeric</type>, |
| <type>string</type>, <type>bitstring</type>, <type>datetime</type>, |
| <type>timespan</type>, <type>geometric</type>, <type>network</type>, and |
| user-defined. (For a list see <xref linkend="catalog-typcategory-table"/>; |
| but note it is also possible to create custom type categories.) Within each |
| category there can be one or more <firstterm>preferred types</firstterm>, which |
| are preferred when there is a choice of possible types. With careful selection |
| of preferred types and available implicit casts, it is possible to ensure that |
| ambiguous expressions (those with multiple candidate parsing solutions) can be |
| resolved in a useful way. |
| </para> |
| |
| <para> |
| All type conversion rules are designed with several principles in mind: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| Implicit conversions should never have surprising or unpredictable outcomes. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| There should be no extra overhead in the parser or executor |
| if a query does not need implicit type conversion. |
| That is, if a query is well-formed and the types already match, then the query should execute |
| without spending extra time in the parser and without introducing unnecessary implicit conversion |
| calls in the query. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Additionally, if a query usually requires an implicit conversion for a function, and |
| if then the user defines a new function with the correct argument types, the parser |
| should use this new function and no longer do implicit conversion to use the old function. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| </sect1> |
| |
| <sect1 id="typeconv-oper"> |
| <title>Operators</title> |
| |
| <indexterm zone="typeconv-oper"> |
| <primary>operator</primary> |
| <secondary>type resolution in an invocation</secondary> |
| </indexterm> |
| |
| <para> |
| The specific operator that is referenced by an operator expression |
| is determined using the following procedure. |
| Note that this procedure is indirectly affected |
| by the precedence of the operators involved, since that will determine |
| which sub-expressions are taken to be the inputs of which operators. |
| See <xref linkend="sql-precedence"/> for more information. |
| </para> |
| |
| <procedure> |
| <title>Operator Type Resolution</title> |
| |
| <step id="op-resol-select" performance="required"> |
| <para> |
| Select the operators to be considered from the |
| <classname>pg_operator</classname> system catalog. If a non-schema-qualified |
| operator name was used (the usual case), the operators |
| considered are those with the matching name and argument count that are |
| visible in the current search path (see <xref linkend="ddl-schemas-path"/>). |
| If a qualified operator name was given, only operators in the specified |
| schema are considered. |
| </para> |
| |
| <substeps> |
| <step performance="optional"> |
| <para> |
| If the search path finds multiple operators with identical argument types, |
| only the one appearing earliest in the path is considered. Operators with |
| different argument types are considered on an equal footing regardless of |
| search path position. |
| </para> |
| </step> |
| </substeps> |
| </step> |
| |
| <step id="op-resol-exact-match" performance="required"> |
| <para> |
| Check for an operator accepting exactly the input argument types. |
| If one exists (there can be only one exact match in the set of |
| operators considered), use it. Lack of an exact match creates a security |
| hazard when calling, via qualified name |
| <footnote id="op-qualified-security"> |
| <!-- If you edit this, consider editing func-qualified-security. --> |
| <para> |
| The hazard does not arise with a non-schema-qualified name, because a |
| search path containing schemas that permit untrusted users to create |
| objects is not a <link linkend="ddl-schemas-patterns">secure schema usage |
| pattern</link>. |
| </para> |
| </footnote> |
| (not typical), any operator found in a schema that permits untrusted users to |
| create objects. In such situations, cast arguments to force an exact match. |
| </para> |
| |
| <substeps> |
| <step id="op-resol-exact-unknown" performance="optional"> |
| <para> |
| If one argument of a binary operator invocation is of the <type>unknown</type> type, |
| then assume it is the same type as the other argument for this check. |
| Invocations involving two <type>unknown</type> inputs, or a prefix operator |
| with an <type>unknown</type> input, will never find a match at this step. |
| </para> |
| </step> |
| <step id="op-resol-exact-domain" performance="optional"> |
| <para> |
| If one argument of a binary operator invocation is of the <type>unknown</type> |
| type and the other is of a domain type, next check to see if there is an |
| operator accepting exactly the domain's base type on both sides; if so, use it. |
| </para> |
| </step> |
| </substeps> |
| </step> |
| |
| <step id="op-resol-best-match" performance="required"> |
| <para> |
| Look for the best match. |
| </para> |
| <substeps> |
| <step performance="required"> |
| <para> |
| Discard candidate operators for which the input types do not match |
| and cannot be converted (using an implicit conversion) to match. |
| <type>unknown</type> literals are |
| assumed to be convertible to anything for this purpose. If only one |
| candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| If any input argument is of a domain type, treat it as being of the |
| domain's base type for all subsequent steps. This ensures that domains |
| act like their base types for purposes of ambiguous-operator resolution. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| Run through all candidates and keep those with the most exact matches |
| on input types. Keep all candidates if none have exact matches. |
| If only one candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| Run through all candidates and keep those that accept preferred types (of the |
| input data type's type category) at the most positions where type conversion |
| will be required. |
| Keep all candidates if none accept preferred types. |
| If only one candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| If any input arguments are <type>unknown</type>, check the type |
| categories accepted at those argument positions by the remaining |
| candidates. At each position, select the <type>string</type> category |
| if any |
| candidate accepts that category. (This bias towards string is appropriate |
| since an unknown-type literal looks like a string.) Otherwise, if |
| all the remaining candidates accept the same type category, select that |
| category; otherwise fail because the correct choice cannot be deduced |
| without more clues. Now discard |
| candidates that do not accept the selected type category. Furthermore, |
| if any candidate accepts a preferred type in that category, |
| discard candidates that accept non-preferred types for that argument. |
| Keep all candidates if none survive these tests. |
| If only one candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step id="op-resol-last-unknown" performance="required"> |
| <para> |
| If there are both <type>unknown</type> and known-type arguments, and all |
| the known-type arguments have the same type, assume that the |
| <type>unknown</type> arguments are also of that type, and check which |
| candidates can accept that type at the <type>unknown</type>-argument |
| positions. If exactly one candidate passes this test, use it. |
| Otherwise, fail. |
| </para> |
| </step> |
| </substeps> |
| </step> |
| </procedure> |
| |
| <para> |
| Some examples follow. |
| </para> |
| |
| <example> |
| <title>Square Root Operator Type Resolution</title> |
| |
| <para> |
| There is only one square root operator (prefix <literal>|/</literal>) |
| defined in the standard catalog, and it takes an argument of type |
| <type>double precision</type>. |
| The scanner assigns an initial type of <type>integer</type> to the argument |
| in this query expression: |
| <screen> |
| SELECT |/ 40 AS "square root of 40"; |
| square root of 40 |
| ------------------- |
| 6.324555320336759 |
| (1 row) |
| </screen> |
| |
| So the parser does a type conversion on the operand and the query |
| is equivalent to: |
| |
| <screen> |
| SELECT |/ CAST(40 AS double precision) AS "square root of 40"; |
| </screen> |
| </para> |
| </example> |
| |
| <example> |
| <title>String Concatenation Operator Type Resolution</title> |
| |
| <para> |
| A string-like syntax is used for working with string types and for |
| working with complex extension types. |
| Strings with unspecified type are matched with likely operator candidates. |
| </para> |
| |
| <para> |
| An example with one unspecified argument: |
| <screen> |
| SELECT text 'abc' || 'def' AS "text and unknown"; |
| |
| text and unknown |
| ------------------ |
| abcdef |
| (1 row) |
| </screen> |
| </para> |
| |
| <para> |
| In this case the parser looks to see if there is an operator taking <type>text</type> |
| for both arguments. Since there is, it assumes that the second argument should |
| be interpreted as type <type>text</type>. |
| </para> |
| |
| <para> |
| Here is a concatenation of two values of unspecified types: |
| <screen> |
| SELECT 'abc' || 'def' AS "unspecified"; |
| |
| unspecified |
| ------------- |
| abcdef |
| (1 row) |
| </screen> |
| </para> |
| |
| <para> |
| In this case there is no initial hint for which type to use, since no types |
| are specified in the query. So, the parser looks for all candidate operators |
| and finds that there are candidates accepting both string-category and |
| bit-string-category inputs. Since string category is preferred when available, |
| that category is selected, and then the |
| preferred type for strings, <type>text</type>, is used as the specific |
| type to resolve the unknown-type literals as. |
| </para> |
| </example> |
| |
| <example> |
| <title>Absolute-Value and Negation Operator Type Resolution</title> |
| |
| <para> |
| The <productname>PostgreSQL</productname> operator catalog has several |
| entries for the prefix operator <literal>@</literal>, all of which implement |
| absolute-value operations for various numeric data types. One of these |
| entries is for type <type>float8</type>, which is the preferred type in |
| the numeric category. Therefore, <productname>PostgreSQL</productname> |
| will use that entry when faced with an <type>unknown</type> input: |
| <screen> |
| SELECT @ '-4.5' AS "abs"; |
| abs |
| ----- |
| 4.5 |
| (1 row) |
| </screen> |
| Here the system has implicitly resolved the unknown-type literal as type |
| <type>float8</type> before applying the chosen operator. We can verify that |
| <type>float8</type> and not some other type was used: |
| <screen> |
| SELECT @ '-4.5e500' AS "abs"; |
| |
| ERROR: "-4.5e500" is out of range for type double precision |
| </screen> |
| </para> |
| |
| <para> |
| On the other hand, the prefix operator <literal>~</literal> (bitwise negation) |
| is defined only for integer data types, not for <type>float8</type>. So, if we |
| try a similar case with <literal>~</literal>, we get: |
| <screen> |
| SELECT ~ '20' AS "negation"; |
| |
| ERROR: operator is not unique: ~ "unknown" |
| HINT: Could not choose a best candidate operator. You might need to add |
| explicit type casts. |
| </screen> |
| This happens because the system cannot decide which of the several |
| possible <literal>~</literal> operators should be preferred. We can help |
| it out with an explicit cast: |
| <screen> |
| SELECT ~ CAST('20' AS int8) AS "negation"; |
| |
| negation |
| ---------- |
| -21 |
| (1 row) |
| </screen> |
| </para> |
| </example> |
| |
| <example> |
| <title>Array Inclusion Operator Type Resolution</title> |
| |
| <para> |
| Here is another example of resolving an operator with one known and one |
| unknown input: |
| <screen> |
| SELECT array[1,2] <@ '{1,2,3}' as "is subset"; |
| |
| is subset |
| ----------- |
| t |
| (1 row) |
| </screen> |
| The <productname>PostgreSQL</productname> operator catalog has several |
| entries for the infix operator <literal><@</literal>, but the only two that |
| could possibly accept an integer array on the left-hand side are |
| array inclusion (<type>anyarray</type> <literal><@</literal> <type>anyarray</type>) |
| and range inclusion (<type>anyelement</type> <literal><@</literal> <type>anyrange</type>). |
| Since none of these polymorphic pseudo-types (see <xref |
| linkend="datatype-pseudo"/>) are considered preferred, the parser cannot |
| resolve the ambiguity on that basis. |
| However, <xref linkend="op-resol-last-unknown"/> tells |
| it to assume that the unknown-type literal is of the same type as the other |
| input, that is, integer array. Now only one of the two operators can match, |
| so array inclusion is selected. (Had range inclusion been selected, we would |
| have gotten an error, because the string does not have the right format to be |
| a range literal.) |
| </para> |
| </example> |
| |
| <example> |
| <title>Custom Operator on a Domain Type</title> |
| |
| <para> |
| Users sometimes try to declare operators applying just to a domain type. |
| This is possible but is not nearly as useful as it might seem, because the |
| operator resolution rules are designed to select operators applying to the |
| domain's base type. As an example consider |
| <screen> |
| CREATE DOMAIN mytext AS text CHECK(...); |
| CREATE FUNCTION mytext_eq_text (mytext, text) RETURNS boolean AS ...; |
| CREATE OPERATOR = (procedure=mytext_eq_text, leftarg=mytext, rightarg=text); |
| CREATE TABLE mytable (val mytext); |
| |
| SELECT * FROM mytable WHERE val = 'foo'; |
| </screen> |
| This query will not use the custom operator. The parser will first see if |
| there is a <type>mytext</type> <literal>=</literal> <type>mytext</type> operator |
| (<xref linkend="op-resol-exact-unknown"/>), which there is not; |
| then it will consider the domain's base type <type>text</type>, and see if |
| there is a <type>text</type> <literal>=</literal> <type>text</type> operator |
| (<xref linkend="op-resol-exact-domain"/>), which there is; |
| so it resolves the <type>unknown</type>-type literal as <type>text</type> and |
| uses the <type>text</type> <literal>=</literal> <type>text</type> operator. |
| The only way to get the custom operator to be used is to explicitly cast |
| the literal: |
| <screen> |
| SELECT * FROM mytable WHERE val = text 'foo'; |
| </screen> |
| so that the <type>mytext</type> <literal>=</literal> <type>text</type> operator is found |
| immediately according to the exact-match rule. If the best-match rules |
| are reached, they actively discriminate against operators on domain types. |
| If they did not, such an operator would create too many ambiguous-operator |
| failures, because the casting rules always consider a domain as castable |
| to or from its base type, and so the domain operator would be considered |
| usable in all the same cases as a similarly-named operator on the base type. |
| </para> |
| </example> |
| |
| </sect1> |
| |
| <sect1 id="typeconv-func"> |
| <title>Functions</title> |
| |
| <indexterm zone="typeconv-func"> |
| <primary>function</primary> |
| <secondary>type resolution in an invocation</secondary> |
| </indexterm> |
| |
| <para> |
| The specific function that is referenced by a function call |
| is determined using the following procedure. |
| </para> |
| |
| <procedure> |
| <title>Function Type Resolution</title> |
| |
| <step performance="required"> |
| <para> |
| Select the functions to be considered from the |
| <classname>pg_proc</classname> system catalog. If a non-schema-qualified |
| function name was used, the functions |
| considered are those with the matching name and argument count that are |
| visible in the current search path (see <xref linkend="ddl-schemas-path"/>). |
| If a qualified function name was given, only functions in the specified |
| schema are considered. |
| </para> |
| |
| <substeps> |
| <step performance="optional"> |
| <para> |
| If the search path finds multiple functions of identical argument types, |
| only the one appearing earliest in the path is considered. Functions of |
| different argument types are considered on an equal footing regardless of |
| search path position. |
| </para> |
| </step> |
| <step performance="optional"> |
| <para> |
| If a function is declared with a <literal>VARIADIC</literal> array parameter, and |
| the call does not use the <literal>VARIADIC</literal> keyword, then the function |
| is treated as if the array parameter were replaced by one or more occurrences |
| of its element type, as needed to match the call. After such expansion the |
| function might have effective argument types identical to some non-variadic |
| function. In that case the function appearing earlier in the search path is |
| used, or if the two functions are in the same schema, the non-variadic one is |
| preferred. |
| </para> |
| <para> |
| This creates a security hazard when calling, via qualified name |
| <footnote id="func-qualified-security"> |
| <!-- If you edit this, consider editing op-qualified-security. --> |
| <para> |
| The hazard does not arise with a non-schema-qualified name, because a |
| search path containing schemas that permit untrusted users to create |
| objects is not a <link linkend="ddl-schemas-patterns">secure schema usage |
| pattern</link>. |
| </para> |
| </footnote>, |
| a variadic function found in a schema that permits untrusted users to create |
| objects. A malicious user can take control and execute arbitrary SQL |
| functions as though you executed them. Substitute a call bearing |
| the <literal>VARIADIC</literal> keyword, which bypasses this hazard. Calls |
| populating <literal>VARIADIC "any"</literal> parameters often have no |
| equivalent formulation containing the <literal>VARIADIC</literal> keyword. To |
| issue those calls safely, the function's schema must permit only trusted users |
| to create objects. |
| </para> |
| </step> |
| <step performance="optional"> |
| <para> |
| Functions that have default values for parameters are considered to match any |
| call that omits zero or more of the defaultable parameter positions. If more |
| than one such function matches a call, the one appearing earliest in the |
| search path is used. If there are two or more such functions in the same |
| schema with identical parameter types in the non-defaulted positions (which is |
| possible if they have different sets of defaultable parameters), the system |
| will not be able to determine which to prefer, and so an <quote>ambiguous |
| function call</quote> error will result if no better match to the call can be |
| found. |
| </para> |
| <para> |
| This creates an availability hazard when calling, via qualified |
| name<footnoteref linkend="func-qualified-security"/>, any function found in a |
| schema that permits untrusted users to create objects. A malicious user can |
| create a function with the name of an existing function, replicating that |
| function's parameters and appending novel parameters having default values. |
| This precludes new calls to the original function. To forestall this hazard, |
| place functions in schemas that permit only trusted users to create objects. |
| </para> |
| </step> |
| </substeps> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| Check for a function accepting exactly the input argument types. |
| If one exists (there can be only one exact match in the set of |
| functions considered), use it. Lack of an exact match creates a security |
| hazard when calling, via qualified |
| name<footnoteref linkend="func-qualified-security"/>, a function found in a |
| schema that permits untrusted users to create objects. In such situations, |
| cast arguments to force an exact match. (Cases involving <type>unknown</type> |
| will never find a match at this step.) |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| If no exact match is found, see if the function call appears |
| to be a special type conversion request. This happens if the function call |
| has just one argument and the function name is the same as the (internal) |
| name of some data type. Furthermore, the function argument must be either |
| an unknown-type literal, or a type that is binary-coercible to the named |
| data type, or a type that could be converted to the named data type by |
| applying that type's I/O functions (that is, the conversion is either to or |
| from one of the standard string types). When these conditions are met, |
| the function call is treated as a form of <literal>CAST</literal> specification. |
| <footnote> |
| <para> |
| The reason for this step is to support function-style cast specifications |
| in cases where there is not an actual cast function. If there is a cast |
| function, it is conventionally named after its output type, and so there |
| is no need to have a special case. See |
| <xref linkend="sql-createcast"/> |
| for additional commentary. |
| </para> |
| </footnote> |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| Look for the best match. |
| </para> |
| <substeps> |
| <step performance="required"> |
| <para> |
| Discard candidate functions for which the input types do not match |
| and cannot be converted (using an implicit conversion) to match. |
| <type>unknown</type> literals are |
| assumed to be convertible to anything for this purpose. If only one |
| candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| If any input argument is of a domain type, treat it as being of the |
| domain's base type for all subsequent steps. This ensures that domains |
| act like their base types for purposes of ambiguous-function resolution. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| Run through all candidates and keep those with the most exact matches |
| on input types. Keep all candidates if none have exact matches. |
| If only one candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| Run through all candidates and keep those that accept preferred types (of the |
| input data type's type category) at the most positions where type conversion |
| will be required. |
| Keep all candidates if none accept preferred types. |
| If only one candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| If any input arguments are <type>unknown</type>, check the type categories |
| accepted |
| at those argument positions by the remaining candidates. At each position, |
| select the <type>string</type> category if any candidate accepts that category. |
| (This bias towards string |
| is appropriate since an unknown-type literal looks like a string.) |
| Otherwise, if all the remaining candidates accept the same type category, |
| select that category; otherwise fail because |
| the correct choice cannot be deduced without more clues. |
| Now discard candidates that do not accept the selected type category. |
| Furthermore, if any candidate accepts a preferred type in that category, |
| discard candidates that accept non-preferred types for that argument. |
| Keep all candidates if none survive these tests. |
| If only one candidate remains, use it; else continue to the next step. |
| </para> |
| </step> |
| <step performance="required"> |
| <para> |
| If there are both <type>unknown</type> and known-type arguments, and all |
| the known-type arguments have the same type, assume that the |
| <type>unknown</type> arguments are also of that type, and check which |
| candidates can accept that type at the <type>unknown</type>-argument |
| positions. If exactly one candidate passes this test, use it. |
| Otherwise, fail. |
| </para> |
| </step> |
| </substeps> |
| </step> |
| </procedure> |
| |
| <para> |
| Note that the <quote>best match</quote> rules are identical for operator and |
| function type resolution. |
| Some examples follow. |
| </para> |
| |
| <example> |
| <title>Rounding Function Argument Type Resolution</title> |
| |
| <para> |
| There is only one <function>round</function> function that takes two |
| arguments; it takes a first argument of type <type>numeric</type> and |
| a second argument of type <type>integer</type>. |
| So the following query automatically converts |
| the first argument of type <type>integer</type> to |
| <type>numeric</type>: |
| |
| <screen> |
| SELECT round(4, 4); |
| |
| round |
| -------- |
| 4.0000 |
| (1 row) |
| </screen> |
| |
| That query is actually transformed by the parser to: |
| <screen> |
| SELECT round(CAST (4 AS numeric), 4); |
| </screen> |
| </para> |
| |
| <para> |
| Since numeric constants with decimal points are initially assigned the |
| type <type>numeric</type>, the following query will require no type |
| conversion and therefore might be slightly more efficient: |
| <screen> |
| SELECT round(4.0, 4); |
| </screen> |
| </para> |
| </example> |
| |
| <example> |
| <title>Variadic Function Resolution</title> |
| |
| <para> |
| <screen> |
| CREATE FUNCTION public.variadic_example(VARIADIC numeric[]) RETURNS int |
| LANGUAGE sql AS 'SELECT 1'; |
| CREATE FUNCTION |
| </screen> |
| |
| This function accepts, but does not require, the VARIADIC keyword. It |
| tolerates both integer and numeric arguments: |
| |
| <screen> |
| SELECT public.variadic_example(0), |
| public.variadic_example(0.0), |
| public.variadic_example(VARIADIC array[0.0]); |
| variadic_example | variadic_example | variadic_example |
| ------------------+------------------+------------------ |
| 1 | 1 | 1 |
| (1 row) |
| </screen> |
| |
| However, the first and second calls will prefer more-specific functions, if |
| available: |
| |
| <screen> |
| CREATE FUNCTION public.variadic_example(numeric) RETURNS int |
| LANGUAGE sql AS 'SELECT 2'; |
| CREATE FUNCTION |
| |
| CREATE FUNCTION public.variadic_example(int) RETURNS int |
| LANGUAGE sql AS 'SELECT 3'; |
| CREATE FUNCTION |
| |
| SELECT public.variadic_example(0), |
| public.variadic_example(0.0), |
| public.variadic_example(VARIADIC array[0.0]); |
| variadic_example | variadic_example | variadic_example |
| ------------------+------------------+------------------ |
| 3 | 2 | 1 |
| (1 row) |
| </screen> |
| |
| Given the default configuration and only the first function existing, the |
| first and second calls are insecure. Any user could intercept them by |
| creating the second or third function. By matching the argument type exactly |
| and using the <literal>VARIADIC</literal> keyword, the third call is secure. |
| </para> |
| </example> |
| |
| <example> |
| <title>Substring Function Type Resolution</title> |
| |
| <para> |
| There are several <function>substr</function> functions, one of which |
| takes types <type>text</type> and <type>integer</type>. If called |
| with a string constant of unspecified type, the system chooses the |
| candidate function that accepts an argument of the preferred category |
| <literal>string</literal> (namely of type <type>text</type>). |
| |
| <screen> |
| SELECT substr('1234', 3); |
| |
| substr |
| -------- |
| 34 |
| (1 row) |
| </screen> |
| </para> |
| |
| <para> |
| If the string is declared to be of type <type>varchar</type>, as might be the case |
| if it comes from a table, then the parser will try to convert it to become <type>text</type>: |
| <screen> |
| SELECT substr(varchar '1234', 3); |
| |
| substr |
| -------- |
| 34 |
| (1 row) |
| </screen> |
| |
| This is transformed by the parser to effectively become: |
| <screen> |
| SELECT substr(CAST (varchar '1234' AS text), 3); |
| </screen> |
| </para> |
| <para> |
| <note> |
| <para> |
| The parser learns from the <structname>pg_cast</structname> catalog that |
| <type>text</type> and <type>varchar</type> |
| are binary-compatible, meaning that one can be passed to a function that |
| accepts the other without doing any physical conversion. Therefore, no |
| type conversion call is really inserted in this case. |
| </para> |
| </note> |
| </para> |
| |
| <para> |
| And, if the function is called with an argument of type <type>integer</type>, |
| the parser will try to convert that to <type>text</type>: |
| <screen> |
| SELECT substr(1234, 3); |
| ERROR: function substr(integer, integer) does not exist |
| HINT: No function matches the given name and argument types. You might need |
| to add explicit type casts. |
| </screen> |
| |
| This does not work because <type>integer</type> does not have an implicit cast |
| to <type>text</type>. An explicit cast will work, however: |
| <screen> |
| SELECT substr(CAST (1234 AS text), 3); |
| |
| substr |
| -------- |
| 34 |
| (1 row) |
| </screen> |
| </para> |
| </example> |
| |
| </sect1> |
| |
| <sect1 id="typeconv-query"> |
| <title>Value Storage</title> |
| |
| <para> |
| Values to be inserted into a table are converted to the destination |
| column's data type according to the |
| following steps. |
| </para> |
| |
| <procedure> |
| <title>Value Storage Type Conversion</title> |
| |
| <step performance="required"> |
| <para> |
| Check for an exact match with the target. |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| Otherwise, try to convert the expression to the target type. This is possible |
| if an <firstterm>assignment cast</firstterm> between the two types is registered in the |
| <structname>pg_cast</structname> catalog (see <xref linkend="sql-createcast"/>). |
| Alternatively, if the expression is an unknown-type literal, the contents of |
| the literal string will be fed to the input conversion routine for the target |
| type. |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| Check to see if there is a sizing cast for the target type. A sizing |
| cast is a cast from that type to itself. If one is found in the |
| <structname>pg_cast</structname> catalog, apply it to the expression before storing |
| into the destination column. The implementation function for such a cast |
| always takes an extra parameter of type <type>integer</type>, which receives |
| the destination column's <structfield>atttypmod</structfield> value (typically its |
| declared length, although the interpretation of <structfield>atttypmod</structfield> |
| varies for different data types), and it may take a third <type>boolean</type> |
| parameter that says whether the cast is explicit or implicit. The cast |
| function |
| is responsible for applying any length-dependent semantics such as size |
| checking or truncation. |
| </para> |
| </step> |
| |
| </procedure> |
| |
| <example> |
| <title><type>character</type> Storage Type Conversion</title> |
| |
| <para> |
| For a target column declared as <type>character(20)</type> the following |
| statement shows that the stored value is sized correctly: |
| |
| <screen> |
| CREATE TABLE vv (v character(20)); |
| INSERT INTO vv SELECT 'abc' || 'def'; |
| SELECT v, octet_length(v) FROM vv; |
| |
| v | octet_length |
| ----------------------+-------------- |
| abcdef | 20 |
| (1 row) |
| </screen> |
| </para> |
| |
| <para> |
| What has really happened here is that the two unknown literals are resolved |
| to <type>text</type> by default, allowing the <literal>||</literal> operator |
| to be resolved as <type>text</type> concatenation. Then the <type>text</type> |
| result of the operator is converted to <type>bpchar</type> (<quote>blank-padded |
| char</quote>, the internal name of the <type>character</type> data type) to match the target |
| column type. (Since the conversion from <type>text</type> to |
| <type>bpchar</type> is binary-coercible, this conversion does |
| not insert any real function call.) Finally, the sizing function |
| <literal>bpchar(bpchar, integer, boolean)</literal> is found in the system catalog |
| and applied to the operator's result and the stored column length. This |
| type-specific function performs the required length check and addition of |
| padding spaces. |
| </para> |
| </example> |
| </sect1> |
| |
| <sect1 id="typeconv-union-case"> |
| <title><literal>UNION</literal>, <literal>CASE</literal>, and Related Constructs</title> |
| |
| <indexterm zone="typeconv-union-case"> |
| <primary>UNION</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <indexterm zone="typeconv-union-case"> |
| <primary>CASE</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <indexterm zone="typeconv-union-case"> |
| <primary>ARRAY</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <indexterm zone="typeconv-union-case"> |
| <primary>VALUES</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <indexterm zone="typeconv-union-case"> |
| <primary>GREATEST</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <indexterm zone="typeconv-union-case"> |
| <primary>LEAST</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <para> |
| SQL <literal>UNION</literal> constructs must match up possibly dissimilar |
| types to become a single result set. The resolution algorithm is |
| applied separately to each output column of a union query. The |
| <literal>INTERSECT</literal> and <literal>EXCEPT</literal> constructs resolve |
| dissimilar types in the same way as <literal>UNION</literal>. |
| Some other constructs, including |
| <literal>CASE</literal>, <literal>ARRAY</literal>, <literal>VALUES</literal>, |
| and the <function>GREATEST</function> and <function>LEAST</function> |
| functions, use the identical |
| algorithm to match up their component expressions and select a result |
| data type. |
| </para> |
| |
| <procedure> |
| <title>Type Resolution for <literal>UNION</literal>, <literal>CASE</literal>, |
| and Related Constructs</title> |
| |
| <step performance="required"> |
| <para> |
| If all inputs are of the same type, and it is not <type>unknown</type>, |
| resolve as that type. |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| If any input is of a domain type, treat it as being of the |
| domain's base type for all subsequent steps. |
| <footnote> |
| <para> |
| Somewhat like the treatment of domain inputs for operators and |
| functions, this behavior allows a domain type to be preserved through |
| a <literal>UNION</literal> or similar construct, so long as the user is |
| careful to ensure that all inputs are implicitly or explicitly of that |
| exact type. Otherwise the domain's base type will be used. |
| </para> |
| </footnote> |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| If all inputs are of type <type>unknown</type>, resolve as type |
| <type>text</type> (the preferred type of the string category). |
| Otherwise, <type>unknown</type> inputs are ignored for the purposes |
| of the remaining rules. |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| If the non-unknown inputs are not all of the same type category, fail. |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| Select the first non-unknown input type as the candidate type, |
| then consider each other non-unknown input type, left to right. |
| <footnote> |
| <para> |
| For historical reasons, <literal>CASE</literal> treats |
| its <literal>ELSE</literal> clause (if any) as the <quote>first</quote> |
| input, with the <literal>THEN</literal> clauses(s) considered after |
| that. In all other cases, <quote>left to right</quote> means the order |
| in which the expressions appear in the query text. |
| </para> |
| </footnote> |
| If the candidate type can be implicitly converted to the other type, |
| but not vice-versa, select the other type as the new candidate type. |
| Then continue considering the remaining inputs. If, at any stage of this |
| process, a preferred type is selected, stop considering additional |
| inputs. |
| </para> |
| </step> |
| |
| <step performance="required"> |
| <para> |
| Convert all inputs to the final candidate type. Fail if there is not an |
| implicit conversion from a given input type to the candidate type. |
| </para> |
| </step> |
| </procedure> |
| |
| <para> |
| Some examples follow. |
| </para> |
| |
| <example> |
| <title>Type Resolution with Underspecified Types in a Union</title> |
| |
| <para> |
| <screen> |
| SELECT text 'a' AS "text" UNION SELECT 'b'; |
| |
| text |
| ------ |
| a |
| b |
| (2 rows) |
| </screen> |
| Here, the unknown-type literal <literal>'b'</literal> will be resolved to type <type>text</type>. |
| </para> |
| </example> |
| |
| <example> |
| <title>Type Resolution in a Simple Union</title> |
| |
| <para> |
| <screen> |
| SELECT 1.2 AS "numeric" UNION SELECT 1; |
| |
| numeric |
| --------- |
| 1 |
| 1.2 |
| (2 rows) |
| </screen> |
| The literal <literal>1.2</literal> is of type <type>numeric</type>, |
| and the <type>integer</type> value <literal>1</literal> can be cast implicitly to |
| <type>numeric</type>, so that type is used. |
| </para> |
| </example> |
| |
| <example> |
| <title>Type Resolution in a Transposed Union</title> |
| |
| <para> |
| <screen> |
| SELECT 1 AS "real" UNION SELECT CAST('2.2' AS REAL); |
| |
| real |
| ------ |
| 1 |
| 2.2 |
| (2 rows) |
| </screen> |
| Here, since type <type>real</type> cannot be implicitly cast to <type>integer</type>, |
| but <type>integer</type> can be implicitly cast to <type>real</type>, the union |
| result type is resolved as <type>real</type>. |
| </para> |
| </example> |
| |
| <example> |
| <title>Type Resolution in a Nested Union</title> |
| |
| <para> |
| <screen> |
| SELECT NULL UNION SELECT NULL UNION SELECT 1; |
| |
| ERROR: UNION types text and integer cannot be matched |
| </screen> |
| This failure occurs because <productname>PostgreSQL</productname> treats |
| multiple <literal>UNION</literal>s as a nest of pairwise operations; |
| that is, this input is the same as |
| <screen> |
| (SELECT NULL UNION SELECT NULL) UNION SELECT 1; |
| </screen> |
| The inner <literal>UNION</literal> is resolved as emitting |
| type <type>text</type>, according to the rules given above. Then the |
| outer <literal>UNION</literal> has inputs of types <type>text</type> |
| and <type>integer</type>, leading to the observed error. The problem |
| can be fixed by ensuring that the leftmost <literal>UNION</literal> |
| has at least one input of the desired result type. |
| </para> |
| |
| <para> |
| <literal>INTERSECT</literal> and <literal>EXCEPT</literal> operations are |
| likewise resolved pairwise. However, the other constructs described in this |
| section consider all of their inputs in one resolution step. |
| </para> |
| </example> |
| </sect1> |
| |
| <sect1 id="typeconv-select"> |
| <title><literal>SELECT</literal> Output Columns</title> |
| |
| <indexterm zone="typeconv-select"> |
| <primary>SELECT</primary> |
| <secondary>determination of result type</secondary> |
| </indexterm> |
| |
| <para> |
| The rules given in the preceding sections will result in assignment |
| of non-<type>unknown</type> data types to all expressions in an SQL query, |
| except for unspecified-type literals that appear as simple output |
| columns of a <command>SELECT</command> command. For example, in |
| |
| <screen> |
| SELECT 'Hello World'; |
| </screen> |
| |
| there is nothing to identify what type the string literal should be |
| taken as. In this situation <productname>PostgreSQL</productname> will fall back |
| to resolving the literal's type as <type>text</type>. |
| </para> |
| |
| <para> |
| When the <command>SELECT</command> is one arm of a <literal>UNION</literal> |
| (or <literal>INTERSECT</literal> or <literal>EXCEPT</literal>) construct, or when it |
| appears within <command>INSERT ... SELECT</command>, this rule is not applied |
| since rules given in preceding sections take precedence. The type of an |
| unspecified-type literal can be taken from the other <literal>UNION</literal> arm |
| in the first case, or from the destination column in the second case. |
| </para> |
| |
| <para> |
| <literal>RETURNING</literal> lists are treated the same as <command>SELECT</command> |
| output lists for this purpose. |
| </para> |
| |
| <note> |
| <para> |
| Prior to <productname>PostgreSQL</productname> 10, this rule did not exist, and |
| unspecified-type literals in a <command>SELECT</command> output list were |
| left as type <type>unknown</type>. That had assorted bad consequences, |
| so it's been changed. |
| </para> |
| </note> |
| |
| </sect1> |
| </chapter> |