| <!-- doc/src/sgml/spgist.sgml --> |
| |
| <chapter id="spgist"> |
| <title>SP-GiST Indexes</title> |
| |
| <indexterm> |
| <primary>index</primary> |
| <secondary>SP-GiST</secondary> |
| </indexterm> |
| |
| <sect1 id="spgist-intro"> |
| <title>Introduction</title> |
| |
| <para> |
| <acronym>SP-GiST</acronym> is an abbreviation for space-partitioned |
| <acronym>GiST</acronym>. <acronym>SP-GiST</acronym> supports partitioned |
| search trees, which facilitate development of a wide range of different |
| non-balanced data structures, such as quad-trees, k-d trees, and radix |
| trees (tries). The common feature of these structures is that they |
| repeatedly divide the search space into partitions that need not be |
| of equal size. Searches that are well matched to the partitioning rule |
| can be very fast. |
| </para> |
| |
| <para> |
| These popular data structures were originally developed for in-memory |
| usage. In main memory, they are usually designed as a set of dynamically |
| allocated nodes linked by pointers. This is not suitable for direct |
| storing on disk, since these chains of pointers can be rather long which |
| would require too many disk accesses. In contrast, disk-based data |
| structures should have a high fanout to minimize I/O. The challenge |
| addressed by <acronym>SP-GiST</acronym> is to map search tree nodes to |
| disk pages in such a way that a search need access only a few disk pages, |
| even if it traverses many nodes. |
| </para> |
| |
| <para> |
| Like <acronym>GiST</acronym>, <acronym>SP-GiST</acronym> is meant to allow |
| the development of custom data types with the appropriate access methods, |
| by an expert in the domain of the data type, rather than a database expert. |
| </para> |
| |
| <para> |
| Some of the information here is derived from Purdue University's |
| SP-GiST Indexing Project |
| <ulink url="https://www.cs.purdue.edu/spgist/">web site</ulink>. |
| The <acronym>SP-GiST</acronym> implementation in |
| <productname>PostgreSQL</productname> is primarily maintained by Teodor |
| Sigaev and Oleg Bartunov, and there is more information on their |
| <!-- URL will be changed --> |
| <ulink url="http://www.sai.msu.su/~megera/wiki/spgist_dev">web site</ulink>. |
| </para> |
| |
| </sect1> |
| |
| <sect1 id="spgist-builtin-opclasses"> |
| <title>Built-in Operator Classes</title> |
| |
| <para> |
| The core <productname>PostgreSQL</productname> distribution |
| includes the <acronym>SP-GiST</acronym> operator classes shown in |
| <xref linkend="spgist-builtin-opclasses-table"/>. |
| </para> |
| |
| <table id="spgist-builtin-opclasses-table"> |
| <title>Built-in <acronym>SP-GiST</acronym> Operator Classes</title> |
| <tgroup cols="3"> |
| <thead> |
| <row> |
| <entry>Name</entry> |
| <entry>Indexable Operators</entry> |
| <entry>Ordering Operators</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry valign="middle" morerows="11"><literal>box_ops</literal></entry> |
| <entry><literal><< (box,box)</literal></entry> |
| <entry valign="middle" morerows="11"><literal><-> (box,point)</literal></entry> |
| </row> |
| <row><entry><literal>&< (box,box)</literal></entry></row> |
| <row><entry><literal>&> (box,box)</literal></entry></row> |
| <row><entry><literal>>> (box,box)</literal></entry></row> |
| <row><entry><literal><@ (box,box)</literal></entry></row> |
| <row><entry><literal>@> (box,box)</literal></entry></row> |
| <row><entry><literal>~= (box,box)</literal></entry></row> |
| <row><entry><literal>&& (box,box)</literal></entry></row> |
| <row><entry><literal><<| (box,box)</literal></entry></row> |
| <row><entry><literal>&<| (box,box)</literal></entry></row> |
| <row><entry><literal>|&> (box,box)</literal></entry></row> |
| <row><entry><literal>|>> (box,box)</literal></entry></row> |
| |
| <row> |
| <entry valign="middle" morerows="5"><literal>kd_point_ops</literal></entry> |
| <entry><literal>|>> (point,point)</literal></entry> |
| <entry valign="middle" morerows="5"><literal><-> (point,point)</literal></entry> |
| </row> |
| <row><entry><literal><< (point,point)</literal></entry></row> |
| <row><entry><literal>>> (point,point)</literal></entry></row> |
| <row><entry><literal><<| (point,point)</literal></entry></row> |
| <row><entry><literal>~= (point,point)</literal></entry></row> |
| <row><entry><literal><@ (point,box)</literal></entry></row> |
| |
| <row> |
| <entry valign="middle" morerows="10"><literal>network_ops</literal></entry> |
| <entry><literal><< (inet,inet)</literal></entry> |
| <entry valign="middle" morerows="10"></entry> |
| </row> |
| <row><entry><literal><<= (inet,inet)</literal></entry></row> |
| <row><entry><literal>>> (inet,inet)</literal></entry></row> |
| <row><entry><literal>>>= (inet,inet)</literal></entry></row> |
| <row><entry><literal>= (inet,inet)</literal></entry></row> |
| <row><entry><literal><> (inet,inet)</literal></entry></row> |
| <row><entry><literal>< (inet,inet)</literal></entry></row> |
| <row><entry><literal><= (inet,inet)</literal></entry></row> |
| <row><entry><literal>> (inet,inet)</literal></entry></row> |
| <row><entry><literal>>= (inet,inet)</literal></entry></row> |
| <row><entry><literal>&& (inet,inet)</literal></entry></row> |
| |
| <row> |
| <entry valign="middle" morerows="11"><literal>poly_ops</literal></entry> |
| <entry><literal><< (polygon,polygon)</literal></entry> |
| <entry valign="middle" morerows="11"><literal><-> (polygon,point)</literal></entry> |
| </row> |
| <row><entry><literal>&< (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>&> (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>>> (polygon,polygon)</literal></entry></row> |
| <row><entry><literal><@ (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>@> (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>~= (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>&& (polygon,polygon)</literal></entry></row> |
| <row><entry><literal><<| (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>&<| (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>|>> (polygon,polygon)</literal></entry></row> |
| <row><entry><literal>|&> (polygon,polygon)</literal></entry></row> |
| |
| <row> |
| <entry valign="middle" morerows="5"><literal>quad_point_ops</literal></entry> |
| <entry><literal>|>> (point,point)</literal></entry> |
| <entry valign="middle" morerows="5"><literal><-> (point,point)</literal></entry> |
| </row> |
| <row><entry><literal><< (point,point)</literal></entry></row> |
| <row><entry><literal>>> (point,point)</literal></entry></row> |
| <row><entry><literal><<| (point,point)</literal></entry></row> |
| <row><entry><literal>~= (point,point)</literal></entry></row> |
| <row><entry><literal><@ (point,box)</literal></entry></row> |
| |
| <row> |
| <entry valign="middle" morerows="9"><literal>range_ops</literal></entry> |
| <entry><literal>= (anyrange,anyrange)</literal></entry> |
| <entry valign="middle" morerows="9"></entry> |
| </row> |
| <row><entry><literal>&& (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal>@> (anyrange,anyelement)</literal></entry></row> |
| <row><entry><literal>@> (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal><@ (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal><< (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal>>> (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal>&< (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal>&> (anyrange,anyrange)</literal></entry></row> |
| <row><entry><literal>-|- (anyrange,anyrange)</literal></entry></row> |
| |
| <row> |
| <entry valign="middle" morerows="9"><literal>text_ops</literal></entry> |
| <entry><literal>= (text,text)</literal></entry> |
| <entry valign="middle" morerows="9"></entry> |
| </row> |
| <row><entry><literal>< (text,text)</literal></entry></row> |
| <row><entry><literal><= (text,text)</literal></entry></row> |
| <row><entry><literal>> (text,text)</literal></entry></row> |
| <row><entry><literal>>= (text,text)</literal></entry></row> |
| <row><entry><literal>~<~ (text,text)</literal></entry></row> |
| <row><entry><literal>~<=~ (text,text)</literal></entry></row> |
| <row><entry><literal>~>=~ (text,text)</literal></entry></row> |
| <row><entry><literal>~>~ (text,text)</literal></entry></row> |
| <row><entry><literal>^@ (text,text)</literal></entry></row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <para> |
| Of the two operator classes for type <type>point</type>, |
| <literal>quad_point_ops</literal> is the default. <literal>kd_point_ops</literal> |
| supports the same operators but uses a different index data structure that |
| may offer better performance in some applications. |
| </para> |
| <para> |
| The <literal>quad_point_ops</literal>, <literal>kd_point_ops</literal> and |
| <literal>poly_ops</literal> operator classes support the <literal><-></literal> |
| ordering operator, which enables the k-nearest neighbor (<literal>k-NN</literal>) |
| search over indexed point or polygon data sets. |
| </para> |
| |
| </sect1> |
| |
| <sect1 id="spgist-extensibility"> |
| <title>Extensibility</title> |
| |
| <para> |
| <acronym>SP-GiST</acronym> offers an interface with a high level of |
| abstraction, requiring the access method developer to implement only |
| methods specific to a given data type. The <acronym>SP-GiST</acronym> core |
| is responsible for efficient disk mapping and searching the tree structure. |
| It also takes care of concurrency and logging considerations. |
| </para> |
| |
| <para> |
| Leaf tuples of an <acronym>SP-GiST</acronym> tree usually contain values |
| of the same data type as the indexed column, although it is also possible |
| for them to contain lossy representations of the indexed column. |
| Leaf tuples stored at the root level will directly represent |
| the original indexed data value, but leaf tuples at lower |
| levels might contain only a partial value, such as a suffix. |
| In that case the operator class support functions must be able to |
| reconstruct the original value using information accumulated from the |
| inner tuples that are passed through to reach the leaf level. |
| </para> |
| |
| <para> |
| When an <acronym>SP-GiST</acronym> index is created with |
| <literal>INCLUDE</literal> columns, the values of those columns are also |
| stored in leaf tuples. The <literal>INCLUDE</literal> columns are of no |
| concern to the <acronym>SP-GiST</acronym> operator class, so they are |
| not discussed further here. |
| </para> |
| |
| <para> |
| Inner tuples are more complex, since they are branching points in the |
| search tree. Each inner tuple contains a set of one or more |
| <firstterm>nodes</firstterm>, which represent groups of similar leaf values. |
| A node contains a downlink that leads either to another, lower-level inner |
| tuple, or to a short list of leaf tuples that all lie on the same index page. |
| Each node normally has a <firstterm>label</firstterm> that describes it; for example, |
| in a radix tree the node label could be the next character of the string |
| value. (Alternatively, an operator class can omit the node labels, if it |
| works with a fixed set of nodes for all inner tuples; |
| see <xref linkend="spgist-null-labels"/>.) |
| Optionally, an inner tuple can have a <firstterm>prefix</firstterm> value |
| that describes all its members. In a radix tree this could be the common |
| prefix of the represented strings. The prefix value is not necessarily |
| really a prefix, but can be any data needed by the operator class; |
| for example, in a quad-tree it can store the central point that the four |
| quadrants are measured with respect to. A quad-tree inner tuple would |
| then also contain four nodes corresponding to the quadrants around this |
| central point. |
| </para> |
| |
| <para> |
| Some tree algorithms require knowledge of level (or depth) of the current |
| tuple, so the <acronym>SP-GiST</acronym> core provides the possibility for |
| operator classes to manage level counting while descending the tree. |
| There is also support for incrementally reconstructing the represented |
| value when that is needed, and for passing down additional data (called |
| <firstterm>traverse values</firstterm>) during a tree descent. |
| </para> |
| |
| <note> |
| <para> |
| The <acronym>SP-GiST</acronym> core code takes care of null entries. |
| Although <acronym>SP-GiST</acronym> indexes do store entries for nulls |
| in indexed columns, this is hidden from the index operator class code: |
| no null index entries or search conditions will ever be passed to the |
| operator class methods. (It is assumed that <acronym>SP-GiST</acronym> |
| operators are strict and so cannot succeed for null values.) Null values |
| are therefore not discussed further here. |
| </para> |
| </note> |
| |
| <para> |
| There are five user-defined methods that an index operator class for |
| <acronym>SP-GiST</acronym> must provide, and two are optional. All five |
| mandatory methods follow the convention of accepting two <type>internal</type> |
| arguments, the first of which is a pointer to a C struct containing input |
| values for the support method, while the second argument is a pointer to a |
| C struct where output values must be placed. Four of the mandatory methods just |
| return <type>void</type>, since all their results appear in the output struct; but |
| <function>leaf_consistent</function> returns a <type>boolean</type> result. |
| The methods must not modify any fields of their input structs. In all |
| cases, the output struct is initialized to zeroes before calling the |
| user-defined method. The optional sixth method <function>compress</function> |
| accepts a <type>datum</type> to be indexed as the only argument and returns a value suitable |
| for physical storage in a leaf tuple. The optional seventh method |
| <function>options</function> accepts an <type>internal</type> pointer to a C struct, where |
| opclass-specific parameters should be placed, and returns <type>void</type>. |
| </para> |
| |
| <para> |
| The five mandatory user-defined methods are: |
| </para> |
| |
| <variablelist> |
| <varlistentry> |
| <term><function>config</function></term> |
| <listitem> |
| <para> |
| Returns static information about the index implementation, including |
| the data type OIDs of the prefix and node label data types. |
| </para> |
| <para> |
| The <acronym>SQL</acronym> declaration of the function must look like this: |
| <programlisting> |
| CREATE FUNCTION my_config(internal, internal) RETURNS void ... |
| </programlisting> |
| The first argument is a pointer to a <structname>spgConfigIn</structname> |
| C struct, containing input data for the function. |
| The second argument is a pointer to a <structname>spgConfigOut</structname> |
| C struct, which the function must fill with result data. |
| <programlisting> |
| typedef struct spgConfigIn |
| { |
| Oid attType; /* Data type to be indexed */ |
| } spgConfigIn; |
| |
| typedef struct spgConfigOut |
| { |
| Oid prefixType; /* Data type of inner-tuple prefixes */ |
| Oid labelType; /* Data type of inner-tuple node labels */ |
| Oid leafType; /* Data type of leaf-tuple values */ |
| bool canReturnData; /* Opclass can reconstruct original data */ |
| bool longValuesOK; /* Opclass can cope with values > 1 page */ |
| } spgConfigOut; |
| </programlisting> |
| |
| <structfield>attType</structfield> is passed in order to support polymorphic |
| index operator classes; for ordinary fixed-data-type operator classes, it |
| will always have the same value and so can be ignored. |
| </para> |
| |
| <para> |
| For operator classes that do not use prefixes, |
| <structfield>prefixType</structfield> can be set to <literal>VOIDOID</literal>. |
| Likewise, for operator classes that do not use node labels, |
| <structfield>labelType</structfield> can be set to <literal>VOIDOID</literal>. |
| <structfield>canReturnData</structfield> should be set true if the operator class |
| is capable of reconstructing the originally-supplied index value. |
| <structfield>longValuesOK</structfield> should be set true only when the |
| <structfield>attType</structfield> is of variable length and the operator |
| class is capable of segmenting long values by repeated suffixing |
| (see <xref linkend="spgist-limits"/>). |
| </para> |
| |
| <para> |
| <structfield>leafType</structfield> should match the index storage type |
| defined by the operator class's <structfield>opckeytype</structfield> |
| catalog entry. |
| (Note that <structfield>opckeytype</structfield> can be zero, |
| implying the storage type is the same as the operator class's input |
| type, which is the most common situation.) |
| For reasons of backward compatibility, the <function>config</function> |
| method can set <structfield>leafType</structfield> to some other value, |
| and that value will be used; but this is deprecated since the index |
| contents are then incorrectly identified in the catalogs. |
| Also, it's permissible to |
| leave <structfield>leafType</structfield> uninitialized (zero); |
| that is interpreted as meaning the index storage type derived from |
| <structfield>opckeytype</structfield>. |
| </para> |
| |
| <para> |
| When <structfield>attType</structfield> |
| and <structfield>leafType</structfield> are different, the optional |
| method <function>compress</function> must be provided. |
| Method <function>compress</function> is responsible |
| for transformation of datums to be indexed from <structfield>attType</structfield> |
| to <structfield>leafType</structfield>. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><function>choose</function></term> |
| <listitem> |
| <para> |
| Chooses a method for inserting a new value into an inner tuple. |
| </para> |
| |
| <para> |
| The <acronym>SQL</acronym> declaration of the function must look like this: |
| <programlisting> |
| CREATE FUNCTION my_choose(internal, internal) RETURNS void ... |
| </programlisting> |
| The first argument is a pointer to a <structname>spgChooseIn</structname> |
| C struct, containing input data for the function. |
| The second argument is a pointer to a <structname>spgChooseOut</structname> |
| C struct, which the function must fill with result data. |
| <programlisting> |
| typedef struct spgChooseIn |
| { |
| Datum datum; /* original datum to be indexed */ |
| Datum leafDatum; /* current datum to be stored at leaf */ |
| int level; /* current level (counting from zero) */ |
| |
| /* Data from current inner tuple */ |
| bool allTheSame; /* tuple is marked all-the-same? */ |
| bool hasPrefix; /* tuple has a prefix? */ |
| Datum prefixDatum; /* if so, the prefix value */ |
| int nNodes; /* number of nodes in the inner tuple */ |
| Datum *nodeLabels; /* node label values (NULL if none) */ |
| } spgChooseIn; |
| |
| typedef enum spgChooseResultType |
| { |
| spgMatchNode = 1, /* descend into existing node */ |
| spgAddNode, /* add a node to the inner tuple */ |
| spgSplitTuple /* split inner tuple (change its prefix) */ |
| } spgChooseResultType; |
| |
| typedef struct spgChooseOut |
| { |
| spgChooseResultType resultType; /* action code, see above */ |
| union |
| { |
| struct /* results for spgMatchNode */ |
| { |
| int nodeN; /* descend to this node (index from 0) */ |
| int levelAdd; /* increment level by this much */ |
| Datum restDatum; /* new leaf datum */ |
| } matchNode; |
| struct /* results for spgAddNode */ |
| { |
| Datum nodeLabel; /* new node's label */ |
| int nodeN; /* where to insert it (index from 0) */ |
| } addNode; |
| struct /* results for spgSplitTuple */ |
| { |
| /* Info to form new upper-level inner tuple with one child tuple */ |
| bool prefixHasPrefix; /* tuple should have a prefix? */ |
| Datum prefixPrefixDatum; /* if so, its value */ |
| int prefixNNodes; /* number of nodes */ |
| Datum *prefixNodeLabels; /* their labels (or NULL for |
| * no labels) */ |
| int childNodeN; /* which node gets child tuple */ |
| |
| /* Info to form new lower-level inner tuple with all old nodes */ |
| bool postfixHasPrefix; /* tuple should have a prefix? */ |
| Datum postfixPrefixDatum; /* if so, its value */ |
| } splitTuple; |
| } result; |
| } spgChooseOut; |
| </programlisting> |
| |
| <structfield>datum</structfield> is the original datum of |
| <structname>spgConfigIn</structname>.<structfield>attType</structfield> |
| type that was to be inserted into the index. |
| <structfield>leafDatum</structfield> is a value of |
| <structname>spgConfigOut</structname>.<structfield>leafType</structfield> |
| type, which is initially a result of method |
| <function>compress</function> applied to <structfield>datum</structfield> |
| when method <function>compress</function> is provided, or the same value as |
| <structfield>datum</structfield> otherwise. |
| <structfield>leafDatum</structfield> can change at lower levels of the tree |
| if the <function>choose</function> or <function>picksplit</function> |
| methods change it. When the insertion search reaches a leaf page, |
| the current value of <structfield>leafDatum</structfield> is what will be stored |
| in the newly created leaf tuple. |
| <structfield>level</structfield> is the current inner tuple's level, starting at |
| zero for the root level. |
| <structfield>allTheSame</structfield> is true if the current inner tuple is |
| marked as containing multiple equivalent nodes |
| (see <xref linkend="spgist-all-the-same"/>). |
| <structfield>hasPrefix</structfield> is true if the current inner tuple contains |
| a prefix; if so, |
| <structfield>prefixDatum</structfield> is its value. |
| <structfield>nNodes</structfield> is the number of child nodes contained in the |
| inner tuple, and |
| <structfield>nodeLabels</structfield> is an array of their label values, or |
| NULL if there are no labels. |
| </para> |
| |
| <para> |
| The <function>choose</function> function can determine either that |
| the new value matches one of the existing child nodes, or that a new |
| child node must be added, or that the new value is inconsistent with |
| the tuple prefix and so the inner tuple must be split to create a |
| less restrictive prefix. |
| </para> |
| |
| <para> |
| If the new value matches one of the existing child nodes, |
| set <structfield>resultType</structfield> to <literal>spgMatchNode</literal>. |
| Set <structfield>nodeN</structfield> to the index (from zero) of that node in |
| the node array. |
| Set <structfield>levelAdd</structfield> to the increment in |
| <structfield>level</structfield> caused by descending through that node, |
| or leave it as zero if the operator class does not use levels. |
| Set <structfield>restDatum</structfield> to equal <structfield>leafDatum</structfield> |
| if the operator class does not modify datums from one level to the |
| next, or otherwise set it to the modified value to be used as |
| <structfield>leafDatum</structfield> at the next level. |
| </para> |
| |
| <para> |
| If a new child node must be added, |
| set <structfield>resultType</structfield> to <literal>spgAddNode</literal>. |
| Set <structfield>nodeLabel</structfield> to the label to be used for the new |
| node, and set <structfield>nodeN</structfield> to the index (from zero) at which |
| to insert the node in the node array. |
| After the node has been added, the <function>choose</function> |
| function will be called again with the modified inner tuple; |
| that call should result in an <literal>spgMatchNode</literal> result. |
| </para> |
| |
| <para> |
| If the new value is inconsistent with the tuple prefix, |
| set <structfield>resultType</structfield> to <literal>spgSplitTuple</literal>. |
| This action moves all the existing nodes into a new lower-level |
| inner tuple, and replaces the existing inner tuple with a tuple |
| having a single downlink pointing to the new lower-level inner tuple. |
| Set <structfield>prefixHasPrefix</structfield> to indicate whether the new |
| upper tuple should have a prefix, and if so set |
| <structfield>prefixPrefixDatum</structfield> to the prefix value. This new |
| prefix value must be sufficiently less restrictive than the original |
| to accept the new value to be indexed. |
| Set <structfield>prefixNNodes</structfield> to the number of nodes needed in the |
| new tuple, and set <structfield>prefixNodeLabels</structfield> to a palloc'd array |
| holding their labels, or to NULL if node labels are not required. |
| Note that the total size of the new upper tuple must be no more |
| than the total size of the tuple it is replacing; this constrains |
| the lengths of the new prefix and new labels. |
| Set <structfield>childNodeN</structfield> to the index (from zero) of the node |
| that will downlink to the new lower-level inner tuple. |
| Set <structfield>postfixHasPrefix</structfield> to indicate whether the new |
| lower-level inner tuple should have a prefix, and if so set |
| <structfield>postfixPrefixDatum</structfield> to the prefix value. The |
| combination of these two prefixes and the downlink node's label |
| (if any) must have the same meaning as the original prefix, because |
| there is no opportunity to alter the node labels that are moved to |
| the new lower-level tuple, nor to change any child index entries. |
| After the node has been split, the <function>choose</function> |
| function will be called again with the replacement inner tuple. |
| That call may return an <literal>spgAddNode</literal> result, if no suitable |
| node was created by the <literal>spgSplitTuple</literal> action. Eventually |
| <function>choose</function> must return <literal>spgMatchNode</literal> to |
| allow the insertion to descend to the next level. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><function>picksplit</function></term> |
| <listitem> |
| <para> |
| Decides how to create a new inner tuple over a set of leaf tuples. |
| </para> |
| |
| <para> |
| The <acronym>SQL</acronym> declaration of the function must look like this: |
| <programlisting> |
| CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ... |
| </programlisting> |
| The first argument is a pointer to a <structname>spgPickSplitIn</structname> |
| C struct, containing input data for the function. |
| The second argument is a pointer to a <structname>spgPickSplitOut</structname> |
| C struct, which the function must fill with result data. |
| <programlisting> |
| typedef struct spgPickSplitIn |
| { |
| int nTuples; /* number of leaf tuples */ |
| Datum *datums; /* their datums (array of length nTuples) */ |
| int level; /* current level (counting from zero) */ |
| } spgPickSplitIn; |
| |
| typedef struct spgPickSplitOut |
| { |
| bool hasPrefix; /* new inner tuple should have a prefix? */ |
| Datum prefixDatum; /* if so, its value */ |
| |
| int nNodes; /* number of nodes for new inner tuple */ |
| Datum *nodeLabels; /* their labels (or NULL for no labels) */ |
| |
| int *mapTuplesToNodes; /* node index for each leaf tuple */ |
| Datum *leafTupleDatums; /* datum to store in each new leaf tuple */ |
| } spgPickSplitOut; |
| </programlisting> |
| |
| <structfield>nTuples</structfield> is the number of leaf tuples provided. |
| <structfield>datums</structfield> is an array of their datum values of |
| <structname>spgConfigOut</structname>.<structfield>leafType</structfield> |
| type. |
| <structfield>level</structfield> is the current level that all the leaf tuples |
| share, which will become the level of the new inner tuple. |
| </para> |
| |
| <para> |
| Set <structfield>hasPrefix</structfield> to indicate whether the new inner |
| tuple should have a prefix, and if so set |
| <structfield>prefixDatum</structfield> to the prefix value. |
| Set <structfield>nNodes</structfield> to indicate the number of nodes that |
| the new inner tuple will contain, and |
| set <structfield>nodeLabels</structfield> to an array of their label values, |
| or to NULL if node labels are not required. |
| Set <structfield>mapTuplesToNodes</structfield> to an array that gives the index |
| (from zero) of the node that each leaf tuple should be assigned to. |
| Set <structfield>leafTupleDatums</structfield> to an array of the values to |
| be stored in the new leaf tuples (these will be the same as the |
| input <structfield>datums</structfield> if the operator class does not modify |
| datums from one level to the next). |
| Note that the <function>picksplit</function> function is |
| responsible for palloc'ing the |
| <structfield>nodeLabels</structfield>, <structfield>mapTuplesToNodes</structfield> and |
| <structfield>leafTupleDatums</structfield> arrays. |
| </para> |
| |
| <para> |
| If more than one leaf tuple is supplied, it is expected that the |
| <function>picksplit</function> function will classify them into more than |
| one node; otherwise it is not possible to split the leaf tuples |
| across multiple pages, which is the ultimate purpose of this |
| operation. Therefore, if the <function>picksplit</function> function |
| ends up placing all the leaf tuples in the same node, the core |
| SP-GiST code will override that decision and generate an inner |
| tuple in which the leaf tuples are assigned at random to several |
| identically-labeled nodes. Such a tuple is marked |
| <literal>allTheSame</literal> to signify that this has happened. The |
| <function>choose</function> and <function>inner_consistent</function> functions |
| must take suitable care with such inner tuples. |
| See <xref linkend="spgist-all-the-same"/> for more information. |
| </para> |
| |
| <para> |
| <function>picksplit</function> can be applied to a single leaf tuple only |
| in the case that the <function>config</function> function set |
| <structfield>longValuesOK</structfield> to true and a larger-than-a-page input |
| value has been supplied. In this case the point of the operation is |
| to strip off a prefix and produce a new, shorter leaf datum value. |
| The call will be repeated until a leaf datum short enough to fit on |
| a page has been produced. See <xref linkend="spgist-limits"/> for |
| more information. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><function>inner_consistent</function></term> |
| <listitem> |
| <para> |
| Returns set of nodes (branches) to follow during tree search. |
| </para> |
| |
| <para> |
| The <acronym>SQL</acronym> declaration of the function must look like this: |
| <programlisting> |
| CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ... |
| </programlisting> |
| The first argument is a pointer to a <structname>spgInnerConsistentIn</structname> |
| C struct, containing input data for the function. |
| The second argument is a pointer to a <structname>spgInnerConsistentOut</structname> |
| C struct, which the function must fill with result data. |
| |
| <programlisting> |
| typedef struct spgInnerConsistentIn |
| { |
| ScanKey scankeys; /* array of operators and comparison values */ |
| ScanKey orderbys; /* array of ordering operators and comparison |
| * values */ |
| int nkeys; /* length of scankeys array */ |
| int norderbys; /* length of orderbys array */ |
| |
| Datum reconstructedValue; /* value reconstructed at parent */ |
| void *traversalValue; /* opclass-specific traverse value */ |
| MemoryContext traversalMemoryContext; /* put new traverse values here */ |
| int level; /* current level (counting from zero) */ |
| bool returnData; /* original data must be returned? */ |
| |
| /* Data from current inner tuple */ |
| bool allTheSame; /* tuple is marked all-the-same? */ |
| bool hasPrefix; /* tuple has a prefix? */ |
| Datum prefixDatum; /* if so, the prefix value */ |
| int nNodes; /* number of nodes in the inner tuple */ |
| Datum *nodeLabels; /* node label values (NULL if none) */ |
| } spgInnerConsistentIn; |
| |
| typedef struct spgInnerConsistentOut |
| { |
| int nNodes; /* number of child nodes to be visited */ |
| int *nodeNumbers; /* their indexes in the node array */ |
| int *levelAdds; /* increment level by this much for each */ |
| Datum *reconstructedValues; /* associated reconstructed values */ |
| void **traversalValues; /* opclass-specific traverse values */ |
| double **distances; /* associated distances */ |
| } spgInnerConsistentOut; |
| </programlisting> |
| |
| The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>, |
| describes the index search condition(s). These conditions are |
| combined with AND — only index entries that satisfy all of |
| them are interesting. (Note that <structfield>nkeys</structfield> = 0 implies |
| that all index entries satisfy the query.) Usually the consistent |
| function only cares about the <structfield>sk_strategy</structfield> and |
| <structfield>sk_argument</structfield> fields of each array entry, which |
| respectively give the indexable operator and comparison value. |
| In particular it is not necessary to check <structfield>sk_flags</structfield> to |
| see if the comparison value is NULL, because the SP-GiST core code |
| will filter out such conditions. |
| The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>, |
| describes ordering operators (if any) in the same manner. |
| <structfield>reconstructedValue</structfield> is the value reconstructed for the |
| parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the |
| <function>inner_consistent</function> function did not provide a value at the |
| parent level. |
| <structfield>traversalValue</structfield> is a pointer to any traverse data |
| passed down from the previous call of <function>inner_consistent</function> |
| on the parent index tuple, or NULL at the root level. |
| <structfield>traversalMemoryContext</structfield> is the memory context in which |
| to store output traverse values (see below). |
| <structfield>level</structfield> is the current inner tuple's level, starting at |
| zero for the root level. |
| <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is |
| required for this query; this will only be so if the |
| <function>config</function> function asserted <structfield>canReturnData</structfield>. |
| <structfield>allTheSame</structfield> is true if the current inner tuple is |
| marked <quote>all-the-same</quote>; in this case all the nodes have the |
| same label (if any) and so either all or none of them match the query |
| (see <xref linkend="spgist-all-the-same"/>). |
| <structfield>hasPrefix</structfield> is true if the current inner tuple contains |
| a prefix; if so, |
| <structfield>prefixDatum</structfield> is its value. |
| <structfield>nNodes</structfield> is the number of child nodes contained in the |
| inner tuple, and |
| <structfield>nodeLabels</structfield> is an array of their label values, or |
| NULL if the nodes do not have labels. |
| </para> |
| |
| <para> |
| <structfield>nNodes</structfield> must be set to the number of child nodes that |
| need to be visited by the search, and |
| <structfield>nodeNumbers</structfield> must be set to an array of their indexes. |
| If the operator class keeps track of levels, set |
| <structfield>levelAdds</structfield> to an array of the level increments |
| required when descending to each node to be visited. (Often these |
| increments will be the same for all the nodes, but that's not |
| necessarily so, so an array is used.) |
| If value reconstruction is needed, set |
| <structfield>reconstructedValues</structfield> to an array of the values |
| reconstructed for each child node to be visited; otherwise, leave |
| <structfield>reconstructedValues</structfield> as NULL. |
| The reconstructed values are assumed to be of type |
| <structname>spgConfigOut</structname>.<structfield>leafType</structfield>. |
| (However, since the core system will do nothing with them except |
| possibly copy them, it is sufficient for them to have the |
| same <literal>typlen</literal> and <literal>typbyval</literal> |
| properties as <structfield>leafType</structfield>.) |
| If ordered search is performed, set <structfield>distances</structfield> |
| to an array of distance values according to <structfield>orderbys</structfield> |
| array (nodes with lowest distances will be processed first). Leave it |
| NULL otherwise. |
| If it is desired to pass down additional out-of-band information |
| (<quote>traverse values</quote>) to lower levels of the tree search, |
| set <structfield>traversalValues</structfield> to an array of the appropriate |
| traverse values, one for each child node to be visited; otherwise, |
| leave <structfield>traversalValues</structfield> as NULL. |
| Note that the <function>inner_consistent</function> function is |
| responsible for palloc'ing the |
| <structfield>nodeNumbers</structfield>, <structfield>levelAdds</structfield>, |
| <structfield>distances</structfield>, |
| <structfield>reconstructedValues</structfield>, and |
| <structfield>traversalValues</structfield> arrays in the current memory context. |
| However, any output traverse values pointed to by |
| the <structfield>traversalValues</structfield> array should be allocated |
| in <structfield>traversalMemoryContext</structfield>. |
| Each traverse value must be a single palloc'd chunk. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><function>leaf_consistent</function></term> |
| <listitem> |
| <para> |
| Returns true if a leaf tuple satisfies a query. |
| </para> |
| |
| <para> |
| The <acronym>SQL</acronym> declaration of the function must look like this: |
| <programlisting> |
| CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ... |
| </programlisting> |
| The first argument is a pointer to a <structname>spgLeafConsistentIn</structname> |
| C struct, containing input data for the function. |
| The second argument is a pointer to a <structname>spgLeafConsistentOut</structname> |
| C struct, which the function must fill with result data. |
| <programlisting> |
| typedef struct spgLeafConsistentIn |
| { |
| ScanKey scankeys; /* array of operators and comparison values */ |
| ScanKey orderbys; /* array of ordering operators and comparison |
| * values */ |
| int nkeys; /* length of scankeys array */ |
| int norderbys; /* length of orderbys array */ |
| |
| Datum reconstructedValue; /* value reconstructed at parent */ |
| void *traversalValue; /* opclass-specific traverse value */ |
| int level; /* current level (counting from zero) */ |
| bool returnData; /* original data must be returned? */ |
| |
| Datum leafDatum; /* datum in leaf tuple */ |
| } spgLeafConsistentIn; |
| |
| typedef struct spgLeafConsistentOut |
| { |
| Datum leafValue; /* reconstructed original data, if any */ |
| bool recheck; /* set true if operator must be rechecked */ |
| bool recheckDistances; /* set true if distances must be rechecked */ |
| double *distances; /* associated distances */ |
| } spgLeafConsistentOut; |
| </programlisting> |
| |
| The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>, |
| describes the index search condition(s). These conditions are |
| combined with AND — only index entries that satisfy all of |
| them satisfy the query. (Note that <structfield>nkeys</structfield> = 0 implies |
| that all index entries satisfy the query.) Usually the consistent |
| function only cares about the <structfield>sk_strategy</structfield> and |
| <structfield>sk_argument</structfield> fields of each array entry, which |
| respectively give the indexable operator and comparison value. |
| In particular it is not necessary to check <structfield>sk_flags</structfield> to |
| see if the comparison value is NULL, because the SP-GiST core code |
| will filter out such conditions. |
| The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>, |
| describes the ordering operators in the same manner. |
| <structfield>reconstructedValue</structfield> is the value reconstructed for the |
| parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the |
| <function>inner_consistent</function> function did not provide a value at the |
| parent level. |
| <structfield>traversalValue</structfield> is a pointer to any traverse data |
| passed down from the previous call of <function>inner_consistent</function> |
| on the parent index tuple, or NULL at the root level. |
| <structfield>level</structfield> is the current leaf tuple's level, starting at |
| zero for the root level. |
| <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is |
| required for this query; this will only be so if the |
| <function>config</function> function asserted <structfield>canReturnData</structfield>. |
| <structfield>leafDatum</structfield> is the key value of |
| <structname>spgConfigOut</structname>.<structfield>leafType</structfield> |
| stored in the current leaf tuple. |
| </para> |
| |
| <para> |
| The function must return <literal>true</literal> if the leaf tuple matches the |
| query, or <literal>false</literal> if not. In the <literal>true</literal> case, |
| if <structfield>returnData</structfield> is <literal>true</literal> then |
| <structfield>leafValue</structfield> must be set to the value (of type |
| <structname>spgConfigIn</structname>.<structfield>attType</structfield>) |
| originally supplied to be indexed for this leaf tuple. Also, |
| <structfield>recheck</structfield> may be set to <literal>true</literal> if the match |
| is uncertain and so the operator(s) must be re-applied to the actual |
| heap tuple to verify the match. |
| If ordered search is performed, set <structfield>distances</structfield> |
| to an array of distance values according to <structfield>orderbys</structfield> |
| array. Leave it NULL otherwise. If at least one of returned distances |
| is not exact, set <structfield>recheckDistances</structfield> to true. |
| In this case, the executor will calculate the exact distances after |
| fetching the tuple from the heap, and will reorder the tuples if needed. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| |
| <para> |
| The optional user-defined methods are: |
| </para> |
| |
| <variablelist> |
| <varlistentry> |
| <term><function>Datum compress(Datum in)</function></term> |
| <listitem> |
| <para> |
| Converts a data item into a format suitable for physical storage in |
| a leaf tuple of the index. It accepts a value of type |
| <structname>spgConfigIn</structname>.<structfield>attType</structfield> |
| and returns a value of type |
| <structname>spgConfigOut</structname>.<structfield>leafType</structfield>. |
| The output value must not contain an out-of-line TOAST pointer. |
| </para> |
| |
| <para> |
| Note: the <function>compress</function> method is only applied to |
| values to be stored. The consistent methods receive query |
| <structfield>scankeys</structfield> unchanged, without transformation |
| using <function>compress</function>. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><function>options</function></term> |
| <listitem> |
| <para> |
| Defines a set of user-visible parameters that control operator class |
| behavior. |
| </para> |
| |
| <para> |
| The <acronym>SQL</acronym> declaration of the function must look like this: |
| |
| <programlisting> |
| CREATE OR REPLACE FUNCTION my_options(internal) |
| RETURNS void |
| AS 'MODULE_PATHNAME' |
| LANGUAGE C STRICT; |
| </programlisting> |
| </para> |
| |
| <para> |
| The function is passed a pointer to a <structname>local_relopts</structname> |
| struct, which needs to be filled with a set of operator class |
| specific options. The options can be accessed from other support |
| functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and |
| <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros. |
| </para> |
| |
| <para> |
| Since the representation of the key in <acronym>SP-GiST</acronym> is |
| flexible, it may depend on user-specified parameters. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| |
| <para> |
| All the SP-GiST support methods are normally called in a short-lived |
| memory context; that is, <varname>CurrentMemoryContext</varname> will be reset |
| after processing of each tuple. It is therefore not very important to |
| worry about pfree'ing everything you palloc. (The <function>config</function> |
| method is an exception: it should try to avoid leaking memory. But |
| usually the <function>config</function> method need do nothing but assign |
| constants into the passed parameter struct.) |
| </para> |
| |
| <para> |
| If the indexed column is of a collatable data type, the index collation |
| will be passed to all the support methods, using the standard |
| <function>PG_GET_COLLATION()</function> mechanism. |
| </para> |
| |
| </sect1> |
| |
| <sect1 id="spgist-implementation"> |
| <title>Implementation</title> |
| |
| <para> |
| This section covers implementation details and other tricks that are |
| useful for implementers of <acronym>SP-GiST</acronym> operator classes to |
| know. |
| </para> |
| |
| <sect2 id="spgist-limits"> |
| <title>SP-GiST Limits</title> |
| |
| <para> |
| Individual leaf tuples and inner tuples must fit on a single index page |
| (8kB by default). Therefore, when indexing values of variable-length |
| data types, long values can only be supported by methods such as radix |
| trees, in which each level of the tree includes a prefix that is short |
| enough to fit on a page, and the final leaf level includes a suffix also |
| short enough to fit on a page. The operator class should set |
| <structfield>longValuesOK</structfield> to true only if it is prepared to arrange for |
| this to happen. Otherwise, the <acronym>SP-GiST</acronym> core will |
| reject any request to index a value that is too large to fit |
| on an index page. |
| </para> |
| |
| <para> |
| Likewise, it is the operator class's responsibility that inner tuples |
| do not grow too large to fit on an index page; this limits the number |
| of child nodes that can be used in one inner tuple, as well as the |
| maximum size of a prefix value. |
| </para> |
| |
| <para> |
| Another limitation is that when an inner tuple's node points to a set |
| of leaf tuples, those tuples must all be in the same index page. |
| (This is a design decision to reduce seeking and save space in the |
| links that chain such tuples together.) If the set of leaf tuples |
| grows too large for a page, a split is performed and an intermediate |
| inner tuple is inserted. For this to fix the problem, the new inner |
| tuple <emphasis>must</emphasis> divide the set of leaf values into more than one |
| node group. If the operator class's <function>picksplit</function> function |
| fails to do that, the <acronym>SP-GiST</acronym> core resorts to |
| extraordinary measures described in <xref linkend="spgist-all-the-same"/>. |
| </para> |
| |
| <para> |
| When <structfield>longValuesOK</structfield> is true, it is expected |
| that successive levels of the <acronym>SP-GiST</acronym> tree will |
| absorb more and more information into the prefixes and node labels of |
| the inner tuples, making the required leaf datum smaller and smaller, |
| so that eventually it will fit on a page. |
| To prevent bugs in operator classes from causing infinite insertion |
| loops, the <acronym>SP-GiST</acronym> core will raise an error if the |
| leaf datum does not become any smaller within ten cycles |
| of <function>choose</function> method calls. |
| </para> |
| </sect2> |
| |
| <sect2 id="spgist-null-labels"> |
| <title>SP-GiST Without Node Labels</title> |
| |
| <para> |
| Some tree algorithms use a fixed set of nodes for each inner tuple; |
| for example, in a quad-tree there are always exactly four nodes |
| corresponding to the four quadrants around the inner tuple's centroid |
| point. In such a case the code typically works with the nodes by |
| number, and there is no need for explicit node labels. To suppress |
| node labels (and thereby save some space), the <function>picksplit</function> |
| function can return NULL for the <structfield>nodeLabels</structfield> array, |
| and likewise the <function>choose</function> function can return NULL for |
| the <structfield>prefixNodeLabels</structfield> array during |
| a <literal>spgSplitTuple</literal> action. |
| This will in turn result in <structfield>nodeLabels</structfield> being NULL during |
| subsequent calls to <function>choose</function> and <function>inner_consistent</function>. |
| In principle, node labels could be used for some inner tuples and omitted |
| for others in the same index. |
| </para> |
| |
| <para> |
| When working with an inner tuple having unlabeled nodes, it is an error |
| for <function>choose</function> to return <literal>spgAddNode</literal>, since the set |
| of nodes is supposed to be fixed in such cases. |
| </para> |
| </sect2> |
| |
| <sect2 id="spgist-all-the-same"> |
| <title><quote>All-the-Same</quote> Inner Tuples</title> |
| |
| <para> |
| The <acronym>SP-GiST</acronym> core can override the results of the |
| operator class's <function>picksplit</function> function when |
| <function>picksplit</function> fails to divide the supplied leaf values into |
| at least two node categories. When this happens, the new inner tuple |
| is created with multiple nodes that each have the same label (if any) |
| that <function>picksplit</function> gave to the one node it did use, and the |
| leaf values are divided at random among these equivalent nodes. |
| The <literal>allTheSame</literal> flag is set on the inner tuple to warn the |
| <function>choose</function> and <function>inner_consistent</function> functions that the |
| tuple does not have the node set that they might otherwise expect. |
| </para> |
| |
| <para> |
| When dealing with an <literal>allTheSame</literal> tuple, a <function>choose</function> |
| result of <literal>spgMatchNode</literal> is interpreted to mean that the new |
| value can be assigned to any of the equivalent nodes; the core code will |
| ignore the supplied <structfield>nodeN</structfield> value and descend into one |
| of the nodes at random (so as to keep the tree balanced). It is an |
| error for <function>choose</function> to return <literal>spgAddNode</literal>, since |
| that would make the nodes not all equivalent; the |
| <literal>spgSplitTuple</literal> action must be used if the value to be inserted |
| doesn't match the existing nodes. |
| </para> |
| |
| <para> |
| When dealing with an <literal>allTheSame</literal> tuple, the |
| <function>inner_consistent</function> function should return either all or none |
| of the nodes as targets for continuing the index search, since they are |
| all equivalent. This may or may not require any special-case code, |
| depending on how much the <function>inner_consistent</function> function normally |
| assumes about the meaning of the nodes. |
| </para> |
| </sect2> |
| |
| </sect1> |
| |
| <sect1 id="spgist-examples"> |
| <title>Examples</title> |
| |
| <para> |
| The <productname>PostgreSQL</productname> source distribution includes |
| several examples of index operator classes for <acronym>SP-GiST</acronym>, |
| as described in <xref linkend="spgist-builtin-opclasses-table"/>. Look |
| into <filename>src/backend/access/spgist/</filename> |
| and <filename>src/backend/utils/adt/</filename> to see the code. |
| </para> |
| |
| </sect1> |
| |
| </chapter> |