| <!-- doc/src/sgml/tablesample-method.sgml --> |
| |
| <chapter id="tablesample-method"> |
| <title>Writing a Table Sampling Method</title> |
| |
| <indexterm zone="tablesample-method"> |
| <primary>table sampling method</primary> |
| </indexterm> |
| |
| <indexterm zone="tablesample-method"> |
| <primary><literal>TABLESAMPLE</literal> method</primary> |
| </indexterm> |
| |
| <para> |
| <productname>PostgreSQL</productname>'s implementation of the <literal>TABLESAMPLE</literal> |
| clause supports custom table sampling methods, in addition to |
| the <literal>BERNOULLI</literal> and <literal>SYSTEM</literal> methods that are required |
| by the SQL standard. The sampling method determines which rows of the |
| table will be selected when the <literal>TABLESAMPLE</literal> clause is used. |
| </para> |
| |
| <para> |
| At the SQL level, a table sampling method is represented by a single SQL |
| function, typically implemented in C, having the signature |
| <programlisting> |
| method_name(internal) RETURNS tsm_handler |
| </programlisting> |
| The name of the function is the same method name appearing in the |
| <literal>TABLESAMPLE</literal> clause. The <type>internal</type> argument is a dummy |
| (always having value zero) that simply serves to prevent this function from |
| being called directly from an SQL command. |
| The result of the function must be a palloc'd struct of |
| type <type>TsmRoutine</type>, which contains pointers to support functions for |
| the sampling method. These support functions are plain C functions and |
| are not visible or callable at the SQL level. The support functions are |
| described in <xref linkend="tablesample-support-functions"/>. |
| </para> |
| |
| <para> |
| In addition to function pointers, the <type>TsmRoutine</type> struct must |
| provide these additional fields: |
| </para> |
| |
| <variablelist> |
| <varlistentry> |
| <term><literal>List *parameterTypes</literal></term> |
| <listitem> |
| <para> |
| This is an OID list containing the data type OIDs of the parameter(s) |
| that will be accepted by the <literal>TABLESAMPLE</literal> clause when this |
| sampling method is used. For example, for the built-in methods, this |
| list contains a single item with value <literal>FLOAT4OID</literal>, which |
| represents the sampling percentage. Custom sampling methods can have |
| more or different parameters. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>bool repeatable_across_queries</literal></term> |
| <listitem> |
| <para> |
| If <literal>true</literal>, the sampling method can deliver identical samples |
| across successive queries, if the same parameters |
| and <literal>REPEATABLE</literal> seed value are supplied each time and the |
| table contents have not changed. When this is <literal>false</literal>, |
| the <literal>REPEATABLE</literal> clause is not accepted for use with the |
| sampling method. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>bool repeatable_across_scans</literal></term> |
| <listitem> |
| <para> |
| If <literal>true</literal>, the sampling method can deliver identical samples |
| across successive scans in the same query (assuming unchanging |
| parameters, seed value, and snapshot). |
| When this is <literal>false</literal>, the planner will not select plans that |
| would require scanning the sampled table more than once, since that |
| might result in inconsistent query output. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| |
| <para> |
| The <type>TsmRoutine</type> struct type is declared |
| in <filename>src/include/access/tsmapi.h</filename>, which see for additional |
| details. |
| </para> |
| |
| <para> |
| The table sampling methods included in the standard distribution are good |
| references when trying to write your own. Look into |
| the <filename>src/backend/access/tablesample</filename> subdirectory of the source |
| tree for the built-in sampling methods, and into the <filename>contrib</filename> |
| subdirectory for add-on methods. |
| </para> |
| |
| <sect1 id="tablesample-support-functions"> |
| <title>Sampling Method Support Functions</title> |
| |
| <para> |
| The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct |
| containing pointers to the support functions described below. Most of |
| the functions are required, but some are optional, and those pointers can |
| be NULL. |
| </para> |
| |
| <para> |
| <programlisting> |
| void |
| SampleScanGetSampleSize (PlannerInfo *root, |
| RelOptInfo *baserel, |
| List *paramexprs, |
| BlockNumber *pages, |
| double *tuples); |
| </programlisting> |
| |
| This function is called during planning. It must estimate the number of |
| relation pages that will be read during a sample scan, and the number of |
| tuples that will be selected by the scan. (For example, these might be |
| determined by estimating the sampling fraction, and then multiplying |
| the <literal>baserel->pages</literal> and <literal>baserel->tuples</literal> |
| numbers by that, being sure to round the results to integral values.) |
| The <literal>paramexprs</literal> list holds the expression(s) that are |
| parameters to the <literal>TABLESAMPLE</literal> clause. It is recommended to |
| use <function>estimate_expression_value()</function> to try to reduce these |
| expressions to constants, if their values are needed for estimation |
| purposes; but the function must provide size estimates even if they cannot |
| be reduced, and it should not fail even if the values appear invalid |
| (remember that they're only estimates of what the run-time values will be). |
| The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs. |
| </para> |
| |
| <para> |
| <programlisting> |
| void |
| InitSampleScan (SampleScanState *node, |
| int eflags); |
| </programlisting> |
| |
| Initialize for execution of a SampleScan plan node. |
| This is called during executor startup. |
| It should perform any initialization needed before processing can start. |
| The <structname>SampleScanState</structname> node has already been created, but |
| its <structfield>tsm_state</structfield> field is NULL. |
| The <function>InitSampleScan</function> function can palloc whatever internal |
| state data is needed by the sampling method, and store a pointer to |
| it in <literal>node->tsm_state</literal>. |
| Information about the table to scan is accessible through other fields |
| of the <structname>SampleScanState</structname> node (but note that the |
| <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set |
| up yet). |
| <literal>eflags</literal> contains flag bits describing the executor's |
| operating mode for this plan node. |
| </para> |
| |
| <para> |
| When <literal>(eflags & EXEC_FLAG_EXPLAIN_ONLY)</literal> is true, |
| the scan will not actually be performed, so this function should only do |
| the minimum required to make the node state valid for <command>EXPLAIN</command> |
| and <function>EndSampleScan</function>. |
| </para> |
| |
| <para> |
| This function can be omitted (set the pointer to NULL), in which case |
| <function>BeginSampleScan</function> must perform all initialization needed |
| by the sampling method. |
| </para> |
| |
| <para> |
| <programlisting> |
| void |
| BeginSampleScan (SampleScanState *node, |
| Datum *params, |
| int nparams, |
| uint32 seed); |
| </programlisting> |
| |
| Begin execution of a sampling scan. |
| This is called just before the first attempt to fetch a tuple, and |
| may be called again if the scan needs to be restarted. |
| Information about the table to scan is accessible through fields |
| of the <structname>SampleScanState</structname> node (but note that the |
| <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set |
| up yet). |
| The <literal>params</literal> array, of length <literal>nparams</literal>, contains the |
| values of the parameters supplied in the <literal>TABLESAMPLE</literal> clause. |
| These will have the number and types specified in the sampling |
| method's <literal>parameterTypes</literal> list, and have been checked |
| to not be null. |
| <literal>seed</literal> contains a seed to use for any random numbers generated |
| within the sampling method; it is either a hash derived from the |
| <literal>REPEATABLE</literal> value if one was given, or the result |
| of <literal>random()</literal> if not. |
| </para> |
| |
| <para> |
| This function may adjust the fields <literal>node->use_bulkread</literal> |
| and <literal>node->use_pagemode</literal>. |
| If <literal>node->use_bulkread</literal> is <literal>true</literal>, which it is by |
| default, the scan will use a buffer access strategy that encourages |
| recycling buffers after use. It might be reasonable to set this |
| to <literal>false</literal> if the scan will visit only a small fraction of the |
| table's pages. |
| If <literal>node->use_pagemode</literal> is <literal>true</literal>, which it is by |
| default, the scan will perform visibility checking in a single pass for |
| all tuples on each visited page. It might be reasonable to set this |
| to <literal>false</literal> if the scan will select only a small fraction of the |
| tuples on each visited page. That will result in fewer tuple visibility |
| checks being performed, though each one will be more expensive because it |
| will require more locking. |
| </para> |
| |
| <para> |
| If the sampling method is |
| marked <literal>repeatable_across_scans</literal>, it must be able to |
| select the same set of tuples during a rescan as it did originally, that is |
| a fresh call of <function>BeginSampleScan</function> must lead to selecting the |
| same tuples as before (if the <literal>TABLESAMPLE</literal> parameters |
| and seed don't change). |
| </para> |
| |
| <para> |
| <programlisting> |
| BlockNumber |
| NextSampleBlock (SampleScanState *node, BlockNumber nblocks); |
| </programlisting> |
| |
| Returns the block number of the next page to be scanned, or |
| <literal>InvalidBlockNumber</literal> if no pages remain to be scanned. |
| </para> |
| |
| <para> |
| This function can be omitted (set the pointer to NULL), in which case |
| the core code will perform a sequential scan of the entire relation. |
| Such a scan can use synchronized scanning, so that the sampling method |
| cannot assume that the relation pages are visited in the same order on |
| each scan. |
| </para> |
| |
| <para> |
| <programlisting> |
| OffsetNumber |
| NextSampleTuple (SampleScanState *node, |
| BlockNumber blockno, |
| OffsetNumber maxoffset); |
| </programlisting> |
| |
| Returns the offset number of the next tuple to be sampled on the |
| specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to |
| be sampled. <literal>maxoffset</literal> is the largest offset number in use |
| on the page. |
| </para> |
| |
| <note> |
| <para> |
| <function>NextSampleTuple</function> is not explicitly told which of the offset |
| numbers in the range <literal>1 .. maxoffset</literal> actually contain valid |
| tuples. This is not normally a problem since the core code ignores |
| requests to sample missing or invisible tuples; that should not result in |
| any bias in the sample. However, if necessary, the function can use |
| <literal>node->donetuples</literal> to examine how many of the tuples |
| it returned were valid and visible. |
| </para> |
| </note> |
| |
| <note> |
| <para> |
| <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume |
| that <literal>blockno</literal> is the same page number returned by the most |
| recent <function>NextSampleBlock</function> call. It was returned by some |
| previous <function>NextSampleBlock</function> call, but the core code is allowed |
| to call <function>NextSampleBlock</function> in advance of actually scanning |
| pages, so as to support prefetching. It is OK to assume that once |
| sampling of a given page begins, successive <function>NextSampleTuple</function> |
| calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is |
| returned. |
| </para> |
| </note> |
| |
| <para> |
| <programlisting> |
| void |
| EndSampleScan (SampleScanState *node); |
| </programlisting> |
| |
| End the scan and release resources. It is normally not important |
| to release palloc'd memory, but any externally-visible resources |
| should be cleaned up. |
| This function can be omitted (set the pointer to NULL) in the common |
| case where no such resources exist. |
| </para> |
| |
| </sect1> |
| |
| </chapter> |