| <!-- doc/src/sgml/seg.sgml --> |
| |
| <sect1 id="seg" xreflabel="seg"> |
| <title>seg</title> |
| |
| <indexterm zone="seg"> |
| <primary>seg</primary> |
| </indexterm> |
| |
| <para> |
| This module implements a data type <type>seg</type> for |
| representing line segments, or floating point intervals. |
| <type>seg</type> can represent uncertainty in the interval endpoints, |
| making it especially useful for representing laboratory measurements. |
| </para> |
| |
| <para> |
| This module is considered <quote>trusted</quote>, that is, it can be |
| installed by non-superusers who have <literal>CREATE</literal> privilege |
| on the current database. |
| </para> |
| |
| <sect2> |
| <title>Rationale</title> |
| |
| <para> |
| The geometry of measurements is usually more complex than that of a |
| point in a numeric continuum. A measurement is usually a segment of |
| that continuum with somewhat fuzzy limits. The measurements come out |
| as intervals because of uncertainty and randomness, as well as because |
| the value being measured may naturally be an interval indicating some |
| condition, such as the temperature range of stability of a protein. |
| </para> |
| |
| <para> |
| Using just common sense, it appears more convenient to store such data |
| as intervals, rather than pairs of numbers. In practice, it even turns |
| out more efficient in most applications. |
| </para> |
| |
| <para> |
| Further along the line of common sense, the fuzziness of the limits |
| suggests that the use of traditional numeric data types leads to a |
| certain loss of information. Consider this: your instrument reads |
| 6.50, and you input this reading into the database. What do you get |
| when you fetch it? Watch: |
| |
| <screen> |
| test=> select 6.50 :: float8 as "pH"; |
| pH |
| --- |
| 6.5 |
| (1 row) |
| </screen> |
| |
| In the world of measurements, 6.50 is not the same as 6.5. It may |
| sometimes be critically different. The experimenters usually write |
| down (and publish) the digits they trust. 6.50 is actually a fuzzy |
| interval contained within a bigger and even fuzzier interval, 6.5, |
| with their center points being (probably) the only common feature they |
| share. We definitely do not want such different data items to appear the |
| same. |
| </para> |
| |
| <para> |
| Conclusion? It is nice to have a special data type that can record the |
| limits of an interval with arbitrarily variable precision. Variable in |
| the sense that each data element records its own precision. |
| </para> |
| |
| <para> |
| Check this out: |
| |
| <screen> |
| test=> select '6.25 .. 6.50'::seg as "pH"; |
| pH |
| ------------ |
| 6.25 .. 6.50 |
| (1 row) |
| </screen> |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Syntax</title> |
| |
| <para> |
| The external representation of an interval is formed using one or two |
| floating-point numbers joined by the range operator (<literal>..</literal> |
| or <literal>...</literal>). Alternatively, it can be specified as a |
| center point plus or minus a deviation. |
| Optional certainty indicators (<literal><</literal>, |
| <literal>></literal> or <literal>~</literal>) can be stored as well. |
| (Certainty indicators are ignored by all the built-in operators, however.) |
| <xref linkend="seg-repr-table"/> gives an overview of allowed |
| representations; <xref linkend="seg-input-examples"/> shows some |
| examples. |
| </para> |
| |
| <para> |
| In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and |
| <replaceable>delta</replaceable> denote |
| floating-point numbers. <replaceable>x</replaceable> and <replaceable>y</replaceable>, but |
| not <replaceable>delta</replaceable>, can be preceded by a certainty indicator. |
| </para> |
| |
| <table id="seg-repr-table"> |
| <title><type>seg</type> External Representations</title> |
| <tgroup cols="2"> |
| <tbody> |
| <row> |
| <entry><literal><replaceable>x</replaceable></literal></entry> |
| <entry>Single value (zero-length interval) |
| </entry> |
| </row> |
| <row> |
| <entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry> |
| <entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable> |
| </entry> |
| </row> |
| <row> |
| <entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry> |
| <entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to |
| <replaceable>x</replaceable> + <replaceable>delta</replaceable> |
| </entry> |
| </row> |
| <row> |
| <entry><literal><replaceable>x</replaceable> ..</literal></entry> |
| <entry>Open interval with lower bound <replaceable>x</replaceable> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>.. <replaceable>x</replaceable></literal></entry> |
| <entry>Open interval with upper bound <replaceable>x</replaceable> |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <table id="seg-input-examples"> |
| <title>Examples of Valid <type>seg</type> Input</title> |
| <tgroup cols="2"> |
| <colspec colname="col1" colwidth="1*"/> |
| <colspec colname="col2" colwidth="2*"/> |
| <tbody> |
| <row> |
| <entry><literal>5.0</literal></entry> |
| <entry> |
| Creates a zero-length segment (a point, if you will) |
| </entry> |
| </row> |
| <row> |
| <entry><literal>~5.0</literal></entry> |
| <entry> |
| Creates a zero-length segment and records |
| <literal>~</literal> in the data. <literal>~</literal> is ignored |
| by <type>seg</type> operations, but |
| is preserved as a comment. |
| </entry> |
| </row> |
| <row> |
| <entry><literal><5.0</literal></entry> |
| <entry> |
| Creates a point at 5.0. <literal><</literal> is ignored but |
| is preserved as a comment. |
| </entry> |
| </row> |
| <row> |
| <entry><literal>>5.0</literal></entry> |
| <entry> |
| Creates a point at 5.0. <literal>></literal> is ignored but |
| is preserved as a comment. |
| </entry> |
| </row> |
| <row> |
| <entry><literal>5(+-)0.3</literal></entry> |
| <entry> |
| Creates an interval <literal>4.7 .. 5.3</literal>. |
| Note that the <literal>(+-)</literal> notation isn't preserved. |
| </entry> |
| </row> |
| <row> |
| <entry><literal>50 .. </literal></entry> |
| <entry>Everything that is greater than or equal to 50</entry> |
| </row> |
| <row> |
| <entry><literal>.. 0</literal></entry> |
| <entry>Everything that is less than or equal to 0</entry> |
| </row> |
| <row> |
| <entry><literal>1.5e-2 .. 2E-2 </literal></entry> |
| <entry>Creates an interval <literal>0.015 .. 0.02</literal></entry> |
| </row> |
| <row> |
| <entry><literal>1 ... 2</literal></entry> |
| <entry> |
| The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>, |
| or <literal>1..2</literal> |
| (spaces around the range operator are ignored) |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <para> |
| Because the <literal>...</literal> operator is widely used in data sources, it is allowed |
| as an alternative spelling of the <literal>..</literal> operator. Unfortunately, this |
| creates a parsing ambiguity: it is not clear whether the upper bound |
| in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>. |
| This is resolved by requiring at least one digit before the decimal |
| point in all numbers in <type>seg</type> input. |
| </para> |
| |
| <para> |
| As a sanity check, <type>seg</type> rejects intervals with the lower bound |
| greater than the upper, for example <literal>5 .. 2</literal>. |
| </para> |
| |
| </sect2> |
| |
| <sect2> |
| <title>Precision</title> |
| |
| <para> |
| <type>seg</type> values are stored internally as pairs of 32-bit floating point |
| numbers. This means that numbers with more than 7 significant digits |
| will be truncated. |
| </para> |
| |
| <para> |
| Numbers with 7 or fewer significant digits retain their |
| original precision. That is, if your query returns 0.00, you will be |
| sure that the trailing zeroes are not the artifacts of formatting: they |
| reflect the precision of the original data. The number of leading |
| zeroes does not affect precision: the value 0.0067 is considered to |
| have just 2 significant digits. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Usage</title> |
| |
| <para> |
| The <filename>seg</filename> module includes a GiST index operator class for |
| <type>seg</type> values. |
| The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>. |
| </para> |
| |
| <table id="seg-gist-operators"> |
| <title>Seg GiST Operators</title> |
| <tgroup cols="1"> |
| <thead> |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| Operator |
| </para> |
| <para> |
| Description |
| </para></entry> |
| </row> |
| </thead> |
| |
| <tbody> |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal><<</literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Is the first <type>seg</type> entirely to the left of the second? |
| [a, b] << [c, d] is true if b < c. |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal>>></literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Is the first <type>seg</type> entirely to the right of the second? |
| [a, b] >> [c, d] is true if a > d. |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal>&<</literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Does the first <type>seg</type> not extend to the right of the |
| second? |
| [a, b] &< [c, d] is true if b <= d. |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal>&></literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Does the first <type>seg</type> not extend to the left of the |
| second? |
| [a, b] &> [c, d] is true if a >= c. |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal>=</literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Are the two <type>seg</type>s equal? |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal>&&</literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Do the two <type>seg</type>s overlap? |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal>@></literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Does the first <type>seg</type> contain the second? |
| </para></entry> |
| </row> |
| |
| <row> |
| <entry role="func_table_entry"><para role="func_signature"> |
| <type>seg</type> <literal><@</literal> <type>seg</type> |
| <returnvalue>boolean</returnvalue> |
| </para> |
| <para> |
| Is the first <type>seg</type> contained in the second? |
| </para></entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <para> |
| In addition to the above operators, the usual comparison |
| operators shown in <xref linkend="functions-comparison-op-table"/> are |
| available for type <type>seg</type>. These operators |
| first compare (a) to (c), |
| and if these are equal, compare (b) to (d). That results in |
| reasonably good sorting in most cases, which is useful if |
| you want to use ORDER BY with this type. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Notes</title> |
| |
| <para> |
| For examples of usage, see the regression test <filename>sql/seg.sql</filename>. |
| </para> |
| |
| <para> |
| The mechanism that converts <literal>(+-)</literal> to regular ranges |
| isn't completely accurate in determining the number of significant digits |
| for the boundaries. For example, it adds an extra digit to the lower |
| boundary if the resulting interval includes a power of ten: |
| |
| <screen> |
| postgres=> select '10(+-)1'::seg as seg; |
| seg |
| --------- |
| 9.0 .. 11 -- should be: 9 .. 11 |
| </screen> |
| </para> |
| |
| <para> |
| The performance of an R-tree index can largely depend on the initial |
| order of input values. It may be very helpful to sort the input table |
| on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename> |
| for an example. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Credits</title> |
| |
| <para> |
| Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>, |
| Mathematics and Computer Science Division, Argonne National Laboratory. |
| </para> |
| |
| <para> |
| My thanks are primarily to Prof. Joe Hellerstein |
| (<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the |
| gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am |
| also grateful to all Postgres developers, present and past, for enabling |
| myself to create my own world and live undisturbed in it. And I would like |
| to acknowledge my gratitude to Argonne Lab and to the U.S. Department of |
| Energy for the years of faithful support of my database research. |
| </para> |
| |
| </sect2> |
| |
| </sect1> |