| <!-- |
| doc/src/sgml/ref/cluster.sgml |
| PostgreSQL documentation |
| --> |
| |
| <refentry id="sql-cluster"> |
| <indexterm zone="sql-cluster"> |
| <primary>CLUSTER</primary> |
| </indexterm> |
| |
| <refmeta> |
| <refentrytitle>CLUSTER</refentrytitle> |
| <manvolnum>7</manvolnum> |
| <refmiscinfo>SQL - Language Statements</refmiscinfo> |
| </refmeta> |
| |
| <refnamediv> |
| <refname>CLUSTER</refname> |
| <refpurpose>cluster a heap storage table according to an index</refpurpose> |
| </refnamediv> |
| |
| <refsynopsisdiv> |
| <synopsis> |
| CLUSTER [VERBOSE] <replaceable class="parameter">table_name</replaceable> [ USING <replaceable class="parameter">index_name</replaceable> ] |
| CLUSTER ( <replaceable class="parameter">option</replaceable> [, ...] ) <replaceable class="parameter">table_name</replaceable> [ USING <replaceable class="parameter">index_name</replaceable> ] |
| CLUSTER [VERBOSE] |
| |
| <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase> |
| |
| VERBOSE [ <replaceable class="parameter">boolean</replaceable> ] |
| </synopsis> |
| </refsynopsisdiv> |
| |
| <refsect1> |
| <title>Description</title> |
| |
| <para> |
| <command>CLUSTER</command> instructs <productname>PostgreSQL</productname> |
| to cluster the table specified |
| by <replaceable class="parameter">table_name</replaceable> |
| based on the index specified by |
| <replaceable class="parameter">index_name</replaceable>. The index must |
| already have been defined on |
| <replaceable class="parameter">table_name</replaceable>. |
| </para> |
| |
| <para> |
| When a table is clustered, it is physically reordered |
| based on the index information. Clustering is a one-time operation: |
| when the table is subsequently updated, the changes are |
| not clustered. That is, no attempt is made to store new or |
| updated rows according to their index order. (If one wishes, one can |
| periodically recluster by issuing the command again. Also, setting |
| the table's <literal>fillfactor</literal> storage parameter to less than |
| 100% can aid in preserving cluster ordering during updates, since updated |
| rows are kept on the same page if enough space is available there.) |
| </para> |
| |
| <para> |
| When a table is clustered, <productname>PostgreSQL</productname> |
| remembers which index it was clustered by. The form |
| <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command> |
| reclusters the table using the same index as before. You can also |
| use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal> |
| forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for |
| future cluster operations, or to clear any previous setting. |
| </para> |
| |
| <para> |
| <command>CLUSTER</command> without any parameter reclusters all the |
| previously-clustered tables in the current database that the calling user |
| owns, or all such tables if called by a superuser. This |
| form of <command>CLUSTER</command> cannot be executed inside a transaction |
| block. |
| </para> |
| |
| <para> |
| When a table is being clustered, an <literal>ACCESS |
| EXCLUSIVE</literal> lock is acquired on it. This prevents any other |
| database operations (both reads and writes) from operating on the |
| table until the <command>CLUSTER</command> is finished. |
| </para> |
| </refsect1> |
| |
| <refsect1> |
| <title>Parameters</title> |
| |
| <variablelist> |
| <varlistentry> |
| <term><replaceable class="parameter">table_name</replaceable></term> |
| <listitem> |
| <para> |
| The name (possibly schema-qualified) of a table. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><replaceable class="parameter">index_name</replaceable></term> |
| <listitem> |
| <para> |
| The name of an index. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>VERBOSE</literal></term> |
| <listitem> |
| <para> |
| Prints a progress report as each table is clustered. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><replaceable class="parameter">boolean</replaceable></term> |
| <listitem> |
| <para> |
| Specifies whether the selected option should be turned on or off. |
| You can write <literal>TRUE</literal>, <literal>ON</literal>, or |
| <literal>1</literal> to enable the option, and <literal>FALSE</literal>, |
| <literal>OFF</literal>, or <literal>0</literal> to disable it. The |
| <replaceable class="parameter">boolean</replaceable> value can also |
| be omitted, in which case <literal>TRUE</literal> is assumed. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| </refsect1> |
| |
| <refsect1> |
| <title>Notes</title> |
| |
| <para> |
| <command>CLUSTER</command> loses all visibility information of tuples, |
| which makes the table look empty to any snapshot that was taken |
| before the <command>CLUSTER</command> command finished. That makes |
| <command>CLUSTER</command> unsuitable for applications where |
| transactions that access the table being clustered are run concurrently |
| with <command>CLUSTER</command>. This is most visible with serializable |
| transactions, because they take only one snapshot at the beginning of the |
| transaction, but read-committed transactions are also affected. |
| </para> |
| |
| <para> |
| In cases where you are accessing single rows randomly |
| within a table, the actual order of the data in the |
| table is unimportant. However, if you tend to access some |
| data more than others, and there is an index that groups |
| them together, you will benefit from using <command>CLUSTER</command>. |
| If you are requesting a range of indexed values from a table, or a |
| single indexed value that has multiple rows that match, |
| <command>CLUSTER</command> will help because once the index identifies the |
| table page for the first row that matches, all other rows |
| that match are probably already on the same table page, |
| and so you save disk accesses and speed up the query. |
| </para> |
| |
| <para> |
| <command>CLUSTER</command> can re-sort the table using either an index scan |
| on the specified index, or (if the index is a b-tree) a sequential |
| scan followed by sorting. It will attempt to choose the method that |
| will be faster, based on planner cost parameters and available statistical |
| information. |
| </para> |
| |
| <para> |
| When an index scan is used, a temporary copy of the table is created that |
| contains the table data in the index order. Temporary copies of each |
| index on the table are created as well. Therefore, you need free space on |
| disk at least equal to the sum of the table size and the index sizes. |
| </para> |
| |
| <para> |
| When a sequential scan and sort is used, a temporary sort file is |
| also created, so that the peak temporary space requirement is as much |
| as double the table size, plus the index sizes. This method is often |
| faster than the index scan method, but if the disk space requirement is |
| intolerable, you can disable this choice by temporarily setting <xref |
| linkend="guc-enable-sort"/> to <literal>off</literal>. |
| </para> |
| |
| <para> |
| It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to |
| a reasonably large value (but not more than the amount of RAM you can |
| dedicate to the <command>CLUSTER</command> operation) before clustering. |
| </para> |
| |
| <para> |
| Because the planner records statistics about the ordering of |
| tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link> |
| on the newly clustered table. |
| Otherwise, the planner might make poor choices of query plans. |
| </para> |
| |
| <para> |
| Because <command>CLUSTER</command> remembers which indexes are clustered, |
| one can cluster the tables one wants clustered manually the first time, |
| then set up a periodic maintenance script that executes |
| <command>CLUSTER</command> without any parameters, so that the desired tables |
| are periodically reclustered. |
| </para> |
| |
| <para> |
| Each backend running <command>CLUSTER</command> will report its progress |
| in the <structname>pg_stat_progress_cluster</structname> view. See |
| <xref linkend="cluster-progress-reporting"/> for details. |
| </para> |
| </refsect1> |
| |
| <refsect1> |
| <title>Examples</title> |
| |
| <para> |
| Cluster the table <literal>employees</literal> on the basis of |
| its index <literal>employees_ind</literal>: |
| <programlisting> |
| CLUSTER employees USING employees_ind; |
| </programlisting> |
| </para> |
| |
| <para> |
| Cluster the <literal>employees</literal> table using the same |
| index that was used before: |
| <programlisting> |
| CLUSTER employees; |
| </programlisting> |
| </para> |
| |
| <para> |
| Cluster all tables in the database that have previously been clustered: |
| <programlisting> |
| CLUSTER; |
| </programlisting></para> |
| </refsect1> |
| |
| <refsect1> |
| <title>Compatibility</title> |
| |
| <para> |
| There is no <command>CLUSTER</command> statement in the SQL standard. |
| </para> |
| |
| <para> |
| The syntax |
| <synopsis> |
| CLUSTER <replaceable class="parameter">index_name</replaceable> ON <replaceable class="parameter">table_name</replaceable> |
| </synopsis> |
| is also supported for compatibility with pre-8.3 <productname>PostgreSQL</productname> |
| versions. |
| </para> |
| </refsect1> |
| |
| <refsect1> |
| <title>See Also</title> |
| |
| <simplelist type="inline"> |
| <member><xref linkend="app-clusterdb"/></member> |
| <member><xref linkend="cluster-progress-reporting"/></member> |
| </simplelist> |
| </refsect1> |
| </refentry> |