doc/src/sgml/ref/analyze.sgml - cloudberry - Git at Google

 <!--
 doc/src/sgml/ref/analyze.sgml
 PostgreSQL documentation
 -->

 <refentry id="sql-analyze">
  <indexterm zone="sql-analyze">
   <primary>ANALYZE</primary>
  </indexterm>

  <refmeta>
   <refentrytitle>ANALYZE</refentrytitle>
   <manvolnum>7</manvolnum>
   <refmiscinfo>SQL - Language Statements</refmiscinfo>
  </refmeta>

  <refnamediv>
   <refname>ANALYZE</refname>
   <refpurpose>collect statistics about a database</refpurpose>
  </refnamediv>

  <refsynopsisdiv>
 <synopsis>
 ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
 ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]

 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>

     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]

 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>

     <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
 </synopsis>
  </refsynopsisdiv>

  <refsect1>
   <title>Description</title>

   <para>
    <command>ANALYZE</command> collects statistics about the contents
    of tables in the database, and stores the results in the <link
    linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
    system catalog.  Subsequently, the query planner uses these
    statistics to help determine the most efficient execution plans for
    queries.
   </para>

   <para>
    Without a <replaceable class="parameter">table_and_columns</replaceable>
    list, <command>ANALYZE</command> processes every table and materialized view
    in the current database that the current user has permission to analyze.
    With a list, <command>ANALYZE</command> processes only those table(s).
    It is further possible to give a list of column names for a table,
    in which case only the statistics for those columns are collected.
   </para>

   <para>
    When the option list is surrounded by parentheses, the options can be
    written in any order.  The parenthesized syntax was added in
    <productname>PostgreSQL</productname> 11;  the unparenthesized syntax
    is deprecated.
   </para>
  </refsect1>

  <refsect1>
   <title>Parameters</title>

   <variablelist>
    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
      <para>
       Enables display of progress messages.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><literal>SKIP_LOCKED</literal></term>
     <listitem>
      <para>
       Specifies that <command>ANALYZE</command> should not wait for any
       conflicting locks to be released when beginning work on a relation:
       if a relation cannot be locked immediately without waiting, the relation
       is skipped.  Note that even with this option, <command>ANALYZE</command>
       may still block when opening the relation's indexes or when acquiring
       sample rows from partitions, table inheritance children, and some
       types of foreign tables.  Also, while <command>ANALYZE</command>
       ordinarily processes all partitions of specified partitioned tables,
       this option will cause <command>ANALYZE</command> to skip all
       partitions if there is a conflicting lock on the partitioned table.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
      <para>
       Specifies whether the selected option should be turned on or off.
       You can write <literal>TRUE</literal>, <literal>ON</literal>, or
       <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
       <literal>OFF</literal>, or <literal>0</literal> to disable it.  The
       <replaceable class="parameter">boolean</replaceable> value can also
       be omitted, in which case <literal>TRUE</literal> is assumed.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><replaceable class="parameter">table_name</replaceable></term>
     <listitem>
      <para>
       The name (possibly schema-qualified) of a specific table to
       analyze.  If omitted, all regular tables, partitioned tables, and
       materialized views in the current database are analyzed (but not
       foreign tables).  If the specified table is a partitioned table, both the
       inheritance statistics of the partitioned table as a whole and
       statistics of the individual partitions are updated.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><replaceable class="parameter">column_name</replaceable></term>
     <listitem>
      <para>
       The name of a specific column to analyze. Defaults to all columns.
      </para>
     </listitem>
    </varlistentry>
   </variablelist>
  </refsect1>

  <refsect1>
   <title>Outputs</title>

    <para>
     When <literal>VERBOSE</literal> is specified, <command>ANALYZE</command> emits
     progress messages to indicate which table is currently being
     processed.  Various statistics about the tables are printed as well.
    </para>
  </refsect1>

  <refsect1>
   <title>Notes</title>

   <para>
    To analyze a table, one must ordinarily be the table's owner or a
    superuser.  However, database owners are allowed to
    analyze all tables in their databases, except shared catalogs.
    (The restriction for shared catalogs means that a true database-wide
    <command>ANALYZE</command> can only be performed by a superuser.)
    <command>ANALYZE</command> will skip over any tables that the calling user
    does not have permission to analyze.
   </para>

   <para>
    Foreign tables are analyzed only when explicitly selected.  Not all
    foreign data wrappers support <command>ANALYZE</command>.  If the table's
    wrapper does not support <command>ANALYZE</command>, the command prints a
    warning and does nothing.
   </para>

   <para>
    In the default <productname>PostgreSQL</productname> configuration,
    the autovacuum daemon (see <xref linkend="autovacuum"/>)
    takes care of automatic analyzing of tables when they are first loaded
    with data, and as they change throughout regular operation.
    When autovacuum is disabled,
    it is a good idea to run <command>ANALYZE</command> periodically, or
    just after making major changes in the contents of a table.  Accurate
    statistics will help the planner to choose the most appropriate query
    plan, and thereby improve the speed of query processing.  A common
    strategy for read-mostly databases is to run <link linkend="sql-vacuum"><command>VACUUM</command></link>
    and <command>ANALYZE</command> once a day during a low-usage time of day.
    (This will not be sufficient if there is heavy update activity.)
   </para>

   <para>
    <command>ANALYZE</command>
    requires only a read lock on the target table, so it can run in
    parallel with other activity on the table.
   </para>

   <para>
    The statistics collected by <command>ANALYZE</command> usually
    include a list of some of the most common values in each column and
    a histogram showing the approximate data distribution in each
    column.  One or both of these can be omitted if
    <command>ANALYZE</command> deems them uninteresting (for example,
    in a unique-key column, there are no common values) or if the
    column data type does not support the appropriate operators.  There
    is more information about the statistics in <xref
    linkend="maintenance"/>.
   </para>

   <para>
    For large tables, <command>ANALYZE</command> takes a random sample
    of the table contents, rather than examining every row.  This
    allows even very large tables to be analyzed in a small amount of
    time.  Note, however, that the statistics are only approximate, and
    will change slightly each time <command>ANALYZE</command> is run,
    even if the actual table contents did not change.  This might result
    in small changes in the planner's estimated costs shown by
    <link linkend="sql-explain"><command>EXPLAIN</command></link>.
    In rare situations, this non-determinism will cause the planner's
    choices of query plans to change after <command>ANALYZE</command> is run.
    To avoid this, raise the amount of statistics collected by
    <command>ANALYZE</command>, as described below.
   </para>

   <para>
    The extent of analysis can be controlled by adjusting the
    <xref linkend="guc-default-statistics-target"/> configuration variable, or
    on a column-by-column basis by setting the per-column statistics
    target with <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET
    STATISTICS</command></link>.
    The target value sets the
    maximum number of entries in the most-common-value list and the
    maximum number of bins in the histogram.  The default target value
    is 100, but this can be adjusted up or down to trade off accuracy of
    planner estimates against the time taken for
    <command>ANALYZE</command> and the amount of space occupied in
    <literal>pg_statistic</literal>.  In particular, setting the
    statistics target to zero disables collection of statistics for
    that column.  It might be useful to do that for columns that are
    never used as part of the <literal>WHERE</literal>, <literal>GROUP BY</literal>,
    or <literal>ORDER BY</literal> clauses of queries, since the planner will
    have no use for statistics on such columns.
   </para>

   <para>
    The largest statistics target among the columns being analyzed determines
    the number of table rows sampled to prepare the statistics.  Increasing
    the target causes a proportional increase in the time and space needed
    to do <command>ANALYZE</command>.
   </para>

   <para>
    One of the values estimated by <command>ANALYZE</command> is the number of
    distinct values that appear in each column.  Because only a subset of the
    rows are examined, this estimate can sometimes be quite inaccurate, even
    with the largest possible statistics target.  If this inaccuracy leads to
    bad query plans, a more accurate value can be determined manually and then
    installed with
    <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</command></link>.
   </para>

   <para>
     If the table being analyzed has one or more children,
     <command>ANALYZE</command> will gather statistics twice: once on the
     rows of the parent table only, and a second time on the rows of the
     parent table with all of its children.  This second set of statistics
     is needed when planning queries that traverse the entire inheritance
     tree.  The autovacuum daemon, however, will only consider inserts or
     updates on the parent table itself when deciding whether to trigger an
     automatic analyze for that table.  If that table is rarely inserted into
     or updated, the inheritance statistics will not be up to date unless you
     run <command>ANALYZE</command> manually.
   </para>

   <para>
     For partitioned tables, <command>ANALYZE</command> gathers statistics by
     sampling rows from all partitions; in addition, it will recurse into each
     partition and update its statistics.  Each leaf partition is analyzed only
     once, even with multi-level partitioning.  No statistics are collected for
     only the parent table (without data from its partitions), because with
     partitioning it's guaranteed to be empty.
   </para>

   <para>
     By contrast, if the table being analyzed has inheritance children,
     <command>ANALYZE</command> gathers two sets of statistics: one on the rows
     of the parent table only, and a second including rows of both the parent
     table and all of its children.  This second set of statistics is needed when
     planning queries that process the inheritance tree as a whole.  The child
     tables themselves are not individually analyzed in this case.
   </para>

   <para>
     The autovacuum daemon does not process partitioned tables, nor does it
     process inheritance parents if only the children are ever modified.
     It is usually necessary to periodically run a manual
     <command>ANALYZE</command> to keep the statistics of the table hierarchy
     up to date.
   </para>

   <para>
     If any child tables or partitions are foreign tables whose foreign
     data wrappers do not support <command>ANALYZE</command>, those tables are
     ignored while gathering inheritance statistics.
   </para>

   <para>
     If the table being analyzed is completely empty, <command>ANALYZE</command>
     will not record new statistics for that table.  Any existing statistics
     will be retained.
   </para>

   <para>
     Each backend running <command>ANALYZE</command> will report its progress
     in the <structname>pg_stat_progress_analyze</structname> view. See
     <xref linkend="analyze-progress-reporting"/> for details.
   </para>
  </refsect1>

  <refsect1>
   <title>Compatibility</title>

   <para>
    There is no <command>ANALYZE</command> statement in the SQL standard.
   </para>
  </refsect1>

  <refsect1>
   <title>See Also</title>

   <simplelist type="inline">
    <member><xref linkend="sql-vacuum"/></member>
    <member><xref linkend="app-vacuumdb"/></member>
    <member><xref linkend="runtime-config-resource-vacuum-cost"/></member>
    <member><xref linkend="autovacuum"/></member>
    <member><xref linkend="analyze-progress-reporting"/></member>
   </simplelist>
  </refsect1>
 </refentry>
	<!--
	doc/src/sgml/ref/analyze.sgml
	PostgreSQL documentation
	-->

	<refentry id="sql-analyze">
	<indexterm zone="sql-analyze">
	<primary>ANALYZE</primary>
	</indexterm>

	<refmeta>
	<refentrytitle>ANALYZE</refentrytitle>
	<manvolnum>7</manvolnum>
	<refmiscinfo>SQL - Language Statements</refmiscinfo>
	</refmeta>

	<refnamediv>
	<refname>ANALYZE</refname>
	<refpurpose>collect statistics about a database</refpurpose>
	</refnamediv>

	<refsynopsisdiv>
	<synopsis>
	ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
	ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]

	<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>

	VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
	SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]

	<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>

	<replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
	</synopsis>
	</refsynopsisdiv>

	<refsect1>
	<title>Description</title>

	<para>
	<command>ANALYZE</command> collects statistics about the contents
	of tables in the database, and stores the results in the <link
	linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
	system catalog. Subsequently, the query planner uses these
	statistics to help determine the most efficient execution plans for
	queries.
	</para>

	<para>
	Without a <replaceable class="parameter">table_and_columns</replaceable>
	list, <command>ANALYZE</command> processes every table and materialized view
	in the current database that the current user has permission to analyze.
	With a list, <command>ANALYZE</command> processes only those table(s).
	It is further possible to give a list of column names for a table,
	in which case only the statistics for those columns are collected.
	</para>

	<para>
	When the option list is surrounded by parentheses, the options can be
	written in any order. The parenthesized syntax was added in
	<productname>PostgreSQL</productname> 11; the unparenthesized syntax
	is deprecated.
	</para>
	</refsect1>

	<refsect1>
	<title>Parameters</title>

	<variablelist>
	<varlistentry>
	<term><literal>VERBOSE</literal></term>
	<listitem>
	<para>
	Enables display of progress messages.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry>
	<term><literal>SKIP_LOCKED</literal></term>
	<listitem>
	<para>
	Specifies that <command>ANALYZE</command> should not wait for any
	conflicting locks to be released when beginning work on a relation:
	if a relation cannot be locked immediately without waiting, the relation
	is skipped. Note that even with this option, <command>ANALYZE</command>
	may still block when opening the relation's indexes or when acquiring
	sample rows from partitions, table inheritance children, and some
	types of foreign tables. Also, while <command>ANALYZE</command>
	ordinarily processes all partitions of specified partitioned tables,
	this option will cause <command>ANALYZE</command> to skip all
	partitions if there is a conflicting lock on the partitioned table.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry>
	<term><replaceable class="parameter">boolean</replaceable></term>
	<listitem>
	<para>
	Specifies whether the selected option should be turned on or off.
	You can write <literal>TRUE</literal>, <literal>ON</literal>, or
	<literal>1</literal> to enable the option, and <literal>FALSE</literal>,
	<literal>OFF</literal>, or <literal>0</literal> to disable it. The
	<replaceable class="parameter">boolean</replaceable> value can also
	be omitted, in which case <literal>TRUE</literal> is assumed.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry>
	<term><replaceable class="parameter">table_name</replaceable></term>
	<listitem>
	<para>
	The name (possibly schema-qualified) of a specific table to
	analyze. If omitted, all regular tables, partitioned tables, and
	materialized views in the current database are analyzed (but not
	foreign tables). If the specified table is a partitioned table, both the
	inheritance statistics of the partitioned table as a whole and
	statistics of the individual partitions are updated.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry>
	<term><replaceable class="parameter">column_name</replaceable></term>
	<listitem>
	<para>
	The name of a specific column to analyze. Defaults to all columns.
	</para>
	</listitem>
	</varlistentry>
	</variablelist>
	</refsect1>

	<refsect1>
	<title>Outputs</title>

	<para>
	When <literal>VERBOSE</literal> is specified, <command>ANALYZE</command> emits
	progress messages to indicate which table is currently being
	processed. Various statistics about the tables are printed as well.
	</para>
	</refsect1>

	<refsect1>
	<title>Notes</title>

	<para>
	To analyze a table, one must ordinarily be the table's owner or a
	superuser. However, database owners are allowed to
	analyze all tables in their databases, except shared catalogs.
	(The restriction for shared catalogs means that a true database-wide
	<command>ANALYZE</command> can only be performed by a superuser.)
	<command>ANALYZE</command> will skip over any tables that the calling user
	does not have permission to analyze.
	</para>

	<para>
	Foreign tables are analyzed only when explicitly selected. Not all
	foreign data wrappers support <command>ANALYZE</command>. If the table's
	wrapper does not support <command>ANALYZE</command>, the command prints a
	warning and does nothing.
	</para>

	<para>
	In the default <productname>PostgreSQL</productname> configuration,
	the autovacuum daemon (see <xref linkend="autovacuum"/>)
	takes care of automatic analyzing of tables when they are first loaded
	with data, and as they change throughout regular operation.
	When autovacuum is disabled,
	it is a good idea to run <command>ANALYZE</command> periodically, or
	just after making major changes in the contents of a table. Accurate
	statistics will help the planner to choose the most appropriate query
	plan, and thereby improve the speed of query processing. A common
	strategy for read-mostly databases is to run <link linkend="sql-vacuum"><command>VACUUM</command></link>
	and <command>ANALYZE</command> once a day during a low-usage time of day.
	(This will not be sufficient if there is heavy update activity.)
	</para>

	<para>
	<command>ANALYZE</command>
	requires only a read lock on the target table, so it can run in
	parallel with other activity on the table.
	</para>

	<para>
	The statistics collected by <command>ANALYZE</command> usually
	include a list of some of the most common values in each column and
	a histogram showing the approximate data distribution in each
	column. One or both of these can be omitted if
	<command>ANALYZE</command> deems them uninteresting (for example,
	in a unique-key column, there are no common values) or if the
	column data type does not support the appropriate operators. There
	is more information about the statistics in <xref
	linkend="maintenance"/>.
	</para>

	<para>
	For large tables, <command>ANALYZE</command> takes a random sample
	of the table contents, rather than examining every row. This
	allows even very large tables to be analyzed in a small amount of
	time. Note, however, that the statistics are only approximate, and
	will change slightly each time <command>ANALYZE</command> is run,
	even if the actual table contents did not change. This might result
	in small changes in the planner's estimated costs shown by
	<link linkend="sql-explain"><command>EXPLAIN</command></link>.
	In rare situations, this non-determinism will cause the planner's
	choices of query plans to change after <command>ANALYZE</command> is run.
	To avoid this, raise the amount of statistics collected by
	<command>ANALYZE</command>, as described below.
	</para>

	<para>
	The extent of analysis can be controlled by adjusting the
	<xref linkend="guc-default-statistics-target"/> configuration variable, or
	on a column-by-column basis by setting the per-column statistics
	target with <link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET
	STATISTICS</command></link>.
	The target value sets the
	maximum number of entries in the most-common-value list and the
	maximum number of bins in the histogram. The default target value
	is 100, but this can be adjusted up or down to trade off accuracy of
	planner estimates against the time taken for
	<command>ANALYZE</command> and the amount of space occupied in
	<literal>pg_statistic</literal>. In particular, setting the
	statistics target to zero disables collection of statistics for
	that column. It might be useful to do that for columns that are
	never used as part of the <literal>WHERE</literal>, <literal>GROUP BY</literal>,
	or <literal>ORDER BY</literal> clauses of queries, since the planner will
	have no use for statistics on such columns.
	</para>

	<para>
	The largest statistics target among the columns being analyzed determines
	the number of table rows sampled to prepare the statistics. Increasing
	the target causes a proportional increase in the time and space needed
	to do <command>ANALYZE</command>.
	</para>

	<para>
	One of the values estimated by <command>ANALYZE</command> is the number of
	distinct values that appear in each column. Because only a subset of the
	rows are examined, this estimate can sometimes be quite inaccurate, even
	with the largest possible statistics target. If this inaccuracy leads to
	bad query plans, a more accurate value can be determined manually and then
	installed with
	<link linkend="sql-altertable"><command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</command></link>.
	</para>

	<para>
	If the table being analyzed has one or more children,
	<command>ANALYZE</command> will gather statistics twice: once on the
	rows of the parent table only, and a second time on the rows of the
	parent table with all of its children. This second set of statistics
	is needed when planning queries that traverse the entire inheritance
	tree. The autovacuum daemon, however, will only consider inserts or
	updates on the parent table itself when deciding whether to trigger an
	automatic analyze for that table. If that table is rarely inserted into
	or updated, the inheritance statistics will not be up to date unless you
	run <command>ANALYZE</command> manually.
	</para>

	<para>
	For partitioned tables, <command>ANALYZE</command> gathers statistics by
	sampling rows from all partitions; in addition, it will recurse into each
	partition and update its statistics. Each leaf partition is analyzed only
	once, even with multi-level partitioning. No statistics are collected for
	only the parent table (without data from its partitions), because with
	partitioning it's guaranteed to be empty.
	</para>

	<para>
	By contrast, if the table being analyzed has inheritance children,
	<command>ANALYZE</command> gathers two sets of statistics: one on the rows
	of the parent table only, and a second including rows of both the parent
	table and all of its children. This second set of statistics is needed when
	planning queries that process the inheritance tree as a whole. The child
	tables themselves are not individually analyzed in this case.
	</para>

	<para>
	The autovacuum daemon does not process partitioned tables, nor does it
	process inheritance parents if only the children are ever modified.
	It is usually necessary to periodically run a manual
	<command>ANALYZE</command> to keep the statistics of the table hierarchy
	up to date.
	</para>

	<para>
	If any child tables or partitions are foreign tables whose foreign
	data wrappers do not support <command>ANALYZE</command>, those tables are
	ignored while gathering inheritance statistics.
	</para>

	<para>
	If the table being analyzed is completely empty, <command>ANALYZE</command>
	will not record new statistics for that table. Any existing statistics
	will be retained.
	</para>

	<para>
	Each backend running <command>ANALYZE</command> will report its progress
	in the <structname>pg_stat_progress_analyze</structname> view. See
	<xref linkend="analyze-progress-reporting"/> for details.
	</para>
	</refsect1>

	<refsect1>
	<title>Compatibility</title>

	<para>
	There is no <command>ANALYZE</command> statement in the SQL standard.
	</para>
	</refsect1>

	<refsect1>
	<title>See Also</title>

	<simplelist type="inline">
	<member><xref linkend="sql-vacuum"/></member>
	<member><xref linkend="app-vacuumdb"/></member>
	<member><xref linkend="runtime-config-resource-vacuum-cost"/></member>
	<member><xref linkend="autovacuum"/></member>
	<member><xref linkend="analyze-progress-reporting"/></member>
	</simplelist>
	</refsect1>
	</refentry>