doc/src/sgml/ref/explain.sgml - hawq - Git at Google

 <!--
 $PostgreSQL: pgsql/doc/src/sgml/ref/explain.sgml,v 1.38 2006/09/18 19:54:01 tgl Exp $
 PostgreSQL documentation
 -->

 <refentry id="SQL-EXPLAIN">
  <refmeta>
   <refentrytitle id="SQL-EXPLAIN-TITLE">EXPLAIN</refentrytitle>
   <refmiscinfo>SQL - Language Statements</refmiscinfo>
  </refmeta>

  <refnamediv>
   <refname>EXPLAIN</refname>
   <refpurpose>show the execution plan of a statement</refpurpose>
  </refnamediv>

  <indexterm zone="sql-explain">
   <primary>EXPLAIN</primary>
  </indexterm>

  <indexterm zone="sql-explain">
   <primary>prepared statements</primary>
   <secondary>showing the query plan</secondary>
  </indexterm>

  <indexterm zone="sql-explain">
   <primary>cursor</primary>
   <secondary>showing the query plan</secondary>
  </indexterm>

  <refsynopsisdiv>
 <synopsis>
 EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replaceable>
 </synopsis>
  </refsynopsisdiv>

  <refsect1>
   <title>Description</title>

   <para>
    This command displays the execution plan that the
    <productname>PostgreSQL</productname> planner generates for the
    supplied statement.  The execution plan shows how the table(s)
    referenced by the statement will be scanned &mdash; by plain sequential scan,
    index scan, etc. &mdash; and if multiple tables are referenced, what join
    algorithms will be used to bring together the required rows from
    each input table.
   </para>

   <para>
    The most critical part of the display is the estimated statement execution
    cost, which is the planner's guess at how long it will take to run the
    statement (measured in units of disk page fetches).  Actually two numbers
    are shown: the start-up time before the first row can be returned, and
    the total time to return all the rows.  For most queries the total time
    is what matters, but in contexts such as a subquery in <literal>EXISTS</literal>, the planner
    will choose the smallest start-up time instead of the smallest total time
    (since the executor will stop after getting one row, anyway).
    Also, if you limit the number of rows to return with a <literal>LIMIT</literal> clause,
    the planner makes an appropriate interpolation between the endpoint
    costs to estimate which plan is really the cheapest.
   </para>

   <para>
    The <literal>ANALYZE</literal> option causes the statement to be actually executed, not only
    planned.  The total elapsed time expended within each plan node (in
    milliseconds) and total number of rows it actually returned are added to
    the display.  This is useful for seeing whether the planner's estimates
    are close to reality.
   </para>

   <important>
    <para>
     Keep in mind that the statement is actually executed when
     <literal>ANALYZE</literal> is used.  Although
     <command>EXPLAIN</command> will discard any output that a
     <command>SELECT</command> would return, other side effects of the
     statement will happen as usual.  If you wish to use
     <command>EXPLAIN ANALYZE</command> on an
     <command>INSERT</command>, <command>UPDATE</command>,
     <command>DELETE</command>, or <command>EXECUTE</command> statement
     without letting the command affect your data, use this approach:
 <programlisting>
 BEGIN;
 EXPLAIN ANALYZE ...;
 ROLLBACK;
 </programlisting>
    </para>
   </important>
  </refsect1>

  <refsect1>
   <title>Parameters</title>

   <variablelist>
    <varlistentry>
     <term><literal>ANALYZE</literal></term>
     <listitem>
      <para>
       Carry out the command and show the actual run times.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><literal>VERBOSE</literal></term>
     <listitem>
      <para>
       Show the full internal representation of the plan tree, rather
       than just a summary.  Usually this option is only useful for
       specialized debugging purposes.  The
       <literal>VERBOSE</literal> output is either pretty-printed or
       not, depending on the setting of the <xref
       linkend="guc-explain-pretty-print"> configuration parameter.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term><replaceable class="parameter">statement</replaceable></term>
     <listitem>
      <para>
       Any <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>,
       <command>DELETE</>, <command>VALUES</>, <command>EXECUTE</>, or
       <command>DECLARE</> statement, whose execution plan you wish to see.
      </para>
     </listitem>
    </varlistentry>
   </variablelist>
  </refsect1>

  <refsect1>
   <title>Notes</title>

   <para>
    There is only sparse documentation on the optimizer's use of cost
    information in <productname>PostgreSQL</productname>.  Refer to
    <xref linkend="using-explain"> for more information.
   </para>

   <para>
    In order to allow the <productname>PostgreSQL</productname> query
    planner to make reasonably informed decisions when optimizing
    queries, the <command>ANALYZE</command> statement should be run to
    record statistics about the distribution of data within the
    table. If you have not done this (or if the statistical
    distribution of the data in the table has changed significantly
    since the last time <command>ANALYZE</command> was run), the
    estimated costs are unlikely to conform to the real properties of
    the query, and consequently an inferior query plan may be chosen.
   </para>

   <para>
    Genetic query optimization (<acronym>GEQO</acronym>) randomly
    tests execution plans.  Therefore, when the number of tables exceeds
    <varname>geqo_threshold</> causing genetic query optimization to be
    used, the execution plan is likely to change each time the statement
    is executed.
   </para>

  </refsect1>

  <refsect1>
   <title>Examples</title>

   <para>
    To show the plan for a simple query on a table with a single
    <type>integer</type> column and 10000 rows:

 <programlisting>
 EXPLAIN SELECT * FROM foo;

                        QUERY PLAN
 ---------------------------------------------------------
  Seq Scan on foo  (cost=0.00..155.00 rows=10000 width=4)
 (1 row)
 </programlisting>
   </para>

   <para>
    If there is an index and we use a query with an indexable
    <literal>WHERE</literal> condition, <command>EXPLAIN</command>
    might show a different plan:

 <programlisting>
 EXPLAIN SELECT * FROM foo WHERE i = 4;

                          QUERY PLAN
 --------------------------------------------------------------
  Index Scan using fi on foo  (cost=0.00..5.98 rows=1 width=4)
    Index Cond: (i = 4)
 (2 rows)
 </programlisting>
   </para>

   <para>
    And here is an example of a query plan for a query
    using an aggregate function:

 <programlisting>
 EXPLAIN SELECT sum(i) FROM foo WHERE i &lt; 10;

                              QUERY PLAN
 ---------------------------------------------------------------------
  Aggregate  (cost=23.93..23.93 rows=1 width=4)
    -&gt;  Index Scan using fi on foo  (cost=0.00..23.92 rows=6 width=4)
          Index Cond: (i &lt; 10)
 (3 rows)
 </programlisting>
   </para>

   <para>
    Here is an example of using <command>EXPLAIN EXECUTE</command> to
    display the execution plan for a prepared query:

 <programlisting>
 PREPARE query(int, int) AS SELECT sum(bar) FROM test
     WHERE id &gt; $1 AND id &lt; $2
     GROUP BY foo;

 EXPLAIN ANALYZE EXECUTE query(100, 200);

                                                        QUERY PLAN
 -------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 loops=1)
    -&gt;  Index Scan using test_pkey on test  (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99 loops=1)
          Index Cond: ((id &gt; $1) AND (id &lt; $2))
  Total runtime: 0.851 ms
 (4 rows)
 </programlisting>
   </para>

   <para>
    Of course, the specific numbers shown here depend on the actual
    contents of the tables involved.  Also note that the numbers, and
    even the selected query strategy, may vary between
    <productname>PostgreSQL</productname> releases due to planner
    improvements. In addition, the <command>ANALYZE</command> command
    uses random sampling to estimate data statistics; therefore, it is
    possible for cost estimates to change after a fresh run of
    <command>ANALYZE</command>, even if the actual distribution of data
    in the table has not changed.
   </para>
  </refsect1>

  <refsect1>
   <title>Compatibility</title>

   <para>
    There is no <command>EXPLAIN</command> statement defined in the SQL standard.
   </para>
  </refsect1>

  <refsect1>
   <title>See Also</title>

   <simplelist type="inline">
    <member><xref linkend="sql-analyze" endterm="sql-analyze-title"></member>
   </simplelist>
  </refsect1>
 </refentry>
	<!--
	$PostgreSQL: pgsql/doc/src/sgml/ref/explain.sgml,v 1.38 2006/09/18 19:54:01 tgl Exp $
	PostgreSQL documentation
	-->

	<refentry id="SQL-EXPLAIN">
	<refmeta>
	<refentrytitle id="SQL-EXPLAIN-TITLE">EXPLAIN</refentrytitle>
	<refmiscinfo>SQL - Language Statements</refmiscinfo>
	</refmeta>

	<refnamediv>
	<refname>EXPLAIN</refname>
	<refpurpose>show the execution plan of a statement</refpurpose>
	</refnamediv>

	<indexterm zone="sql-explain">
	<primary>EXPLAIN</primary>
	</indexterm>

	<indexterm zone="sql-explain">
	<primary>prepared statements</primary>
	<secondary>showing the query plan</secondary>
	</indexterm>

	<indexterm zone="sql-explain">
	<primary>cursor</primary>
	<secondary>showing the query plan</secondary>
	</indexterm>

	<refsynopsisdiv>
	<synopsis>
	EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replaceable>
	</synopsis>
	</refsynopsisdiv>

	<refsect1>
	<title>Description</title>

	<para>
	This command displays the execution plan that the
	<productname>PostgreSQL</productname> planner generates for the
	supplied statement. The execution plan shows how the table(s)
	referenced by the statement will be scanned — by plain sequential scan,
	index scan, etc. — and if multiple tables are referenced, what join
	algorithms will be used to bring together the required rows from
	each input table.
	</para>

	<para>
	The most critical part of the display is the estimated statement execution
	cost, which is the planner's guess at how long it will take to run the
	statement (measured in units of disk page fetches). Actually two numbers
	are shown: the start-up time before the first row can be returned, and
	the total time to return all the rows. For most queries the total time
	is what matters, but in contexts such as a subquery in <literal>EXISTS</literal>, the planner
	will choose the smallest start-up time instead of the smallest total time
	(since the executor will stop after getting one row, anyway).
	Also, if you limit the number of rows to return with a <literal>LIMIT</literal> clause,
	the planner makes an appropriate interpolation between the endpoint
	costs to estimate which plan is really the cheapest.
	</para>

	<para>
	The <literal>ANALYZE</literal> option causes the statement to be actually executed, not only
	planned. The total elapsed time expended within each plan node (in
	milliseconds) and total number of rows it actually returned are added to
	the display. This is useful for seeing whether the planner's estimates
	are close to reality.
	</para>

	<important>
	<para>
	Keep in mind that the statement is actually executed when
	<literal>ANALYZE</literal> is used. Although
	<command>EXPLAIN</command> will discard any output that a
	<command>SELECT</command> would return, other side effects of the
	statement will happen as usual. If you wish to use
	<command>EXPLAIN ANALYZE</command> on an
	<command>INSERT</command>, <command>UPDATE</command>,
	<command>DELETE</command>, or <command>EXECUTE</command> statement
	without letting the command affect your data, use this approach:
	<programlisting>
	BEGIN;
	EXPLAIN ANALYZE ...;
	ROLLBACK;
	</programlisting>
	</para>
	</important>
	</refsect1>

	<refsect1>
	<title>Parameters</title>

	<variablelist>
	<varlistentry>
	<term><literal>ANALYZE</literal></term>
	<listitem>
	<para>
	Carry out the command and show the actual run times.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry>
	<term><literal>VERBOSE</literal></term>
	<listitem>
	<para>
	Show the full internal representation of the plan tree, rather
	than just a summary. Usually this option is only useful for
	specialized debugging purposes. The
	<literal>VERBOSE</literal> output is either pretty-printed or
	not, depending on the setting of the <xref
	linkend="guc-explain-pretty-print"> configuration parameter.
	</para>
	</listitem>
	</varlistentry>

	<varlistentry>
	<term><replaceable class="parameter">statement</replaceable></term>
	<listitem>
	<para>
	Any <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>,
	<command>DELETE</>, <command>VALUES</>, <command>EXECUTE</>, or
	<command>DECLARE</> statement, whose execution plan you wish to see.
	</para>
	</listitem>
	</varlistentry>
	</variablelist>
	</refsect1>

	<refsect1>
	<title>Notes</title>

	<para>
	There is only sparse documentation on the optimizer's use of cost
	information in <productname>PostgreSQL</productname>. Refer to
	<xref linkend="using-explain"> for more information.
	</para>

	<para>
	In order to allow the <productname>PostgreSQL</productname> query
	planner to make reasonably informed decisions when optimizing
	queries, the <command>ANALYZE</command> statement should be run to
	record statistics about the distribution of data within the
	table. If you have not done this (or if the statistical
	distribution of the data in the table has changed significantly
	since the last time <command>ANALYZE</command> was run), the
	estimated costs are unlikely to conform to the real properties of
	the query, and consequently an inferior query plan may be chosen.
	</para>

	<para>
	Genetic query optimization (<acronym>GEQO</acronym>) randomly
	tests execution plans. Therefore, when the number of tables exceeds
	<varname>geqo_threshold</> causing genetic query optimization to be
	used, the execution plan is likely to change each time the statement
	is executed.
	</para>

	</refsect1>

	<refsect1>
	<title>Examples</title>

	<para>
	To show the plan for a simple query on a table with a single
	<type>integer</type> column and 10000 rows:

	<programlisting>
	EXPLAIN SELECT * FROM foo;

	QUERY PLAN
	---------------------------------------------------------
	Seq Scan on foo (cost=0.00..155.00 rows=10000 width=4)
	(1 row)
	</programlisting>
	</para>

	<para>
	If there is an index and we use a query with an indexable
	<literal>WHERE</literal> condition, <command>EXPLAIN</command>
	might show a different plan:

	<programlisting>
	EXPLAIN SELECT * FROM foo WHERE i = 4;

	QUERY PLAN
	--------------------------------------------------------------
	Index Scan using fi on foo (cost=0.00..5.98 rows=1 width=4)
	Index Cond: (i = 4)
	(2 rows)
	</programlisting>
	</para>

	<para>
	And here is an example of a query plan for a query
	using an aggregate function:

	<programlisting>
	EXPLAIN SELECT sum(i) FROM foo WHERE i < 10;

	QUERY PLAN
	---------------------------------------------------------------------
	Aggregate (cost=23.93..23.93 rows=1 width=4)
	-> Index Scan using fi on foo (cost=0.00..23.92 rows=6 width=4)
	Index Cond: (i < 10)
	(3 rows)
	</programlisting>
	</para>

	<para>
	Here is an example of using <command>EXPLAIN EXECUTE</command> to
	display the execution plan for a prepared query:

	<programlisting>
	PREPARE query(int, int) AS SELECT sum(bar) FROM test
	WHERE id > $1 AND id < $2
	GROUP BY foo;

	EXPLAIN ANALYZE EXECUTE query(100, 200);

	QUERY PLAN
	-------------------------------------------------------------------------------------------------------------------------
	HashAggregate (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 loops=1)
	-> Index Scan using test_pkey on test (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99 loops=1)
	Index Cond: ((id > $1) AND (id < $2))
	Total runtime: 0.851 ms
	(4 rows)
	</programlisting>
	</para>

	<para>
	Of course, the specific numbers shown here depend on the actual
	contents of the tables involved. Also note that the numbers, and
	even the selected query strategy, may vary between
	<productname>PostgreSQL</productname> releases due to planner
	improvements. In addition, the <command>ANALYZE</command> command
	uses random sampling to estimate data statistics; therefore, it is
	possible for cost estimates to change after a fresh run of
	<command>ANALYZE</command>, even if the actual distribution of data
	in the table has not changed.
	</para>
	</refsect1>

	<refsect1>
	<title>Compatibility</title>

	<para>
	There is no <command>EXPLAIN</command> statement defined in the SQL standard.
	</para>
	</refsect1>

	<refsect1>
	<title>See Also</title>

	<simplelist type="inline">
	<member><xref linkend="sql-analyze" endterm="sql-analyze-title"></member>
	</simplelist>
	</refsect1>
	</refentry>