reference/cli/admin_utilities/analyzedb.html.md.erb - hawq-docs - Git at Google

 ---
 title: analyzedb
 ---

 A utility that performs `ANALYZE` operations on tables incrementally and concurrently.

 ## <a id="topic1__section2"></a>Synopsis

 ``` pre
 analyzedb -d <dbname> -s <schema>
    [ --full ]
    [ -l | --list ]
    [ -p <parallel-level> ]
    [ -v | --verbose ]
    [ -a ]

 analyzedb -d <dbname> -t <schema>.<table>
    [ -i col1[, col2, ...] | -x col1[, col2, ...] ]
    [ --full ]
    [ -l | --list ]
    [ -p <parallel-level> ]
    [ -v | --verbose ]
    [ -a ]

 analyzedb -d <dbname> -f <config-file> | --file <config-file>
    [ --full ]
    [ -l | --list ]
    [ -p <parallel-level> ]
    [ -v | --verbose ]
    [ -a ]

 analyzedb -d <dbname> --clean_last | --clean_all

 analyzedb --version

 analyzedb  -? | -h | --help
 ```

 ## <a id="topic1__section3"></a>Description

 The `analyzedb` utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently.

 While performing `ANALYZE` operations, `analyzedb` creates a snapshot of the table metadata and stores it on disk on the master host. An `ANALYZE` operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, `analyzedb` automatically skips the table or partition because it already contains up-to-date statistics.

 For a partitioned table `analyzedb` analyzes only those partitions that have no statistics, or that have stale statistics. `analyzedb` also refreshes the statistics on the root partition.

 By default, `analyzedb` creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, `analyzedb` issues an `ANALYZE` command to the database and specifies different table names. The `-p` option controls the maximum number of concurrent sessions.

 ## <a id="topic1__section4"></a>Notes

 The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous `analyzedb` operation. The snapshots of table metadata are stored as state files in the directory `db_analyze` in the HAWQ master data directory. You can specify the `--clean_last` or `--clean_all` option to remove state files generated by `analyzedb`.

 If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.

 External tables are not affected by `analyzedb`.

 Table names that contain spaces are not supported.


 ## <a id="topic1__section5"></a>Arguments

 <dt>-d \<dbname\>  </dt>
 <dd>Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable `PGDATABASE`. If `PGDATABASE` is not set, the user name specified for the connection is used.</dd>

 <dt>-s \<schema\> </dt>
 <dd>Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line.

 Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd>

 <dt>-t \<schema\>.\<table\>  </dt>
 <dd>Collect statistics only on \<schema\>.\<table\>. The table name must be qualified with a schema name. Only a single table name can be specified on the command line. You can specify the `-f` option to specify multiple tables in a file or the `-s` option to specify all the tables in a schema.

 Only one of these options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd>

 <dt>-f, -\\\-file \<config-file\>  </dt>
 <dd>Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified.

 The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the `-i` or `-x`. No other options are allowed in the file. Other options such as `--full` must be specified on the command line.

 Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.

 When performing `ANALYZE` operations on multiple tables, `analyzedb` creates concurrent sessions to analyze tables in parallel. The `-p` option controls the maximum number of concurrent sessions.

 In the following example, the first line performs an `ANALYZE` operation on the table `public.nation`, the second line performs an `ANALYZE` operation only on the columns `l_shipdate` and `l_receiptdate` in the table `public.lineitem`.

 ``` pre
 public.nation
 public.lineitem -i l_shipdate, l_receiptdate
 ```
 </dd>


 ## <a id="topic1__section5"></a>Options


 <dt>-x \<col1\>, \<col2\>, ...  </dt>
 <dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed.

 Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd>

 <dt>-i \<col1\>, \<col2\>, ...  </dt>
 <dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, collect statistics only for the specified columns.

 Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd>

 <dt>-\\\-full  </dt>
 <dd>Perform an `ANALYZE` operation on all the specified tables. The operation is performed even if the statistics are up to date.</dd>

 <dt>-l, -\\\-list  </dt>
 <dd>Lists the tables that would have been analyzed with the specified options. The `ANALYZE` operations are not performed.</dd>

 <dt>-p \<parallel-level\>  </dt>
 <dd>The number of tables that are analyzed in parallel. The value for <parallel-level> can be an integer between 1 and 10, inclusive. Default value is 5.</dd>

 <dt>-a </dt>
 <dd>Quiet mode. Do not prompt for user confirmation.</dd>

 <dt> -v, -\\\-verbose  </dt>
 <dd>If specified, sets the logging level to verbose. Additional log information is written to the log file and the command line during command execution.</dd>

 <dt>-\\\-clean\_last  </dt>
 <dd>Remove the state files generated by last `analyzedb` operation. All other options except `-d` are ignored.</dd>

 <dt>-\\\-clean\_all  </dt>
 <dd>Remove all the state files generated by `analyzedb`. All other options except` -d` are ignored.</dd>

 <dt>-h, -?, -\\\-help   </dt>
 <dd>Displays the online help.</dd>

 <dt>-\\\-version  </dt>
 <dd>Displays the version of this utility.</dd>


 ## <a id="topic1__section6"></a>Examples

 An example that collects statistics only on a set of table columns. In the database `mytest`, collect statistics on the columns `shipdate` and `receiptdate` in the table `public.orders`:

 ``` shell
 $ analyzedb -d mytest -t public.orders -i shipdate, receiptdate
 ```

 An example that collects statistics on a table and exclude a set of columns. In the database `mytest`, collect statistics on the table `public.foo`, and do not collect statistics on the columns `bar` and `test2`.

 ``` shell
 $ analyzedb -d mytest -t public.foo -x bar, test2
 ```

 An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file `analyze-tables` in the database named `mytest`.

 ``` shell
 $ analyzedb -d mytest -f analyze-tables
 ```

 If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database `mytest`.

 ``` shell
 $ analyzedb -d mytest
 ```
	---
	title: analyzedb
	---

	A utility that performs `ANALYZE` operations on tables incrementally and concurrently.

	## <a id="topic1__section2"></a>Synopsis

	``` pre
	analyzedb -d <dbname> -s <schema>
	[ --full ]
	[ -l \| --list ]
	[ -p <parallel-level> ]
	[ -v \| --verbose ]
	[ -a ]

	analyzedb -d <dbname> -t <schema>.<table>
	[ -i col1[, col2, ...] \| -x col1[, col2, ...] ]
	[ --full ]
	[ -l \| --list ]
	[ -p <parallel-level> ]
	[ -v \| --verbose ]
	[ -a ]

	analyzedb -d <dbname> -f <config-file> \| --file <config-file>
	[ --full ]
	[ -l \| --list ]
	[ -p <parallel-level> ]
	[ -v \| --verbose ]
	[ -a ]

	analyzedb -d <dbname> --clean_last \| --clean_all

	analyzedb --version

	analyzedb -? \| -h \| --help
	```

	## <a id="topic1__section3"></a>Description

	The `analyzedb` utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently.

	While performing `ANALYZE` operations, `analyzedb` creates a snapshot of the table metadata and stores it on disk on the master host. An `ANALYZE` operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, `analyzedb` automatically skips the table or partition because it already contains up-to-date statistics.

	For a partitioned table `analyzedb` analyzes only those partitions that have no statistics, or that have stale statistics. `analyzedb` also refreshes the statistics on the root partition.

	By default, `analyzedb` creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, `analyzedb` issues an `ANALYZE` command to the database and specifies different table names. The `-p` option controls the maximum number of concurrent sessions.

	## <a id="topic1__section4"></a>Notes

	The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous `analyzedb` operation. The snapshots of table metadata are stored as state files in the directory `db_analyze` in the HAWQ master data directory. You can specify the `--clean_last` or `--clean_all` option to remove state files generated by `analyzedb`.

	If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.

	External tables are not affected by `analyzedb`.

	Table names that contain spaces are not supported.


	## <a id="topic1__section5"></a>Arguments

	<dt>-d \<dbname\> </dt>
	<dd>Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable `PGDATABASE`. If `PGDATABASE` is not set, the user name specified for the connection is used.</dd>

	<dt>-s \<schema\> </dt>
	<dd>Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line.

	Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd>

	<dt>-t \<schema\>.\<table\> </dt>
	<dd>Collect statistics only on \<schema\>.\<table\>. The table name must be qualified with a schema name. Only a single table name can be specified on the command line. You can specify the `-f` option to specify multiple tables in a file or the `-s` option to specify all the tables in a schema.

	Only one of these options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd>

	<dt>-f, -\\\-file \<config-file\> </dt>
	<dd>Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified.

	The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the `-i` or `-x`. No other options are allowed in the file. Other options such as `--full` must be specified on the command line.

	Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.

	When performing `ANALYZE` operations on multiple tables, `analyzedb` creates concurrent sessions to analyze tables in parallel. The `-p` option controls the maximum number of concurrent sessions.

	In the following example, the first line performs an `ANALYZE` operation on the table `public.nation`, the second line performs an `ANALYZE` operation only on the columns `l_shipdate` and `l_receiptdate` in the table `public.lineitem`.

	``` pre
	public.nation
	public.lineitem -i l_shipdate, l_receiptdate
	```
	</dd>


	## <a id="topic1__section5"></a>Options


	<dt>-x \<col1\>, \<col2\>, ... </dt>
	<dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed.

	Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd>

	<dt>-i \<col1\>, \<col2\>, ... </dt>
	<dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, collect statistics only for the specified columns.

	Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd>

	<dt>-\\\-full </dt>
	<dd>Perform an `ANALYZE` operation on all the specified tables. The operation is performed even if the statistics are up to date.</dd>

	<dt>-l, -\\\-list </dt>
	<dd>Lists the tables that would have been analyzed with the specified options. The `ANALYZE` operations are not performed.</dd>

	<dt>-p \<parallel-level\> </dt>
	<dd>The number of tables that are analyzed in parallel. The value for <parallel-level> can be an integer between 1 and 10, inclusive. Default value is 5.</dd>

	<dt>-a </dt>
	<dd>Quiet mode. Do not prompt for user confirmation.</dd>

	<dt> -v, -\\\-verbose </dt>
	<dd>If specified, sets the logging level to verbose. Additional log information is written to the log file and the command line during command execution.</dd>

	<dt>-\\\-clean\_last </dt>
	<dd>Remove the state files generated by last `analyzedb` operation. All other options except `-d` are ignored.</dd>

	<dt>-\\\-clean\_all </dt>
	<dd>Remove all the state files generated by `analyzedb`. All other options except` -d` are ignored.</dd>

	<dt>-h, -?, -\\\-help </dt>
	<dd>Displays the online help.</dd>

	<dt>-\\\-version </dt>
	<dd>Displays the version of this utility.</dd>


	## <a id="topic1__section6"></a>Examples

	An example that collects statistics only on a set of table columns. In the database `mytest`, collect statistics on the columns `shipdate` and `receiptdate` in the table `public.orders`:

	``` shell
	$ analyzedb -d mytest -t public.orders -i shipdate, receiptdate
	```

	An example that collects statistics on a table and exclude a set of columns. In the database `mytest`, collect statistics on the table `public.foo`, and do not collect statistics on the columns `bar` and `test2`.

	``` shell
	$ analyzedb -d mytest -t public.foo -x bar, test2
	```

	An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file `analyze-tables` in the database named `mytest`.

	``` shell
	$ analyzedb -d mytest -f analyze-tables
	```

	If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database `mytest`.

	``` shell
	$ analyzedb -d mytest
	```