blob: 0384c34032312c24fc7bfd01b83862bc87430790 [file] [log] [blame]
---
title: analyzedb
---
A utility that performs `ANALYZE` operations on tables incrementally and concurrently.
## <a id="topic1__section2"></a>Synopsis
``` pre
analyzedb -d <dbname> -s <schema>
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> -t <schema>.<table>
[ -i col1[, col2, ...] | -x col1[, col2, ...] ]
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> -f <config-file> | --file <config-file>
[ --full ]
[ -l | --list ]
[ -p <parallel-level> ]
[ -v | --verbose ]
[ -a ]
analyzedb -d <dbname> --clean_last | --clean_all
analyzedb --version
analyzedb -? | -h | --help
```
## <a id="topic1__section3"></a>Description
The `analyzedb` utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently.
While performing `ANALYZE` operations, `analyzedb` creates a snapshot of the table metadata and stores it on disk on the master host. An `ANALYZE` operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, `analyzedb` automatically skips the table or partition because it already contains up-to-date statistics.
For a partitioned table `analyzedb` analyzes only those partitions that have no statistics, or that have stale statistics. `analyzedb` also refreshes the statistics on the root partition.
By default, `analyzedb` creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, `analyzedb` issues an `ANALYZE` command to the database and specifies different table names. The `-p` option controls the maximum number of concurrent sessions.
## <a id="topic1__section4"></a>Notes
The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous `analyzedb` operation. The snapshots of table metadata are stored as state files in the directory `db_analyze` in the HAWQ master data directory. You can specify the `--clean_last` or `--clean_all` option to remove state files generated by `analyzedb`.
If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.
External tables are not affected by `analyzedb`.
Table names that contain spaces are not supported.
## <a id="topic1__section5"></a>Arguments
<dt>-d \<dbname\> </dt>
<dd>Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable `PGDATABASE`. If `PGDATABASE` is not set, the user name specified for the connection is used.</dd>
<dt>-s \<schema\> </dt>
<dd>Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd>
<dt>-t \<schema\>.\<table\> </dt>
<dd>Collect statistics only on \<schema\>.\<table\>. The table name must be qualified with a schema name. Only a single table name can be specified on the command line. You can specify the `-f` option to specify multiple tables in a file or the `-s` option to specify all the tables in a schema.
Only one of these options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd>
<dt>-f, -\\\-file \<config-file\> </dt>
<dd>Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified.
The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the `-i` or `-x`. No other options are allowed in the file. Other options such as `--full` must be specified on the command line.
Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.
When performing `ANALYZE` operations on multiple tables, `analyzedb` creates concurrent sessions to analyze tables in parallel. The `-p` option controls the maximum number of concurrent sessions.
In the following example, the first line performs an `ANALYZE` operation on the table `public.nation`, the second line performs an `ANALYZE` operation only on the columns `l_shipdate` and `l_receiptdate` in the table `public.lineitem`.
``` pre
public.nation
public.lineitem -i l_shipdate, l_receiptdate
```
</dd>
## <a id="topic1__section5"></a>Options
<dt>-x \<col1\>, \<col2\>, ... </dt>
<dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed.
Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd>
<dt>-i \<col1\>, \<col2\>, ... </dt>
<dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, collect statistics only for the specified columns.
Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd>
<dt>-\\\-full </dt>
<dd>Perform an `ANALYZE` operation on all the specified tables. The operation is performed even if the statistics are up to date.</dd>
<dt>-l, -\\\-list </dt>
<dd>Lists the tables that would have been analyzed with the specified options. The `ANALYZE` operations are not performed.</dd>
<dt>-p \<parallel-level\> </dt>
<dd>The number of tables that are analyzed in parallel. The value for <parallel-level> can be an integer between 1 and 10, inclusive. Default value is 5.</dd>
<dt>-a </dt>
<dd>Quiet mode. Do not prompt for user confirmation.</dd>
<dt> -v, -\\\-verbose </dt>
<dd>If specified, sets the logging level to verbose. Additional log information is written to the log file and the command line during command execution.</dd>
<dt>-\\\-clean\_last </dt>
<dd>Remove the state files generated by last `analyzedb` operation. All other options except `-d` are ignored.</dd>
<dt>-\\\-clean\_all </dt>
<dd>Remove all the state files generated by `analyzedb`. All other options except` -d` are ignored.</dd>
<dt>-h, -?, -\\\-help </dt>
<dd>Displays the online help.</dd>
<dt>-\\\-version </dt>
<dd>Displays the version of this utility.</dd>
## <a id="topic1__section6"></a>Examples
An example that collects statistics only on a set of table columns. In the database `mytest`, collect statistics on the columns `shipdate` and `receiptdate` in the table `public.orders`:
``` shell
$ analyzedb -d mytest -t public.orders -i shipdate, receiptdate
```
An example that collects statistics on a table and exclude a set of columns. In the database `mytest`, collect statistics on the table `public.foo`, and do not collect statistics on the columns `bar` and `test2`.
``` shell
$ analyzedb -d mytest -t public.foo -x bar, test2
```
An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file `analyze-tables` in the database named `mytest`.
``` shell
$ analyzedb -d mytest -f analyze-tables
```
If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database `mytest`.
``` shell
$ analyzedb -d mytest
```