| --- |
| title: analyzedb |
| --- |
| |
| A utility that performs `ANALYZE` operations on tables incrementally and concurrently. |
| |
| ## <a id="topic1__section2"></a>Synopsis |
| |
| ``` pre |
| analyzedb -d <dbname> -s <schema> |
| [ --full ] |
| [ -l | --list ] |
| [ -p <parallel-level> ] |
| [ -v | --verbose ] |
| [ -a ] |
| |
| analyzedb -d <dbname> -t <schema>.<table> |
| [ -i col1[, col2, ...] | -x col1[, col2, ...] ] |
| [ --full ] |
| [ -l | --list ] |
| [ -p <parallel-level> ] |
| [ -v | --verbose ] |
| [ -a ] |
| |
| analyzedb -d <dbname> -f <config-file> | --file <config-file> |
| [ --full ] |
| [ -l | --list ] |
| [ -p <parallel-level> ] |
| [ -v | --verbose ] |
| [ -a ] |
| |
| analyzedb -d <dbname> --clean_last | --clean_all |
| |
| analyzedb --version |
| |
| analyzedb -? | -h | --help |
| ``` |
| |
| ## <a id="topic1__section3"></a>Description |
| |
| The `analyzedb` utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently. |
| |
| While performing `ANALYZE` operations, `analyzedb` creates a snapshot of the table metadata and stores it on disk on the master host. An `ANALYZE` operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, `analyzedb` automatically skips the table or partition because it already contains up-to-date statistics. |
| |
| For a partitioned table `analyzedb` analyzes only those partitions that have no statistics, or that have stale statistics. `analyzedb` also refreshes the statistics on the root partition. |
| |
| By default, `analyzedb` creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, `analyzedb` issues an `ANALYZE` command to the database and specifies different table names. The `-p` option controls the maximum number of concurrent sessions. |
| |
| ## <a id="topic1__section4"></a>Notes |
| |
| The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous `analyzedb` operation. The snapshots of table metadata are stored as state files in the directory `db_analyze` in the HAWQ master data directory. You can specify the `--clean_last` or `--clean_all` option to remove state files generated by `analyzedb`. |
| |
| If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all system catalog tables and user-defined tables in the database. |
| |
| External tables are not affected by `analyzedb`. |
| |
| Table names that contain spaces are not supported. |
| |
| |
| ## <a id="topic1__section5"></a>Arguments |
| |
| <dt>-d \<dbname\> </dt> |
| <dd>Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable `PGDATABASE`. If `PGDATABASE` is not set, the user name specified for the connection is used.</dd> |
| |
| <dt>-s \<schema\> </dt> |
| <dd>Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line. |
| |
| Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd> |
| |
| <dt>-t \<schema\>.\<table\> </dt> |
| <dd>Collect statistics only on \<schema\>.\<table\>. The table name must be qualified with a schema name. Only a single table name can be specified on the command line. You can specify the `-f` option to specify multiple tables in a file or the `-s` option to specify all the tables in a schema. |
| |
| Only one of these options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd> |
| |
| <dt>-f, -\\\-file \<config-file\> </dt> |
| <dd>Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified. |
| |
| The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the `-i` or `-x`. No other options are allowed in the file. Other options such as `--full` must be specified on the command line. |
| |
| Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`. |
| |
| When performing `ANALYZE` operations on multiple tables, `analyzedb` creates concurrent sessions to analyze tables in parallel. The `-p` option controls the maximum number of concurrent sessions. |
| |
| In the following example, the first line performs an `ANALYZE` operation on the table `public.nation`, the second line performs an `ANALYZE` operation only on the columns `l_shipdate` and `l_receiptdate` in the table `public.lineitem`. |
| |
| ``` pre |
| public.nation |
| public.lineitem -i l_shipdate, l_receiptdate |
| ``` |
| </dd> |
| |
| |
| ## <a id="topic1__section5"></a>Options |
| |
| |
| <dt>-x \<col1\>, \<col2\>, ... </dt> |
| <dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed. |
| |
| Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd> |
| |
| <dt>-i \<col1\>, \<col2\>, ... </dt> |
| <dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, collect statistics only for the specified columns. |
| |
| Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd> |
| |
| <dt>-\\\-full </dt> |
| <dd>Perform an `ANALYZE` operation on all the specified tables. The operation is performed even if the statistics are up to date.</dd> |
| |
| <dt>-l, -\\\-list </dt> |
| <dd>Lists the tables that would have been analyzed with the specified options. The `ANALYZE` operations are not performed.</dd> |
| |
| <dt>-p \<parallel-level\> </dt> |
| <dd>The number of tables that are analyzed in parallel. The value for <parallel-level> can be an integer between 1 and 10, inclusive. Default value is 5.</dd> |
| |
| <dt>-a </dt> |
| <dd>Quiet mode. Do not prompt for user confirmation.</dd> |
| |
| <dt> -v, -\\\-verbose </dt> |
| <dd>If specified, sets the logging level to verbose. Additional log information is written to the log file and the command line during command execution.</dd> |
| |
| <dt>-\\\-clean\_last </dt> |
| <dd>Remove the state files generated by last `analyzedb` operation. All other options except `-d` are ignored.</dd> |
| |
| <dt>-\\\-clean\_all </dt> |
| <dd>Remove all the state files generated by `analyzedb`. All other options except` -d` are ignored.</dd> |
| |
| <dt>-h, -?, -\\\-help </dt> |
| <dd>Displays the online help.</dd> |
| |
| <dt>-\\\-version </dt> |
| <dd>Displays the version of this utility.</dd> |
| |
| |
| ## <a id="topic1__section6"></a>Examples |
| |
| An example that collects statistics only on a set of table columns. In the database `mytest`, collect statistics on the columns `shipdate` and `receiptdate` in the table `public.orders`: |
| |
| ``` shell |
| $ analyzedb -d mytest -t public.orders -i shipdate, receiptdate |
| ``` |
| |
| An example that collects statistics on a table and exclude a set of columns. In the database `mytest`, collect statistics on the table `public.foo`, and do not collect statistics on the columns `bar` and `test2`. |
| |
| ``` shell |
| $ analyzedb -d mytest -t public.foo -x bar, test2 |
| ``` |
| |
| An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file `analyze-tables` in the database named `mytest`. |
| |
| ``` shell |
| $ analyzedb -d mytest -f analyze-tables |
| ``` |
| |
| If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database `mytest`. |
| |
| ``` shell |
| $ analyzedb -d mytest |
| ``` |
| |
| |