| COMMAND NAME: gpexpand |
| |
| Expands an existing Apache Cloudberry across new hosts in the array. |
| |
| ****************************************************** |
| SYNOPSIS |
| ****************************************************** |
| |
| gpexpand |
| [-f <hosts_file>] |
| | -i <input_file> [-B <batch_size>] [-V] [-t segment_tar_dir] [-S] |
| | {-d <hh:mm:ss> | -e '<YYYY-MM-DD hh:mm:ss>'} |
| [-analyze] [-n <parallel_processes>] |
| | --hba-hostnames |
| | --rollback |
| | --clean |
| [--verbose] [--silent] |
| |
| gpexpand -? | -h | --help |
| |
| gpexpand --version |
| |
| |
| ****************************************************** |
| PREREQUISITES |
| ****************************************************** |
| |
| * You are logged in as the Apache Cloudberry superuser (gpadmin). |
| |
| * The new segment hosts have been installed and configured as per |
| the existing segment hosts. This involves: |
| |
| * Configuring the hardware and OS |
| * Installing the Cloudberry software |
| * Creating the gpadmin user account |
| * Exchanging SSH keys. |
| |
| * Enough disk space on your segment hosts to temporarily hold a |
| copy of your largest table. |
| * When redistributing data, Apache Cloudberry must be running in |
| production mode. Apache Cloudberry cannot be restricted mode or in |
| coordinator mode. The gpstart options -R or -m cannot be specified to start |
| Apache Cloudberry. |
| |
| |
| ****************************************************** |
| DESCRIPTION |
| ****************************************************** |
| |
| The gpexpand utility performs system expansion in two phases: segment |
| initialization and then table redistribution. |
| |
| In the initialization phase, gpexpand runs with an input file that |
| specifies data directories, dbid values, and other characteristics |
| of the new segments. You can create the input file manually, or by |
| following the prompts in an interactive interview. |
| |
| If you choose to create the input file using the interactive interview, |
| you can optionally specify a file containing a list of expansion hosts. |
| If your platform or command shell limits the length of the list of hostnames |
| that you can type when prompted in the interview, specifying the hosts |
| with -f may be mandatory. |
| |
| In addition to initializing the segments, the initialization phase |
| performs these actions: |
| * Creates an expansion schema to store the status of the expansion |
| operation, including detailed status for tables. |
| * Changes the distribution policy for all tables to DISTRIBUTED RANDOMLY. |
| The original distribution policies are later restored in the |
| redistribution phase. |
| |
| To begin the redistribution phase, you must run gpexpand with either |
| the -d (duration) or -e (end time) options. Until the specified end |
| time or duration is reached, the utility will redistribute tables in |
| the expansion schema. Each table is reorganized using ALTER TABLE |
| commands to rebalance the tables across new segments, and to set |
| tables to their original distribution policy. If gpexpand completes |
| the reorganization of all tables before the specified duration, |
| it displays a success message and ends. |
| |
| NOTE: Data redistribution should be performed during low-use hours. |
| Redistribution can divided into batches over an extended period. |
| |
| ****************************************************** |
| OPTIONS |
| ****************************************************** |
| |
| -a | --analyze |
| Run ANALYZE to update the table statistics after expansion. |
| The default is to not run ANALYZE. |
| |
| |
| -B <batch_size> |
| Batch size of remote commands to send to a given host before |
| making a one-second pause. Default is 16. Valid values are 1-128. |
| The gpexpand utility issues a number of setup commands that may exceed |
| the host's maximum threshold for authenticated connections as defined |
| by MaxStartups in the SSH daemon configuration. The one-second pause |
| allows authentications to be completed before gpexpand issues any |
| more commands. The default value does not normally need to be changed. |
| However, it may be necessary to reduce the maximum number of commands |
| if gpexpand fails with connection errors such as |
| 'ssh_exchange_identification: Connection closed by remote host.' |
| |
| |
| -c | --clean |
| Remove the expansion schema. |
| |
| |
| -d | --duration <hh:mm:ss> |
| Duration of the expansion session from beginning to end. |
| |
| |
| -e | --end '<YYYY-MM-DD hh:mm:ss>' |
| Ending date and time for the expansion session. |
| |
| |
| -f | --hosts-file <filename> |
| Specifies the name of a file that contains a list of new hosts for |
| system expansion. Each line of the file must contain a single |
| host name. This file can contain hostnames with or without network |
| interfaces specified. The gpexpand utility handles either case, |
| adding interface numbers to end of the hostname if the original nodes |
| are configured with multiple network interfaces. |
| |
| |
| --hba-hostnames |
| Optional. use hostnames instead of CIDR in pg_hba.conf |
| |
| |
| -i | --input <input_file> |
| Specifies the name of the expansion configuration file, which contains |
| one line for each segment to be added in the format of: |
| |
| <hostname>|<address>|<port>|<datadir>|<dbid>|<content>|<preferred_role> |
| |
| ... |
| |
| |
| -n <parallel_processes> |
| The number of tables to redistribute simultaneously. Valid values |
| are 1 - 96. Each table redistribution process requires two database |
| connections: one to alter the table, and another to update the table's |
| status in the expansion schema. Before increasing -n, check the current |
| value of the server configuration parameter max_connections and make |
| sure the maximum connection limit is not exceeded. |
| |
| |
| -r | --rollback |
| Roll back a failed expansion setup operation. |
| |
| |
| -s | --silent |
| Runs in silent mode. Does not prompt for confirmation to proceed |
| on warnings. |
| |
| |
| -S | --simple_progress |
| Show simple progress view. |
| |
| |
| -t | --tardir <directory> |
| Specify the temporary directory on segment hosts to put tar file. |
| |
| |
| -v | --verbose |
| Verbose debugging output. With this option, the utility will output |
| all DDL and DML used to expand the database. |
| |
| |
| --version |
| Display the utility's version number and exit. |
| |
| |
| -V | --novacuum |
| Do not vacuum catalog tables before creating schema copy. |
| |
| |
| -? | -h | --help |
| Displays the online help. |
| |
| |
| ****************************************************** |
| EXAMPLES |
| ****************************************************** |
| |
| Run gpexpand with an input file to initialize new segments and |
| create the expansion schema in the default database: |
| |
| $ gpexpand -i input_file |
| |
| |
| Run gpexpand for sixty hours maximum duration to redistribute |
| tables to new segments: |
| |
| $ gpexpand -d 60:00:00 |
| |
| ****************************************************** |
| SEE ALSO |
| ****************************************************** |
| |
| gpssh-exkeys |
| |