Ambari provides an easy interface to perform some of the most common HAWQ and PXF Administration Tasks.
HAWQ supports integration with YARN for global resource management. In a YARN managed environment, HAWQ can request resources (containers) dynamically from YARN, and return resources when HAWQ’s workload is not heavy.
See also Integrating YARN with HAWQ for command-line instructions and additional details about using HAWQ with YARN.
Follow this procedure if you have already installed YARN and HAWQ, but you are currently using the HAWQ Standalone mode (not YARN) for resource management. This procedure helps you configure YARN and HAWQ so that HAWQ uses YARN for resource management. This procedure assumes that you will use the default YARN queue for managing HAWQ.
hawq_rm_yarn_address
, hawq_rm_yarn_app_name
, and hawq_rm_yarn_scheduler_address
in the hawq-site.xml
file.yarn.resourcemanager.ha
and yarn.resourcemanager.scheduler.ha
properties in yarn-site.xml
.yarn.resourcemanager.system-metrics-publisher.enabled
property and change its value to false
.hawq_rm_min_resource_perseg
so HAWQ receives at least some number of YARN containers per segment regardless of the size of the initial query. The default value is 2, which means HAWQ’s resource manager acquires at least 2 YARN containers for each segment even if the first query’s resource request is small.hawq_rm_min_resource_perseg
in HAWQ cannot be set to more than 8 since HAWQ’s resource manager acquires YARN containers by vcore. In the case above, the HAWQ resource manager acquires a YARN container quota of 4GB memory and 1 vcore.hawq_rm_min_resource_perseg
as the key and enter the desired Value. Click Add to add the property definition.hawq_rm_resource_idle_timeout
to let the HAWQ resource manager return idle resources more quickly or more slowly.hawq_rm_resource_idle_timeout
. The default value of hawq_rm_resource_idle_timeout
is 300 seconds.hawq_rm_resource_idle_timeout
as the key and enter the desired Value. Click Add to add the property definition.A HAWQ Service check uses the hawq state
command to display the configuration and status of segment hosts in a HAWQ Cluster. It also performs tests to ensure that HAWQ can write to and read from tables, and to ensure that HAWQ can write to and read from HDFS external tables using PXF.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
Select Service Actions > Run Service Check, then click OK to perform the service check.
Ambari displays the HAWQ Service Check task in the list of background operations. If any test fails, then Ambari displays a red error icon next to the task.
Click the HAWQ Service Check task to view the actual log messages that are generated while performing the task. The log messages display the basic configuration and status of HAWQ segments, as well as the results of the HAWQ and PXF tests (if PXF is installed).
Click OK to dismiss the log messages or list of background tasks.
A configuration check determines if operating system parameters on the HAWQ host machines match their recommended settings. You can also perform this procedure from the command line using the hawq check
command. The hawq check
command is run against all HAWQ hosts.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
(Optional) Perform this step if you want to view or modify the host configuration parameters that are evaluated during the HAWQ config check:
Select the Configs tab, then select the Advanced tab in the settings.
Expand Advanced Hawq Check to view or change the list of parameters that are checked with a hawq check
command or with the Ambari HAWQ Config check.
Note: All parameter entries are stored in the /usr/local/hawq/etc/hawq_check.cnf
file. Click the Set Recommended button if you want to restore the file to its original contents.
Select Service Actions > Run HAWQ Config Check, then click OK to perform the configuration check.
Ambari displays the Run HAWQ Config Check task in the list of background operations. If any parameter does not meet the specification defined in /usr/local/hawq/etc/hawq_check.cnf
, then Ambari displays a red error icon next to the task.
Click the Run HAWQ Config Check task to view the actual log messages that are generated while performing the task. Address any configuration errors on the indicated host machines.
Click OK to dismiss the log messages or list of background tasks.
Ambari provides the ability to restart a HAWQ cluster by restarting one or more segments at a time until all segments (or all segments with stale configurations) restart. You can specify a delay between restarting segments, and Ambari can stop the process if a specified number of segments fail to restart. Performing a rolling restart in this manner can help ensure that some HAWQ segments are available to service client requests.
Note: If you do not need to preserve client connections, you can instead perform an full restart of the entire HAWQ cluster using Service Actions > Restart All.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
Select Service Actions > Restart HAWQ Segments.
In the Restart HAWQ Segments page:
Click Trigger Rolling Restart to begin the restart process.
Ambari displays the Rolling Restart of HAWQ segments task in the list of background operations, and indicates the current batch of segments that it is restarting. Click the name of the task to view the log messages generated during the restart. If any segment fails to restart, Ambari displays a red warning icon next to the task.
Ambari host-level actions enable you to perform actions on one or more hosts in the cluster at once. With HAWQ clusters, you can apply the Start, Stop, or Restart actions to one or more HAWQ segment hosts or PXF hosts. Using the host-level actions saves you the trouble of accessing individual hosts in Ambari and applying service actions one-by-one.
Apache HAWQ supports dynamic node expansion. You can add segment nodes while HAWQ is running without having to suspend or terminate cluster operations.
This topic provides some guidelines around expanding your HAWQ cluster.
There are several recommendations to keep in mind when modifying the size of your running HAWQ cluster:
default_hash_table_bucket_number
server configuration parameter to a larger value after expanding the cluster but before redistributing the hash tables.If you have any user-defined function (UDF) libraries installed in your existing HAWQ cluster, install them on the new node(s) that you want to add to the HAWQ cluster.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
Select the Configs tab, then select the Advanced tab in the settings.
Expand the General section, and ensure that the Exchange SSH Keys property (hawq_ssh_keys
) is set to true
. Change this property to true
if needed, and click Save to continue. Ambari must be able to exchange SSH keys with any hosts that you add to the cluster in the following steps.
Select the Hosts tab at the top of the screen to display the Hosts summary.
If the host(s) that you want to add are not currently listed in the Hosts summary page, follow these steps:
Select Actions > Add New Hosts to start the Add Host Wizard.
Follow the initial steps of the Add Host Wizard to identify the new host, specify SSH keys or manually register the host, and confirm the new host(s) to add.
See Set Up Password-less SSH in the HDP documentation if you need more information about performing these tasks.
When you reach the Assign Slaves and Clients page, ensure that the DataNode, HAWQ Segment, and PXF (if the PXF service is installed) components are selected. Select additional components as necessary for your cluster.
Complete the wizard to add the new host and install the selected components.
If the host(s) that you want to add already appear in the Hosts summary, follow these steps:
(Optional) If you are using hash tables, adjust the Default buckets for Hash Distributed tables setting (default_hash_table_bucket_number
) on the HAWQ service‘s Configs > Settings tab. Update this property’s value by multiplying the new number of nodes in the cluster by the appropriate number indicated below.
Number of Nodes After Expansion | Suggested default_hash_table_bucket_number value |
---|---|
<= 85 | 6 * #nodes |
> 85 and <= 102 | 5 * #nodes |
> 102 and <= 128 | 4 * #nodes |
> 128 and <= 170 | 3 * #nodes |
> 170 and <= 256 | 2 * #nodes |
> 256 and <= 512 | 1 * #nodes |
> 512 | 512 |
Note: Ambari requires the HAWQ service to be restarted in order to apply the configuration changes. If you need to apply the configuration without restarting HAWQ (for dynamic cluster expansion), then you can use the HAWQ CLI commands described in Manually Updating the HAWQ Configuration instead of following this step.
Stop and then start the HAWQ service to apply your configuration changes via Ambari. Select Service Actions > Stop, followed by Service Actions > Start to ensure that the HAWQ Master starts before the newly-added segment. During the HAWQ startup, Ambari exchanges ssh keys for the gpadmin
user, and applies the new configuration. >Note: Do not use the Restart All service action to complete this step.
Note: Consider the impact of rebalancing HDFS to other components, such as HBase, before you complete this step.
Rebalance your HDFS data by selecting the HDFS service and then choosing Service Actions > Rebalance HDFS. Follow the Ambari instructions to complete the rebalance action.
Speed up the clearing of the metadata cache by first selecting the HAWQ service and then selecting Service Actions > Clear HAWQ's HDFS Metadata Cache.
If you are using hash distributed tables and wish to take advantage of the performance benefits of using a larger cluster, redistribute the data in all hash-distributed tables by using either the ALTER TABLE or CREATE TABLE AS command. You should redistribute the table data if you modified the default_hash_table_bucket_number
configuration parameter.
**Note:** The redistribution of table data can take a significant amount of time.
false
after Ambari exchanges keys with the new hosts. This prevents Ambari from exchanging keys with all hosts every time the HAWQ master is started or restarted.If you need to expand your HAWQ cluster without restarting the HAWQ service, follow these steps to manually apply the new HAWQ configuration. (Use these steps instead of following Step 7 in the above procedure.):
Update your configuration to use the new default_hash_table_bucket_number
value that you calculated:
gpadmin
user:$ ssh gpadmin@<HAWQ_MASTER_HOST>
greenplum_path.sh
file to update the shell environment:$ source /usr/local/hawq/greenplum_path.sh
default_hash_table_bucket_number
:$ hawq config -s default_hash_table_bucket_number
default_hash_table_bucket_number
to the new value that you calculated:$ config -c default_hash_table_bucket_number -v <new_value>
$ hawq stop cluster -u
default_hash_table_bucket_number
value was updated:$ hawq config -s default_hash_table_bucket_number
Edit the /usr/local/hawq/etc/slaves
file and add the new HAWQ hostname(s) to the end of the file. Separate multiple hosts with new lines. For example, after adding host4 and host5 to a cluster already contains hosts 1-3, the updated file contents would be:
host1 host2 host3 host4 host5
Continue with Step 8 in the previous procedure, Expanding the HAWQ Cluster. When the HAWQ service is ready to be restarted via Ambari, Ambari will refresh the new configurations.
Activating the HAWQ Standby Master promotes the standby host as the new HAWQ Master host. The previous HAWQ Master configuration is automatically removed from the cluster.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
Select Service Actions > Activate HAWQ Standby Master to start the Activate HAWQ Standby Master Wizard.
Read the description of the Wizard and click Next to review the tasks that will be performed.
Ambari displays the host name of the current HAWQ Master that will be removed from the cluster, as well as the HAWQ Standby Master host that will be activated. The information is provided only for review and cannot be edited on this page. Click Next to confirm the operation.
Click OK to confirm that you want to perform the procedure, as it is not possible to roll back the operation using Ambari.
Ambari displays a list of tasks that are performed to activate the standby server and remove the previous HAWQ Master host. Click on any of the tasks to view progress or to view the actual log messages that are generated while performing the task.
Click Complete after the Wizard finishes all tasks.
Important: After the Wizard completes, your HAWQ cluster no longer includes a HAWQ Standby Master host. As a best practice, follow the instructions in Adding a HAWQ Standby Master to configure a new one.
The HAWQ Standby Master serves as a backup of the HAWQ Master host, and is an important part of providing high availability for the HAWQ cluster. When your cluster uses a standby master, you can activate the standby if the active HAWQ Master host fails or becomes unreachable.
Select an existing host in the cluster to run the HAWQ standby master. You cannot run the standby master on the same host that runs the HAWQ master. Also, do not run a standby master on the node where you deployed the Ambari server; if the Ambari postgres instance is running on the same port as the HAWQ master posgres instance, initialization fails and will leave the cluster in an inconsistent state.
Login to the HAWQ host that you chose to run the standby master and determine if there is an existing HAWQ master directory (for example, /data/hawq/master) on the machine. If the directory exists, rename the directory. For example:
$ mv /data/hawq/master /data/hawq/master-old
Note: If a HAWQ master directory exists on the host when you configure the HAWQ standby master, then the standby master may be initialized with stale data. Rename any existing master directory before you proceed.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
Select Service Actions > Add HAWQ Standby Master to start the Add HAWQ Standby Master Wizard.
Read the Get Started page for information about HAWQ the standby master and to acknowledge that the procedure requires a service restart. Click Next to display the Select Host page.
Use the dropdown menu to select a host to use for the HAWQ Standby Master. Click Next to display the Review page.
Note:
Review the information to verify the host on which the HAWQ Standby Master will be installed. Click Back to change your selection or Next to continue.
Confirm that you have renamed any existing HAWQ master data directory on the selected host machine, as described earlier in this procedure. If an existing master data directory exists, the new HAWQ Standby Master may be initialized with stale data and can place the cluster in an inconsistent state. Click Confirm to continue.
Ambari displays a list of tasks that are performed to install the standby master server and reconfigure the cluster. Click on any of the tasks to view progress or to view the actual log messages that are generated while performing the task.
Click Complete after the Wizard finishes all tasks.
This service action enables you to remove the HAWQ Standby Master component in situations where you may need to reinstall the component.
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.)
Click HAWQ in the list of installed services.
Select Service Actions > Remove HAWQ Standby Master to start the Remove HAWQ Standby Master Wizard.
Read the Get Started page for information about the procedure and to acknowledge that the procedure requires a service restart. Click Next to display the Review page.
Ambari displays the HAWQ Standby Master host that will be removed from the cluster configuration. Click Next to continue, then click OK to confirm.
Ambari displays a list of tasks that are performed to remove the standby master from the cluster. Click on any of the tasks to view progress or to view the actual log messages that are generated while performing the task.
Click Complete after the Wizard finishes all tasks.
Important: After the Wizard completes, your HAWQ cluster no longer includes a HAWQ Standby Master host. As a best practice, follow the instructions in Adding a HAWQ Standby Master to configure a new one.
If you install HAWQ using Ambari 2.2.2 with the HDP 2.3 stack, before you attempt to upgrade to HDP 2.4 you must use Ambari to change the dfs.allow.truncate
property to false
. Ambari will display a configuration warning with this setting, but it is required in order to complete the upgrade; choose Proceed Anyway when Ambari warns you about the configured value of dfs.allow.truncate
.
After you complete the upgrade to HDP 2.4, change the value of dfs.allow.truncate
back to true
to ensure that HAWQ can operate as intended.
The password issued by the Ambari web console is used for the hawq ssh-exkeys
utility, which is run during the start phase of the HAWQ Master. Ambari stores and uses its own copy of the gpadmin password, independently of the host system. Passwords on the master and slave nodes are not automatically updated and synchronized with Ambari. Not updating the Ambari system user password causes Ambari to behave as if the gpadmin password was never changed (it keeps using the old password).
If passwordless ssh has not been set up, hawq ssh-exkeys
attempts to exchange the key by using the password provided by the Ambari web console. If the password on the host machine differs from the HAWQ System User password recognized on Ambari, exchanging the key with the HAWQ Master fails. Components without passwordless ssh might not be registered with the HAWQ cluster.
HAWQ service on the cluster must be already installed and managed through Ambari.
You should change the gpadmin password when:
###Procedure All of the listed steps are mandatory. This ensures that HAWQ service remains fully functional.
Use a script to manually change the password for the gpadmin user on all HAWQ hosts (all Master and Slave component hosts). To manually update the password, you must have ssh access to all host machines as the gpadmin user. Generate a hosts file to use with the hawq ssh
command to reset the password on all hosts. Use a text editor to create a file that lists the hostname of the master node, the standby master node, and each segment node used in the cluster. Specify one hostname per line, for example:
mdw smdw sdw1 sdw2 sdw3
You can then use a command similar to the following to change the password on all hosts that are listed in the file:
hawq ssh -f hawq_hosts 'echo "gpadmin:newpassword" | /usr/sbin/chpasswd'
Access the Ambari web console at http://ambari.server.hostname:8080, and login as the “admin” user. (The default password is also “admin”.) The perform the following steps:
This will synchronize the password on the host machines with the password that you specified in Ambari.