| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="impala_ha_statestore"> |
| <title>Configuring StateStore for High Availability</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="High Availability"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Data Analysts"/> |
| <data name="Category" value="StateStore"/> |
| </metadata> |
| </prolog> |
| <conbody> |
| <p>With a pair of StateStore instances in primary/standby mode, the primary StateStore instance |
| will send the cluster's state update and propagate metadata updates. It periodically sends |
| heartbeat to the standby StateStore instance, CatalogD, coordinators and executors. The standby |
| StateStore instance also sends heartbeats to the CatalogD, and coordinators and executors. RPC |
| connections between daemons and StateStore instances are kept alive so that broken connections |
| usually won't result in false failure reports between nodes. The standby StateStore instance |
| takes over the primary role when the service is needed in order to continue to operate when |
| the primary instance goes down.</p> |
| </conbody> |
| <concept id="enabling_statestore_ha"> |
| <title>Enabling StateStore High Availability</title> |
| <conbody> |
| <p>To enable StateStore High Availability (HA) in an Impala cluster, follow these steps:<ol |
| id="ol_k2p_zxm_1cc"> |
| <li>Restart two StateStore instances with the following additional |
| flags:<codeblock id="codeblock_tmp_bym_1cc">enable_statestored_ha: true |
| state_store_ha_port: RPC port for StateStore HA service (default: 24020) |
| state_store_peer_host: Hostname of the peer StateStore instance |
| state_store_peer_ha_port: RPC port of high availability service on the peer StateStore instance (default: 24020) |
| </codeblock></li> |
| <li>Restart all subscribers (including CatalogD, coordinators, and executors) with the |
| following additional |
| flags:<codeblock id="codeblock_zyl_lmh_fcc">state_store_host: Hostname of the first StateStore instance |
| state_store_port: RPC port for StateStore registration on the first StateStore instance (default: 24000) |
| enable_statestored_ha: true |
| state_store_2_host: Hostname of the second StateStore instance |
| state_store_2_port: RPC port for StateStore registration on the second StateStore instance (default: 24000)</codeblock></li> |
| </ol></p> |
| <p>By setting these flags, the Impala cluster is configured to use two StateStore instances for |
| high availability, ensuring high availability and fault tolerance.</p> |
| </conbody> |
| </concept> |
| <concept id="disabling_statestore_ha"> |
| <title>Disabling StateStore High Availability</title> |
| <conbody> |
| <p>To disable StateStore high availability in an Impala cluster, follow these steps:<ol |
| id="ol_udj_b1n_1cc"> |
| <li>Remove one StateStore instance from the Impala cluster.</li> |
| <li>Restart the remaining StateStore instance along with the coordinator, executor, and |
| CatalogD nodes, ensuring they are restarted without the |
| <codeph>enable_statestored_ha</codeph> flag.</li> |
| </ol></p> |
| </conbody> |
| </concept> |
| <concept id="statestore_failure_detection"> |
| <title>StateStore Failure Detection</title> |
| <conbody> |
| <p>The primary StateStore instance continuously sends heartbeat to its registered clients, and |
| the standby StateStore instance. Each StateStore client registers to both active and standby |
| StateStore instances, and maintains the following information about the StateStore servers: |
| the server IP and port, service role - primary/standby, the last time the heartbeat request |
| was received, or number of missed heartbeats. A missing heartbeat response from the |
| StateStor’s client indicates an unhealthy daemon. There is a flag that defines |
| <codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM </codeph>as the consecutive number of missed |
| heartbeat requests to indicate losing communication with the StateStore server from the |
| client's point of view so that the client marks the StateStore server as down. Standby |
| StateStore instance collects the connection states between the clients (CatalogD, |
| coordinators and executors) and primary StateStore instance in its heartbeat messages to the |
| clients. If the standby StateStore instance misses <codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM |
| </codeph>of heartbeat requests from the primary StateStore instance and the majority of |
| clients lose connections with the primary StateStore, it takes over the primary role.</p> |
| </conbody> |
| </concept> |
| </concept> |