blob: b0e95ec1970f8b40f7c33add48f323f4e00b1e77 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="impala_ha_statestore">
<title>Configuring StateStore for High Availability</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="High Availability"/>
<data name="Category" value="Configuring"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="StateStore"/>
</metadata>
</prolog>
<conbody>
<p>With a pair of StateStore instances in primary/standby mode, the primary StateStore instance
will send the cluster's state update and propagate metadata updates. It periodically sends
heartbeat to the standby StateStore instance, CatalogD, coordinators and executors. The standby
StateStore instance also sends heartbeats to the CatalogD, and coordinators and executors. RPC
connections between daemons and StateStore instances are kept alive so that broken connections
usually won't result in false failure reports between nodes. The standby StateStore instance
takes over the primary role when the service is needed in order to continue to operate when
the primary instance goes down.</p>
</conbody>
<concept id="enabling_statestore_ha">
<title>Enabling StateStore High Availability</title>
<conbody>
<p>To enable StateStore High Availability (HA) in an Impala cluster, follow these steps:<ol
id="ol_k2p_zxm_1cc">
<li>Restart two StateStore instances with the following additional
flags:<codeblock id="codeblock_tmp_bym_1cc">enable_statestored_ha: true
state_store_ha_port: RPC port for StateStore HA service (default: 24020)
state_store_peer_host: Hostname of the peer StateStore instance
state_store_peer_ha_port: RPC port of high availability service on the peer StateStore instance (default: 24020)
</codeblock></li>
<li>Restart all subscribers (including CatalogD, coordinators, and executors) with the
following additional
flags:<codeblock id="codeblock_zyl_lmh_fcc">state_store_host: Hostname of the first StateStore instance
state_store_port: RPC port for StateStore registration on the first StateStore instance (default: 24000)
enable_statestored_ha: true
state_store_2_host: Hostname of the second StateStore instance
state_store_2_port: RPC port for StateStore registration on the second StateStore instance (default: 24000)</codeblock></li>
</ol></p>
<p>By setting these flags, the Impala cluster is configured to use two StateStore instances for
high availability, ensuring high availability and fault tolerance.</p>
</conbody>
</concept>
<concept id="disabling_statestore_ha">
<title>Disabling StateStore High Availability</title>
<conbody>
<p>To disable StateStore high availability in an Impala cluster, follow these steps:<ol
id="ol_udj_b1n_1cc">
<li>Remove one StateStore instance from the Impala cluster.</li>
<li>Restart the remaining StateStore instance along with the coordinator, executor, and
CatalogD nodes, ensuring they are restarted without the
<codeph>enable_statestored_ha</codeph> flag.</li>
</ol></p>
</conbody>
</concept>
<concept id="statestore_failure_detection">
<title>StateStore Failure Detection</title>
<conbody>
<p>The primary StateStore instance continuously sends heartbeat to its registered clients, and
the standby StateStore instance. Each StateStore client registers to both active and standby
StateStore instances, and maintains the following information about the StateStore servers:
the server IP and port, service role - primary/standby, the last time the heartbeat request
was received, or number of missed heartbeats. A missing heartbeat response from the
StateStor’s client indicates an unhealthy daemon. There is a flag that defines
<codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM </codeph>as the consecutive number of missed
heartbeat requests to indicate losing communication with the StateStore server from the
client's point of view so that the client marks the StateStore server as down. Standby
StateStore instance collects the connection states between the clients (CatalogD,
coordinators and executors) and primary StateStore instance in its heartbeat messages to the
clients. If the standby StateStore instance misses <codeph>MAX_MISSED_HEARTBEAT_REQUEST_NUM
</codeph>of heartbeat requests from the primary StateStore instance and the majority of
clients lose connections with the primary StateStore, it takes over the primary role.</p>
</conbody>
</concept>
</concept>