| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="intro_components"> |
| |
| <title>Components of the Impala Server</title> |
| <titlealts audience="PDF"><navtitle>Components</navtitle></titlealts> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Concepts"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> The Impala server is a distributed, massively parallel processing (MPP) |
| database engine. It consists of different daemon processes that run on |
| specific hosts within your cluster. </p> |
| |
| <p outputclass="toc inpage"/> |
| </conbody> |
| |
| <concept id="intro_impalad"> |
| |
| <title>The Impala Daemon</title> |
| |
| <conbody> |
| |
| <p> The core Impala component is the Impala daemon, physically represented by the |
| <codeph>impalad</codeph> process. A few of the key functions that an Impala daemon |
| performs are:<ul> |
| <li>Reads and writes to data files.</li> |
| <li>Accepts queries transmitted from the <codeph>impala-shell</codeph> command, Hue, JDBC, |
| or ODBC.</li> |
| <li>Parallelizes the queries and distributes work across the cluster.</li> |
| <li>Transmits intermediate query results back to the central coordinator. </li> |
| </ul></p> |
| <p>Impala daemons can be deployed in one of the following ways:<ul> |
| <li>HDFS and Impala are co-located, and each Impala daemon runs on the same host as a |
| DataNode.</li> |
| <li>Impala is deployed separately in a compute cluster and reads remotely from HDFS, S3, |
| ADLS, etc.</li> |
| </ul></p> |
| |
| <p> The Impala daemons are in constant communication with StateStore, to confirm which daemons |
| are healthy and can accept new work. </p> |
| |
| <p rev="1.2"> They also receive broadcast messages from the <cmdname>catalogd</cmdname> daemon |
| (introduced in Impala 1.2) whenever any Impala daemon in the cluster creates, alters, or |
| drops any type of object, or when an <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph> |
| statement is processed through Impala. This background communication minimizes the need for |
| <codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph> statements that were |
| needed to coordinate metadata across Impala daemons prior to Impala 1.2. </p> |
| |
| <p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503"> In <keyword keyref="impala29_full"/> and |
| higher, you can control which hosts act as query coordinators and which act as query |
| executors, to improve scalability for highly concurrent workloads on large clusters. See |
| <xref keyref="scalability_coordinator"/> for details. </p> |
| |
| <note>Impala daemons should be deployed on nodes using the same Glibc version since different |
| Glibc version supports different Unicode standard version and also ensure that the |
| en_US.UTF-8 locale is installed in the nodes. Not using the same Glibc version might result |
| in inconsistent UTF-8 behavior when UTF8_MODE is set to true.</note> |
| |
| <p> |
| <b>Related information:</b> |
| <xref href="impala_config_options.xml#config_options"/>, <xref |
| href="impala_processes.xml#processes"/>, <xref href="impala_timeouts.xml#impalad_timeout" |
| />, <xref href="impala_ports.xml#ports"/>, <xref href="impala_proxy.xml#proxy"/> |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept id="intro_statestore"> |
| |
| <title>The Impala Statestore</title> |
| |
| <conbody> |
| |
| <p> The Impala component known as the StateStore checks on the health of |
| all Impala daemons in a cluster, and continuously relays its findings to |
| each of those daemons. It is physically represented by a daemon process |
| named <codeph>statestored</codeph>. You only need such a process on one |
| host in a cluster. If an Impala daemon goes offline due to hardware |
| failure, network error, software issue, or other reason, the StateStore |
| informs all the other Impala daemons so that future queries can avoid |
| making requests to the unreachable Impala daemon. </p> |
| |
| <p> Because the StateStore's purpose is to help when things go wrong and |
| to broadcast metadata to coordinators, it is not always critical to the |
| normal operation of an Impala cluster. If the StateStore is not running |
| or becomes unreachable, the Impala daemons continue running and |
| distributing work among themselves as usual when working with the data |
| known to Impala. The cluster just becomes less robust if other Impala |
| daemons fail, and metadata becomes less consistent as it changes while |
| the StateStore is offline. When the StateStore comes back online, it |
| re-establishes communication with the Impala daemons and resumes its |
| monitoring and broadcasting functions. </p> |
| |
| <p> If you issue a DDL statement while the StateStore is down, the queries |
| that access the new object the DDL created will fail. </p> |
| |
| <p conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/> |
| |
| <p> |
| <b>Related information:</b> |
| </p> |
| |
| <p> |
| <xref href="impala_scalability.xml#statestore_scalability"/>, |
| <xref href="impala_config_options.xml#config_options"/>, <xref href="impala_processes.xml#processes"/>, |
| <xref href="impala_timeouts.xml#statestore_timeout"/>, <xref href="impala_ports.xml#ports"/> |
| </p> |
| </conbody> |
| </concept> |
| |
| <concept rev="1.2" id="intro_catalogd"> |
| |
| <title>The Impala Catalog Service</title> |
| |
| <conbody> |
| |
| <p> The Impala component known as the Catalog Service relays the metadata changes from Impala |
| SQL statements to all the Impala coordinators in a cluster. It is physically represented by |
| a daemon process named <codeph>catalogd</codeph>. You only need such a process on one host |
| in a cluster. Because the requests are passed through the StateStore daemon, it makes sense |
| to run the <cmdname>statestored</cmdname> and <cmdname>catalogd</cmdname> services on the |
| same host. </p> |
| |
| <p> The catalog service avoids the need to issue <codeph>REFRESH</codeph> and |
| <codeph>INVALIDATE METADATA</codeph> statements when the metadata changes are performed by |
| statements issued through Impala. |
| </p> |
| <p> When you create a table, load data, and so on through Hive, you do need to issue |
| <codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph> on an Impala daemon |
| before executing a query. Performing <codeph>REFRESH</codeph> or <codeph>INVALIDATE |
| METADATA</codeph> is not required when <cite>Automatic Invalidation/Refresh of |
| Metadata</cite> is enabled. See <xref href="impala_metadata.xml#impala_metadata">Automatic |
| Invalidation/Refresh of Metadata</xref> also known as the Hive Metastore (HMS) event |
| processor.<note id="note_eyx_qcp_fcc" type="note">From Impala 4.1, Automatic |
| Invalidation/Refresh of Metadata is enabled by default.</note></p> |
| |
| <p> |
| This feature touches a number of aspects of Impala: |
| </p> |
| |
| <ul id="catalogd_xrefs"> |
| <li> |
| <p> |
| See <xref href="impala_install.xml#install"/>, <xref href="impala_upgrading.xml#upgrading"/> and |
| <xref href="impala_processes.xml#processes"/>, for usage information for the |
| <cmdname>catalogd</cmdname> daemon. |
| </p> |
| </li> |
| |
| <li> |
| <p> The <codeph>REFRESH</codeph> and <codeph>INVALIDATE |
| METADATA</codeph> statements are not needed when the |
| <codeph>CREATE TABLE</codeph>, <codeph>INSERT</codeph>, or other |
| table-changing or data-changing operation is performed through |
| Impala. These statements are still needed if such operations are |
| done through Hive or by manipulating data files directly in HDFS, |
| but in those cases the statements only need to be issued on one |
| Impala daemon rather than on all daemons. See <xref |
| href="impala_refresh.xml#refresh"/> and <xref |
| href="impala_invalidate_metadata.xml#invalidate_metadata"/> for |
| the latest usage information for those statements. </p> |
| </li> |
| </ul> |
| |
| <p conref="../shared/impala_common.xml#common/load_catalog_in_background"/> |
| |
| <p conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/> |
| |
| <note> |
| <p conref="../shared/impala_common.xml#common/catalog_server_124"/> |
| </note> |
| |
| <p> |
| <b>Related information:</b> <xref href="impala_config_options.xml#config_options"/>, |
| <xref href="impala_processes.xml#processes"/>, <xref href="impala_ports.xml#ports"/> |
| </p> |
| </conbody> |
| </concept> |
| </concept> |