blob: 6ce2ebe8770338872b3edead67c5c451845496ee [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="intro_components">
<title>Components of the Impala Server</title>
<titlealts audience="PDF"><navtitle>Components</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Concepts"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p> The Impala server is a distributed, massively parallel processing (MPP)
database engine. It consists of different daemon processes that run on
specific hosts within your cluster. </p>
<p outputclass="toc inpage"/>
</conbody>
<concept id="intro_impalad">
<title>The Impala Daemon</title>
<conbody>
<p> The core Impala component is the Impala daemon, physically represented by the
<codeph>impalad</codeph> process. A few of the key functions that an Impala daemon
performs are:<ul>
<li>Reads and writes to data files.</li>
<li>Accepts queries transmitted from the <codeph>impala-shell</codeph> command, Hue, JDBC,
or ODBC.</li>
<li>Parallelizes the queries and distributes work across the cluster.</li>
<li>Transmits intermediate query results back to the central coordinator. </li>
</ul></p>
<p>Impala daemons can be deployed in one of the following ways:<ul>
<li>HDFS and Impala are co-located, and each Impala daemon runs on the same host as a
DataNode.</li>
<li>Impala is deployed separately in a compute cluster and reads remotely from HDFS, S3,
ADLS, etc.</li>
</ul></p>
<p> The Impala daemons are in constant communication with StateStore, to confirm which daemons
are healthy and can accept new work. </p>
<p rev="1.2"> They also receive broadcast messages from the <cmdname>catalogd</cmdname> daemon
(introduced in Impala 1.2) whenever any Impala daemon in the cluster creates, alters, or
drops any type of object, or when an <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
statement is processed through Impala. This background communication minimizes the need for
<codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph> statements that were
needed to coordinate metadata across Impala daemons prior to Impala 1.2. </p>
<p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503"> In <keyword keyref="impala29_full"/> and
higher, you can control which hosts act as query coordinators and which act as query
executors, to improve scalability for highly concurrent workloads on large clusters. See
<xref keyref="scalability_coordinator"/> for details. </p>
<note>Impala daemons should be deployed on nodes using the same Glibc version since different
Glibc version supports different Unicode standard version and also ensure that the
en_US.UTF-8 locale is installed in the nodes. Not using the same Glibc version might result
in inconsistent UTF-8 behavior when UTF8_MODE is set to true.</note>
<p>
<b>Related information:</b>
<xref href="impala_config_options.xml#config_options"/>, <xref
href="impala_processes.xml#processes"/>, <xref href="impala_timeouts.xml#impalad_timeout"
/>, <xref href="impala_ports.xml#ports"/>, <xref href="impala_proxy.xml#proxy"/>
</p>
</conbody>
</concept>
<concept id="intro_statestore">
<title>The Impala Statestore</title>
<conbody>
<p> The Impala component known as the StateStore checks on the health of
all Impala daemons in a cluster, and continuously relays its findings to
each of those daemons. It is physically represented by a daemon process
named <codeph>statestored</codeph>. You only need such a process on one
host in a cluster. If an Impala daemon goes offline due to hardware
failure, network error, software issue, or other reason, the StateStore
informs all the other Impala daemons so that future queries can avoid
making requests to the unreachable Impala daemon. </p>
<p> Because the StateStore's purpose is to help when things go wrong and
to broadcast metadata to coordinators, it is not always critical to the
normal operation of an Impala cluster. If the StateStore is not running
or becomes unreachable, the Impala daemons continue running and
distributing work among themselves as usual when working with the data
known to Impala. The cluster just becomes less robust if other Impala
daemons fail, and metadata becomes less consistent as it changes while
the StateStore is offline. When the StateStore comes back online, it
re-establishes communication with the Impala daemons and resumes its
monitoring and broadcasting functions. </p>
<p> If you issue a DDL statement while the StateStore is down, the queries
that access the new object the DDL created will fail. </p>
<p conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
<p>
<b>Related information:</b>
</p>
<p>
<xref href="impala_scalability.xml#statestore_scalability"/>,
<xref href="impala_config_options.xml#config_options"/>, <xref href="impala_processes.xml#processes"/>,
<xref href="impala_timeouts.xml#statestore_timeout"/>, <xref href="impala_ports.xml#ports"/>
</p>
</conbody>
</concept>
<concept rev="1.2" id="intro_catalogd">
<title>The Impala Catalog Service</title>
<conbody>
<p> The Impala component known as the Catalog Service relays the metadata changes from Impala
SQL statements to all the Impala coordinators in a cluster. It is physically represented by
a daemon process named <codeph>catalogd</codeph>. You only need such a process on one host
in a cluster. Because the requests are passed through the StateStore daemon, it makes sense
to run the <cmdname>statestored</cmdname> and <cmdname>catalogd</cmdname> services on the
same host. </p>
<p> The catalog service avoids the need to issue <codeph>REFRESH</codeph> and
<codeph>INVALIDATE METADATA</codeph> statements when the metadata changes are performed by
statements issued through Impala.
</p>
<p> When you create a table, load data, and so on through Hive, you do need to issue
<codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph> on an Impala daemon
before executing a query. Performing <codeph>REFRESH</codeph> or <codeph>INVALIDATE
METADATA</codeph> is not required when <cite>Automatic Invalidation/Refresh of
Metadata</cite> is enabled. See <xref href="impala_metadata.xml#impala_metadata">Automatic
Invalidation/Refresh of Metadata</xref> also known as the Hive Metastore (HMS) event
processor.<note id="note_eyx_qcp_fcc" type="note">From Impala 4.1, Automatic
Invalidation/Refresh of Metadata is enabled by default.</note></p>
<p>
This feature touches a number of aspects of Impala:
</p>
<ul id="catalogd_xrefs">
<li>
<p>
See <xref href="impala_install.xml#install"/>, <xref href="impala_upgrading.xml#upgrading"/> and
<xref href="impala_processes.xml#processes"/>, for usage information for the
<cmdname>catalogd</cmdname> daemon.
</p>
</li>
<li>
<p> The <codeph>REFRESH</codeph> and <codeph>INVALIDATE
METADATA</codeph> statements are not needed when the
<codeph>CREATE TABLE</codeph>, <codeph>INSERT</codeph>, or other
table-changing or data-changing operation is performed through
Impala. These statements are still needed if such operations are
done through Hive or by manipulating data files directly in HDFS,
but in those cases the statements only need to be issued on one
Impala daemon rather than on all daemons. See <xref
href="impala_refresh.xml#refresh"/> and <xref
href="impala_invalidate_metadata.xml#invalidate_metadata"/> for
the latest usage information for those statements. </p>
</li>
</ul>
<p conref="../shared/impala_common.xml#common/load_catalog_in_background"/>
<p conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
<note>
<p conref="../shared/impala_common.xml#common/catalog_server_124"/>
</note>
<p>
<b>Related information:</b> <xref href="impala_config_options.xml#config_options"/>,
<xref href="impala_processes.xml#processes"/>, <xref href="impala_ports.xml#ports"/>
</p>
</conbody>
</concept>
</concept>