blob: 73eaf1ebb5867695161445f98b478c13d69faec5 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.2" id="resource_management">
<title>Resource Management for Impala</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="YARN"/>
<data name="Category" value="Resource Management"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<note conref="../shared/impala_common.xml#common/impala_llama_obsolete"/>
<p>
You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
that run jobs from many Hadoop components.
</p>
<p outputclass="toc inpage"/>
</conbody>
<concept id="rm_enforcement">
<title>How Resource Limits Are Enforced</title>
<prolog>
<metadata>
<data name="Category" value="Concepts"/>
</metadata>
</prolog>
<conbody>
<p>
Limits on memory usage are enforced by Impala's process memory limit (the <codeph>MEM_LIMIT</codeph>
query option setting). The admission control feature checks this setting to decide how many queries
can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
</p>
</conbody>
</concept>
<!--
<concept id="rm_enable">
<title>Enabling Resource Management for Impala</title>
<prolog>
<metadata>
<data name="Category" value="Configuring"/>
<data name="Category" value="Starting and Stopping"/>
</metadata>
</prolog>
<conbody>
<p>
To enable resource management for Impala, first you <xref href="#rm_prereqs">set up the YARN
service for your cluster</xref>. Then you <xref href="#rm_options">add startup options and customize
resource management settings</xref> for the Impala services.
</p>
</conbody>
<concept id="rm_prereqs">
<title>Required Setup for Resource Management with Impala</title>
<conbody>
<p>
YARN is the general-purpose service that manages resources for many Hadoop components within a
<keyword keyref="distro"/> cluster.
</p>
</conbody>
</concept>
<concept id="rm_options">
<title>impalad Startup Options for Resource Management</title>
<conbody>
<p id="resource_management_impalad_options">
The following startup options for <cmdname>impalad</cmdname> enable resource management and customize its
parameters for your cluster configuration:
<ul>
<li>
<codeph>-enable_rm</codeph>: Whether to enable resource management or not, either
<codeph>true</codeph> or <codeph>false</codeph>. The default is <codeph>false</codeph>. None of the
other resource management options have any effect unless <codeph>-enable_rm</codeph> is turned on.
</li>
<li>
<codeph>-cgroup_hierarchy_path</codeph>: Path where YARN will create cgroups for granted
resources. Impala assumes that the cgroup for an allocated container is created in the path
'<varname>cgroup_hierarchy_path</varname> + <varname>container_id</varname>'.
</li>
<li rev="1.4.0">
<codeph>-rm_always_use_defaults</codeph>: If this Boolean option is enabled, Impala ignores computed
estimates and always obtains the default memory and CPU allocation settings at the start of the
query. These default estimates are approximately 2 CPUs and 4 GB of memory, possibly varying slightly
depending on cluster size, workload, and so on. Where practical, enable
<codeph>-rm_always_use_defaults</codeph> whenever resource management is used, and relying on these
default values (that is, leaving out the two following options).
</li>
<li rev="1.4.0">
<codeph>-rm_default_memory=<varname>size</varname></codeph>: Optionally sets the default estimate for
memory usage for each query. You can use suffixes such as M and G for megabytes and gigabytes, the
same as with the <xref href="impala_mem_limit.xml#mem_limit">MEM_LIMIT</xref> query option. Only has
an effect when <codeph>-rm_always_use_defaults</codeph> is also enabled.
</li>
<li rev="1.4.0">
<codeph>-rm_default_cpu_cores</codeph>: Optionally sets the default estimate for number of virtual
CPU cores for each query. Only has an effect when <codeph>-rm_always_use_defaults</codeph> is also
enabled.
</li>
</ul>
</p>
</conbody>
</concept>
-->
<concept id="rm_query_options">
<title>impala-shell Query Options for Resource Management</title>
<prolog>
<metadata>
<data name="Category" value="Impala Query Options"/>
</metadata>
</prolog>
<conbody>
<p>
Before issuing SQL statements through the <cmdname>impala-shell</cmdname> interpreter, you can use the
<codeph>SET</codeph> command to configure the following parameters related to resource management:
</p>
<ul id="ul_nzt_twf_jp">
<li>
<xref href="impala_explain_level.xml#explain_level"/>
</li>
<li>
<xref href="impala_mem_limit.xml#mem_limit"/>
</li>
</ul>
</conbody>
</concept>
<!-- Parent topic is going away, so former subtopic is hoisted up a level.
</concept>
-->
<concept id="rm_limitations">
<title>Limitations of Resource Management for Impala</title>
<conbody>
<!-- Conditionalizing some content here with audience="hidden" because there are already some XML comments
inside the list, so not practical to enclose the whole thing in XML comments. -->
<p audience="hidden">
Currently, Impala has the following limitations for resource management of Impala queries:
</p>
<ul audience="hidden">
<li>
Table statistics are required, and column statistics are highly valuable, for Impala to produce accurate
estimates of how much memory to request from YARN. See
<xref href="impala_perf_stats.xml#perf_table_stats"/> and
<xref href="impala_perf_stats.xml#perf_column_stats"/> for instructions on gathering both kinds of
statistics, and <xref href="impala_explain.xml#explain"/> for the extended <codeph>EXPLAIN</codeph>
output where you can check that statistics are available for a specific table and set of columns.
</li>
<li>
If the Impala estimate of required memory is lower than is actually required for a query, Impala
dynamically expands the amount of requested memory.
<!-- Impala will cancel the query when it exceeds the requested memory size. -->
Queries might still be cancelled if the reservation expansion fails, for example if there are
insufficient remaining resources for that pool, or the expansion request takes long enough that it
exceeds the query timeout interval, or because of YARN preemption.
<!-- This could happen in some cases with complex queries, even when table and column statistics are available. -->
You can see the actual memory usage after a failed query by issuing a <codeph>PROFILE</codeph> command in
<cmdname>impala-shell</cmdname>. Specify a larger memory figure with the <codeph>MEM_LIMIT</codeph>
query option and re-try the query.
</li>
</ul>
<p rev="2.0.0">
The <codeph>MEM_LIMIT</codeph> query option, and the other resource-related query options, are settable
through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
lifted.
</p>
</conbody>
</concept>
</concept>