| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept rev="1.2" id="resource_management"> |
| |
| <title>Resource Management for Impala</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="YARN"/> |
| <data name="Category" value="Resource Management"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <note conref="../shared/impala_common.xml#common/impala_llama_obsolete"/> |
| |
| <p> |
| You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters |
| that run jobs from many Hadoop components. |
| </p> |
| |
| <p outputclass="toc inpage"/> |
| </conbody> |
| |
| <concept id="rm_enforcement"> |
| |
| <title>How Resource Limits Are Enforced</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Concepts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| Limits on memory usage are enforced by Impala's process memory limit (the <codeph>MEM_LIMIT</codeph> |
| query option setting). The admission control feature checks this setting to decide how many queries |
| can be safely run at the same time. Then the Impala daemon enforces the limit by activating the |
| spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime. |
| </p> |
| |
| </conbody> |
| </concept> |
| |
| <!-- |
| <concept id="rm_enable"> |
| |
| <title>Enabling Resource Management for Impala</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Starting and Stopping"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| To enable resource management for Impala, first you <xref href="#rm_prereqs">set up the YARN |
| service for your cluster</xref>. Then you <xref href="#rm_options">add startup options and customize |
| resource management settings</xref> for the Impala services. |
| </p> |
| </conbody> |
| |
| <concept id="rm_prereqs"> |
| |
| <title>Required Setup for Resource Management with Impala</title> |
| |
| <conbody> |
| |
| <p> |
| YARN is the general-purpose service that manages resources for many Hadoop components within a |
| <keyword keyref="distro"/> cluster. |
| </p> |
| |
| </conbody> |
| </concept> |
| |
| <concept id="rm_options"> |
| |
| <title>impalad Startup Options for Resource Management</title> |
| |
| <conbody> |
| |
| <p id="resource_management_impalad_options"> |
| The following startup options for <cmdname>impalad</cmdname> enable resource management and customize its |
| parameters for your cluster configuration: |
| <ul> |
| <li> |
| <codeph>-enable_rm</codeph>: Whether to enable resource management or not, either |
| <codeph>true</codeph> or <codeph>false</codeph>. The default is <codeph>false</codeph>. None of the |
| other resource management options have any effect unless <codeph>-enable_rm</codeph> is turned on. |
| </li> |
| |
| <li> |
| <codeph>-cgroup_hierarchy_path</codeph>: Path where YARN will create cgroups for granted |
| resources. Impala assumes that the cgroup for an allocated container is created in the path |
| '<varname>cgroup_hierarchy_path</varname> + <varname>container_id</varname>'. |
| </li> |
| |
| <li rev="1.4.0"> |
| <codeph>-rm_always_use_defaults</codeph>: If this Boolean option is enabled, Impala ignores computed |
| estimates and always obtains the default memory and CPU allocation settings at the start of the |
| query. These default estimates are approximately 2 CPUs and 4 GB of memory, possibly varying slightly |
| depending on cluster size, workload, and so on. Where practical, enable |
| <codeph>-rm_always_use_defaults</codeph> whenever resource management is used, and relying on these |
| default values (that is, leaving out the two following options). |
| </li> |
| |
| <li rev="1.4.0"> |
| <codeph>-rm_default_memory=<varname>size</varname></codeph>: Optionally sets the default estimate for |
| memory usage for each query. You can use suffixes such as M and G for megabytes and gigabytes, the |
| same as with the <xref href="impala_mem_limit.xml#mem_limit">MEM_LIMIT</xref> query option. Only has |
| an effect when <codeph>-rm_always_use_defaults</codeph> is also enabled. |
| </li> |
| |
| <li rev="1.4.0"> |
| <codeph>-rm_default_cpu_cores</codeph>: Optionally sets the default estimate for number of virtual |
| CPU cores for each query. Only has an effect when <codeph>-rm_always_use_defaults</codeph> is also |
| enabled. |
| </li> |
| </ul> |
| </p> |
| |
| </conbody> |
| </concept> |
| --> |
| |
| <concept id="rm_query_options"> |
| |
| <title>impala-shell Query Options for Resource Management</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala Query Options"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| Before issuing SQL statements through the <cmdname>impala-shell</cmdname> interpreter, you can use the |
| <codeph>SET</codeph> command to configure the following parameters related to resource management: |
| </p> |
| |
| <ul id="ul_nzt_twf_jp"> |
| <li> |
| <xref href="impala_explain_level.xml#explain_level"/> |
| </li> |
| |
| <li> |
| <xref href="impala_mem_limit.xml#mem_limit"/> |
| </li> |
| |
| </ul> |
| </conbody> |
| </concept> |
| |
| <!-- Parent topic is going away, so former subtopic is hoisted up a level. |
| </concept> |
| --> |
| |
| <concept id="rm_limitations"> |
| |
| <title>Limitations of Resource Management for Impala</title> |
| |
| <conbody> |
| |
| <!-- Conditionalizing some content here with audience="hidden" because there are already some XML comments |
| inside the list, so not practical to enclose the whole thing in XML comments. --> |
| |
| <p audience="hidden"> |
| Currently, Impala has the following limitations for resource management of Impala queries: |
| </p> |
| |
| <ul audience="hidden"> |
| <li> |
| Table statistics are required, and column statistics are highly valuable, for Impala to produce accurate |
| estimates of how much memory to request from YARN. See |
| <xref href="impala_perf_stats.xml#perf_table_stats"/> and |
| <xref href="impala_perf_stats.xml#perf_column_stats"/> for instructions on gathering both kinds of |
| statistics, and <xref href="impala_explain.xml#explain"/> for the extended <codeph>EXPLAIN</codeph> |
| output where you can check that statistics are available for a specific table and set of columns. |
| </li> |
| |
| <li> |
| If the Impala estimate of required memory is lower than is actually required for a query, Impala |
| dynamically expands the amount of requested memory. |
| <!-- Impala will cancel the query when it exceeds the requested memory size. --> |
| Queries might still be cancelled if the reservation expansion fails, for example if there are |
| insufficient remaining resources for that pool, or the expansion request takes long enough that it |
| exceeds the query timeout interval, or because of YARN preemption. |
| <!-- This could happen in some cases with complex queries, even when table and column statistics are available. --> |
| You can see the actual memory usage after a failed query by issuing a <codeph>PROFILE</codeph> command in |
| <cmdname>impala-shell</cmdname>. Specify a larger memory figure with the <codeph>MEM_LIMIT</codeph> |
| query option and re-try the query. |
| </li> |
| </ul> |
| |
| <p rev="2.0.0"> |
| The <codeph>MEM_LIMIT</codeph> query option, and the other resource-related query options, are settable |
| through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now |
| lifted. |
| </p> |
| </conbody> |
| </concept> |
| </concept> |