docs/topics/impala_resource_management.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept rev="1.2" id="resource_management">

   <title>Resource Management for Impala</title>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="YARN"/>
       <data name="Category" value="Resource Management"/>
       <data name="Category" value="Administrators"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <note conref="../shared/impala_common.xml#common/impala_llama_obsolete"/>

     <p>
       You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
       that run jobs from many Hadoop components.
     </p>

     <p outputclass="toc inpage"/>
   </conbody>

   <concept id="rm_enforcement">

     <title>How Resource Limits Are Enforced</title>
   <prolog>
     <metadata>
       <data name="Category" value="Concepts"/>
     </metadata>
   </prolog>

     <conbody>

       <p>
         Limits on memory usage are enforced by Impala's process memory limit (the <codeph>MEM_LIMIT</codeph>
         query option setting). The admission control feature checks this setting to decide how many queries
         can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
         spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
       </p>

     </conbody>
   </concept>

 <!--
   <concept id="rm_enable">

     <title>Enabling Resource Management for Impala</title>
   <prolog>
     <metadata>
       <data name="Category" value="Configuring"/>
       <data name="Category" value="Starting and Stopping"/>
     </metadata>
   </prolog>

     <conbody>

       <p>
         To enable resource management for Impala, first you <xref href="#rm_prereqs">set up the YARN
         service for your cluster</xref>. Then you <xref href="#rm_options">add startup options and customize
         resource management settings</xref> for the Impala services.
       </p>
     </conbody>

     <concept id="rm_prereqs">

       <title>Required Setup for Resource Management with Impala</title>

       <conbody>

         <p>
           YARN is the general-purpose service that manages resources for many Hadoop components within a
           <keyword keyref="distro"/> cluster.
         </p>

       </conbody>
     </concept>

     <concept id="rm_options">

       <title>impalad Startup Options for Resource Management</title>

       <conbody>

         <p id="resource_management_impalad_options">
           The following startup options for <cmdname>impalad</cmdname> enable resource management and customize its
           parameters for your cluster configuration:
           <ul>
             <li>
               <codeph>-enable_rm</codeph>: Whether to enable resource management or not, either
               <codeph>true</codeph> or <codeph>false</codeph>. The default is <codeph>false</codeph>. None of the
               other resource management options have any effect unless <codeph>-enable_rm</codeph> is turned on.
             </li>

             <li>
               <codeph>-cgroup_hierarchy_path</codeph>: Path where YARN will create cgroups for granted
               resources. Impala assumes that the cgroup for an allocated container is created in the path
               '<varname>cgroup_hierarchy_path</varname> + <varname>container_id</varname>'.
             </li>

             <li rev="1.4.0">
               <codeph>-rm_always_use_defaults</codeph>: If this Boolean option is enabled, Impala ignores computed
               estimates and always obtains the default memory and CPU allocation settings at the start of the
               query. These default estimates are approximately 2 CPUs and 4 GB of memory, possibly varying slightly
               depending on cluster size, workload, and so on. Where practical, enable
               <codeph>-rm_always_use_defaults</codeph> whenever resource management is used, and relying on these
               default values (that is, leaving out the two following options).
             </li>

             <li rev="1.4.0">
               <codeph>-rm_default_memory=<varname>size</varname></codeph>: Optionally sets the default estimate for
               memory usage for each query. You can use suffixes such as M and G for megabytes and gigabytes, the
               same as with the <xref href="impala_mem_limit.xml#mem_limit">MEM_LIMIT</xref> query option. Only has
               an effect when <codeph>-rm_always_use_defaults</codeph> is also enabled.
             </li>

             <li rev="1.4.0">
               <codeph>-rm_default_cpu_cores</codeph>: Optionally sets the default estimate for number of virtual
               CPU cores for each query. Only has an effect when <codeph>-rm_always_use_defaults</codeph> is also
               enabled.
             </li>
           </ul>
         </p>

       </conbody>
     </concept>
 -->

     <concept id="rm_query_options">

       <title>impala-shell Query Options for Resource Management</title>
   <prolog>
     <metadata>
       <data name="Category" value="Impala Query Options"/>
     </metadata>
   </prolog>

       <conbody>

         <p>
           Before issuing SQL statements through the <cmdname>impala-shell</cmdname> interpreter, you can use the
           <codeph>SET</codeph> command to configure the following parameters related to resource management:
         </p>

         <ul id="ul_nzt_twf_jp">
           <li>
             <xref href="impala_explain_level.xml#explain_level"/>
           </li>

           <li>
             <xref href="impala_mem_limit.xml#mem_limit"/>
           </li>

         </ul>
       </conbody>
     </concept>

 <!-- Parent topic is going away, so former subtopic is hoisted up a level.
   </concept>
 -->

   <concept id="rm_limitations">

     <title>Limitations of Resource Management for Impala</title>

     <conbody>

 <!-- Conditionalizing some content here with audience="hidden" because there are already some XML comments
      inside the list, so not practical to enclose the whole thing in XML comments. -->

       <p audience="hidden">
         Currently, Impala has the following limitations for resource management of Impala queries:
       </p>

       <ul audience="hidden">
         <li>
           Table statistics are required, and column statistics are highly valuable, for Impala to produce accurate
           estimates of how much memory to request from YARN. See
           <xref href="impala_perf_stats.xml#perf_table_stats"/> and
           <xref href="impala_perf_stats.xml#perf_column_stats"/> for instructions on gathering both kinds of
           statistics, and <xref href="impala_explain.xml#explain"/> for the extended <codeph>EXPLAIN</codeph>
           output where you can check that statistics are available for a specific table and set of columns.
         </li>

         <li>
           If the Impala estimate of required memory is lower than is actually required for a query, Impala
           dynamically expands the amount of requested memory.
 <!--          Impala will cancel the query when it exceeds the requested memory size. -->
           Queries might still be cancelled if the reservation expansion fails, for example if there are
           insufficient remaining resources for that pool, or the expansion request takes long enough that it
           exceeds the query timeout interval, or because of YARN preemption.
 <!--          This could happen in some cases with complex queries, even when table and column statistics are available. -->
           You can see the actual memory usage after a failed query by issuing a <codeph>PROFILE</codeph> command in
           <cmdname>impala-shell</cmdname>. Specify a larger memory figure with the <codeph>MEM_LIMIT</codeph>
           query option and re-try the query.
         </li>
       </ul>

       <p rev="2.0.0">
         The <codeph>MEM_LIMIT</codeph> query option, and the other resource-related query options, are settable
         through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
         lifted.
       </p>
     </conbody>
   </concept>
 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept rev="1.2" id="resource_management">

	<title>Resource Management for Impala</title>
	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="YARN"/>
	<data name="Category" value="Resource Management"/>
	<data name="Category" value="Administrators"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Data Analysts"/>
	</metadata>
	</prolog>

	<conbody>

	<note conref="../shared/impala_common.xml#common/impala_llama_obsolete"/>

	<p>
	You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
	that run jobs from many Hadoop components.
	</p>

	<p outputclass="toc inpage"/>
	</conbody>

	<concept id="rm_enforcement">

	<title>How Resource Limits Are Enforced</title>
	<prolog>
	<metadata>
	<data name="Category" value="Concepts"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	Limits on memory usage are enforced by Impala's process memory limit (the <codeph>MEM_LIMIT</codeph>
	query option setting). The admission control feature checks this setting to decide how many queries
	can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
	spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
	</p>

	</conbody>
	</concept>

	<!--
	<concept id="rm_enable">

	<title>Enabling Resource Management for Impala</title>
	<prolog>
	<metadata>
	<data name="Category" value="Configuring"/>
	<data name="Category" value="Starting and Stopping"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	To enable resource management for Impala, first you <xref href="#rm_prereqs">set up the YARN
	service for your cluster</xref>. Then you <xref href="#rm_options">add startup options and customize
	resource management settings</xref> for the Impala services.
	</p>
	</conbody>

	<concept id="rm_prereqs">

	<title>Required Setup for Resource Management with Impala</title>

	<conbody>

	<p>
	YARN is the general-purpose service that manages resources for many Hadoop components within a
	<keyword keyref="distro"/> cluster.
	</p>

	</conbody>
	</concept>

	<concept id="rm_options">

	<title>impalad Startup Options for Resource Management</title>

	<conbody>

	<p id="resource_management_impalad_options">
	The following startup options for <cmdname>impalad</cmdname> enable resource management and customize its
	parameters for your cluster configuration:
	<ul>
	<li>
	<codeph>-enable_rm</codeph>: Whether to enable resource management or not, either
	<codeph>true</codeph> or <codeph>false</codeph>. The default is <codeph>false</codeph>. None of the
	other resource management options have any effect unless <codeph>-enable_rm</codeph> is turned on.
	</li>

	<li>
	<codeph>-cgroup_hierarchy_path</codeph>: Path where YARN will create cgroups for granted
	resources. Impala assumes that the cgroup for an allocated container is created in the path
	'<varname>cgroup_hierarchy_path</varname> + <varname>container_id</varname>'.
	</li>

	<li rev="1.4.0">
	<codeph>-rm_always_use_defaults</codeph>: If this Boolean option is enabled, Impala ignores computed
	estimates and always obtains the default memory and CPU allocation settings at the start of the
	query. These default estimates are approximately 2 CPUs and 4 GB of memory, possibly varying slightly
	depending on cluster size, workload, and so on. Where practical, enable
	<codeph>-rm_always_use_defaults</codeph> whenever resource management is used, and relying on these
	default values (that is, leaving out the two following options).
	</li>

	<li rev="1.4.0">
	<codeph>-rm_default_memory=<varname>size</varname></codeph>: Optionally sets the default estimate for
	memory usage for each query. You can use suffixes such as M and G for megabytes and gigabytes, the
	same as with the <xref href="impala_mem_limit.xml#mem_limit">MEM_LIMIT</xref> query option. Only has
	an effect when <codeph>-rm_always_use_defaults</codeph> is also enabled.
	</li>

	<li rev="1.4.0">
	<codeph>-rm_default_cpu_cores</codeph>: Optionally sets the default estimate for number of virtual
	CPU cores for each query. Only has an effect when <codeph>-rm_always_use_defaults</codeph> is also
	enabled.
	</li>
	</ul>
	</p>

	</conbody>
	</concept>
	-->

	<concept id="rm_query_options">

	<title>impala-shell Query Options for Resource Management</title>
	<prolog>
	<metadata>
	<data name="Category" value="Impala Query Options"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	Before issuing SQL statements through the <cmdname>impala-shell</cmdname> interpreter, you can use the
	<codeph>SET</codeph> command to configure the following parameters related to resource management:
	</p>

	<ul id="ul_nzt_twf_jp">
	<li>
	<xref href="impala_explain_level.xml#explain_level"/>
	</li>

	<li>
	<xref href="impala_mem_limit.xml#mem_limit"/>
	</li>

	</ul>
	</conbody>
	</concept>

	<!-- Parent topic is going away, so former subtopic is hoisted up a level.
	</concept>
	-->

	<concept id="rm_limitations">

	<title>Limitations of Resource Management for Impala</title>

	<conbody>

	<!-- Conditionalizing some content here with audience="hidden" because there are already some XML comments
	inside the list, so not practical to enclose the whole thing in XML comments. -->

	<p audience="hidden">
	Currently, Impala has the following limitations for resource management of Impala queries:
	</p>

	<ul audience="hidden">
	<li>
	Table statistics are required, and column statistics are highly valuable, for Impala to produce accurate
	estimates of how much memory to request from YARN. See
	<xref href="impala_perf_stats.xml#perf_table_stats"/> and
	<xref href="impala_perf_stats.xml#perf_column_stats"/> for instructions on gathering both kinds of
	statistics, and <xref href="impala_explain.xml#explain"/> for the extended <codeph>EXPLAIN</codeph>
	output where you can check that statistics are available for a specific table and set of columns.
	</li>

	<li>
	If the Impala estimate of required memory is lower than is actually required for a query, Impala
	dynamically expands the amount of requested memory.
	<!-- Impala will cancel the query when it exceeds the requested memory size. -->
	Queries might still be cancelled if the reservation expansion fails, for example if there are
	insufficient remaining resources for that pool, or the expansion request takes long enough that it
	exceeds the query timeout interval, or because of YARN preemption.
	<!-- This could happen in some cases with complex queries, even when table and column statistics are available. -->
	You can see the actual memory usage after a failed query by issuing a <codeph>PROFILE</codeph> command in
	<cmdname>impala-shell</cmdname>. Specify a larger memory figure with the <codeph>MEM_LIMIT</codeph>
	query option and re-try the query.
	</li>
	</ul>

	<p rev="2.0.0">
	The <codeph>MEM_LIMIT</codeph> query option, and the other resource-related query options, are settable
	through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
	lifted.
	</p>
	</conbody>
	</concept>
	</concept>