blob: ba28791d7c1ac8bf7fd2ccdfd3e4099790eccf09 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.3.0" id="admission_control">
<title>Configuring Admission Control</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Querying"/>
<data name="Category" value="Admission Control"/>
<data name="Category" value="Resource Management"/>
</metadata>
</prolog>
<conbody>
<p>
Impala includes features that balance and maximize resources in your
<keyword keyref="hadoop_distro"/> cluster. This topic describes how you can improve
efficiency of your a <keyword keyref="hadoop_distro"/> cluster using those features.
</p>
<p>
The configuration options for admission control range from the simple (a single resource
pool with a single set of options) to the complex (multiple resource pools with different
options, each pool handling queries for a different set of users and groups).
</p>
</conbody>
<concept id="concept_bz4_vxz_jgb">
<title>Configuring Admission Control in Command Line Interface</title>
<conbody>
<p>
To configure admission control, use a combination of startup options for the Impala
daemon and edit or create the configuration files
<filepath>fair-scheduler.xml</filepath> and <filepath>llama-site.xml</filepath>.
</p>
<p>
For a straightforward configuration using a single resource pool named
<codeph>default</codeph>, you can specify configuration options on the command line and
skip the <filepath>fair-scheduler.xml</filepath> and <filepath>llama-site.xml</filepath>
configuration files.
</p>
<p>
For an advanced configuration with multiple resource pools using different settings:
<ol>
<li>
Set up the <filepath>fair-scheduler.xml</filepath> and
<filepath>llama-site.xml</filepath> configuration files manually.
</li>
<li>
Provide the paths to each one using the <cmdname>impalad</cmdname> command-line
options, <codeph>&#8209;&#8209;fair_scheduler_allocation_path</codeph> and
<codeph>&#8209;&#8209;llama_site_path</codeph> respectively.
</li>
</ol>
</p>
<p>
The Impala admission control feature uses the Fair Scheduler configuration settings to
determine how to map users and groups to different resource pools. For example, you
might set up different resource pools with separate memory limits, and maximum number of
concurrent and queued queries, for different categories of users within your
organization. For details about all the Fair Scheduler configuration settings, see the
<xref
href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration"
scope="external" format="html">Apache
wiki</xref>.
</p>
<p>
The Impala admission control feature uses a small subset of possible settings from the
<filepath>llama-site.xml</filepath> configuration file:
</p>
<codeblock>llama.am.throttling.maximum.placed.reservations.<varname>queue_name</varname>
llama.am.throttling.maximum.queued.reservations.<varname>queue_name</varname>
<ph rev="2.5.0 IMPALA-2538">impala.admission-control.pool-default-query-options.<varname>queue_name</varname>
impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph>
</codeblock>
<p rev="2.5.0 IMPALA-2538">
The <codeph>impala.admission-control.pool-queue-timeout-ms</codeph> setting specifies
the timeout value for this pool in milliseconds.
</p>
<p rev="2.5.0 IMPALA-2538"
>
The<codeph>impala.admission-control.pool-default-query-options</codeph> settings
designates the default query options for all queries that run in this pool. Its argument
value is a comma-delimited string of 'key=value' pairs, <codeph>'key1=val1,key2=val2,
...'</codeph>. For example, this is where you might set a default memory limit for all
queries in the pool, using an argument such as <codeph>MEM_LIMIT=5G</codeph>.
</p>
<p rev="2.5.0 IMPALA-2538">
The <codeph>impala.admission-control.*</codeph> configuration settings are available in
<keyword keyref="impala25_full"/> and higher.
</p>
</conbody>
<concept id="concept_cz4_vxz_jgb">
<title>Example of Admission Control Configuration</title>
<conbody>
<p>
Here are sample <filepath>fair-scheduler.xml</filepath> and
<filepath>llama-site.xml</filepath> files that define resource pools
<codeph>root.default</codeph>, <codeph>root.development</codeph>, and
<codeph>root.production</codeph>. These files define resource pools for Impala
admission control and are separate from the similar
<codeph>fair-scheduler.xml</codeph>that defines resource pools for YARN.
</p>
<p>
<b>fair-scheduler.xml:</b>
</p>
<p>
Although Impala does not use the <codeph>vcores</codeph> value, you must still specify
it to satisfy YARN requirements for the file contents.
</p>
<p>
Each <codeph>&lt;aclSubmitApps&gt;</codeph> tag (other than the one for
<codeph>root</codeph>) contains a comma-separated list of users, then a space, then a
comma-separated list of groups; these are the users and groups allowed to submit
Impala statements to the corresponding resource pool.
</p>
<p>
If you leave the <codeph>&lt;aclSubmitApps&gt;</codeph> element empty for a pool,
nobody can submit directly to that pool; child pools can specify their own
<codeph>&lt;aclSubmitApps&gt;</codeph> values to authorize users and groups to submit
to those pools.
</p>
<codeblock>&lt;allocations>
&lt;queue name="root">
&lt;aclSubmitApps> &lt;/aclSubmitApps>
&lt;queue name="default">
&lt;maxResources>50000 mb, 0 vcores&lt;/maxResources>
&lt;aclSubmitApps>*&lt;/aclSubmitApps>
&lt;/queue>
&lt;queue name="development">
&lt;maxResources>200000 mb, 0 vcores&lt;/maxResources>
&lt;aclSubmitApps>user1,user2 dev,ops,admin&lt;/aclSubmitApps>
&lt;/queue>
&lt;queue name="production">
&lt;maxResources>1000000 mb, 0 vcores&lt;/maxResources>
&lt;aclSubmitApps> ops,admin&lt;/aclSubmitApps>
&lt;/queue>
&lt;/queue>
&lt;queuePlacementPolicy>
&lt;rule name="specified" create="false"/>
&lt;rule name="default" />
&lt;/queuePlacementPolicy>
&lt;/allocations>
</codeblock>
<p>
<b>llama-site.xml:</b>
</p>
<codeblock rev="2.5.0 IMPALA-2538">
&lt;?xml version="1.0" encoding="UTF-8"?>
&lt;configuration>
&lt;property>
&lt;name>llama.am.throttling.maximum.placed.reservations.root.default&lt;/name>
&lt;value>10&lt;/value>
&lt;/property>
&lt;property>
&lt;name>llama.am.throttling.maximum.queued.reservations.root.default&lt;/name>
&lt;value>50&lt;/value>
&lt;/property>
&lt;property>
&lt;name>impala.admission-control.pool-default-query-options.root.default&lt;/name>
&lt;value>mem_limit=128m,query_timeout_s=20,max_io_buffers=10&lt;/value>
&lt;/property>
&lt;property>
&lt;name>impala.admission-control.<b>pool-queue-timeout-ms</b>.root.default&lt;/name>
&lt;value>30000&lt;/value>
&lt;/property>
&lt;property>
&lt;name>impala.admission-control.<b>max-query-mem-limit</b>.root.default.regularPool&lt;/name>
&lt;value>1610612736&lt;/value>&lt;!--1.5GB-->
&lt;/property>
&lt;property>
&lt;name>impala.admission-control.<b>min-query-mem-limit</b>.root.default.regularPool&lt;/name>
&lt;value>52428800&lt;/value>&lt;!--50MB-->
&lt;/property>
&lt;property>
&lt;name>impala.admission-control.<b>clamp-mem-limit-query-option</b>.root.default.regularPool&lt;/name>
&lt;value>true&lt;/value>
&lt;/property>
</codeblock>
</conbody>
</concept>
</concept>
<concept id="concept_zy4_vxz_jgb">
<title>Configuring Cluster-wide Admission Control</title>
<prolog>
<metadata>
<data name="Category" value="Configuring"/>
</metadata>
</prolog>
<conbody>
<note type="important">
These settings only apply if you enable admission control but leave dynamic resource
pools disabled. In <keyword
keyref="impala25_full"/> and higher, we recommend
that you set up dynamic resource pools and customize the settings for each pool as
described in <xref href="#concept_bz4_vxz_jgb" format="dita"/>.
</note>
<p>
The following Impala configuration options let you adjust the settings of the admission
control feature. When supplying the options on the <cmdname>impalad</cmdname> command
line, prepend the option name with <codeph>--</codeph>.
</p>
<dl>
<dlentry>
<dt>
<codeph>queue_wait_timeout_ms</codeph>
</dt>
<dd>
<b>Purpose:</b> Maximum amount of time (in milliseconds) that a request waits to be
admitted before timing out.
<p>
<b>Type:</b> <codeph>int64</codeph>
</p>
<p>
<b>Default:</b> <codeph>60000</codeph>
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>default_pool_max_requests</codeph>
</dt>
<dd>
<b>Purpose:</b> Maximum number of concurrent outstanding requests allowed to run
before incoming requests are queued. Because this limit applies cluster-wide, but
each Impala node makes independent decisions to run queries immediately or queue
them, it is a soft limit; the overall number of concurrent queries might be slightly
higher during times of heavy load. A negative value indicates no limit. Ignored if
<codeph>fair_scheduler_config_path</codeph> and <codeph>llama_site_path</codeph> are
set.
<p>
<b>Type:</b> <codeph>int64</codeph>
</p>
<p>
<b>Default:</b> <ph rev="2.5.0">-1, meaning unlimited (prior to
<keyword
keyref="impala25_full"/> the default was 200)</ph>
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>default_pool_max_queued</codeph>
</dt>
<dd>
<b>Purpose:</b> Maximum number of requests allowed to be queued before rejecting
requests. Because this limit applies cluster-wide, but each Impala node makes
independent decisions to run queries immediately or queue them, it is a soft limit;
the overall number of queued queries might be slightly higher during times of heavy
load. A negative value or 0 indicates requests are always rejected once the maximum
concurrent requests are executing. Ignored if
<codeph>fair_scheduler_config_path</codeph> and <codeph>llama_site_path</codeph> are
set.
<p>
<b>Type:</b> <codeph>int64</codeph>
</p>
<p>
<b>Default:</b> <ph rev="2.5.0">unlimited</ph>
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>default_pool_mem_limit</codeph>
</dt>
<dd>
<b>Purpose:</b> Maximum amount of memory (across the entire cluster) that all
outstanding requests in this pool can use before new requests to this pool are
queued. Specified in bytes, megabytes, or gigabytes by a number followed by the
suffix <codeph>b</codeph> (optional), <codeph>m</codeph>, or <codeph>g</codeph>,
either uppercase or lowercase. You can specify floating-point values for megabytes
and gigabytes, to represent fractional numbers such as <codeph>1.5</codeph>. You can
also specify it as a percentage of the physical memory by specifying the suffix
<codeph>%</codeph>. 0 or no setting indicates no limit. Defaults to bytes if no unit
is given. Because this limit applies cluster-wide, but each Impala node makes
independent decisions to run queries immediately or queue them, it is a soft limit;
the overall memory used by concurrent queries might be slightly higher during times
of heavy load. Ignored if <codeph>fair_scheduler_config_path</codeph> and
<codeph>llama_site_path</codeph> are set.
<note
conref="../shared/impala_common.xml#common/admission_compute_stats"/>
<p conref="../shared/impala_common.xml#common/type_string"/>
<p>
<b>Default:</b> <codeph>""</codeph> (empty string, meaning unlimited)
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>disable_pool_max_requests</codeph>
</dt>
<dd>
<b>Purpose:</b> Disables all per-pool limits on the maximum number of running
requests.
<p>
<b>Type:</b> Boolean
</p>
<p>
<b>Default:</b> <codeph>false</codeph>
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>disable_pool_mem_limits</codeph>
</dt>
<dd>
<b>Purpose:</b> Disables all per-pool mem limits.
<p>
<b>Type:</b> Boolean
</p>
<p>
<b>Default:</b> <codeph>false</codeph>
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>fair_scheduler_allocation_path</codeph>
</dt>
<dd>
<b>Purpose:</b> Path to the fair scheduler allocation file
(<codeph>fair-scheduler.xml</codeph>).
<p
conref="../shared/impala_common.xml#common/type_string"/>
<p>
<b>Default:</b> <codeph>""</codeph> (empty string)
</p>
<p>
<b>Usage notes:</b> Admission control only uses a small subset of the settings
that can go in this file, as described below. For details about all the Fair
Scheduler configuration settings, see the
<xref
href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration"
scope="external" format="html">Apache
wiki</xref>.
</p>
</dd>
</dlentry>
<dlentry>
<dt>
<codeph>llama_site_path</codeph>
</dt>
<dd>
<b>Purpose:</b> Path to the configuration file used by admission control
(<codeph>llama-site.xml</codeph>). If set,
<codeph>fair_scheduler_allocation_path</codeph> must also be set.
<p conref="../shared/impala_common.xml#common/type_string"/>
<p>
<b>Default:</b> <codeph>""</codeph> (empty string)
</p>
<p>
<b>Usage notes:</b> Admission control only uses a few of the settings that can go
in this file, as described below.
</p>
</dd>
</dlentry>
</dl>
</conbody>
</concept>
</concept>