blob: 2abfaf7ce76d1a38a2576388ab72d3c455baf5ed [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="mem_limit">
<title>MEM_LIMIT Query Option</title>
<titlealts audience="PDF"><navtitle>MEM_LIMIT</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Query Options"/>
<data name="Category" value="Scalability"/>
<data name="Category" value="Memory"/>
<data name="Category" value="Troubleshooting"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
<indexterm audience="hidden">MEM_LIMIT query option</indexterm>
When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
Therefore, the total memory that can be used by a query is the <codeph>MEM_LIMIT</codeph> times the number of nodes.
</p>
<p rev="">
There are two levels of memory limit for Impala.
The <codeph>-mem_limit</codeph> startup option sets an overall limit for the <cmdname>impalad</cmdname> process
(which handles multiple queries concurrently).
That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <codeph>-mem_limit=70%</codeph>.
The <codeph>MEM_LIMIT</codeph> query option, which you set through <cmdname>impala-shell</cmdname>
or the <codeph>SET</codeph> statement in a JDBC or ODBC application, applies to each individual query.
The <codeph>MEM_LIMIT</codeph> query option is usually expressed as a fixed size such as <codeph>10gb</codeph>,
and must always be less than the <cmdname>impalad</cmdname> memory limit.
</p>
<p rev="">
If query processing exceeds the specified memory limit on any node, either the per-query limit or the
<cmdname>impalad</cmdname> limit, Impala cancels the query automatically.
Memory limits are checked periodically during query processing, so the actual memory in use
might briefly exceed the limit without the query being cancelled.
</p>
<p>
When resource management is enabled, the mechanism for this option changes. If set, it overrides the
automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
query does not proceed until that much memory is available. The actual memory used by the query could be
lower, since some queries use much less memory than others. With resource management, the
<codeph>MEM_LIMIT</codeph> setting acts both as a hard limit on the amount of memory a query can use on any
node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
is being executed. When resource management is enabled but no <codeph>MEM_LIMIT</codeph> setting is
specified, Impala estimates the amount of memory needed on each node for each query, requests that much
memory from YARN before starting the query, and then internally sets the <codeph>MEM_LIMIT</codeph> on each
node to the requested amount of memory during the query. Thus, if the query takes more memory than was
originally estimated, Impala detects that the <codeph>MEM_LIMIT</codeph> is exceeded and cancels the query
itself.
</p>
<p>
<b>Type:</b> numeric
</p>
<p rev="">
<b>Units:</b> A numeric argument represents memory size in bytes; you can also use a suffix of <codeph>m</codeph> or <codeph>mb</codeph>
for megabytes, or more commonly <codeph>g</codeph> or <codeph>gb</codeph> for gigabytes. If you specify a value with unrecognized
formats, subsequent queries fail with an error.
</p>
<p rev="">
<b>Default:</b> 0 (unlimited)
</p>
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
<p rev="">
The <codeph>MEM_LIMIT</codeph> setting is primarily useful in a high-concurrency setting,
or on a cluster with a workload shared between Impala and other data processing components.
You can prevent any query from accidentally using much more memory than expected,
which could negatively impact other Impala queries.
</p>
<p rev="">
Use the output of the <codeph>SUMMARY</codeph> command in <cmdname>impala-shell</cmdname>
to get a report of memory used for each phase of your most heavyweight queries on each node,
and then set a <codeph>MEM_LIMIT</codeph> somewhat higher than that.
See <xref href="impala_explain_plan.xml#perf_summary"/> for usage information about
the <codeph>SUMMARY</codeph> command.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb" rev=""/>
<p rev="">
The following examples show how to set the <codeph>MEM_LIMIT</codeph> query option
using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
</p>
<codeblock rev="">
[localhost:21000] > set mem_limit=3000000000;
MEM_LIMIT set to 3000000000
[localhost:21000] > select 5;
Query: select 5
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3g;
MEM_LIMIT set to 3g
[localhost:21000] > select 5;
Query: select 5
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3gb;
MEM_LIMIT set to 3gb
[localhost:21000] > select 5;
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3m;
MEM_LIMIT set to 3m
[localhost:21000] > select 5;
+---+
| 5 |
+---+
| 5 |
+---+
[localhost:21000] > set mem_limit=3mb;
MEM_LIMIT set to 3mb
[localhost:21000] > select 5;
+---+
| 5 |
+---+
</codeblock>
<p rev="">
The following examples show how unrecognized <codeph>MEM_LIMIT</codeph>
values lead to errors for subsequent queries.
</p>
<codeblock rev="">
[localhost:21000] > set mem_limit=3tb;
MEM_LIMIT set to 3tb
[localhost:21000] > select 5;
ERROR: Failed to parse query memory limit from '3tb'.
[localhost:21000] > set mem_limit=xyz;
MEM_LIMIT set to xyz
[localhost:21000] > select 5;
Query: select 5
ERROR: Failed to parse query memory limit from 'xyz'.
</codeblock>
<p rev="">
The following examples shows the automatic query cancellation
when the <codeph>MEM_LIMIT</codeph> value is exceeded
on any host involved in the Impala query. First it runs a
successful query and checks the largest amount of memory
used on any node for any stage of the query.
Then it sets an artificially low <codeph>MEM_LIMIT</codeph>
setting so that the same query cannot run.
</p>
<codeblock rev="">
[localhost:21000] > select count(*) from customer;
Query: select count(*) from customer
+----------+
| count(*) |
+----------+
| 150000 |
+----------+
[localhost:21000] > select count(distinct c_name) from customer;
Query: select count(distinct c_name) from customer
+------------------------+
| count(distinct c_name) |
+------------------------+
| 150000 |
+------------------------+
[localhost:21000] > summary;
+--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
| 06:AGGREGATE | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE |
| 05:EXCHANGE | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
| 02:AGGREGATE | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | |
| 04:AGGREGATE | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | |
| 03:EXCHANGE | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) |
<b>| 01:AGGREGATE | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | |</b>
| 00:SCAN HDFS | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer |
+--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
[localhost:21000] > set mem_limit=15mb;
MEM_LIMIT set to 15mb
[localhost:21000] > select count(distinct c_name) from customer;
Query: select count(distinct c_name) from customer
ERROR:
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.
</codeblock>
</conbody>
</concept>