docs/topics/impala_mem_limit.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="mem_limit">

   <title>MEM_LIMIT Query Option</title>

   <titlealts audience="PDF">

     <navtitle>MEM_LIMIT</navtitle>

   </titlealts>

   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Impala Query Options"/>
       <data name="Category" value="Scalability"/>
       <data name="Category" value="Memory"/>
       <data name="Category" value="Troubleshooting"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <p>
       The MEM_LIMIT query option defines the maximum amount of memory a query can allocate on
       each node. The total memory that can be used by a query is the <codeph>MEM_LIMIT</codeph>
       times the number of nodes.
     </p>

     <p rev="">
       There are two levels of memory limit for Impala. The
       <codeph>&#8209;&#8209;mem_limit</codeph> startup option sets an overall limit for the
       <cmdname>impalad</cmdname> process (which handles multiple queries concurrently). That
       process memory limit can be expressed either as a percentage of RAM available to the
       process such as <codeph>&#8209;&#8209;mem_limit=70%</codeph> or as a fixed amount of
       memory, such as <codeph>100gb</codeph>. The memory available to the process is based on
       the host's physical memory and, since Impala 3.2, memory limits from Linux Control Groups.
       E.g. if an <cmdname>impalad</cmdname> process is running in a Docker container on a host
       with 100GB of memory, the memory available is 100GB or the Docker container's memory
       limit, whichever is less.
     </p>

     <p rev="">
       The <codeph>MEM_LIMIT</codeph> query option, which you set through
       <cmdname>impala-shell</cmdname> or the <codeph>SET</codeph> statement in a JDBC or ODBC
       application, applies to each individual query. The <codeph>MEM_LIMIT</codeph> query option
       is usually expressed as a fixed size such as <codeph>10gb</codeph>, and must always be
       less than the <cmdname>impalad</cmdname> memory limit.
     </p>

     <p rev="">
       If query processing exceeds the specified memory limit on any node, either the per-query
       limit or the <cmdname>impalad</cmdname> limit, Impala cancels the query automatically.
       Memory limits are checked periodically during query processing, so the actual memory in
       use might briefly exceed the limit without the query being cancelled.
     </p>

     <p>
       <b>Type:</b> numeric
     </p>

     <p rev="">
       <b>Units:</b> A numeric argument represents memory size in bytes; you can also use a
       suffix of <codeph>m</codeph> or <codeph>mb</codeph> for megabytes, or more commonly
       <codeph>g</codeph> or <codeph>gb</codeph> for gigabytes. If you specify a value with
       unrecognized formats, subsequent queries fail with an error.
     </p>

     <p rev="">
       <b>Default:</b> 0 (unlimited)
     </p>

     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

     <p rev="">
       The <codeph>MEM_LIMIT</codeph> setting is primarily useful in a high-concurrency setting,
       or on a cluster with a workload shared between Impala and other data processing
       components. You can prevent any query from accidentally using much more memory than
       expected, which could negatively impact other Impala queries.
     </p>

     <p rev="">
       Use the output of the <codeph>SUMMARY</codeph> command in <cmdname>impala-shell</cmdname>
       to get a report of memory used for each phase of your most heavyweight queries on each
       node, and then set a <codeph>MEM_LIMIT</codeph> somewhat higher than that. See
       <xref href="impala_explain_plan.xml#perf_summary"/> for usage information about the
       <codeph>SUMMARY</codeph> command.
     </p>

     <p conref="../shared/impala_common.xml#common/example_blurb" rev=""/>

     <p rev="">
       The following examples show how to set the <codeph>MEM_LIMIT</codeph> query option using a
       fixed number of bytes, or suffixes representing gigabytes or megabytes.
     </p>

 <codeblock rev="">
 [localhost:21000] > set mem_limit=3000000000;
 MEM_LIMIT set to 3000000000
 [localhost:21000] > select 5;
 Query: select 5
 +---+
 | 5 |
 +---+
 | 5 |
 +---+

 [localhost:21000] > set mem_limit=3g;
 MEM_LIMIT set to 3g
 [localhost:21000] > select 5;
 Query: select 5
 +---+
 | 5 |
 +---+
 | 5 |
 +---+

 [localhost:21000] > set mem_limit=3gb;
 MEM_LIMIT set to 3gb
 [localhost:21000] > select 5;
 +---+
 | 5 |
 +---+
 | 5 |
 +---+

 [localhost:21000] > set mem_limit=3m;
 MEM_LIMIT set to 3m
 [localhost:21000] > select 5;
 +---+
 | 5 |
 +---+
 | 5 |
 +---+
 [localhost:21000] > set mem_limit=3mb;
 MEM_LIMIT set to 3mb
 [localhost:21000] > select 5;
 +---+
 | 5 |
 +---+
 </codeblock>

     <p rev="">
       The following examples show how unrecognized <codeph>MEM_LIMIT</codeph> values lead to
       errors for subsequent queries.
     </p>

 <codeblock rev="">
 [localhost:21000] > set mem_limit=3pb;
 MEM_LIMIT set to 3pb
 [localhost:21000] > select 5;
 ERROR: Failed to parse query memory limit from '3pb'.

 [localhost:21000] > set mem_limit=xyz;
 MEM_LIMIT set to xyz
 [localhost:21000] > select 5;
 Query: select 5
 ERROR: Failed to parse query memory limit from 'xyz'.
 </codeblock>

     <p rev="">
       The following examples shows the automatic query cancellation when the
       <codeph>MEM_LIMIT</codeph> value is exceeded on any host involved in the Impala query.
       First it runs a successful query and checks the largest amount of memory used on any node
       for any stage of the query. Then it sets an artificially low <codeph>MEM_LIMIT</codeph>
       setting so that the same query cannot run.
     </p>

 <codeblock rev="">
 [localhost:21000] > select count(*) from customer;
 Query: select count(*) from customer
 +----------+
 | count(*) |
 +----------+
 | 150000   |
 +----------+

 [localhost:21000] > select count(distinct c_name) from customer;
 Query: select count(distinct c_name) from customer
 +------------------------+
 | count(distinct c_name) |
 +------------------------+
 | 150000                 |
 +------------------------+

 [localhost:21000] > summary;
 +--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
 | Operator     | #Hosts | #Inst  | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail        |
 +--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
 | 06:AGGREGATE | 1      | 1      | 230.00ms | 230.00ms | 1       | 1          | 16.00 KB | -1 B          | FINALIZE      |
 | 05:EXCHANGE  | 1      | 1      | 43.44us  | 43.44us  | 1       | 1          | 0 B      | -1 B          | UNPARTITIONED |
 | 02:AGGREGATE | 1      | 1      | 227.14ms | 227.14ms | 1       | 1          | 12.00 KB | 10.00 MB      |               |
 | 04:AGGREGATE | 1      | 1      | 126.27ms | 126.27ms | 150.00K | 150.00K    | 15.17 MB | 10.00 MB      |               |
 | 03:EXCHANGE  | 1      | 1      | 44.07ms  | 44.07ms  | 150.00K | 150.00K    | 0 B      | 0 B           | HASH(c_name)  |
 <b>| 01:AGGREGATE | 1      | 1      | 361.94ms | 361.94ms | 150.00K | 150.00K    | 23.04 MB | 10.00 MB      |               |</b>
 | 00:SCAN HDFS | 1      | 1      | 43.64ms  | 43.64ms  | 150.00K | 150.00K    | 24.19 MB | 64.00 MB      | tpch.customer |
 +--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+

 [localhost:21000] > set mem_limit=15mb;
 MEM_LIMIT set to 15mb
 [localhost:21000] > select count(distinct c_name) from customer;
 Query: select count(distinct c_name) from customer
 ERROR:
 Memory limit exceeded
 Query did not have enough memory to get the minimum required buffers in the block manager.
 </codeblock>

   </conbody>

 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept id="mem_limit">

	<title>MEM_LIMIT Query Option</title>

	<titlealts audience="PDF">

	<navtitle>MEM_LIMIT</navtitle>

	</titlealts>

	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="Impala Query Options"/>
	<data name="Category" value="Scalability"/>
	<data name="Category" value="Memory"/>
	<data name="Category" value="Troubleshooting"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Data Analysts"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	The MEM_LIMIT query option defines the maximum amount of memory a query can allocate on
	each node. The total memory that can be used by a query is the <codeph>MEM_LIMIT</codeph>
	times the number of nodes.
	</p>

	<p rev="">
	There are two levels of memory limit for Impala. The
	<codeph>‑‑mem_limit</codeph> startup option sets an overall limit for the
	<cmdname>impalad</cmdname> process (which handles multiple queries concurrently). That
	process memory limit can be expressed either as a percentage of RAM available to the
	process such as <codeph>‑‑mem_limit=70%</codeph> or as a fixed amount of
	memory, such as <codeph>100gb</codeph>. The memory available to the process is based on
	the host's physical memory and, since Impala 3.2, memory limits from Linux Control Groups.
	E.g. if an <cmdname>impalad</cmdname> process is running in a Docker container on a host
	with 100GB of memory, the memory available is 100GB or the Docker container's memory
	limit, whichever is less.
	</p>

	<p rev="">
	The <codeph>MEM_LIMIT</codeph> query option, which you set through
	<cmdname>impala-shell</cmdname> or the <codeph>SET</codeph> statement in a JDBC or ODBC
	application, applies to each individual query. The <codeph>MEM_LIMIT</codeph> query option
	is usually expressed as a fixed size such as <codeph>10gb</codeph>, and must always be
	less than the <cmdname>impalad</cmdname> memory limit.
	</p>

	<p rev="">
	If query processing exceeds the specified memory limit on any node, either the per-query
	limit or the <cmdname>impalad</cmdname> limit, Impala cancels the query automatically.
	Memory limits are checked periodically during query processing, so the actual memory in
	use might briefly exceed the limit without the query being cancelled.
	</p>

	<p>
	<b>Type:</b> numeric
	</p>

	<p rev="">
	<b>Units:</b> A numeric argument represents memory size in bytes; you can also use a
	suffix of <codeph>m</codeph> or <codeph>mb</codeph> for megabytes, or more commonly
	<codeph>g</codeph> or <codeph>gb</codeph> for gigabytes. If you specify a value with
	unrecognized formats, subsequent queries fail with an error.
	</p>

	<p rev="">
	<b>Default:</b> 0 (unlimited)
	</p>

	<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

	<p rev="">
	The <codeph>MEM_LIMIT</codeph> setting is primarily useful in a high-concurrency setting,
	or on a cluster with a workload shared between Impala and other data processing
	components. You can prevent any query from accidentally using much more memory than
	expected, which could negatively impact other Impala queries.
	</p>

	<p rev="">
	Use the output of the <codeph>SUMMARY</codeph> command in <cmdname>impala-shell</cmdname>
	to get a report of memory used for each phase of your most heavyweight queries on each
	node, and then set a <codeph>MEM_LIMIT</codeph> somewhat higher than that. See
	<xref href="impala_explain_plan.xml#perf_summary"/> for usage information about the
	<codeph>SUMMARY</codeph> command.
	</p>

	<p conref="../shared/impala_common.xml#common/example_blurb" rev=""/>

	<p rev="">
	The following examples show how to set the <codeph>MEM_LIMIT</codeph> query option using a
	fixed number of bytes, or suffixes representing gigabytes or megabytes.
	</p>

	<codeblock rev="">
	[localhost:21000] > set mem_limit=3000000000;
	MEM_LIMIT set to 3000000000
	[localhost:21000] > select 5;
	Query: select 5
	+---+
	\| 5 \|
	+---+
	\| 5 \|
	+---+

	[localhost:21000] > set mem_limit=3g;
	MEM_LIMIT set to 3g
	[localhost:21000] > select 5;
	Query: select 5
	+---+
	\| 5 \|
	+---+
	\| 5 \|
	+---+

	[localhost:21000] > set mem_limit=3gb;
	MEM_LIMIT set to 3gb
	[localhost:21000] > select 5;
	+---+
	\| 5 \|
	+---+
	\| 5 \|
	+---+

	[localhost:21000] > set mem_limit=3m;
	MEM_LIMIT set to 3m
	[localhost:21000] > select 5;
	+---+
	\| 5 \|
	+---+
	\| 5 \|
	+---+
	[localhost:21000] > set mem_limit=3mb;
	MEM_LIMIT set to 3mb
	[localhost:21000] > select 5;
	+---+
	\| 5 \|
	+---+
	</codeblock>

	<p rev="">
	The following examples show how unrecognized <codeph>MEM_LIMIT</codeph> values lead to
	errors for subsequent queries.
	</p>

	<codeblock rev="">
	[localhost:21000] > set mem_limit=3pb;
	MEM_LIMIT set to 3pb
	[localhost:21000] > select 5;
	ERROR: Failed to parse query memory limit from '3pb'.

	[localhost:21000] > set mem_limit=xyz;
	MEM_LIMIT set to xyz
	[localhost:21000] > select 5;
	Query: select 5
	ERROR: Failed to parse query memory limit from 'xyz'.
	</codeblock>

	<p rev="">
	The following examples shows the automatic query cancellation when the
	<codeph>MEM_LIMIT</codeph> value is exceeded on any host involved in the Impala query.
	First it runs a successful query and checks the largest amount of memory used on any node
	for any stage of the query. Then it sets an artificially low <codeph>MEM_LIMIT</codeph>
	setting so that the same query cannot run.
	</p>

	<codeblock rev="">
	[localhost:21000] > select count(*) from customer;
	Query: select count(*) from customer
	+----------+
	\| count(*) \|
	+----------+
	\| 150000 \|
	+----------+

	[localhost:21000] > select count(distinct c_name) from customer;
	Query: select count(distinct c_name) from customer
	+------------------------+
	\| count(distinct c_name) \|
	+------------------------+
	\| 150000 \|
	+------------------------+

	[localhost:21000] > summary;
	+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
	\| Operator \| #Hosts \| #Inst \| Avg Time \| Max Time \| #Rows \| Est. #Rows \| Peak Mem \| Est. Peak Mem \| Detail \|
	+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+
	\| 06:AGGREGATE \| 1 \| 1 \| 230.00ms \| 230.00ms \| 1 \| 1 \| 16.00 KB \| -1 B \| FINALIZE \|
	\| 05:EXCHANGE \| 1 \| 1 \| 43.44us \| 43.44us \| 1 \| 1 \| 0 B \| -1 B \| UNPARTITIONED \|
	\| 02:AGGREGATE \| 1 \| 1 \| 227.14ms \| 227.14ms \| 1 \| 1 \| 12.00 KB \| 10.00 MB \| \|
	\| 04:AGGREGATE \| 1 \| 1 \| 126.27ms \| 126.27ms \| 150.00K \| 150.00K \| 15.17 MB \| 10.00 MB \| \|
	\| 03:EXCHANGE \| 1 \| 1 \| 44.07ms \| 44.07ms \| 150.00K \| 150.00K \| 0 B \| 0 B \| HASH(c_name) \|
	<b>\| 01:AGGREGATE \| 1 \| 1 \| 361.94ms \| 361.94ms \| 150.00K \| 150.00K \| 23.04 MB \| 10.00 MB \| \|</b>
	\| 00:SCAN HDFS \| 1 \| 1 \| 43.64ms \| 43.64ms \| 150.00K \| 150.00K \| 24.19 MB \| 64.00 MB \| tpch.customer \|
	+--------------+--------+--------+----------+----------+---------+------------+----------+---------------+---------------+

	[localhost:21000] > set mem_limit=15mb;
	MEM_LIMIT set to 15mb
	[localhost:21000] > select count(distinct c_name) from customer;
	Query: select count(distinct c_name) from customer
	ERROR:
	Memory limit exceeded
	Query did not have enough memory to get the minimum required buffers in the block manager.
	</codeblock>

	</conbody>

	</concept>