blob: f243fe0b746e560b8b86286ea2261d41c73e1b2a [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="copyright" content="(C) Copyright 2023" />
<meta name="DC.rights.owner" content="(C) Copyright 2023" />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)" />
<meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" />
<meta name="prodname" content="Impala" />
<meta name="prodname" content="Impala" />
<meta name="version" content="Impala 3.4.x" />
<meta name="version" content="Impala 3.4.x" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="exec_single_node_rows_threshold" />
<link rel="stylesheet" type="text/css" href="../commonltr.css" />
<title>EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</title>
</head>
<body id="exec_single_node_rows_threshold">
<h1 class="title topictitle1" id="ariaid-title1">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (<span class="keyword">Impala 2.1</span> or higher only)</h1>
<div class="body conbody">
<p class="p">
This setting controls the cutoff point (in terms of number of rows scanned) below which Impala treats a query
as a <span class="q">"small"</span> query, turning off optimizations such as parallel execution and native code generation. The
overhead for these optimizations is applicable for queries involving substantial amounts of data, but it
makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries
allows Impala to complete them more quickly, keeping admission control slots, CPU, memory, and so on
available for resource-intensive queries.
</p>
<p class="p">
<strong class="ph b">Syntax:</strong>
</p>
<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=<var class="keyword varname">number_of_rows</var></code></pre>
<p class="p">
<strong class="ph b">Type:</strong> numeric
</p>
<p class="p">
<strong class="ph b">Default:</strong> 100
</p>
<p class="p">
<strong class="ph b">Usage notes:</strong> Typically, you increase the default value to make this optimization apply to more queries.
If incorrect or corrupted table and column statistics cause Impala to apply this optimization
incorrectly to queries that actually involve substantial work, you might see the queries being slower as a
result of remote reads. In that case, recompute statistics with the <code class="ph codeph">COMPUTE STATS</code>
or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement. If there is a problem collecting accurate
statistics, you can turn this feature off by setting the value to -1.
</p>
<p class="p">
<strong class="ph b">Internal details:</strong>
</p>
<p class="p">
This setting applies to queries where the number of rows processed can be accurately
determined, either through table and column statistics, or by the presence of a
<code class="ph codeph">LIMIT</code> clause. If Impala cannot accurately estimate the number of rows,
then this setting does not apply.
</p>
<p class="p">
In <span class="keyword">Impala 2.3</span> and higher, where Impala supports the complex data types <code class="ph codeph">STRUCT</code>,
<code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>, if a query refers to any column of those types,
the small-query optimization is turned off for that query regardless of the
<code class="ph codeph">EXEC_SINGLE_NODE_ROWS_THRESHOLD</code> setting.
</p>
<p class="p">
For a query that is determined to be <span class="q">"small"</span>, all work is performed on the coordinator node. This might
result in some I/O being performed by remote reads. The savings from not distributing the query work and not
generating native code are expected to outweigh any overhead from the remote reads.
</p>
<p class="p">
<strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10</span>
</p>
<p class="p">
<strong class="ph b">Examples:</strong>
</p>
<p class="p">
A common use case is to query just a few rows from a table to inspect typical data values. In this example,
Impala does not parallelize the query or perform native code generation because the result set is guaranteed
to be smaller than the threshold value from this query option:
</p>
<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=500;
SELECT * FROM enormous_table LIMIT 300;
</code></pre>
</div>
<div class="related-links">
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div>
</div>
</div></body>
</html>