docs/build/plain-html/topics/impala_mt_dop.html - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 <meta name="copyright" content="(C) Copyright 2023" />
 <meta name="DC.rights.owner" content="(C) Copyright 2023" />
 <meta name="DC.Type" content="concept" />
 <meta name="DC.Title" content="MT_DOP Query Option" />
 <meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" />
 <meta name="prodname" content="Impala" />
 <meta name="prodname" content="Impala" />
 <meta name="version" content="Impala 3.4.x" />
 <meta name="version" content="Impala 3.4.x" />
 <meta name="DC.Format" content="XHTML" />
 <meta name="DC.Identifier" content="mt_dop" />
 <link rel="stylesheet" type="text/css" href="../commonltr.css" />
 <title>MT_DOP Query Option</title>
 </head>
 <body id="mt_dop">


   <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>


   <div class="body conbody">

     <p class="p">

       Sets the degree of intra-node parallelism used for certain operations that
       can benefit from multithreaded execution. You can specify values
       higher than zero to find the ideal balance of response time,
       memory usage, and CPU usage during statement processing.
     </p>


     <div class="note note"><span class="notetitle">Note:</span>
       <p class="p">
         The Impala execution engine is being revamped incrementally to add
         additional parallelism within a single host for certain statements and
         kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
         <span class="q">"old"</span> code path with limited intra-node parallelism.
       </p>


       <p class="p">
         Currently, <code class="ph codeph">MT_DOP</code> support varies by statement type:
       </p>

       <ul class="ul">
         <li class="li">
           <p class="p">
             <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
             <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
             <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
           </p>

         </li>

         <li class="li">
           <p class="p">
             <code class="ph codeph">SELECT</code> statements. <code class="ph codeph">MT_DOP</code> is 0 by default
             for <code class="ph codeph">SELECT</code> statements but can be set to a value greater
             than 0 to control intra-node parallelism. This may be useful to tune
             query performance and in particular to reduce execution time of
             long-running, CPU-intensive queries.
           </p>

         </li>

         <li class="li">
           <p class="p">
             <code class="ph codeph">DML</code> statements. <code class="ph codeph">MT_DOP</code> values greater
             than zero are not currently supported for DML statements. DML statements
             will produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero value.
           </p>

         </li>

         <li class="li">
           <p class="p">
             In <span class="keyword">Impala 3.4</span> and earlier, not all <code class="ph codeph">SELECT</code>
             statements support setting <code class="ph codeph">MT_DOP</code>. Specifically, only
             scan and aggregation operators, and
             local joins that do not need data exchanges (such as for nested types) are
             supported. Other <code class="ph codeph">SELECT</code> statements produce an error if
             <code class="ph codeph">MT_DOP</code> is set to a non-zero value.
           </p>

         </li>

       </ul>


     </div>


     <p class="p">
         <strong class="ph b">Type:</strong> integer
       </p>

     <p class="p">
         <strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
       </p>

     <p class="p">
       Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
       statements for Parquet tables benefit substantially from extra intra-node
       parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
       for Parquet tables.
     </p>

     <p class="p">
       <strong class="ph b">Range:</strong> 0 to 64
     </p>


     <p class="p">
         <strong class="ph b">Examples:</strong>
       </p>


     <div class="note note"><span class="notetitle">Note:</span>
       <p class="p">
         Any timing figures in the following examples are on a small, lightly loaded development cluster.
         Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
         partitions within each table.
       </p>

     </div>


     <p class="p">
       The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
       statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
       setting:
     </p>


 <pre class="pre codeblock"><code>
 -- Explicitly setting MT_DOP to 0 selects the old code path.
 set mt_dop = 0;
 MT_DOP set to 0

 -- The analysis for the billion rows is distributed among hosts,
 -- but uses only a single core on each host.
 compute stats billion_rows_parquet;
 +-----------------------------------------+
 | summary                                 |
 +-----------------------------------------+
 | Updated 1 partition(s) and 2 column(s). |
 +-----------------------------------------+

 drop stats billion_rows_parquet;

 -- Using 4 logical processors per host is faster.
 set mt_dop = 4;
 MT_DOP set to 4

 compute stats billion_rows_parquet;
 +-----------------------------------------+
 | summary                                 |
 +-----------------------------------------+
 | Updated 1 partition(s) and 2 column(s). |
 +-----------------------------------------+

 drop stats billion_rows_parquet;

 -- Unsetting the option reverts back to its default.
 -- Which for COMPUTE STATS and a Parquet table is 4,
 -- so again it uses the fast path.
 unset MT_DOP;
 Unsetting option MT_DOP

 compute stats billion_rows_parquet;
 +-----------------------------------------+
 | summary                                 |
 +-----------------------------------------+
 | Updated 1 partition(s) and 2 column(s). |
 +-----------------------------------------+

 </code></pre>

     <p class="p">
       The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
       for a query on a Parquet table:
     </p>


 <pre class="pre codeblock"><code>
 set mt_dop = 0;
 MT_DOP set to 0

 -- COUNT(DISTINCT) for a unique column is CPU-intensive.
 select count(distinct id) from billion_rows_parquet;
 +--------------------+
 | count(distinct id) |
 +--------------------+
 | 1000000000         |
 +--------------------+
 Fetched 1 row(s) in 67.20s

 set mt_dop = 16;
 MT_DOP set to 16

 -- Introducing more intra-node parallelism for the aggregation
 -- speeds things up, and potentially reduces memory overhead by
 -- reducing the number of scanner threads.
 select count(distinct id) from billion_rows_parquet;
 +--------------------+
 | count(distinct id) |
 +--------------------+
 | 1000000000         |
 +--------------------+
 Fetched 1 row(s) in 17.19s

 </code></pre>

     <p class="p">
       The following example shows how queries that are not compatible with non-zero
       <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
       is set:
     </p>


 <pre class="pre codeblock"><code>
 set mt_dop=1;
 MT_DOP set to 1

 insert into a1
 select * from a2;
 ERROR: NotImplementedException: MT_DOP not supported for DML statements.

 </code></pre>

     <p class="p">
         <strong class="ph b">Related information:</strong>
       </p>

     <p class="p">
       <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
       <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
     </p>


   </div>

 <div class="related-links">
 <div class="familylinks">
 <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div>
 </div>
 </div></body>
 </html>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

	<meta name="copyright" content="(C) Copyright 2023" />
	<meta name="DC.rights.owner" content="(C) Copyright 2023" />
	<meta name="DC.Type" content="concept" />
	<meta name="DC.Title" content="MT_DOP Query Option" />
	<meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" />
	<meta name="prodname" content="Impala" />
	<meta name="prodname" content="Impala" />
	<meta name="version" content="Impala 3.4.x" />
	<meta name="version" content="Impala 3.4.x" />
	<meta name="DC.Format" content="XHTML" />
	<meta name="DC.Identifier" content="mt_dop" />
	<link rel="stylesheet" type="text/css" href="../commonltr.css" />
	<title>MT_DOP Query Option</title>
	</head>
	<body id="mt_dop">


	<h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>




	<div class="body conbody">

	<p class="p">

	Sets the degree of intra-node parallelism used for certain operations that
	can benefit from multithreaded execution. You can specify values
	higher than zero to find the ideal balance of response time,
	memory usage, and CPU usage during statement processing.
	</p>


	<div class="note note"><span class="notetitle">Note:</span>
	<p class="p">
	The Impala execution engine is being revamped incrementally to add
	additional parallelism within a single host for certain statements and
	kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
	<span class="q">"old"</span> code path with limited intra-node parallelism.
	</p>


	<p class="p">
	Currently, <code class="ph codeph">MT_DOP</code> support varies by statement type:
	</p>

	<ul class="ul">
	<li class="li">
	<p class="p">
	<code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
	<code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
	<code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
	</p>

	</li>

	<li class="li">
	<p class="p">
	<code class="ph codeph">SELECT</code> statements. <code class="ph codeph">MT_DOP</code> is 0 by default
	for <code class="ph codeph">SELECT</code> statements but can be set to a value greater
	than 0 to control intra-node parallelism. This may be useful to tune
	query performance and in particular to reduce execution time of
	long-running, CPU-intensive queries.
	</p>

	</li>

	<li class="li">
	<p class="p">
	<code class="ph codeph">DML</code> statements. <code class="ph codeph">MT_DOP</code> values greater
	than zero are not currently supported for DML statements. DML statements
	will produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero value.
	</p>

	</li>

	<li class="li">
	<p class="p">
	In <span class="keyword">Impala 3.4</span> and earlier, not all <code class="ph codeph">SELECT</code>
	statements support setting <code class="ph codeph">MT_DOP</code>. Specifically, only
	scan and aggregation operators, and
	local joins that do not need data exchanges (such as for nested types) are
	supported. Other <code class="ph codeph">SELECT</code> statements produce an error if
	<code class="ph codeph">MT_DOP</code> is set to a non-zero value.
	</p>

	</li>

	</ul>


	</div>


	<p class="p">
	<strong class="ph b">Type:</strong> integer
	</p>

	<p class="p">
	<strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
	</p>

	<p class="p">
	Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
	statements for Parquet tables benefit substantially from extra intra-node
	parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
	for Parquet tables.
	</p>

	<p class="p">
	<strong class="ph b">Range:</strong> 0 to 64
	</p>


	<p class="p">
	<strong class="ph b">Examples:</strong>
	</p>


	<div class="note note"><span class="notetitle">Note:</span>
	<p class="p">
	Any timing figures in the following examples are on a small, lightly loaded development cluster.
	Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
	partitions within each table.
	</p>

	</div>


	<p class="p">
	The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
	statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
	setting:
	</p>


	<pre class="pre codeblock"><code>
	-- Explicitly setting MT_DOP to 0 selects the old code path.
	set mt_dop = 0;
	MT_DOP set to 0

	-- The analysis for the billion rows is distributed among hosts,
	-- but uses only a single core on each host.
	compute stats billion_rows_parquet;
	+-----------------------------------------+
	\| summary \|
	+-----------------------------------------+
	\| Updated 1 partition(s) and 2 column(s). \|
	+-----------------------------------------+

	drop stats billion_rows_parquet;

	-- Using 4 logical processors per host is faster.
	set mt_dop = 4;
	MT_DOP set to 4

	compute stats billion_rows_parquet;
	+-----------------------------------------+
	\| summary \|
	+-----------------------------------------+
	\| Updated 1 partition(s) and 2 column(s). \|
	+-----------------------------------------+

	drop stats billion_rows_parquet;

	-- Unsetting the option reverts back to its default.
	-- Which for COMPUTE STATS and a Parquet table is 4,
	-- so again it uses the fast path.
	unset MT_DOP;
	Unsetting option MT_DOP

	compute stats billion_rows_parquet;
	+-----------------------------------------+
	\| summary \|
	+-----------------------------------------+
	\| Updated 1 partition(s) and 2 column(s). \|
	+-----------------------------------------+

	</code></pre>

	<p class="p">
	The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
	for a query on a Parquet table:
	</p>


	<pre class="pre codeblock"><code>
	set mt_dop = 0;
	MT_DOP set to 0

	-- COUNT(DISTINCT) for a unique column is CPU-intensive.
	select count(distinct id) from billion_rows_parquet;
	+--------------------+
	\| count(distinct id) \|
	+--------------------+
	\| 1000000000 \|
	+--------------------+
	Fetched 1 row(s) in 67.20s

	set mt_dop = 16;
	MT_DOP set to 16

	-- Introducing more intra-node parallelism for the aggregation
	-- speeds things up, and potentially reduces memory overhead by
	-- reducing the number of scanner threads.
	select count(distinct id) from billion_rows_parquet;
	+--------------------+
	\| count(distinct id) \|
	+--------------------+
	\| 1000000000 \|
	+--------------------+
	Fetched 1 row(s) in 17.19s

	</code></pre>

	<p class="p">
	The following example shows how queries that are not compatible with non-zero
	<code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
	is set:
	</p>


	<pre class="pre codeblock"><code>
	set mt_dop=1;
	MT_DOP set to 1

	insert into a1
	select * from a2;
	ERROR: NotImplementedException: MT_DOP not supported for DML statements.

	</code></pre>

	<p class="p">
	<strong class="ph b">Related information:</strong>
	</p>

	<p class="p">
	<a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
	<a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
	</p>


	</div>

	<div class="related-links">
	<div class="familylinks">
	<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div>
	</div>
	</div></body>
	</html>