docs/topics/impala_disable_codegen_rows_threshold.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept rev="2.10.0" id="disable_codegen_rows_threshold">

   <title>DISABLE_CODEGEN_ROWS_THRESHOLD Query Option (<keyword keyref="impala210_full"/> or higher only)</title>
   <titlealts audience="PDF"><navtitle>DISABLE_CODEGEN_ROWS_THRESHOLD</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Impala Query Options"/>
       <data name="Category" value="Scalability"/>
       <data name="Category" value="Performance"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <p rev="2.0.0"> This setting controls the cutoff point (in terms of number
       of rows processed per Impala daemon) below which Impala disables native
       code generation for the whole query. Native code generation is very
       beneficial for queries that process many rows because it reduces the time
       taken to process of each row. However, generating the native code adds
       latency to query startup. Therefore, automatically disabling codegen for
       queries that process relatively small amounts of data can improve query
       response time. </p>

     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>

 <codeblock>SET DISABLE_CODEGEN_ROWS_THRESHOLD=<varname>number_of_rows</varname></codeblock>

     <p>
       <b>Type:</b> numeric
     </p>

     <p>
       <b>Default:</b> 50000
     </p>

     <p>
       <b>Usage notes:</b> Typically, you increase the default value to make this optimization apply to more queries.
       If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to
       queries that actually involve substantial work, you might see the queries being slower as a result of codegen
       being disabled. In that case, recompute statistics with the <codeph>COMPUTE STATS</codeph> or
       <codeph>COMPUTE INCREMENTAL STATS</codeph> statement. If there is a problem collecting accurate statistics,
       you can turn this feature off by setting the value to 0.
     </p>

     <p conref="../shared/impala_common.xml#common/internals_blurb"/>

     <p>
       This setting applies to queries where the number of rows processed can be accurately
       determined, either through table and column statistics, or by the presence of a
       <codeph>LIMIT</codeph> clause. If Impala cannot accurately estimate the number of rows,
       then this setting does not apply.
     </p>

     <p rev="2.3.0">
       If a query uses the complex data types <codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>,
       or <codeph>MAP</codeph>, then codegen is never automatically disabled regardless of the
       <codeph>DISABLE_CODEGEN_ROWS_THRESHOLD</codeph> setting.
     </p>

     <p conref="../shared/impala_common.xml#common/added_in_2100"/>

 <!-- Don't have any other places that tie into this particular optimization technique yet.
 Potentially: conceptual topics about code generation, distributed queries

 <p conref="../shared/impala_common.xml#common/related_info"/>
 <p>
 </p>
 -->

   </conbody>

 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept rev="2.10.0" id="disable_codegen_rows_threshold">

	<title>DISABLE_CODEGEN_ROWS_THRESHOLD Query Option (<keyword keyref="impala210_full"/> or higher only)</title>
	<titlealts audience="PDF"><navtitle>DISABLE_CODEGEN_ROWS_THRESHOLD</navtitle></titlealts>
	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="Impala Query Options"/>
	<data name="Category" value="Scalability"/>
	<data name="Category" value="Performance"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Data Analysts"/>
	</metadata>
	</prolog>

	<conbody>

	<p rev="2.0.0"> This setting controls the cutoff point (in terms of number
	of rows processed per Impala daemon) below which Impala disables native
	code generation for the whole query. Native code generation is very
	beneficial for queries that process many rows because it reduces the time
	taken to process of each row. However, generating the native code adds
	latency to query startup. Therefore, automatically disabling codegen for
	queries that process relatively small amounts of data can improve query
	response time. </p>

	<p conref="../shared/impala_common.xml#common/syntax_blurb"/>

	<codeblock>SET DISABLE_CODEGEN_ROWS_THRESHOLD=<varname>number_of_rows</varname></codeblock>

	<p>
	<b>Type:</b> numeric
	</p>

	<p>
	<b>Default:</b> 50000
	</p>

	<p>
	<b>Usage notes:</b> Typically, you increase the default value to make this optimization apply to more queries.
	If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to
	queries that actually involve substantial work, you might see the queries being slower as a result of codegen
	being disabled. In that case, recompute statistics with the <codeph>COMPUTE STATS</codeph> or
	<codeph>COMPUTE INCREMENTAL STATS</codeph> statement. If there is a problem collecting accurate statistics,
	you can turn this feature off by setting the value to 0.
	</p>

	<p conref="../shared/impala_common.xml#common/internals_blurb"/>

	<p>
	This setting applies to queries where the number of rows processed can be accurately
	determined, either through table and column statistics, or by the presence of a
	<codeph>LIMIT</codeph> clause. If Impala cannot accurately estimate the number of rows,
	then this setting does not apply.
	</p>

	<p rev="2.3.0">
	If a query uses the complex data types <codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>,
	or <codeph>MAP</codeph>, then codegen is never automatically disabled regardless of the
	<codeph>DISABLE_CODEGEN_ROWS_THRESHOLD</codeph> setting.
	</p>

	<p conref="../shared/impala_common.xml#common/added_in_2100"/>

	<!-- Don't have any other places that tie into this particular optimization technique yet.
	Potentially: conceptual topics about code generation, distributed queries

	<p conref="../shared/impala_common.xml#common/related_info"/>
	<p>
	</p>
	-->

	</conbody>

	</concept>