docs/topics/impala_hints.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="hints">

   <title>Query Hints in Impala SELECT Statements</title>
   <titlealts audience="PDF"><navtitle>Hints</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="SQL"/>
       <data name="Category" value="Querying"/>
       <data name="Category" value="Performance"/>
       <data name="Category" value="Troubleshooting"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <p>
       <indexterm audience="hidden">hints</indexterm>
       The Impala SQL dialect supports query hints, for fine-tuning the inner workings of queries. Specify hints as
       a temporary workaround for expensive queries, where missing statistics or other factors cause inefficient
       performance.
     </p>

     <p>
       Hints are most often used for the most resource-intensive kinds of Impala queries:
     </p>

     <ul>
       <li>
         Join queries involving large tables, where intermediate result sets are transmitted across the network to
         evaluate the join conditions.
       </li>

       <li>
         Inserting into partitioned Parquet tables, where many memory buffers could be allocated on each host to
         hold intermediate results for each partition.
       </li>
     </ul>

     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>

     <p>
       You can also represent the hints as keywords surrounded by <codeph>[]</codeph>
       square brackets; include the brackets in the text of the SQL statement.
       <note conref="../shared/impala_common.xml#common/square_bracket_hint_caveat"/>
     </p>

 <codeblock>SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
 <varname>join_left_hand_table</varname>
   JOIN [{ /* +BROADCAST */ | /* +SHUFFLE */ }]
 <varname>join_right_hand_table</varname>
 <varname>remainder_of_query</varname>;

 INSERT <varname>insert_clauses</varname>
   [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
   [<ph rev="IMPALA-2522 2.8.0">/* +CLUSTERED */</ph>]
   SELECT <varname>remainder_of_query</varname>;
 </codeblock>

     <p rev="2.0.0">
       In <keyword keyref="impala20_full"/> and higher, you can also specify the hints inside comments that use
       either the <codeph>/* */</codeph> or <codeph>--</codeph> notation. Specify a <codeph>+</codeph> symbol
       immediately before the hint name. Recently added hints are only available using the <codeph>/* */</codeph>
       and <codeph>--</codeph> notation.
       For clarity, the <codeph>/* */</codeph> and <codeph>--</codeph> styles
       are used in the syntax and examples throughout this section.
       With the <codeph>/* */</codeph> or <codeph>--</codeph> notation for
       hints, specify a <codeph>+</codeph> symbol immediately before the first hint name.
       Multiple hints can be specified separated by commas, for example
       <codeph>/* +clustered,shuffle */</codeph>
     </p>

 <codeblock rev="2.0.0">SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
 <varname>join_left_hand_table</varname>
   JOIN /* +BROADCAST|SHUFFLE */
 <varname>join_right_hand_table</varname>
 <varname>remainder_of_query</varname>;

 SELECT <varname>select_list</varname> FROM
 <varname>join_left_hand_table</varname>
   JOIN -- +BROADCAST|SHUFFLE
 <varname>join_right_hand_table</varname>
 <varname>remainder_of_query</varname>;

 INSERT <varname>insert_clauses</varname>
   /* +SHUFFLE|NOSHUFFLE */
   SELECT <varname>remainder_of_query</varname>;

 INSERT <varname>insert_clauses</varname>
   -- +SHUFFLE|NOSHUFFLE
   SELECT <varname>remainder_of_query</varname>;

 <ph rev="IMPALA-2924">SELECT <varname>select_list</varname> FROM
 <varname>table_ref</varname>
   /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE}
     [,RANDOM_REPLICA] */
 <varname>remainder_of_query</varname>;</ph>

 <ph rev="IMPALA-2522 2.8.0">INSERT <varname>insert_clauses</varname>
   -- +CLUSTERED
   SELECT <varname>remainder_of_query</varname>;

 INSERT <varname>insert_clauses</varname>
   /* +CLUSTERED */
   SELECT <varname>remainder_of_query</varname>;</ph>
 </codeblock>

     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

     <p>
       With both forms of hint syntax, include the <codeph>STRAIGHT_JOIN</codeph>
       keyword immediately after the <codeph>SELECT</codeph> keyword to prevent Impala from
       reordering the tables in a way that makes the join-related hints ineffective.
     </p>

     <p>
       To reduce the need to use hints, run the <codeph>COMPUTE STATS</codeph> statement against all tables involved
       in joins, or used as the source tables for <codeph>INSERT ... SELECT</codeph> operations where the
       destination is a partitioned Parquet table. Do this operation after loading data or making substantial
       changes to the data within each table. Having up-to-date statistics helps Impala choose more efficient query
       plans without the need for hinting. See <xref href="impala_perf_stats.xml#perf_stats"/> for details and
       examples.
     </p>

     <p>
       To see which join strategy is used for a particular query, examine the <codeph>EXPLAIN</codeph> output for
       that query. See <xref href="impala_explain_plan.xml#perf_explain"/> for details and examples.
     </p>

     <p>
       <b>Hints for join queries:</b>
     </p>

     <p>
       The <codeph>/* +BROADCAST */</codeph> and <codeph>/* +SHUFFLE */</codeph> hints control the execution strategy for join
       queries. Specify one of the following constructs immediately after the <codeph>JOIN</codeph> keyword in a
       query:
     </p>

     <ul>
       <li>
         <codeph>/* +SHUFFLE */</codeph> - Makes that join operation use the <q>partitioned</q> technique, which divides
         up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other
         nodes for processing. (The keyword <codeph>SHUFFLE</codeph> is used to indicate a <q>partitioned join</q>,
         because that type of join is not related to <q>partitioned tables</q>.) Since the alternative
         <q>broadcast</q> join mechanism is the default when table and index statistics are unavailable, you might
         use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more
         efficient for joins between large tables of similar size.
       </li>

       <li>
         <codeph>/* +BROADCAST */</codeph> - Makes that join operation use the <q>broadcast</q> technique that sends the
         entire contents of the right-hand table to all nodes involved in processing the join. This is the default
         mode of operation when table and index statistics are unavailable, so you would typically only need it if
         stale metadata caused Impala to mistakenly choose a partitioned join operation. Typically, broadcast joins
         are more efficient in cases where one table is much smaller than the other. (Put the smaller table on the
         right side of the <codeph>JOIN</codeph> operator.)
       </li>
     </ul>

     <p>
       <b>Hints for INSERT ... SELECT queries:</b>
     </p>

     <p conref="../shared/impala_common.xml#common/insert_hints"/>

     <p rev="IMPALA-2924">
       <b>Hints for scheduling of HDFS blocks:</b>
     </p>

     <p rev="IMPALA-2924">
       The hints <codeph>/* +SCHEDULE_CACHE_LOCAL */</codeph>,
       <codeph>/* +SCHEDULE_DISK_LOCAL */</codeph>, and
       <codeph>/* +SCHEDULE_REMOTE */</codeph> have the same effect
       as specifying the <codeph>REPLICA_PREFERENCE</codeph> query
       option with the respective option settings of <codeph>CACHE_LOCAL</codeph>,
       <codeph>DISK_LOCAL</codeph>, or <codeph>REMOTE</codeph>.
       The hint <codeph>/* +RANDOM_REPLICA */</codeph> is the same as
       enabling the <codeph>SCHEDULE_RANDOM_REPLICA</codeph> query option.
     </p>

     <p rev="IMPALA-2924">
       You can use these hints in combination by separating them with commas,
       for example, <codeph>/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</codeph>.
       See <xref keyref="replica_preference"/> and
       <xref keyref="schedule_random_replica"/> for information about how
       these settings influence the way Impala processes HDFS data blocks.
     </p>

     <p rev="IMPALA-2924">
       Specifying the replica preference as a query hint always overrides the
       query option setting. Specifying either the <codeph>SCHEDULE_RANDOM_REPLICA</codeph>
       query option or the corresponding <codeph>RANDOM_REPLICA</codeph> query hint
       enables the random tie-breaking behavior when processing data blocks
       during the query.
     </p>

     <p>
       <b>Suggestions versus directives:</b>
     </p>

     <p>
       In early Impala releases, hints were always obeyed and so acted more like directives. Once Impala gained join
       order optimizations, sometimes join queries were automatically reordered in a way that made a hint
       irrelevant. Therefore, the hints act more like suggestions in Impala 1.2.2 and higher.
     </p>

     <p>
       To force Impala to follow the hinted execution mechanism for a join query, include the
       <codeph>STRAIGHT_JOIN</codeph> keyword in the <codeph>SELECT</codeph> statement. See
       <xref href="impala_perf_joins.xml#straight_join"/> for details. When you use this technique, Impala does not
       reorder the joined tables at all, so you must be careful to arrange the join order to put the largest table
       (or subquery result set) first, then the smallest, second smallest, third smallest, and so on. This ordering lets Impala do the
       most I/O-intensive parts of the query using local reads on the DataNodes, and then reduce the size of the
       intermediate result set as much as possible as each subsequent table or subquery result set is joined.
     </p>

     <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>

     <p>
       Queries that include subqueries in the <codeph>WHERE</codeph> clause can be rewritten internally as join
       queries. Currently, you cannot apply hints to the joins produced by these types of queries.
     </p>

     <p>
       Because hints can prevent queries from taking advantage of new metadata or improvements in query planning,
       use them only when required to work around performance issues, and be prepared to remove them when they are
       no longer required, such as after a new Impala release or bug fix.
     </p>

     <p>
       In particular, the <codeph>/* +BROADCAST */</codeph> and <codeph>/* +SHUFFLE */</codeph> hints are expected to be
       needed much less frequently in Impala 1.2.2 and higher, because the join order optimization feature in
       combination with the <codeph>COMPUTE STATS</codeph> statement now automatically choose join order and join
       mechanism without the need to rewrite the query and add hints. See
       <xref href="impala_perf_joins.xml#perf_joins"/> for details.
     </p>

     <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>

     <p rev="2.0.0">
       The hints embedded within <codeph>--</codeph> comments are compatible with Hive queries. The hints embedded
       within <codeph>/* */</codeph> comments or <codeph>[ ]</codeph> square brackets are not recognized by or not
       compatible with Hive. For example, Hive raises an error for Impala hints within <codeph>/* */</codeph>
       comments because it does not recognize the Impala hint names.
     </p>

     <p conref="../shared/impala_common.xml#common/view_blurb"/>

     <p rev="2.0.0">
       If you use a hint in the query that defines a view, the hint is preserved when you query the view. Impala
       internally rewrites all hints in views to use the <codeph>--</codeph> comment notation, so that Hive can
       query such views without errors due to unrecognized hint names.
     </p>

     <p conref="../shared/impala_common.xml#common/example_blurb"/>

     <p>
       For example, this query joins a large customer table with a small lookup table of less than 100 rows. The
       right-hand table can be broadcast efficiently to all nodes involved in the join. Thus, you would use the
       <codeph>/* +broadcast */</codeph> hint to force a broadcast join strategy:
     </p>

 <codeblock>select straight_join customer.address, state_lookup.state_name
   from customer join <b>/* +broadcast */</b> state_lookup
   on customer.state_id = state_lookup.state_id;</codeblock>

     <p>
       This query joins two large tables of unpredictable size. You might benchmark the query with both kinds of
       hints and find that it is more efficient to transmit portions of each table to other nodes for processing.
       Thus, you would use the <codeph>/* +shuffle */</codeph> hint to force a partitioned join strategy:
     </p>

 <codeblock>select straight_join weather.wind_velocity, geospatial.altitude
   from weather join <b>/* +shuffle */</b> geospatial
   on weather.lat = geospatial.lat and weather.long = geospatial.long;</codeblock>

     <p>
       For joins involving three or more tables, the hint applies to the tables on either side of that specific
       <codeph>JOIN</codeph> keyword. The <codeph>STRAIGHT_JOIN</codeph> keyword ensures that joins are processed
       in a predictable order from left to right. For example, this query joins
       <codeph>t1</codeph> and <codeph>t2</codeph> using a partitioned join, then joins that result set to
       <codeph>t3</codeph> using a broadcast join:
     </p>

 <codeblock>select straight_join t1.name, t2.id, t3.price
   from t1 join <b>/* +shuffle */</b> t2 join <b>/* +broadcast */</b> t3
   on t1.id = t2.id and t2.id = t3.id;</codeblock>

     <!-- To do: This is a good place to add more sample output showing before and after EXPLAIN plans. -->

     <p conref="../shared/impala_common.xml#common/related_info"/>

     <p>
       For more background information about join queries, see <xref href="impala_joins.xml#joins"/>. For
       performance considerations, see <xref href="impala_perf_joins.xml#perf_joins"/>.
     </p>
   </conbody>
 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept id="hints">

	<title>Query Hints in Impala SELECT Statements</title>
	<titlealts audience="PDF"><navtitle>Hints</navtitle></titlealts>
	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="SQL"/>
	<data name="Category" value="Querying"/>
	<data name="Category" value="Performance"/>
	<data name="Category" value="Troubleshooting"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Data Analysts"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	<indexterm audience="hidden">hints</indexterm>
	The Impala SQL dialect supports query hints, for fine-tuning the inner workings of queries. Specify hints as
	a temporary workaround for expensive queries, where missing statistics or other factors cause inefficient
	performance.
	</p>

	<p>
	Hints are most often used for the most resource-intensive kinds of Impala queries:
	</p>

	<ul>
	<li>
	Join queries involving large tables, where intermediate result sets are transmitted across the network to
	evaluate the join conditions.
	</li>

	<li>
	Inserting into partitioned Parquet tables, where many memory buffers could be allocated on each host to
	hold intermediate results for each partition.
	</li>
	</ul>

	<p conref="../shared/impala_common.xml#common/syntax_blurb"/>

	<p>
	You can also represent the hints as keywords surrounded by <codeph>[]</codeph>
	square brackets; include the brackets in the text of the SQL statement.
	<note conref="../shared/impala_common.xml#common/square_bracket_hint_caveat"/>
	</p>

	<codeblock>SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
	<varname>join_left_hand_table</varname>
	JOIN [{ /* +BROADCAST / \| / +SHUFFLE */ }]
	<varname>join_right_hand_table</varname>
	<varname>remainder_of_query</varname>;

	INSERT <varname>insert_clauses</varname>
	[{ /* +SHUFFLE / \| / +NOSHUFFLE */ }]
	[<ph rev="IMPALA-2522 2.8.0">/* +CLUSTERED */</ph>]
	SELECT <varname>remainder_of_query</varname>;
	</codeblock>

	<p rev="2.0.0">
	In <keyword keyref="impala20_full"/> and higher, you can also specify the hints inside comments that use
	either the <codeph>/* */</codeph> or <codeph>--</codeph> notation. Specify a <codeph>+</codeph> symbol
	immediately before the hint name. Recently added hints are only available using the <codeph>/* */</codeph>
	and <codeph>--</codeph> notation.
	For clarity, the <codeph>/* */</codeph> and <codeph>--</codeph> styles
	are used in the syntax and examples throughout this section.
	With the <codeph>/* */</codeph> or <codeph>--</codeph> notation for
	hints, specify a <codeph>+</codeph> symbol immediately before the first hint name.
	Multiple hints can be specified separated by commas, for example
	<codeph>/* +clustered,shuffle */</codeph>
	</p>

	<codeblock rev="2.0.0">SELECT STRAIGHT_JOIN <varname>select_list</varname> FROM
	<varname>join_left_hand_table</varname>
	JOIN /* +BROADCAST\|SHUFFLE */
	<varname>join_right_hand_table</varname>
	<varname>remainder_of_query</varname>;

	SELECT <varname>select_list</varname> FROM
	<varname>join_left_hand_table</varname>
	JOIN -- +BROADCAST\|SHUFFLE
	<varname>join_right_hand_table</varname>
	<varname>remainder_of_query</varname>;

	INSERT <varname>insert_clauses</varname>
	/* +SHUFFLE\|NOSHUFFLE */
	SELECT <varname>remainder_of_query</varname>;

	INSERT <varname>insert_clauses</varname>
	-- +SHUFFLE\|NOSHUFFLE
	SELECT <varname>remainder_of_query</varname>;

	<ph rev="IMPALA-2924">SELECT <varname>select_list</varname> FROM
	<varname>table_ref</varname>
	/* +{SCHEDULE_CACHE_LOCAL \| SCHEDULE_DISK_LOCAL \| SCHEDULE_REMOTE}
	[,RANDOM_REPLICA] */
	<varname>remainder_of_query</varname>;</ph>

	<ph rev="IMPALA-2522 2.8.0">INSERT <varname>insert_clauses</varname>
	-- +CLUSTERED
	SELECT <varname>remainder_of_query</varname>;

	INSERT <varname>insert_clauses</varname>
	/* +CLUSTERED */
	SELECT <varname>remainder_of_query</varname>;</ph>
	</codeblock>

	<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

	<p>
	With both forms of hint syntax, include the <codeph>STRAIGHT_JOIN</codeph>
	keyword immediately after the <codeph>SELECT</codeph> keyword to prevent Impala from
	reordering the tables in a way that makes the join-related hints ineffective.
	</p>

	<p>
	To reduce the need to use hints, run the <codeph>COMPUTE STATS</codeph> statement against all tables involved
	in joins, or used as the source tables for <codeph>INSERT ... SELECT</codeph> operations where the
	destination is a partitioned Parquet table. Do this operation after loading data or making substantial
	changes to the data within each table. Having up-to-date statistics helps Impala choose more efficient query
	plans without the need for hinting. See <xref href="impala_perf_stats.xml#perf_stats"/> for details and
	examples.
	</p>

	<p>
	To see which join strategy is used for a particular query, examine the <codeph>EXPLAIN</codeph> output for
	that query. See <xref href="impala_explain_plan.xml#perf_explain"/> for details and examples.
	</p>

	<p>
	<b>Hints for join queries:</b>
	</p>

	<p>
	The <codeph>/* +BROADCAST /</codeph> and <codeph>/ +SHUFFLE */</codeph> hints control the execution strategy for join
	queries. Specify one of the following constructs immediately after the <codeph>JOIN</codeph> keyword in a
	query:
	</p>

	<ul>
	<li>
	<codeph>/* +SHUFFLE */</codeph> - Makes that join operation use the <q>partitioned</q> technique, which divides
	up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other
	nodes for processing. (The keyword <codeph>SHUFFLE</codeph> is used to indicate a <q>partitioned join</q>,
	because that type of join is not related to <q>partitioned tables</q>.) Since the alternative
	<q>broadcast</q> join mechanism is the default when table and index statistics are unavailable, you might
	use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more
	efficient for joins between large tables of similar size.
	</li>

	<li>
	<codeph>/* +BROADCAST */</codeph> - Makes that join operation use the <q>broadcast</q> technique that sends the
	entire contents of the right-hand table to all nodes involved in processing the join. This is the default
	mode of operation when table and index statistics are unavailable, so you would typically only need it if
	stale metadata caused Impala to mistakenly choose a partitioned join operation. Typically, broadcast joins
	are more efficient in cases where one table is much smaller than the other. (Put the smaller table on the
	right side of the <codeph>JOIN</codeph> operator.)
	</li>
	</ul>

	<p>
	<b>Hints for INSERT ... SELECT queries:</b>
	</p>

	<p conref="../shared/impala_common.xml#common/insert_hints"/>

	<p rev="IMPALA-2924">
	<b>Hints for scheduling of HDFS blocks:</b>
	</p>

	<p rev="IMPALA-2924">
	The hints <codeph>/* +SCHEDULE_CACHE_LOCAL */</codeph>,
	<codeph>/* +SCHEDULE_DISK_LOCAL */</codeph>, and
	<codeph>/* +SCHEDULE_REMOTE */</codeph> have the same effect
	as specifying the <codeph>REPLICA_PREFERENCE</codeph> query
	option with the respective option settings of <codeph>CACHE_LOCAL</codeph>,
	<codeph>DISK_LOCAL</codeph>, or <codeph>REMOTE</codeph>.
	The hint <codeph>/* +RANDOM_REPLICA */</codeph> is the same as
	enabling the <codeph>SCHEDULE_RANDOM_REPLICA</codeph> query option.
	</p>

	<p rev="IMPALA-2924">
	You can use these hints in combination by separating them with commas,
	for example, <codeph>/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</codeph>.
	See <xref keyref="replica_preference"/> and
	<xref keyref="schedule_random_replica"/> for information about how
	these settings influence the way Impala processes HDFS data blocks.
	</p>

	<p rev="IMPALA-2924">
	Specifying the replica preference as a query hint always overrides the
	query option setting. Specifying either the <codeph>SCHEDULE_RANDOM_REPLICA</codeph>
	query option or the corresponding <codeph>RANDOM_REPLICA</codeph> query hint
	enables the random tie-breaking behavior when processing data blocks
	during the query.
	</p>

	<p>
	<b>Suggestions versus directives:</b>
	</p>

	<p>
	In early Impala releases, hints were always obeyed and so acted more like directives. Once Impala gained join
	order optimizations, sometimes join queries were automatically reordered in a way that made a hint
	irrelevant. Therefore, the hints act more like suggestions in Impala 1.2.2 and higher.
	</p>

	<p>
	To force Impala to follow the hinted execution mechanism for a join query, include the
	<codeph>STRAIGHT_JOIN</codeph> keyword in the <codeph>SELECT</codeph> statement. See
	<xref href="impala_perf_joins.xml#straight_join"/> for details. When you use this technique, Impala does not
	reorder the joined tables at all, so you must be careful to arrange the join order to put the largest table
	(or subquery result set) first, then the smallest, second smallest, third smallest, and so on. This ordering lets Impala do the
	most I/O-intensive parts of the query using local reads on the DataNodes, and then reduce the size of the
	intermediate result set as much as possible as each subsequent table or subquery result set is joined.
	</p>

	<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>

	<p>
	Queries that include subqueries in the <codeph>WHERE</codeph> clause can be rewritten internally as join
	queries. Currently, you cannot apply hints to the joins produced by these types of queries.
	</p>

	<p>
	Because hints can prevent queries from taking advantage of new metadata or improvements in query planning,
	use them only when required to work around performance issues, and be prepared to remove them when they are
	no longer required, such as after a new Impala release or bug fix.
	</p>

	<p>
	In particular, the <codeph>/* +BROADCAST /</codeph> and <codeph>/ +SHUFFLE */</codeph> hints are expected to be
	needed much less frequently in Impala 1.2.2 and higher, because the join order optimization feature in
	combination with the <codeph>COMPUTE STATS</codeph> statement now automatically choose join order and join
	mechanism without the need to rewrite the query and add hints. See
	<xref href="impala_perf_joins.xml#perf_joins"/> for details.
	</p>

	<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>

	<p rev="2.0.0">
	The hints embedded within <codeph>--</codeph> comments are compatible with Hive queries. The hints embedded
	within <codeph>/* */</codeph> comments or <codeph>[ ]</codeph> square brackets are not recognized by or not
	compatible with Hive. For example, Hive raises an error for Impala hints within <codeph>/* */</codeph>
	comments because it does not recognize the Impala hint names.
	</p>

	<p conref="../shared/impala_common.xml#common/view_blurb"/>

	<p rev="2.0.0">
	If you use a hint in the query that defines a view, the hint is preserved when you query the view. Impala
	internally rewrites all hints in views to use the <codeph>--</codeph> comment notation, so that Hive can
	query such views without errors due to unrecognized hint names.
	</p>

	<p conref="../shared/impala_common.xml#common/example_blurb"/>

	<p>
	For example, this query joins a large customer table with a small lookup table of less than 100 rows. The
	right-hand table can be broadcast efficiently to all nodes involved in the join. Thus, you would use the
	<codeph>/* +broadcast */</codeph> hint to force a broadcast join strategy:
	</p>

	<codeblock>select straight_join customer.address, state_lookup.state_name
	from customer join <b>/* +broadcast */</b> state_lookup
	on customer.state_id = state_lookup.state_id;</codeblock>

	<p>
	This query joins two large tables of unpredictable size. You might benchmark the query with both kinds of
	hints and find that it is more efficient to transmit portions of each table to other nodes for processing.
	Thus, you would use the <codeph>/* +shuffle */</codeph> hint to force a partitioned join strategy:
	</p>

	<codeblock>select straight_join weather.wind_velocity, geospatial.altitude
	from weather join <b>/* +shuffle */</b> geospatial
	on weather.lat = geospatial.lat and weather.long = geospatial.long;</codeblock>

	<p>
	For joins involving three or more tables, the hint applies to the tables on either side of that specific
	<codeph>JOIN</codeph> keyword. The <codeph>STRAIGHT_JOIN</codeph> keyword ensures that joins are processed
	in a predictable order from left to right. For example, this query joins
	<codeph>t1</codeph> and <codeph>t2</codeph> using a partitioned join, then joins that result set to
	<codeph>t3</codeph> using a broadcast join:
	</p>

	<codeblock>select straight_join t1.name, t2.id, t3.price
	from t1 join <b>/* +shuffle /</b> t2 join <b>/ +broadcast */</b> t3
	on t1.id = t2.id and t2.id = t3.id;</codeblock>

	<!-- To do: This is a good place to add more sample output showing before and after EXPLAIN plans. -->

	<p conref="../shared/impala_common.xml#common/related_info"/>

	<p>
	For more background information about join queries, see <xref href="impala_joins.xml#joins"/>. For
	performance considerations, see <xref href="impala_perf_joins.xml#perf_joins"/>.
	</p>
	</conbody>
	</concept>