| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="joins"> |
| |
| <title>Joins in Impala SELECT Statements</title> |
| <titlealts audience="PDF"><navtitle>Joins</navtitle></titlealts> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Data Analysts"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Querying"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> A join query is a <codeph>SELECT</codeph> statement that combines data |
| from two or more tables, and returns a result set containing items from |
| some or all of those tables. It is a way to cross-reference and correlate |
| related data that is organized into multiple tables, typically using |
| identifiers that are repeated in each of the joined tables. </p> |
| |
| <p conref="../shared/impala_common.xml#common/syntax_blurb"/> |
| |
| <p conref="../shared/impala_common.xml#common/join_types"/> |
| |
| <codeblock>SELECT <varname>select_list</varname> FROM |
| <varname>table_or_subquery1</varname> [INNER] JOIN <varname>table_or_subquery2</varname> | |
| <varname>table_or_subquery1</varname> {LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN <varname>table_or_subquery2</varname> | |
| <varname>table_or_subquery1</varname> {LEFT | RIGHT} SEMI JOIN <varname>table_or_subquery2</varname> | |
| <ph rev="2.0.0"><varname>table_or_subquery1</varname> {LEFT | RIGHT} ANTI JOIN <varname>table_or_subquery2</varname> |</ph> |
| [ ON <varname>col1</varname> = <varname>col2</varname> [AND <varname>col3</varname> = <varname>col4</varname> ...] | |
| USING (<varname>col1</varname> [, <varname>col2</varname> ...]) ] |
| [<varname>other_join_clause</varname> ...] |
| [ WHERE <varname>where_clauses</varname> ] |
| |
| SELECT <varname>select_list</varname> FROM |
| <varname>table_or_subquery1</varname>, <varname>table_or_subquery2</varname> [, <varname>table_or_subquery3</varname> ...] |
| [<varname>other_join_clause</varname> ...] |
| WHERE |
| <varname>col1</varname> = <varname>col2</varname> [AND <varname>col3</varname> = <varname>col4</varname> ...] |
| |
| SELECT <varname>select_list</varname> FROM |
| <varname>table_or_subquery1</varname> CROSS JOIN <varname>table_or_subquery2</varname> |
| [<varname>other_join_clause</varname> ...] |
| [ WHERE <varname>where_clauses</varname> ]</codeblock> |
| |
| <p> |
| <b>SQL-92 and SQL-89 Joins:</b> |
| </p> |
| |
| <p> |
| Queries with the explicit <codeph>JOIN</codeph> keywords are known as SQL-92 style joins, referring to the |
| level of the SQL standard where they were introduced. The corresponding <codeph>ON</codeph> or |
| <codeph>USING</codeph> clauses clearly show which columns are used as the join keys in each case: |
| </p> |
| |
| <codeblock>SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b> |
| <b>ON t1.id = t2.id and t1.type_flag = t2.type_flag</b> |
| WHERE t1.c1 > 100; |
| |
| SELECT t1.c1, t2.c2 FROM <b>t1 JOIN t2</b> |
| <b>USING (id, type_flag)</b> |
| WHERE t1.c1 > 100;</codeblock> |
| |
| <p> |
| The <codeph>ON</codeph> clause is a general way to compare columns across the two tables, even if the column |
| names are different. The <codeph>USING</codeph> clause is a shorthand notation for specifying the join |
| columns, when the column names are the same in both tables. You can code equivalent <codeph>WHERE</codeph> |
| clauses that compare the columns, instead of <codeph>ON</codeph> or <codeph>USING</codeph> clauses, but that |
| practice is not recommended because mixing the join comparisons with other filtering clauses is typically |
| less readable and harder to maintain. |
| </p> |
| |
| <p> |
| Queries with a comma-separated list of tables and subqueries are known as SQL-89 style joins. In these |
| queries, the equality comparisons between columns of the joined tables go in the <codeph>WHERE</codeph> |
| clause alongside other kinds of comparisons. This syntax is easy to learn, but it is also easy to |
| accidentally remove a <codeph>WHERE</codeph> clause needed for the join to work correctly. |
| </p> |
| |
| <codeblock>SELECT t1.c1, t2.c2 FROM <b>t1, t2</b> |
| WHERE |
| <b>t1.id = t2.id AND t1.type_flag = t2.type_flag</b> |
| AND t1.c1 > 100;</codeblock> |
| |
| <p> |
| <b>Self-joins:</b> |
| </p> |
| |
| <p> |
| Impala can do self-joins, for example to join on two different columns in the same table to represent |
| parent-child relationships or other tree-structured data. There is no explicit syntax for this; just use the |
| same table name for both the left-hand and right-hand table, and assign different table aliases to use when |
| referring to the fully qualified column names: |
| </p> |
| |
| <codeblock>-- Combine fields from both parent and child rows. |
| SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;</codeblock> |
| |
| <p> |
| <b>Inner and outer joins:</b> |
| </p> |
| |
| <p> |
| An inner join is the most common and familiar type: rows in the result set contain the requested columns from |
| the appropriate tables, for all combinations of rows where the join columns of the tables have identical |
| values. If a column with the same name occurs in both tables, use a fully qualified name or a column alias to |
| refer to the column in the select list or other clauses. Impala performs inner joins by default for both |
| SQL-89 and SQL-92 join syntax: |
| </p> |
| |
| <codeblock>-- The following 3 forms are all equivalent. |
| SELECT t1.id, c1, c2 FROM t1, t2 WHERE t1.id = t2.id; |
| SELECT t1.id, c1, c2 FROM t1 JOIN t2 ON t1.id = t2.id; |
| SELECT t1.id, c1, c2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;</codeblock> |
| |
| <p> |
| An outer join retrieves all rows from the left-hand table, or the right-hand table, or both; wherever there |
| is no matching data in the table on the other side of the join, the corresponding columns in the result set |
| are set to <codeph>NULL</codeph>. To perform an outer join, include the <codeph>OUTER</codeph> keyword in the |
| join operator, along with either <codeph>LEFT</codeph>, <codeph>RIGHT</codeph>, or <codeph>FULL</codeph>: |
| </p> |
| |
| <codeblock>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id; |
| SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.id = t2.id; |
| SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;</codeblock> |
| |
| <p> |
| For outer joins, Impala requires SQL-92 syntax; that is, the <codeph>JOIN</codeph> keyword instead of |
| comma-separated table names. Impala does not support vendor extensions such as <codeph>(+)</codeph> or |
| <codeph>*=</codeph> notation for doing outer joins with SQL-89 query syntax. |
| </p> |
| |
| <p> |
| <b>Equijoins and Non-Equijoins:</b> |
| </p> |
| |
| <p> |
| By default, Impala requires an equality comparison between the left-hand and right-hand tables, either |
| through <codeph>ON</codeph>, <codeph>USING</codeph>, or <codeph>WHERE</codeph> clauses. These types of |
| queries are classified broadly as equijoins. Inner, outer, full, and semi joins can all be equijoins based on |
| the presence of equality tests between columns in the left-hand and right-hand tables. |
| </p> |
| |
| <p> |
| In Impala 1.2.2 and higher, non-equijoin queries are also possible, with comparisons such as |
| <codeph>!=</codeph> or <codeph><</codeph> between the join columns. These kinds of queries require care to |
| avoid producing huge result sets that could exceed resource limits. Once you have planned a non-equijoin |
| query that produces a result set of acceptable size, you can code the query using the <codeph>CROSS |
| JOIN</codeph> operator, and add the extra comparisons in the <codeph>WHERE</codeph> clause: |
| </p> |
| |
| <codeblock>SELECT * FROM t1 CROSS JOIN t2 WHERE t1.total > t2.maximum_price;</codeblock> |
| |
| <p rev="2.3.0"> |
| In <keyword keyref="impala23_full"/> and higher, additional non-equijoin queries are possible due to the addition |
| of nested loop joins. These queries typically involve <codeph>SEMI JOIN</codeph>, |
| <codeph>ANTI JOIN</codeph>, or <codeph>FULL OUTER JOIN</codeph> clauses. |
| Impala sometimes also uses nested loop joins internally when evaluating <codeph>OUTER JOIN</codeph> |
| queries involving complex type columns. |
| Query phases involving nested loop joins do not use the spill-to-disk mechanism if they |
| exceed the memory limit. Impala decides internally when to use each join mechanism; you cannot |
| specify any query hint to choose between the nested loop join or the original hash join algorithm. |
| </p> |
| |
| <codeblock rev="2.3.0">SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.int_col < t2.int_col;</codeblock> |
| |
| <p> |
| <b>Semi-joins:</b> |
| </p> |
| |
| <p> |
| Semi-joins are a relatively rarely used variation. With the left semi-join, only data from the left-hand |
| table is returned, for rows where there is matching data in the right-hand table, based on comparisons |
| between join columns in <codeph>ON</codeph> or <codeph>WHERE</codeph> clauses. Only one instance of each row |
| from the left-hand table is returned, regardless of how many matching rows exist in the right-hand table. |
| <ph rev="2.0.0">A right semi-join (available in Impala 2.0 and higher) reverses the comparison and returns |
| data from the right-hand table.</ph> |
| </p> |
| |
| <codeblock>SELECT t1.c1, t1.c2, t1.c2 FROM t1 LEFT SEMI JOIN t2 ON t1.id = t2.id;</codeblock> |
| |
| <p> |
| <b>Natural joins (not supported):</b> |
| </p> |
| |
| <p> |
| Impala does not support the <codeph>NATURAL JOIN</codeph> operator, again to avoid inconsistent or huge |
| result sets. Natural joins do away with the <codeph>ON</codeph> and <codeph>USING</codeph> clauses, and |
| instead automatically join on all columns with the same names in the left-hand and right-hand tables. This |
| kind of query is not recommended for rapidly evolving data structures such as are typically used in Hadoop. |
| Thus, Impala does not support the <codeph>NATURAL JOIN</codeph> syntax, which can produce different query |
| results as columns are added to or removed from tables. |
| </p> |
| |
| <p> |
| If you do have any queries that use <codeph>NATURAL JOIN</codeph>, make sure to rewrite them with explicit |
| <codeph>USING</codeph> clauses, because Impala could interpret the <codeph>NATURAL</codeph> keyword as a |
| table alias: |
| </p> |
| |
| <codeblock>-- 'NATURAL' is interpreted as an alias for 't1' and Impala attempts an inner join, |
| -- resulting in an error because inner joins require explicit comparisons between columns. |
| SELECT t1.c1, t2.c2 FROM t1 NATURAL JOIN t2; |
| ERROR: NotImplementedException: Join with 't2' requires at least one conjunctive equality predicate. |
| To perform a Cartesian product between two tables, use a CROSS JOIN. |
| |
| -- If you expect the tables to have identically named columns with matching values, |
| -- list the corresponding column names in a USING clause. |
| SELECT t1.c1, t2.c2 FROM t1 JOIN t2 USING (id, type_flag, name, address);</codeblock> |
| |
| <p rev="2.0.0"> |
| <b>Anti-joins (<keyword keyref="impala20_full"/> and higher only):</b> |
| </p> |
| |
| <p rev="2.0.0"> |
| Impala supports the <codeph>LEFT ANTI JOIN</codeph> and <codeph>RIGHT ANTI JOIN</codeph> clauses in |
| <keyword keyref="impala20"/> and higher. The <codeph>LEFT</codeph> or <codeph>RIGHT</codeph> |
| keyword is required for this kind of join. For <codeph>LEFT ANTI JOIN</codeph>, this clause returns those |
| values from the left-hand table that have no matching value in the right-hand table. <codeph>RIGHT ANTI |
| JOIN</codeph> reverses the comparison and returns values from the right-hand table. You can express this |
| negative relationship either through the <codeph>ANTI JOIN</codeph> clause or through a <codeph>NOT |
| EXISTS</codeph> operator with a subquery. |
| </p> |
| |
| <!-- Restriction lifted in Impala 2.0. |
| <p> |
| Impala does not support <codeph>WHERE</codeph> clauses |
| such as <codeph>IN (<varname>subquery</varname>)</codeph>, |
| <codeph>NOT IN (<varname>subquery</varname>)</codeph>, |
| <codeph>EXISTS (<varname>subquery</varname>)</codeph>, |
| and <codeph>NOT EXISTS (<varname>subquery</varname>)</codeph>. |
| Therefore from a practical standpoint, you cannot |
| express an anti-join condition, where values from one table |
| are returned only if no matching values are present in another table. |
| </p> |
| --> |
| |
| <p conref="../shared/impala_common.xml#common/complex_types_blurb"/> |
| |
| <!-- To do: reuse some complex types examples with joins here or under Examples farther down. --> |
| |
| <p rev="2.3.0"> |
| When referring to a column with a complex type (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>) |
| in a query, you use join notation to <q>unpack</q> the scalar fields of the struct, the elements of the array, or |
| the key-value pairs of the map. (The join notation is not required for aggregation operations, such as |
| <codeph>COUNT()</codeph> or <codeph>SUM()</codeph> for array elements.) Because Impala recognizes which complex type elements are associated with which row |
| of the result set, you use the same syntax as for a cross or cartesian join, without an explicit join condition. |
| See <xref href="impala_complex_types.xml#complex_types"/> for details about Impala support for complex types. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> |
| |
| <p> |
| You typically use join queries in situations like these: |
| </p> |
| |
| <ul> |
| <li> |
| When related data arrives from different sources, with each data set physically residing in a separate |
| table. For example, you might have address data from business records that you cross-check against phone |
| listings or census data. |
| <note> |
| Impala can join tables of different file formats, including Impala-managed tables and HBase tables. For |
| example, you might keep small dimension tables in HBase, for convenience of single-row lookups and |
| updates, and for the larger fact tables use Parquet or other binary file format optimized for scan |
| operations. Then, you can issue a join query to cross-reference the fact tables with the dimension |
| tables. |
| </note> |
| </li> |
| |
| <li> |
| When data is normalized, a technique for reducing data duplication by dividing it across multiple tables. |
| This kind of organization is often found in data that comes from traditional relational database systems. |
| For example, instead of repeating some long string such as a customer name in multiple tables, each table |
| might contain a numeric customer ID. Queries that need to display the customer name could <q>join</q> the |
| table that specifies which customer ID corresponds to which name. |
| </li> |
| |
| <li> |
| When certain columns are rarely needed for queries, so they are moved into separate tables to reduce |
| overhead for common queries. For example, a <codeph>biography</codeph> field might be rarely needed in |
| queries on employee data. Putting that field in a separate table reduces the amount of I/O for common |
| queries on employee addresses or phone numbers. Queries that do need the <codeph>biography</codeph> column |
| can retrieve it by performing a join with that separate table. |
| </li> |
| |
| <li> |
| In <keyword keyref="impala23_full"/> or higher, when referring to complex type columns in queries. |
| See <xref href="impala_complex_types.xml#complex_types"/> for details. |
| </li> |
| </ul> |
| |
| <p> |
| When comparing columns with the same names in <codeph>ON</codeph> or <codeph>WHERE</codeph> clauses, use the |
| fully qualified names such as <codeph><varname>db_name</varname>.<varname>table_name</varname></codeph>, or |
| assign table aliases, column aliases, or both to make the code more compact and understandable: |
| </p> |
| |
| <codeblock>select t1.c1 as first_id, t2.c2 as second_id from |
| t1 join t2 on first_id = second_id; |
| |
| select fact.custno, dimension.custno from |
| customer_data as fact join customer_address as dimension |
| using (custno)</codeblock> |
| |
| <note> |
| <p> |
| Performance for join queries is a crucial aspect for Impala, because complex join queries are |
| resource-intensive operations. An efficient join query produces much less network traffic and CPU overhead |
| than an inefficient one. For best results: |
| </p> |
| <ul> |
| <li rev="1.2"> |
| Make sure that both <xref href="impala_perf_stats.xml#perf_stats">table and column statistics</xref> are |
| available for all the tables involved in a join query, and especially for the columns referenced in any |
| join conditions. Impala uses the statistics to automatically deduce an efficient join order. |
| Use <xref href="impala_show.xml#show"><codeph>SHOW TABLE STATS <varname>table_name</varname></codeph> and |
| <codeph>SHOW COLUMN STATS <varname>table_name</varname></codeph></xref> to check if statistics are |
| already present. Issue the <codeph>COMPUTE STATS <varname>table_name</varname></codeph> for a nonpartitioned table, |
| or (in Impala 2.1.0 and higher) <codeph>COMPUTE INCREMENTAL STATS <varname>table_name</varname></codeph> |
| for a partitioned table, to collect the initial statistics at both the table and column levels, and to keep the |
| statistics up to date after any substantial <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph> operations. |
| </li> |
| |
| <li rev="1.2"> |
| If table or column statistics are not available, join the largest table first. You can check the |
| existence of statistics with the <codeph>SHOW TABLE STATS <varname>table_name</varname></codeph> and |
| <codeph>SHOW COLUMN STATS <varname>table_name</varname></codeph> statements. |
| </li> |
| |
| <li rev="1.2.2"> |
| If table or column statistics are not available, join subsequent tables according to which table has the |
| most selective filter, based on overall size and <codeph>WHERE</codeph> clauses. Joining the table with |
| the most selective filter results in the fewest number of rows being returned. |
| </li> |
| </ul> |
| <p> |
| For more information and examples of performance for join queries, see |
| <xref href="impala_perf_joins.xml#perf_joins"/>. |
| </p> |
| </note> |
| |
| <p> |
| To control the result set from a join query, include the names of corresponding column names in both tables |
| in an <codeph>ON</codeph> or <codeph>USING</codeph> clause, or by coding equality comparisons for those |
| columns in the <codeph>WHERE</codeph> clause. |
| </p> |
| |
| <codeblock>[localhost:21000] > select c_last_name, ca_city from customer join customer_address where c_customer_sk = ca_address_sk; |
| +-------------+-----------------+ |
| | c_last_name | ca_city | |
| +-------------+-----------------+ |
| | Lewis | Fairfield | |
| | Moses | Fairview | |
| | Hamilton | Pleasant Valley | |
| | White | Oak Ridge | |
| | Moran | Glendale | |
| ... |
| | Richards | Lakewood | |
| | Day | Lebanon | |
| | Painter | Oak Hill | |
| | Bentley | Greenfield | |
| | Jones | Stringtown | |
| +-------------+------------------+ |
| Returned 50000 row(s) in 9.82s</codeblock> |
| |
| <p> |
| One potential downside of joins is the possibility of excess resource usage in poorly constructed queries. |
| Impala imposes restrictions on join queries to guard against such issues. To minimize the chance of runaway |
| queries on large data sets, Impala requires every join query to contain at least one equality predicate |
| between the columns of the various tables. For example, if <codeph>T1</codeph> contains 1000 rows and |
| <codeph>T2</codeph> contains 1,000,000 rows, a query <codeph>SELECT <varname>columns</varname> FROM t1 JOIN |
| t2</codeph> could return up to 1 billion rows (1000 * 1,000,000); Impala requires that the query include a |
| clause such as <codeph>ON t1.c1 = t2.c2</codeph> or <codeph>WHERE t1.c1 = t2.c2</codeph>. |
| </p> |
| |
| <p> |
| Because even with equality clauses, the result set can still be large, as we saw in the previous example, you |
| might use a <codeph>LIMIT</codeph> clause to return a subset of the results: |
| </p> |
| |
| <codeblock>[localhost:21000] > select c_last_name, ca_city from customer, customer_address where c_customer_sk = ca_address_sk limit 10; |
| +-------------+-----------------+ |
| | c_last_name | ca_city | |
| +-------------+-----------------+ |
| | Lewis | Fairfield | |
| | Moses | Fairview | |
| | Hamilton | Pleasant Valley | |
| | White | Oak Ridge | |
| | Moran | Glendale | |
| | Sharp | Lakeview | |
| | Wiles | Farmington | |
| | Shipman | Union | |
| | Gilbert | New Hope | |
| | Brunson | Martinsville | |
| +-------------+-----------------+ |
| Returned 10 row(s) in 0.63s</codeblock> |
| |
| <p> |
| Or you might use additional comparison operators or aggregation functions to condense a large result set into |
| a smaller set of values: |
| </p> |
| |
| <codeblock>[localhost:21000] > -- Find the names of customers who live in one particular town. |
| [localhost:21000] > select distinct c_last_name from customer, customer_address where |
| c_customer_sk = ca_address_sk |
| and ca_city = "Green Acres"; |
| +---------------+ |
| | c_last_name | |
| +---------------+ |
| | Hensley | |
| | Pearson | |
| | Mayer | |
| | Montgomery | |
| | Ricks | |
| ... |
| | Barrett | |
| | Price | |
| | Hill | |
| | Hansen | |
| | Meeks | |
| +---------------+ |
| Returned 332 row(s) in 0.97s |
| |
| [localhost:21000] > -- See how many different customers in this town have names starting with "A". |
| [localhost:21000] > select count(distinct c_last_name) from customer, customer_address where |
| c_customer_sk = ca_address_sk |
| and ca_city = "Green Acres" |
| and substr(c_last_name,1,1) = "A"; |
| +-----------------------------+ |
| | count(distinct c_last_name) | |
| +-----------------------------+ |
| | 12 | |
| +-----------------------------+ |
| Returned 1 row(s) in 1.00s</codeblock> |
| |
| <p> |
| Because a join query can involve reading large amounts of data from disk, sending large amounts of data |
| across the network, and loading large amounts of data into memory to do the comparisons and filtering, you |
| might do benchmarking, performance analysis, and query tuning to find the most efficient join queries for |
| your data set, hardware capacity, network configuration, and cluster workload. |
| </p> |
| |
| <p> |
| The two categories of joins in Impala are known as <b>partitioned joins</b> and <b>broadcast joins</b>. If |
| inaccurate table or column statistics, or some quirk of the data distribution, causes Impala to choose the |
| wrong mechanism for a particular join, consider using query hints as a temporary workaround. For details, see |
| <xref href="impala_hints.xml#hints"/>. |
| </p> |
| |
| <p rev="2.5.0"> |
| <b>Handling NULLs in Join Columns:</b> |
| </p> |
| |
| <p rev="2.5.0"> |
| By default, join key columns do not match if either one contains a <codeph>NULL</codeph> value. |
| To treat such columns as equal if both contain <codeph>NULL</codeph>, you can use an expression |
| such as <codeph>A = B OR (A IS NULL AND B IS NULL)</codeph>. |
| In <keyword keyref="impala25_full"/> and higher, the <codeph><=></codeph> operator (shorthand for |
| <codeph>IS NOT DISTINCT FROM</codeph>) performs the same comparison in a concise and efficient form. |
| The <codeph><=></codeph> operator is more efficient in for comparing join keys in a <codeph>NULL</codeph>-safe |
| manner, because the operator can use a hash join while the <codeph>OR</codeph> expression cannot. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/example_blurb"/> |
| |
| <p> |
| The following examples refer to these simple tables containing small sets of integers: |
| <codeblock>[localhost:21000] > create table t1 (x int); |
| [localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6); |
| |
| [localhost:21000] > create table t2 (y int); |
| [localhost:21000] > insert into t2 values (2), (4), (6); |
| |
| [localhost:21000] > create table t3 (z int); |
| [localhost:21000] > insert into t3 values (1), (3), (5); |
| </codeblock> |
| </p> |
| |
| <!-- To do: fill in examples for other join types. --> |
| |
| <p> |
| The following example demonstrates an anti-join, returning the values from <codeph>T1</codeph> that do not |
| exist in <codeph>T2</codeph> (in this case, the odd numbers 1, 3, and 5): |
| </p> |
| |
| <codeblock>[localhost:21000] > select x from t1 left anti join t2 on (t1.x = t2.y); |
| +---+ |
| | x | |
| +---+ |
| | 1 | |
| | 3 | |
| | 5 | |
| +---+ |
| </codeblock> |
| |
| <p conref="../shared/impala_common.xml#common/related_info"/> |
| |
| <p> |
| See these tutorials for examples of different kinds of joins: |
| </p> |
| |
| <ul> |
| <li> |
| <xref href="impala_tutorial.xml#tut_cross_join"/> |
| </li> |
| </ul> |
| </conbody> |
| </concept> |