| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="langref_hiveql_delta"> |
| |
| <title>SQL Differences Between Impala and Hive</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Hive"/> |
| <data name="Category" value="Porting"/> |
| <data name="Category" value="Data Analysts"/> |
| <data name="Category" value="Developers"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| <indexterm audience="hidden">Hive</indexterm> |
| <indexterm audience="hidden">HiveQL</indexterm> |
| Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as |
| built-in functions. See <xref href="impala_porting.xml#porting"/> for a general discussion of adapting SQL |
| code from a variety of database systems to Impala. |
| </p> |
| |
| <p> |
| Because Impala and Hive share the same metastore database and their tables are often used interchangeably, |
| the following section covers differences between Impala and Hive in detail. |
| </p> |
| |
| <p outputclass="toc inpage"/> |
| </conbody> |
| |
| <concept id="langref_hiveql_unsupported"> |
| |
| <title>HiveQL Features not Available in Impala</title> |
| |
| <conbody> |
| |
| <p> |
| The current release of Impala does not support the following SQL features that you might be familiar with |
| from HiveQL: |
| </p> |
| |
| <!-- To do: |
| Yeesh, too many separate lists of unsupported Hive syntax. |
| Here, the FAQ, and in some of the intro topics. |
| Some discussion in IMP-1061 about how best to reorg. |
| Lots of opportunities for conrefs. |
| --> |
| |
| <ul> |
| <!-- Now supported in <keyword keyref="impala23_full"/> and higher. Find places on this page (like already done under lateral views) to note the new data type support. |
| <li> |
| Non-scalar data types such as maps, arrays, structs. |
| </li> |
| --> |
| |
| <li rev="1.2"> |
| Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom file formats, or custom SerDes. |
| </li> |
| |
| <li rev=""> |
| The <codeph>DATE</codeph> data type. |
| </li> |
| |
| <li> |
| XML and JSON functions. |
| </li> |
| |
| <li> |
| Certain aggregate functions from HiveQL: <codeph>covar_pop</codeph>, <codeph>covar_samp</codeph>, |
| <codeph>corr</codeph>, <codeph>percentile</codeph>, <codeph>percentile_approx</codeph>, |
| <codeph>histogram_numeric</codeph>, <codeph>collect_set</codeph>; Impala supports the set of aggregate |
| functions listed in <xref href="impala_aggregate_functions.xml#aggregate_functions"/> and analytic |
| functions listed in <xref href="impala_analytic_functions.xml#analytic_functions"/>. |
| </li> |
| |
| <li> |
| Sampling. |
| </li> |
| |
| <li> |
| Lateral views. In <keyword keyref="impala23_full"/> and higher, Impala supports queries on complex types |
| (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>), using join notation |
| rather than the <codeph>EXPLODE()</codeph> keyword. |
| See <xref href="impala_complex_types.xml#complex_types"/> for details about Impala support for complex types. |
| </li> |
| |
| <li> |
| Multiple <codeph>DISTINCT</codeph> clauses per query, although Impala includes some workarounds for this |
| limitation. |
| <note conref="../shared/impala_common.xml#common/multiple_count_distinct"/> |
| </li> |
| </ul> |
| |
| <p> |
| User-defined functions (UDFs) are supported starting in Impala 1.2. See <xref href="impala_udf.xml#udfs"/> |
| for full details on Impala UDFs. |
| <ul> |
| <li> |
| <p> |
| Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently |
| support user-defined table generating functions (UDTFs). |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| Only Impala-supported column types are supported in Java-based UDFs. |
| </p> |
| </li> |
| |
| <li> |
| <p conref="../shared/impala_common.xml#common/current_user_caveat"/> |
| </li> |
| </ul> |
| </p> |
| |
| <p> |
| Impala does not currently support these HiveQL statements: |
| </p> |
| |
| <ul> |
| <li> |
| <codeph>ANALYZE TABLE</codeph> (the Impala equivalent is <codeph>COMPUTE STATS</codeph>) |
| </li> |
| |
| <li> |
| <codeph>DESCRIBE COLUMN</codeph> |
| </li> |
| |
| <li> |
| <codeph>DESCRIBE DATABASE</codeph> |
| </li> |
| |
| <li> |
| <codeph>EXPORT TABLE</codeph> |
| </li> |
| |
| <li> |
| <codeph>IMPORT TABLE</codeph> |
| </li> |
| |
| <li> |
| <codeph>SHOW TABLE EXTENDED</codeph> |
| </li> |
| |
| <li> |
| <codeph>SHOW INDEXES</codeph> |
| </li> |
| |
| <li> |
| <codeph>SHOW COLUMNS</codeph> |
| </li> |
| |
| <li rev="DOCS-656"> |
| <codeph>INSERT OVERWRITE DIRECTORY</codeph>; use <codeph>INSERT OVERWRITE <varname>table_name</varname></codeph> |
| or <codeph>CREATE TABLE AS SELECT</codeph> to materialize query results into the HDFS directory associated |
| with an Impala table. |
| </li> |
| </ul> |
| </conbody> |
| </concept> |
| |
| <concept id="langref_hiveql_semantics"> |
| |
| <title>Semantic Differences Between Impala and HiveQL Features</title> |
| |
| <conbody> |
| |
| <p> |
| This section covers instances where Impala and Hive have similar functionality, sometimes including the |
| same syntax, but there are differences in the runtime semantics of those features. |
| </p> |
| |
| <p> |
| <b>Security:</b> |
| </p> |
| |
| <p> |
| Impala utilizes the <xref href="http://sentry.incubator.apache.org/" scope="external" format="html">Apache |
| Sentry </xref> authorization framework, which provides fine-grained role-based access control |
| to protect data against unauthorized access or tampering. |
| </p> |
| |
| <p> |
| The Hive component now includes Sentry-enabled <codeph>GRANT</codeph>, |
| <codeph>REVOKE</codeph>, and <codeph>CREATE/DROP ROLE</codeph> statements. Earlier Hive releases had a |
| privilege system with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements that were primarily |
| intended to prevent accidental deletion of data, rather than a security mechanism to protect against |
| malicious users. |
| </p> |
| |
| <p> |
| Impala can make use of privileges set up through Hive <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements. |
| Impala has its own <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala 2.0 and higher. |
| See <xref href="impala_authorization.xml#authorization"/> for the details of authorization in Impala, including |
| how to switch from the original policy file-based privilege model to the Sentry service using privileges |
| stored in the metastore database. |
| </p> |
| |
| <p> |
| <b>SQL statements and clauses:</b> |
| </p> |
| |
| <p> |
| The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL |
| statement and clause names: |
| </p> |
| |
| <ul> |
| <li> |
| Impala uses different syntax and names for query hints, <codeph>[SHUFFLE]</codeph> and |
| <codeph>[NOSHUFFLE]</codeph> rather than <codeph>MapJoin</codeph> or <codeph>StreamJoin</codeph>. See |
| <xref href="impala_joins.xml#joins"/> for the Impala details. |
| </li> |
| |
| <li> |
| Impala does not expose MapReduce specific features of <codeph>SORT BY</codeph>, <codeph>DISTRIBUTE |
| BY</codeph>, or <codeph>CLUSTER BY</codeph>. |
| </li> |
| |
| <li> |
| Impala does not require queries to include a <codeph>FROM</codeph> clause. |
| </li> |
| </ul> |
| |
| <p> |
| <b>Data types:</b> |
| </p> |
| |
| <ul> |
| <li> |
| Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected |
| casting behavior. |
| <ul> |
| <li> |
| Impala does not implicitly cast between string and numeric or Boolean types. Always use |
| <codeph>CAST()</codeph> for these conversions. |
| </li> |
| |
| <li> |
| Impala does perform implicit casts among the numeric types, when going from a smaller or less precise |
| type to a larger or more precise one. For example, Impala will implicitly convert a |
| <codeph>SMALLINT</codeph> to a <codeph>BIGINT</codeph> or <codeph>FLOAT</codeph>, but to convert from |
| <codeph>DOUBLE</codeph> to <codeph>FLOAT</codeph> or <codeph>INT</codeph> to <codeph>TINYINT</codeph> |
| requires a call to <codeph>CAST()</codeph> in the query. |
| </li> |
| |
| <li> |
| Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal |
| formats for the <codeph>TIMESTAMP</codeph> data type and the <codeph>from_unixtime()</codeph> format |
| string; see <xref href="impala_timestamp.xml#timestamp"/> for details. |
| </li> |
| </ul> |
| <p> |
| See <xref href="impala_datatypes.xml#datatypes"/> for full details on implicit and explicit casting for |
| all types, and <xref href="impala_conversion_functions.xml#conversion_functions"/> for details about |
| the <codeph>CAST()</codeph> function. |
| </p> |
| </li> |
| |
| <li> |
| Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from |
| unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can |
| produce different results for some calls to similarly named date/time functions between Impala and Hive. |
| See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details about the Impala |
| functions. See <xref href="impala_timestamp.xml#timestamp"/> for a discussion of how Impala handles |
| time zones, and configuration options you can use to make Impala match the Hive behavior more closely |
| when dealing with Parquet-encoded <codeph>TIMESTAMP</codeph> data or when converting between |
| the local time zone and UTC. |
| </li> |
| |
| <li> |
| The Impala <codeph>TIMESTAMP</codeph> type can represent dates ranging from 1400-01-01 to 9999-12-31. |
| This is different from the Hive date range, which is 0000-01-01 to 9999-12-31. |
| </li> |
| |
| <li> |
| <p conref="../shared/impala_common.xml#common/int_overflow_behavior"/> |
| </li> |
| |
| </ul> |
| |
| <p> |
| <b>Miscellaneous features:</b> |
| </p> |
| |
| <ul> |
| <li> |
| Impala does not provide virtual columns. |
| </li> |
| |
| <li> |
| Impala does not expose locking. |
| </li> |
| |
| <li> |
| Impala does not expose some configuration properties. |
| </li> |
| </ul> |
| </conbody> |
| </concept> |
| </concept> |