docs/topics/impala_new_features.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept rev="ver" id="new_features">

   <title><ph audience="standalone">New Features in Apache Impala</ph><ph audience="integrated">What's New in Apache Impala</ph></title>

   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Release Notes"/>
       <data name="Category" value="New Features"/>
       <data name="Category" value="What's New"/>
       <data name="Category" value="Getting Started"/>
       <data name="Category" value="Upgrading"/>
       <data name="Category" value="Administrators"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <p>
       This release of Impala contains the following changes and enhancements from previous releases.
     </p>

     <p outputclass="toc inpage"/>

   </conbody>
   <concept rev="3.2.0" id="new_features_33">
     <title>New Features in <keyword keyref="impala33"/></title>
     <conbody>
       <p> The following sections describe the noteworthy improvements made in
           <keyword keyref="impala33"/>. </p>
       <p> For the full list of issues closed in this release, see the <xref
           keyref="changelog_33">changelog for <keyword keyref="impala33"
           /></xref>. </p>
       <section id="section_ezf_tnq_s3b">
         <title>Increased Compatibility with Apache Projects</title>
         <p>Impala is integrate with the following components:<ul>
             <li dir="ltr">
               <p dir="ltr">Apache Ranger: Use Apache Ranger to manage
                 authorization in Impala. See <xref
                   href="https://impala.apache.org/docs/build/html/topics/impala_authorization.html"
                   format="html" scope="external"><u>Impala
                   Authorization</u></xref> for details.</p>
             </li>
             <li dir="ltr">
               <p dir="ltr">Apache Atlas: Use Apache Atlas to manage data
                 governance in Impala.</p>
             </li>
             <li dir="ltr">
               <p dir="ltr">Hive 3</p>
             </li>
           </ul></p>
       </section>
       <section id="section_ys5_k4n_t3b">
         <title>Parquet Page Index </title>
         <p>To improve performance when using Parquet files, Impala can now write
           page indexes in Parquet files and use those indexes to skip pages for
           the faster scan.</p>
         <p>See <xref href="impala_parquet.xml#parquet_performance"/> for
           details.</p>
       </section>
       <section id="section_zs5_k4n_t3b">
         <title>The Remote File Handle Cache Supports S3</title>
         <p>Impala can now cache remote HDFS file handles when the tables that
           store their data in Amazon S3 cloud storage.</p>
         <p>See <xref href="impala_scalability.xml#scalability_file_handle_cache"
           /> for the information on remote file handle cache.</p>
       </section>
       <section id="section_jls_hxj_s3b">
         <title>Support for Kudu Integrated with Hive Metastore</title>
         <p>In Impala 3.3 and Kudu 1.10, Kudu is integrated with Hive Metastore
           (HMS), and from Impala, you can create, update, delete, and query the
           tables in the Kudu services integrated with HMS.</p>
         <p>See <xref
             href="https://impala.apache.org/docs/build/html/topics/impala_kudu.html"
             format="html" scope="external">Using Kudu with Impala</xref> for
           information on using Kudu tables in Impala.</p>
       </section>
       <section id="section_dp4_mxj_s3b">
         <title>Zstd Compression for Parquet files</title>
         <p>Zstandard (Zstd) is a real-time compression algorithm offering a
           tradeoff between speed and ratio of compression. Compression levels
           from 1 up to 22 are supported. The lower the level, the faster the
           speed at the cost of compression ratio.</p>
       </section>
       <section id="section_parquet_lz4_notes">
         <title>Lz4 Compression for Parquet files</title>
         <p>Lz4 is a lossless compression algorithm providing extremely fast
             and scalable compression and decompression.</p>
       </section>
       <section id="section_drv_nxj_s3b">
         <title>Data Cache for Remote Reads</title>
         <p>To improve performance on multi-cluster HDFS environments as well as
           on object store environments, Impala now caches data for non-local
           reads (e.g. S3, ABFS, ADLS) on local storage.</p>
         <p>The data cache is enabled with the <codeph>--data_cache
             startup</codeph> flag.</p>
         <p>See <xref
             href="https://impala.apache.org/docs/build/html/topics/impala_data_cache.html"
             format="html" scope="external">Impala Remote Data Cache</xref> for
           the information and steps to enable remote data cache.</p>
       </section>
       <section id="section_xp4_b1f_t3b">
         <title>Metadata Performance Improvements </title>
         <p>The following features to improve metadata performance are enabled by
           default in this release:</p>
         <ul>
           <li>
             <p>Incremental stats are now compressed in memory in
                 <codeph>catalogd</codeph>, reducing memory footprint in
                 <codeph>catalogd</codeph>.</p>
           </li>
           <li>
             <p><codeph>impalad</codeph>coordinators fetch incremental stats from
                 <codeph>catalogd</codeph> on-demand, reducing the memory
               footprint and the network requirements for broadcasting
               metadata.</p>
           </li>
           <li>
             <p>Time-based and memory-based automatic invalidation of metadata to
               keep the size of metadata bounded and to reduce the chances of
                 <codeph>catalogd</codeph>cache running out of memory.</p>
           </li>
           <li>
             <p>Automatic invalidation of metadata</p>
             <p>With automatic metadata management enabled, you no longer have to
               issue <codeph>INVALIDATE</codeph> / <codeph>REFRESH</codeph> in a
               number of conditions.</p>
             <p>In Impala 3.3, the following additional event in Hive Metastore
               can trigger automatic INVALIDATE / REFRESH of Metadata:</p>
             <ul>
               <li>
                 <p>INSERT into tables and partitions from Impala or from Spark
                   on the same or multiple cluster configuration</p>
               </li>
             </ul>
           </li>
         </ul>
         <p>See <xref href="impala_metadata.xml#impala_metadata"/> for the
           information on the above features.</p>
       </section>
       <section id="section_ztf_c4q_s3b">
         <title>Scalable Pool Configuration in Admission Controller</title>
         <p>To offer more dynamic and flexible resource management, Impala
           supports the new configuration parameters that scale with the number
           of hosts in the resource pool. You can use the parameters to control
           the number of running queries, queued queries, and maximum amount of
           memory allocated for Impala resource pools. See <xref
             href="impala_admission.xml#admission_control"/> for the information
           about the new parameters and using them for admission control.</p>
       </section>
       <section id="section_b55_gxj_s3b">
         <title>Query Profile</title>
         <p>The following information was added to the Query Profile output for
           better monitoring and troubleshooting of query performance.</p>
         <ul>
           <li>
             <p>Network I/O throughput</p>
           </li>
           <li>
             <p>System disk I/O throughput</p>
           </li>
         </ul>
         <p>See <xref
             href="https://impala.apache.org/docs/build/html/topics/impala_explain_plan.html"
             format="html" scope="external">Impala Query Profile</xref> for
           generating and reading query profile.</p>
       </section>
       <section id="section_lbh_kzj_s3b">
         <title>DATE Data Type and Functions</title>
         <p>You can use the new the DATE type to describe a particular
           year/month/day, in the form YYYY-MM-DD.</p>
         <p>This initial DATE type support the TEXT, Parquet, and HBASE file
           formats.</p>
         <p>The support of DATE data type includes the following features:</p>
         <ul>
           <li><codeph>DATE</codeph> type column as a partitioning key
             column</li>
           <li><codeph>DATE</codeph> literal</li>
           <li>Implicit casting between <codeph>DATE</codeph> and other types:
               <codeph>STRING</codeph> and <codeph>TIMESTAMP</codeph></li>
           <li>Most of the built-in functions for <codeph>TIMESTAMP</codeph> now
             allow the <codeph>DATE</codeph> type arguments, as well.</li>
         </ul>
         <p>See <xref href="impala_date.xml#date"/> and <xref
             href="impala_datetime_functions.xml#datetime_functions"/> for using
           the DATE type.</p>
       </section>
       <section id="section_wpm_zzj_s3b">
         <title>Support Hive Insert-Only Transactional Tables</title>
         <p>Impala added the support to create, drop, query, and insert into the
           insert-only type of transactional tables. </p>
       </section>
       <section>
         <p>See <xref
             href="https://impala.apache.org/docs/build/html/topics/impala_transactions.html"
             format="html" scope="external">Impala Transactions</xref> for
           details.</p>
       </section>
       <section id="section_ab2_41k_s3b">
         <title>HiveServer2 HTTP Connection for Clients</title>
         <p>Now client applications can connect to Impala over HTTP via
           HiveServer2 with the option to use the Kerberos SPNEGO and LDAP for
           authentication. See <xref
             href="https://impala.apache.org/docs/build/html/topics/impala_client.html"
             format="html" scope="external">Impala Clients</xref> for
           details.</p>
       </section>
       <section id="section_xxt_44q_s3b">
         <title>Default File Format Changed to Parquet</title>
         <p>When you create a table, the default format for that table data is
           now Parquet.</p>
         <p>For backward compatibility, you can use the DEFAULT_FILE_FORMAT query
           option to set the default file format to the previous default, text,
           or other formats.</p>
       </section>
       <section id="section_m1h_mnf_t3b">
         <title>Built-in Function to Process JSON Objects</title>
         <p>The <codeph>GET_JSON_OBJECT()</codeph> function extracts JSON object
           from a string based on the path specified and returns the extracted
           JSON object.</p>
         <p>See <xref href="impala_misc_functions.xml#misc_functions">Impala
             Miscellaneous Functions</xref>. for details.</p>
       </section>
       <section id="section_acs_wck_s3b">
         <title>Ubuntu 18.04</title>
         <p>This version of Impala is certified to run on Ubuntu 18.04.</p>
       </section>
     </conbody>
   </concept>
   <concept rev="3.2.0" id="new_features_32">
     <title>New Features in <keyword keyref="impala32"/></title>
     <conbody>
       <p> The following sections describe the noteworthy improvements made in
           <keyword keyref="impala32"/>. </p>
       <p> For the full list of issues closed in this release, see the <xref
           keyref="changelog_32">changelog for <keyword keyref="impala32"
           /></xref>. </p>
     </conbody>
     <concept id="rn_32_multi_cluster">
       <title>Multi-cluster Support</title>
       <conbody>
         <ul>
           <li dir="ltr">Remote File Handle Cache<p>Impala can now cache remote
               HDFS file handles when the
                 <codeph>cache_remote_file_handles</codeph> impalad flag is set
               to <codeph>true</codeph>. This feature does not apply to non-HDFS
               tables, such as Kudu or HBase tables, and does not apply to the
               tables that store their data on cloud services, such as S3 or
               ADLS. See <xref
                 href="https://impala.apache.org/docs/build/html/topics/impala_scalability.html"
                 format="html" scope="external">Scalabilty Considerations</xref>
               for file handle caching in Impala.</p></li>
         </ul>
       </conbody>
     </concept>
     <concept id="rn_32_ac">
       <title>Enhancements in Resource Management and Admission Control</title>
       <conbody>
         <ul>
           <li>Admission Debug page is available in <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_webui.html"
               format="html" scope="external">Impala Daemon (impalad) web
               UI</xref> at <codeph>\admission</codeph> and provides the
             following information about Impala resource pools:<ul>
               <li>Pool configuration</li>
               <li>Relevant pool stats</li>
               <li>Queued queries in order of being queued (local to the
                 coordinator)</li>
               <li>Running queries (local to this coordinator)</li>
               <li>Histogram of the distribution of peak memory usage by admitted
                 queries</li>
             </ul></li>
         </ul>
         <ul>
           <li>A new query option, <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_num_rows_produced_limit.html"
               format="html" scope="external">NUM_ROWS_PRODUCED_LIMIT</xref>, was
             added to limit the number of rows returned from queries.<p>Impala
               will cancel a query if the query produces more rows than the limit
               specified by this query option. The limit applies only when the
               results are returned to a client, e.g. for a
                 <codeph>SELECT</codeph> query, but not an
                 <codeph>INSERT</codeph> query. This query option is a guardrail
               against users accidentally submitting queries that return a large
               number of rows.</p></li>
         </ul>
       </conbody>
     </concept>
     <concept id="rn_32_metadata">
       <title>Metadata Performance Improvements</title>
       <conbody>
         <ul>
           <li><xref
               href="https://impala.apache.org/docs/build/html/topics/impala_metadata.html"
               format="html" scope="external">Automatic Metadata Sync using Hive
               Metastore Notification Events</xref><p>When enabled, the
                 <codeph>catalogd</codeph> polls Hive Metastore (HMS)
               notifications events at a configurable interval and syncs with
               HMS. You can use the new web UI pages of the
                 <codeph>catalogd</codeph> to check the state of the automatic
               invalidate event processor. </p><p><b>Note</b>: This is a preview
               feature in <keyword keyref="impala32">Impala
             3.2</keyword>.</p></li>
         </ul>
       </conbody>
     </concept>
     <concept id="rn_32_usability">
       <title>Compatibility and Usability Enhancements</title>
       <conbody>
         <ul>
           <li>Impala can now read the <codeph>TIMESTAMP_MILLIS</codeph> and
               <codeph>TIMESTAMP_MICROS</codeph> Parquet types. See <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_parquet.html"
               format="html" scope="external">Using Parquet File Format for
               Impala Tables</xref> for the Parquet support in Impala.</li>
           <li>Impala can now read the complex types in ORC such as ARRAY,
             STRUCT, and MAP. See <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_orc.html"
               format="html" scope="external">Using ORC File Format for Impala
               Tables</xref> for the ORC support in Impala.</li>
           <li>The <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_string_functions.html"
               format="html" scope="external">LEVENSHTEIN</xref> string function
             is supported.<p>The function returns the Levenshtein distance
               between two input strings, the minimum number of single-character
               edits required to transform one string to other.</p></li>
           <li>The <codeph>IF NOT EXISTS</codeph> clause is supported in the
               <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_alter_table.html"
               format="html" scope="external"><codeph>ALTER TABLE</codeph></xref>
             statement.</li>
           <li>The new <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_default_file_format.html"
               format="html" scope="external"
                 ><codeph>DEFAULT_FILE_FORMAT</codeph></xref> query option allows
             you to set the default table file format. This removes the need for
             the <codeph>STORED AS &lt;format></codeph> clause. Set this option
             if you prefer a value that is not <codeph>TEXT</codeph>. The
             supported formats are: <ul>
               <li><codeph>TEXT</codeph></li>
               <li><codeph>RC_FILE</codeph></li>
               <li><codeph>SEQUENCE_FILE</codeph></li>
               <li><codeph>AVRO</codeph></li>
               <li><codeph>PARQUET</codeph></li>
               <li><codeph>KUDU</codeph></li>
               <li><codeph>ORC</codeph></li>
             </ul></li>
           <li>The extended or verbose <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_explain.html"
               format="html" scope="external"><codeph>EXPLAIN</codeph></xref>
             output includes the following new information for queries:<ul>
               <li>The text of the analyzed query that may have been rewritten to
                 include various optimizations and implicit casts. </li>
               <li>The implicit casts and literals shown with the actual
                 types.</li>
             </ul></li>
           <li>CPU resource utilization (user, system, iowait) metrics were added
             to the <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_explain_plan.html"
               format="html" scope="external">Impala profile</xref> output.</li>
         </ul>
       </conbody>
     </concept>
     <concept id="rn_32_security">
       <title><b id="docs-internal-guid-e1c558d3-7fff-4d4e-0ec1-e40f60c9b64a"
             ><b>Security Enhancement</b></b></title>
       <conbody>
         <ul>
           <li>The <xref
               href="https://impala.apache.org/docs/build/html/topics/impala_refresh_authorization.html"
               format="html" scope="external">REFRESH AUTHORIZATION</xref>
             statement was implemented for refreshing authorization data.</li>
         </ul>
       </conbody>
     </concept>
   </concept>
   <!-- All 3.1.x new features go under here -->
   <concept rev="3.1.0" id="new_features_31">
     <title>New Features in <keyword keyref="impala31"/></title>
     <conbody>
       <p> For the full list of issues closed in this release, including the
         issues marked as <q>new features</q> or <q>improvements</q>, see the
           <xref keyref="changelog_31">changelog for <keyword keyref="impala31"
           /></xref>. </p>
     </conbody>
   </concept>

   <!-- All 3.0.x new features go under here -->
   <concept rev="3.0.0" id="new_features_300">
     <title>New Features in <keyword keyref="impala30"/></title>
     <conbody>
       <p>
         For the full list of issues closed in this release, including the
         issues marked as <q>new features</q> or <q>improvements</q>, see the
           <xref keyref="changelog_300">changelog for <keyword keyref="impala30"
           /></xref>.
       </p>
     </conbody>
   </concept>

 <!-- All 2.12.x new features go under here -->

   <concept rev="2.12.0" id="new_features_2120">

     <title>New Features in <keyword keyref="impala212_full"/></title>

     <conbody>

       <p>
         For the full list of issues closed in this release, including the issues
         marked as <q>new features</q> or <q>improvements</q>, see the
         <xref keyref="changelog_212">changelog for <keyword keyref="impala212"/></xref>.
       </p>

     </conbody>
   </concept>

 <!-- All 2.11.x new features go under here -->

   <concept rev="2.11.0" id="new_features_2110">

     <title>New Features in <keyword keyref="impala211_full"/></title>

     <conbody>

       <p>
         For the full list of issues closed in this release, including the issues
         marked as <q>new features</q> or <q>improvements</q>, see the
         <xref keyref="changelog_211">changelog for <keyword keyref="impala211"/></xref>.
       </p>

     </conbody>
   </concept>

 <!-- All 2.10.x new features go under here -->

   <concept rev="2.10.0" id="new_features_2100">

     <title>New Features in <keyword keyref="impala210_full"/></title>

     <conbody>

       <p>
         For the full list of issues closed in this release, including the issues
         marked as <q>new features</q> or <q>improvements</q>, see the
         <xref keyref="changelog_210">changelog for <keyword keyref="impala210"/></xref>.
       </p>

     </conbody>
   </concept>

 <!-- All 2.9.x new features go under here -->

   <concept rev="2.9.0" id="new_features_290">

     <title>New Features in <keyword keyref="impala29_full"/></title>

     <conbody>

       <p>
         For the full list of issues closed in this release, including the issues
         marked as <q>new features</q> or <q>improvements</q>, see the
         <xref keyref="changelog_29">changelog for <keyword keyref="impala29"/></xref>.
       </p>

       <p>
         The following are some of the most significant new features in this release:
       </p>

       <ul id="feature_list">
         <li>
           <p rev="IMPALA-4729">
             A new function, <codeph>replace()</codeph>, which is faster than
             <codeph>regexp_replace()</codeph> for simple string substitutions.
             See <xref keyref="string_functions"/> for details.
           </p>
         </li>
         <li>
           <p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503">
             Startup flags for the <cmdname>impalad</cmdname> daemon, <codeph>is_executor</codeph>
             and <codeph>is_coordinator</codeph>, let you divide the work on a large, busy cluster
             between a small number of hosts acting as query coordinators, and a larger number of
             hosts acting as query executors. By default, each host can act in both roles,
             potentially introducing bottlenecks during heavily concurrent workloads.
             See <xref keyref="scalability_coordinator"/> for details.
           </p>
         </li>
       </ul>

     </conbody>
   </concept>

 <!-- All 2.8.x new features go under here -->

   <concept rev="2.8.0" id="new_features_280">

     <title>New Features in <keyword keyref="impala28_full"/></title>

     <conbody>

       <ul id="feature_list">
         <li>
           <p>
             Performance and scalability improvements:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-4572">
                 The <codeph>COMPUTE STATS</codeph> statement can
                 take advantage of multithreading.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4135">
                 Improved scalability for highly concurrent loads by reducing the possibility of TCP/IP timeouts.
                 A configuration setting, <codeph>accepted_cnxn_queue_depth</codeph>, can be adjusted upwards to
                 avoid this type of timeout on large clusters.
               </p>
             </li>
             <li>
               <p>
                 Several performance improvements were made to the mechanism for generating native code:
               </p>
               <ul>
                 <li>
                   <p rev="IMPALA-3638">
                     Some queries involving analytic functions can take better advantage of native code generation.
                   </p>
                 </li>
                 <li>
                   <p rev="IMPALA-4008">
                     Modules produced during intermediate code generation are organized
                     to be easier to cache and reuse during the lifetime of a long-running or complicated query.
                   </p>
                 </li>
                 <li>
                   <p rev="IMPALA-4397 IMPALA-1430">
                     The <codeph>COMPUTE STATS</codeph> statement is more efficient
                     (less time for the codegen phase) for tables with a large number
                     of columns, especially for tables containing <codeph>TIMESTAMP</codeph>
                     columns.
                   </p>
                 </li>
                 <li>
                   <p rev="IMPALA-3838 IMPALA-4495">
                     The logic for determining whether or not to use a runtime filter is more reliable, and the
                     evaluation process itself is faster because of native code generation.
                   </p>
                 </li>
               </ul>
             </li>
             <li>
               <p rev="IMPALA-3902">
                 The <codeph>MT_DOP</codeph> query option enables
                 multithreading for a number of Impala operations.
                 <codeph>COMPUTE STATS</codeph> statements for Parquet tables
                 use a default of <codeph>MT_DOP=4</codeph> to improve the
                 intra-node parallelism and CPU efficiency of this data-intensive
                 operation.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4397">
                 The <codeph>COMPUTE STATS</codeph> statement is more efficient
                 (less time for the codegen phase) for tables with a large number
                 of columns.
               </p>
             </li>
             <li>
               <p rev="IMPALA-2521">
                 A new hint, <codeph>CLUSTERED</codeph>,
                 allows Impala <codeph>INSERT</codeph> operations on a Parquet table
                 that use dynamic partitioning to process a high number of
                 partitions in a single statement. The data is ordered based on the
                 partition key columns, and each partition is only written
                 by a single host, reducing the amount of memory needed to buffer
                 Parquet data while the data blocks are being constructed.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3552">
                 The new configuration setting <codeph>inc_stats_size_limit_bytes</codeph>
                 lets you reduce the load on the catalog server when running the
                 <codeph>COMPUTE INCREMENTAL STATS</codeph> statement for very large tables.
               </p>
             </li>
             <li>
               <p rev="IMPALA-1788">
                 Impala folds many constant expressions within query statements,
                 rather than evaluating them for each row. This optimization
                 is especially useful when using functions to manipulate and
                 format <codeph>TIMESTAMP</codeph> values, such as the result
                 of an expression such as <codeph>to_date(now() - interval 1 day)</codeph>.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4529">
                 Parsing of complicated expressions is faster. This speedup is
                 especially useful for queries containing large <codeph>CASE</codeph>
                 expressions.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4302">
                 Evaluation is faster for <codeph>IN</codeph> operators with many constant
                 arguments. The same performance improvement applies to other functions
                 with many constant arguments.
               </p>
             </li>
             <li>
               <p rev="IMPALA-1286">
                 Impala optimizes identical comparison operators within multiple <codeph>OR</codeph>
                 blocks.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4193 IMPALA-3342">
                 The reporting for wall-clock times and total CPU time in profile output is more accurate.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3671">
                 A new query option, <codeph>SCRATCH_LIMIT</codeph>, lets you restrict the amount of
                 space used when a query exceeds the memory limit and activates the <q>spill to disk</q> mechanism.
                 This option helps to avoid runaway queries or make queries <q>fail fast</q> if they require more
                 memory than anticipated. You can prevent runaway queries from using excessive amounts of spill space,
                 without restarting the cluster to turn the spilling feature off entirely.
                 See <xref href="impala_scratch_limit.xml#scratch_limit"/> for details.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p>
             Integration with Apache Kudu:
           </p>
           <ul>
             <li>
               <p rev="">
                 The experimental Impala support for the Kudu storage layer has been folded
                 into the main Impala development branch. Impala can now directly access Kudu tables,
                 opening up new capabilities such as enhanced DML operations and continuous ingestion.
               </p>
             </li>
             <li>
               <p rev="">
                 The <codeph>DELETE</codeph> statement is a flexible way to remove data from a Kudu table. Previously,
                 removing data from an Impala table involved removing or rewriting the underlying data files, dropping entire partitions,
                 or rewriting the entire table. This Impala statement only works for Kudu tables.
               </p>
             </li>
             <li>
               <p rev="">
                 The <codeph>UPDATE</codeph> statement is a flexible way to modify data within a Kudu table. Previously,
                 updating data in an Impala table involved replacing the underlying data files, dropping entire partitions,
                 or rewriting the entire table. This Impala statement only works for Kudu tables.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3725">
                 The <codeph>UPSERT</codeph> statement is a flexible way to ingest, modify, or both data within a Kudu table. Previously,
                 ingesting data that might contain duplicates involved an inefficient multi-stage operation, and there was no
                 built-in protection against duplicate data. The <codeph>UPSERT</codeph> statement, in combination with
                 the primary key designation for Kudu tables, lets you add or replace rows in a single operation, and
                 automatically avoids creating any duplicate data.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3719 IMPALA-3726">
                 The <codeph>CREATE TABLE</codeph> statement gains some new clauses that are specific to Kudu tables:
                 <codeph>PARTITION BY</codeph>, <codeph>PARTITIONS</codeph>, <codeph>STORED AS KUDU</codeph>, and column
                 attributes <codeph>PRIMARY KEY</codeph>, <codeph>NULL</codeph> and <codeph>NOT NULL</codeph>,
                 <codeph>ENCODING</codeph>, <codeph>COMPRESSION</codeph>, <codeph>DEFAULT</codeph>, and <codeph>BLOCK_SIZE</codeph>.
                 These clauses replace the explicit <codeph>TBLPROPERTIES</codeph> settings that were required in the
                 early experimental phases of integration between Impala and Kudu.
               </p>
             </li>
             <li>
               <p rev="IMPALA-2890">
                 The <codeph>ALTER TABLE</codeph> statement can change certain attributes of Kudu tables.
                 You can add, drop, or rename columns.
                 You can add or drop range partitions.
                 You can change the <codeph>TBLPROPERTIES</codeph> value to rename or point to a different underlying Kudu table,
                 independently from the Impala table name in the metastore database.
                 You cannot change the data type of an existing column in a Kudu table.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4403">
                 The <codeph>SHOW PARTITIONS</codeph> statement displays information about the distribution of data
                 between partitions in Kudu tables. A new variation, <codeph>SHOW RANGE PARTITIONS</codeph>,
                 displays information about the Kudu-specific partitions that apply across ranges of key values.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4379">
                 Not all Impala data types are supported in Kudu tables. In particular, currently the Impala
                 <codeph>TIMESTAMP</codeph> type is not allowed in a Kudu table. Impala does not recognize the
                 <codeph>UNIXTIME_MICROS</codeph> Kudu type when it is present in a Kudu table. (These two
                 representations of date/time data use different units and are not directly compatible.)
                 You cannot create columns of type <codeph>TIMESTAMP</codeph>, <codeph>DECIMAL</codeph>,
                 <codeph>VARCHAR</codeph>, or <codeph>CHAR</codeph> within a Kudu table. Within a query, you can
                 cast values in a result set to these types. Certain types, such as <codeph>BOOLEAN</codeph>,
                 cannot be used as primary key columns.
               </p>
             </li>
             <li>
               <p rev="">
                 Currently, Kudu tables are not interchangeable between Impala and Hive the way other kinds of Impala tables are.
                 Although the metadata for Kudu tables is stored in the metastore database, currently Hive cannot access Kudu tables.
               </p>
             </li>
             <li>
               <p rev="">
                 The <codeph>INSERT</codeph> statement works for Kudu tables. The organization
                 of the Kudu data makes it more efficient than with HDFS-backed tables to insert
                 data in small batches, such as with the <codeph>INSERT ... VALUES</codeph> syntax.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4283">
                 Some audit data is recorded for data governance purposes.
                 All <codeph>UPDATE</codeph>, <codeph>DELETE</codeph>, and <codeph>UPSERT</codeph> statements are characterized
                 as <codeph>INSERT</codeph> operations in the audit log. Currently, lineage metadata is not generated for
                 <codeph>UPDATE</codeph> and <codeph>DELETE</codeph> operations on Kudu tables.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4000">
                 Currently, Kudu tables have limited support for Sentry:
                 <ul>
                   <li>
                     <p>
                       Access to Kudu tables must be granted to roles as usual.
                     </p>
                   </li>
                   <li>
                     <p>
                       Currently, access to a Kudu table through Sentry is <q>all or nothing</q>.
                       You cannot enforce finer-grained permissions such as at the column level,
                       or permissions on certain operations such as <codeph>INSERT</codeph>.
                     </p>
                   </li>
                   <li>
                     <p>
                       Only users with <codeph>ALL</codeph> privileges on <codeph>SERVER</codeph> can create external Kudu tables.
                     </p>
                   </li>
                 </ul>
                 Because non-SQL APIs can access Kudu data without going through Sentry
                 authorization, currently the Sentry support is considered preliminary.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4571">
                 Equality and <codeph>IN</codeph> predicates in Impala queries are pushed to
                 Kudu and evaluated efficiently by the Kudu storage layer.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="">
             <b>Security:</b>
           </p>
           <ul>
             <li>
               <p>
                 Impala can take advantage of the S3 encrypted credential
                 store, to avoid exposing the secret key when accessing
                 data stored on S3.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-1654">
             [<xref keyref="IMPALA-1654">IMPALA-1654</xref>]
             Several kinds of DDL operations
             can now work on a range of partitions. The partitions can be specified
             using operators such as <codeph>&lt;</codeph>, <codeph>&gt;=</codeph>, and
             <codeph>!=</codeph> rather than just an equality predicate applying to a single
             partition.
             This new feature extends the syntax of several clauses
             of the <codeph>ALTER TABLE</codeph> statement
             (<codeph>DROP PARTITION</codeph>, <codeph>SET [UN]CACHED</codeph>,
             <codeph>SET FILEFORMAT | SERDEPROPERTIES | TBLPROPERTIES</codeph>),
             the <codeph>SHOW FILES</codeph> statement, and the
             <codeph>COMPUTE INCREMENTAL STATS</codeph> statement.
             It does not apply to statements that are defined to only apply to a single
             partition, such as <codeph>LOAD DATA</codeph>, <codeph>ALTER TABLE ... ADD PARTITION</codeph>,
             <codeph>SET LOCATION</codeph>, and <codeph>INSERT</codeph> with a static
             partitioning clause.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3973">
             The <codeph>instr()</codeph> function has optional second and third arguments, representing
             the character to position to begin searching for the substring, and the Nth occurrence
             of the substring to find.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3441 IMPALA-4387">
             Improved error handling for malformed Avro data. In particular, incorrect
             precision or scale for <codeph>DECIMAL</codeph> types is now handled.
           </p>
         </li>
         <li>
           <p>
             Impala debug web UI:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-1169">
                 In addition to <q>inflight</q> and <q>finished</q> queries, the web UI
                 now also includes a section for <q>queued</q> queries.
               </p>
             </li>
             <li>
               <p rev="IMPALA-4048">
                 The <uicontrol>/sessions</uicontrol> tab now clarifies how many of the displayed
                 sections are active, and lets you sort by <uicontrol>Expired</uicontrol> status
                 to distinguish active sessions from expired ones.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-4020">
             Improved stability when DDL operations such as <codeph>CREATE DATABASE</codeph>
             or <codeph>DROP DATABASE</codeph> are run in Hive at the same time as an Impala
             <codeph>INVALIDATE METADATA</codeph> statement.
           </p>
         </li>
         <li>
           <p rev="IMPALA-1616">
             The <q>out of memory</q> error report was made more user-friendly, with additional
             diagnostic information to help identify the spot where the memory limit was exceeded.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3983 IMPALA-3974">
             Improved disk space usage for Java-based UDFs. Temporary copies of the associated JAR
             files are removed when no longer needed, so that they do not accumulate across restarts
             of the <cmdname>catalogd</cmdname> daemon and potentially cause an out-of-space condition.
             These temporary files are also created in the directory specified by the <codeph>local_library_dir</codeph>
             configuration setting, so that the storage for these temporary files can be independent
             from any capacity limits on the <filepath>/tmp</filepath> filesystem.
           </p>
         </li>
       </ul>

     </conbody>
   </concept>

 <!-- All 2.7.x new features go under here -->

   <concept rev="2.7.0" id="new_features_270">

     <title>New Features in <keyword keyref="impala27_full"/></title>

     <conbody>

       <ul id="feature_list">
         <li>
           <p>
             Performance improvements:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-3206">
                 [<xref keyref="IMPALA-3206">IMPALA-3206</xref>]
                 Speedup for queries against <codeph>DECIMAL</codeph> columns in Avro tables.
                 The code that parses <codeph>DECIMAL</codeph> values from Avro now uses
                 native code generation.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3674">
                 [<xref keyref="IMPALA-3674">IMPALA-3674</xref>]
                 Improved efficiency in LLVM code generation can reduce codegen time, especially
                 for short queries.
               </p>
             </li>
             <!-- Not actually a new feature, it's more a tip about when to expect remote reads and how to minimize them. To go somewhere in the performance / best practices / Parquet info.
             <li>
               <p rev="IMPALA-3885">
                 [<xref keyref="IMPALA-3885">IMPALA-3885</xref>]
                 Parquet files with multiple blocks can now be processed
                 without remote reads.
               </p>
             </li>
             -->
             <li>
               <p rev="IMPALA-2979">
                 [<xref keyref="IMPALA-2979">IMPALA-2979</xref>]
                 Improvements to scheduling on worker nodes,
                 enabled by the <codeph>REPLICA_PREFERENCE</codeph> query option.
                 See <xref
                   href="impala_replica_preference.xml#replica_preference"/> for details.
               </p>
             </li>
           </ul>
         </li>
         <li audience="hidden">
           <p rev="IMPALA-3210"><!-- Patch didn't make it into in <keyword keyref="impala27_full"/> -->
             [<xref keyref="IMPALA-3210">IMPALA-3210</xref>]
             The analytic functions <codeph>FIRST_VALUE()</codeph> and <codeph>LAST_VALUE()</codeph>
             accept a new clause, <codeph>IGNORE NULLS</codeph>.
             See <xref href="impala_analytic_functions.xml#first_value"/>
             and <xref href="impala_analytic_functions.xml#last_value"/>
             for details.
           </p>
         </li>
         <li>
           <p rev="IMPALA-1683">
             [<xref keyref="IMPALA-1683">IMPALA-1683</xref>]
             The <codeph>REFRESH</codeph> statement can be applied to a single partition,
             rather than the entire table. See <xref href="impala_refresh.xml#refresh"/>
             and <xref href="impala_partitioning.xml#partition_refresh"/> for details.
           </p>
         </li>
         <li>
           <p>
             Improvements to the Impala web user interface:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-2767">
                 [<xref keyref="IMPALA-2767">IMPALA-2767</xref>]
                 You can now force a session to expire by clicking a link in the web UI,
                 on the <uicontrol>/sessions</uicontrol> tab.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3715">
                 [<xref keyref="IMPALA-3715">IMPALA-3715</xref>]
                 The <uicontrol>/memz</uicontrol> tab includes more information about
                 Impala memory usage.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3716">
                 [<xref keyref="IMPALA-3716">IMPALA-3716</xref>]
                 The <uicontrol>Details</uicontrol> page for a query now includes
                 a <uicontrol>Memory</uicontrol> tab.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-3499">
             [<xref keyref="IMPALA-3499">IMPALA-3499</xref>]
             Scalability improvements to the catalog server. Impala handles internal communication
             more efficiently for tables with large numbers of columns and partitions, where the
             size of the metadata exceeds 2 GiB.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3677">
             [<xref keyref="IMPALA-3677">IMPALA-3677</xref>]
             You can send a <codeph>SIGUSR1</codeph> signal to any Impala-related daemon to write a
             Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
             without triggering a crash. See <xref href="impala_breakpad.xml#breakpad"/> for
             details about the Breakpad minidump feature.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3687">
             [<xref keyref="IMPALA-3687">IMPALA-3687</xref>]
             The schema reconciliation rules for Avro tables have changed slightly
             for <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> columns. Now, if
             the definition of such a column is changed in the Avro schema file,
             the column retains its <codeph>CHAR</codeph> or <codeph>VARCHAR</codeph>
             type as specified in the SQL definition, but the column name and comment
             from the Avro schema file take precedence.
             See <xref href="impala_avro.xml#avro_create_table"/> for details about
             column definitions in Avro tables.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3575">
             [<xref keyref="IMPALA-3575">IMPALA-3575</xref>]
             Some network
             operations now have additional timeout and retry settings. The extra
             configuration helps avoid failed queries for transient network
             problems, to avoid hangs when a sender or receiver fails in the
             middle of a network transmission, and to make cancellation requests
             more reliable despite network issues. </p>
         </li>
       </ul>

     </conbody>
   </concept>
 <!-- All 2.6.x new features go under here -->

   <concept rev="2.6.0" id="new_features_260">

     <title>New Features in <keyword keyref="impala26_full"/></title>

     <conbody>

       <ul>
         <li>
           <p>
             Improvements to Impala support for the Amazon S3 filesystem:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-1878">
                 Impala can now write to S3 tables through the <codeph>INSERT</codeph>
                 or <codeph>LOAD DATA</codeph> statements.
                 See <xref href="impala_s3.xml#s3"/> for general information about
                 using Impala with S3.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3452">
                 A new query option, <codeph>S3_SKIP_INSERT_STAGING</codeph>, lets you
                 trade off between fast <codeph>INSERT</codeph> performance and
                 slower <codeph>INSERT</codeph>s that are more consistent if a
                 problem occurs during the statement. The new behavior is enabled by default.
                 See <xref href="impala_s3_skip_insert_staging.xml#s3_skip_insert_staging"/> for details
                 about this option.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="">
             Performance improvements for the runtime filtering feature:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-3333">
                 The default for the <codeph>RUNTIME_FILTER_MODE</codeph>
                 query option is changed to <codeph>GLOBAL</codeph> (the highest setting).
                 See <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode"/> for
                 details about this option.
               </p>
             </li>
             <li rev="IMPALA-3007">
               <p>
                 The <codeph>RUNTIME_BLOOM_FILTER_SIZE</codeph> setting is now only used
                 as a fallback if statistics are not available; otherwise, Impala
                 uses the statistics to estimate the appropriate size to use for each filter.
                 See <xref href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size"/> for
                 details about this option.
               </p>
             </li>
             <li rev="IMPALA-3480">
               <p>
                 New query options <codeph>RUNTIME_FILTER_MIN_SIZE</codeph> and
                 <codeph>RUNTIME_FILTER_MAX_SIZE</codeph> let you fine-tune
                 the sizes of the Bloom filter structures used for runtime filtering.
                 If the filter size derived from Impala internal estimates or from
                 the <codeph>RUNTIME_FILTER_BLOOM_SIZE</codeph> falls outside the size
                 range specified by these options, any too-small filter size is adjusted
                 to the minimum, and any too-large filter size is adjusted to the maximum.
                 See <xref href="impala_runtime_filter_min_size.xml#runtime_filter_min_size"/>
                 and <xref href="impala_runtime_filter_max_size.xml#runtime_filter_max_size"/>
                 for details about these options.
               </p>
             </li>
             <li rev="IMPALA-2956">
               <p>
                 Runtime filter propagation now applies to all the
                 operands of <codeph>UNION</codeph> and <codeph>UNION ALL</codeph>
                 operators.
               </p>
             </li>
             <li rev="IMPALA-3077">
               <p>
                 Runtime filters can now be produced during join queries even
                 when the join processing activates the spill-to-disk mechanism.
               </p>
             </li>
           </ul>
             See <xref href="impala_runtime_filtering.xml#runtime_filtering"/> for
             general information about the runtime filtering feature.
         </li>
         <!-- Have to look closer at resource management / admission control to see if
              there are any ripple effects from this default change. -->
         <li>
           <p rev="IMPALA-3199">
             Admission control and dynamic resource pools are enabled by default.
             See <xref href="impala_admission.xml#admission_control"/> for details
             about admission control.
           </p>
         </li>
         <!-- Below here are features that are pretty well taken care of already;
              some of them didn't need much if any doc in the first place. -->
         <li>
           <p rev="IMPALA-3369">
             Impala can now manually set column statistics,
             using the <codeph>ALTER TABLE</codeph> statement with a
             <codeph>SET COLUMN STATS</codeph> clause.
             See <xref href="impala_perf_stats.xml#perf_column_stats_manual"/> for details.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3490 IMPALA-3581 IMPALA-2686">
             Impala can now write lightweight <q>minidump</q> files, rather
             than large core files, to save diagnostic information when
             any of the Impala-related daemons crash. This feature uses the
             open source <codeph>breakpad</codeph> framework.
             See <xref href="impala_breakpad.xml#breakpad"/> for details.
           </p>
         </li>
         <li>
           <p>
             New query options improve interoperability with Parquet files:
             <ul>
               <li>
                 <p rev="IMPALA-2835">
                   The <codeph>PARQUET_FALLBACK_SCHEMA_RESOLUTION</codeph> query option
                   lets Impala locate columns within Parquet files based on
                   column name rather than ordinal position.
                   This enhancement improves interoperability with applications
                   that write Parquet files with a different order or subset of
                   columns than are used in the Impala table.
                   See <xref href="impala_parquet_fallback_schema_resolution.xml#parquet_fallback_schema_resolution"/>
                   for details.
                 </p>
               </li>
               <li>
                 <p rev="IMPALA-2069">
                   The <codeph>PARQUET_ANNOTATE_STRINGS_UTF8</codeph> query option
                   makes Impala include the <codeph>UTF-8</codeph> annotation
                   metadata for <codeph>STRING</codeph>, <codeph>CHAR</codeph>,
                   and <codeph>VARCHAR</codeph> columns in Parquet files created
                   by <codeph>INSERT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph>
                   statements.
                   See <xref href="impala_parquet_annotate_strings_utf8.xml#parquet_annotate_strings_utf8"/>
                   for details.
                 </p>
               </li>
             </ul>
             See <xref href="impala_parquet.xml#parquet"/> for general information about working
             with Parquet files.
           </p>
         </li>
         <li>
           <p>
             Improvements to security and reduction in overhead for secure clusters:
           </p>
           <ul>
             <li>
               <p rev="IMPALA-1928">
                 Overall performance improvements for secure clusters.
                 (TPC-H queries on a secure cluster were benchmarked
                 at roughly 3x as fast as the previous release.)
               </p>
             </li>
             <li>
               <p rev="IMPALA-2660">
                 Impala now recognizes the <codeph>auth_to_local</codeph> setting,
                 specified through the HDFS configuration setting
                 <codeph>hadoop.security.auth_to_local</codeph>.
                 This feature is disabled by default; to enable it,
                 specify <codeph>--load_auth_to_local_rules=true</codeph>
                 in the <cmdname>impalad</cmdname> configuration settings.
                 See <xref href="impala_kerberos.xml#auth_to_local"/> for details.
               </p>
             </li>
             <li>
               <p rev="IMPALA-2599">
                 Timing improvements in the mechanism for the <cmdname>impalad</cmdname>
                 daemon to acquire Kerberos tickets. This feature spreads out the overhead
                 on the KDC during Impala startup, especially for large clusters.
               </p>
             </li>
             <li>
               <p rev="IMPALA-3554">
                 For Kerberized clusters, the Catalog service now uses
                 the Kerberos principal instead of the operating sytem user that runs
                 the <cmdname>catalogd</cmdname> daemon.
                 This eliminates the requirement to configure a <codeph>hadoop.user.group.static.mapping.overrides</codeph>
                 setting to put the OS user into the Sentry administrative group, on clusters where the principal
                 and the OS user name for this user are different.
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-3286">
             Overall performance improvements for join queries, by using a prefetching mechanism
             while building the in-memory hash table to evaluate join predicates.
             See <xref href="impala_prefetch_mode.xml#prefetch_mode"/> for the query option
             to control this optimization.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3397">
             The <cmdname>impala-shell</cmdname> interpreter has a new command,
             <codeph>SOURCE</codeph>, that lets you run a set of SQL statements
             or other <cmdname>impala-shell</cmdname> commands stored in a file.
             You can run additional <codeph>SOURCE</codeph> commands from inside
             a file, to set up flexible sequences of statements for use cases
             such as schema setup, ETL, or reporting.
             See <xref href="impala_shell_commands.xml#shell_commands"/> for details
             and <xref href="impala_shell_running_commands.xml#shell_running_commands"/>
             for examples.
           </p>
         </li>
         <li>
           <p rev="IMPALA-1772">
             The <codeph>millisecond()</codeph> built-in function lets you extract
             the fractional seconds part of a <codeph>TIMESTAMP</codeph> value.
             See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3092">
             If an Avro table is created without column definitions in the
             <codeph>CREATE TABLE</codeph> statement, and columns are later
             added through <codeph>ALTER TABLE</codeph>, the resulting
             table is now queryable. Missing values from the newly added
             columns now default to <codeph>NULL</codeph>.
             See <xref href="impala_avro.xml#avro"/> for general details about
             working with Avro files.
           </p>
         </li>
         <li>
           <p>
             The mechanism for interpreting <codeph>DECIMAL</codeph> literals is
             improved, no longer going through an intermediate conversion step
             to <codeph>DOUBLE</codeph>:
             <ul>
               <li>
                 <p rev="IMPALA-3163">
                   Casting a <codeph>DECIMAL</codeph> value to <codeph>TIMESTAMP</codeph>
                   <codeph>DOUBLE</codeph> produces a more precise
                   value for the <codeph>TIMESTAMP</codeph> than formerly.
                 </p>
               </li>
               <li>
                 <p rev="IMPALA-3439">
                   Certain function calls involving <codeph>DECIMAL</codeph> literals
                   now succeed, when formerly they failed due to lack of a function
                   signature with a <codeph>DOUBLE</codeph> argument.
                 </p>
               </li>
               <li>
                 <p rev="">
                   Faster runtime performance for <codeph>DECIMAL</codeph> constant
                   values, through improved native code generation for all combinations
                   of precision and scale.
                 </p>
               </li>
             </ul>
             See <xref href="impala_decimal.xml#decimal"/> for details about the <codeph>DECIMAL</codeph> type.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3155">
             Improved type accuracy for <codeph>CASE</codeph> return values.
             If all <codeph>WHEN</codeph> clauses of the <codeph>CASE</codeph>
             expression are of <codeph>CHAR</codeph> type, the final result
             is also <codeph>CHAR</codeph> instead of being converted to
             <codeph>STRING</codeph>.
             See <xref href="impala_conditional_functions.xml#conditional_functions"/>
             for details about the <codeph>CASE</codeph> function.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3232">
             Uncorrelated queries using the <codeph>NOT EXISTS</codeph> operator
             are now supported. Formerly, the <codeph>NOT EXISTS</codeph>
             operator was only available for correlated subqueries.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2736">
             Improved performance for reading Parquet files.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3375">
             Improved performance for <term>top-N</term> queries, that is,
             those including both <codeph>ORDER BY</codeph> and
             <codeph>LIMIT</codeph> clauses.
           </p>
         </li>
         <!-- JIRA still in open state as of 5.8 / 2.6, commenting out.
         <li>
           <p rev="IMPALA-3471">
             A top-N query can now also activate the spill-to-disk mechanism if
             a host runs low on memory while evaluating it. For example, using
             large <codeph>LIMIT</codeph> and/or <codeph>OFFSET</codeph> clauses
             adds some memory overhead that could cause spilling.
           </p>
         </li>
         -->
         <li>
           <p rev="IMPALA-1740">
             Impala optionally skips an arbitrary number of header lines from text input
             files on HDFS based on the <codeph>skip.header.line.count</codeph> value
             in the <codeph>TBLPROPERTIES</codeph> field of the table metadata.
             See <xref href="impala_txtfile.xml#text_data_files"/> for details.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2336">
             Trailing comments are now allowed in queries processed by
             the <cmdname>impala-shell</cmdname> options <codeph>-q</codeph>
             and <codeph>-f</codeph>.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2844">
             Impala can run <codeph>COUNT</codeph> queries for RCFile tables
             that include complex type columns.
             See <xref href="impala_complex_types.xml#complex_types"/> for
             general information about working with complex types,
             and <xref href="impala_array.xml#array"/>,
             <xref href="impala_map.xml#map"/>, and <xref href="impala_struct.xml#struct"/>
             for syntax details of each type.
           </p>
         </li>
       </ul>

     </conbody>
   </concept>

 <!-- All 2.5.x new features go under here -->

   <concept rev="2.5.0" id="new_features_250">

     <title>New Features in <keyword keyref="impala25_full"/></title>

     <conbody>

       <ul>
         <li><!-- Spec: https://docs.google.com/document/d/1ambtYJ1t05iITCVIrN6N1A-e7PZBSetBPgjy8SLzJrA/edit#heading=h.vcftzwlpn845 -->
           <p rev="IMPALA-2552 IMPALA-3054">
             Dynamic partition pruning. When a query refers to a partition key column in a <codeph>WHERE</codeph>
             clause, and the exact set of column values are not known until the query is executed,
             Impala evaluates the predicate and skips the I/O for entire partitions that are not needed.
             For example, if a table was partitioned by year, Impala would apply this technique to a query
             such as <codeph>SELECT c1 FROM partitioned_table WHERE year = (SELECT MAX(year) FROM other_table)</codeph>.
             <ph audience="standalone">See <xref href="impala_partitioning.xml#dynamic_partition_pruning"/> for details.</ph>
           </p>
           <p>
             The dynamic partition pruning optimization technique lets Impala avoid reading
             data files from partitions that are not part of the result set, even when
             that determination cannot be made in advance. This technique is especially valuable
             when performing join queries involving partitioned tables. For example, if a join
             query includes an <codeph>ON</codeph> clause and a <codeph>WHERE</codeph> clause
             that refer to the same columns, the query can find the set of column values that
             match the <codeph>WHERE</codeph> clause, and only scan the associated partitions
             when evaluating the <codeph>ON</codeph> clause.
           </p>
           <p>
             Dynamic partition pruning is controlled by the same settings as the runtime filtering feature.
             By default, this feature is enabled at a medium level, because the maximum setting can use
             slightly more memory for queries than in previous releases.
             To fully enable this feature, set the query option <codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
           </p>
         </li>
         <li><!-- Spec: https://docs.google.com/document/d/1ambtYJ1t05iITCVIrN6N1A-e7PZBSetBPgjy8SLzJrA/edit#heading=h.vcftzwlpn845 -->
           <p rev="IMPALA-2419 IMPALA-3001 IMPALA-3008 IMPALA-3039 IMPALA-3046 IMPALA-3054">
             Runtime filtering. This is a wide-ranging set of optimizations that are especially valuable for join queries.
             Using the same technique as with dynamic partition pruning,
             Impala uses the predicates from <codeph>WHERE</codeph> and <codeph>ON</codeph> clauses
             to determine the subset of column values from one of the joined tables could possibly be part of the
             result set. Impala sends a compact representation of the filter condition to the hosts in the cluster,
             instead of the full set of values or the entire table.
             <ph audience="PDF">See <xref href="impala_runtime_filtering.xml#runtime_filtering"/> for details.</ph>
           </p>
           <p>
             By default, this feature is enabled at a medium level, because the maximum setting can use
             slightly more memory for queries than in previous releases.
             To fully enable this feature, set the query option <codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
             <ph audience="PDF">See <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode"/> for details.</ph>
           </p>
           <p>
             This feature involves some new query options:
             <xref audience="standalone" href="impala_runtime_filter_mode.xml">RUNTIME_FILTER_MODE</xref><codeph audience="integrated">RUNTIME_FILTER_MODE</codeph>,
             <xref audience="standalone" href="impala_max_num_runtime_filters.xml">MAX_NUM_RUNTIME_FILTERS</xref><codeph audience="integrated">MAX_NUM_RUNTIME_FILTERS</codeph>,
             <xref audience="standalone" href="impala_runtime_bloom_filter_size.xml">RUNTIME_BLOOM_FILTER_SIZE</xref><codeph audience="integrated">RUNTIME_BLOOM_FILTER_SIZE</codeph>,
             <xref audience="standalone" href="impala_runtime_filter_wait_time_ms.xml">RUNTIME_FILTER_WAIT_TIME_MS</xref><codeph audience="integrated">RUNTIME_FILTER_WAIT_TIME_MS</codeph>,
             and <xref audience="standalone" href="impala_disable_row_runtime_filtering.xml">DISABLE_ROW_RUNTIME_FILTERING</xref><codeph audience="integrated">DISABLE_ROW_RUNTIME_FILTERING</codeph>.
             <ph audience="PDF">See
             <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode">RUNTIME_FILTER_MODE</xref>,
             <xref href="impala_max_num_runtime_filters.xml#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS</xref>,
             <xref href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE</xref>,
             <xref href="impala_runtime_filter_wait_time_ms.xml#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS</xref>, and
             <xref href="impala_disable_row_runtime_filtering.xml#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING</xref>
             for details.
             </ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-2696">
             More efficient use of the HDFS caching feature, to avoid
             hotspots and bottlenecks that could occur if heavily used
             cached data blocks were always processed by the same host.
             By default, Impala now randomizes which host processes each cached
             HDFS data block, when cached replicas are available on multiple hosts.
             (Remember to use the <codeph>WITH REPLICATION</codeph> clause with the
             <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statement
             when enabling HDFS caching for a table or partition, to cache the same
             data blocks across multiple hosts.)
             The new query option <codeph>SCHEDULE_RANDOM_REPLICA</codeph>
             <!-- and <codeph>REPLICA_PREFERENCE</codeph> -->
             lets you fine-tune the interaction with HDFS caching even more.
             <ph audience="PDF">See <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-2641">
             The <codeph>TRUNCATE TABLE</codeph> statement now accepts an <codeph>IF EXISTS</codeph>
             clause, making <codeph>TRUNCATE TABLE</codeph> easier to use in setup or ETL scripts where the table might or
             might not exist.
             <ph audience="PDF">See <xref href="impala_truncate_table.xml#truncate_table"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-2681 IMPALA-2688 IMPALA-2749">
             Improved performance and reliability for the <codeph>DECIMAL</codeph> data type:
             <ul>
             <li>
               <p rev="IMPALA-2681">
                 Using <codeph>DECIMAL</codeph> values in a <codeph>GROUP BY</codeph> clause now
                 triggers the native code generation optimization, speeding up queries that
                 group by values such as prices.
               </p>
             </li>
             <li>
               <p rev="IMPALA-2688">
                 Checking for overflow in <codeph>DECIMAL</codeph>
                 multiplication is now substantially faster, making <codeph>DECIMAL</codeph>
                 a more practical data type in some use cases where formerly <codeph>DECIMAL</codeph>
                 was much slower than <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph>.
               </p>
             </li>
             <li>
               <p rev="IMPALA-2749">
                 Multiplying a mixture of <codeph>DECIMAL</codeph>
                 and <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph> values now returns the
                 <codeph>DOUBLE</codeph> rather than <codeph>DECIMAL</codeph>. This change avoids
                 some cases where an intermediate value would underflow or overflow and become
                 <codeph>NULL</codeph> unexpectedly.
               </p>
             </li>
             </ul>
             <ph audience="PDF">See <xref href="impala_decimal.xml"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-2382">
             For UDFs written in Java, or Hive UDFs reused for Impala,
             Impala now allows parameters and return values to be primitive types.
             Formerly, these things were required to be one of the <q>Writable</q>
             object types.
             <ph audience="PDF">See <xref href="impala_udf.xml#udfs_hive"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-1588"><!-- This is from 2015, so perhaps it's really in an earlier release. -->
             Performance improvements for HDFS I/O. Impala now caches HDFS file handles to avoid the
             overhead of repeatedly opening the same file.
           </p>
         </li>

         <!-- Kudu didn't make it into 2.5 / 5.7 release, so no DELETE or UPDATE statement. -->
         <li>
           <p><!-- Is there a JIRA for that one? Alex? -->
             Performance improvements for queries involving nested complex types.
             Certain basic query types, such as counting the elements of a complex column,
             now use an optimized code path.
           </p>
         </li>

         <li>
           <p rev="IMPALA-3044 IMPALA-2538 IMPALA-1168">
             Improvements to the memory reservation mechanism for the Impala
             admission control feature. You can specify more settings, such
             as the timeout period and maximum aggregate memory used, for each
             resource pool instead of globally for the Impala instance. The
             default limit for concurrent queries (the <uicontrol>max requests</uicontrol>
             setting) is now unlimited instead of 200.
           </p>
         </li>

         <li>
           <p rev="IMPALA-1755">
             Performance improvements related to code generation.
             Even in queries where code generation is not performed
             for some phases of execution (such as reading data from
             Parquet tables), Impala can still use code generation in
             other parts of the query, such as evaluating
             functions in the <codeph>WHERE</codeph> clause.
           </p>
         </li>
         <li>
           <p rev="IMPALA-1305">
             Performance improvements for queries using aggregation functions
             on high-cardinality columns.
             Formerly, Impala could do unnecessary extra work to produce intermediate
             results for operations such as <codeph>DISTINCT</codeph> or <codeph>GROUP BY</codeph>
             on columns that were unique or had few duplicate values.
             Now, Impala decides at run time whether it is more efficient to
             do an initial aggregation phase and pass along a smaller set of intermediate data,
             or to pass raw intermediate data back to next phase of query processing to be aggregated there.
             This feature is known as <term>streaming pre-aggregation</term>.
             In case of performance regression, this feature can be turned off
             using the <codeph>DISABLE_STREAMING_PREAGGREGATIONS</codeph> query option.
             <ph audience="PDF">See <xref href="impala_disable_streaming_preaggregations.xml#disable_streaming_preaggregations"/> for details.</ph>
           </p>
         </li>
         <li>
           <p>
             Spill-to-disk feature now always recommended. In earlier releases, the spill-to-disk feature
             could be turned off using a pair of configuration settings,
             <codeph>enable_partitioned_aggregation=false</codeph> and
             <codeph>enable_partitioned_hash_join=false</codeph>.
             The latest improvements in the spill-to-disk mechanism, and related features that
             interact with it, make this feature robust enough that disabling it is now
             no longer needed or supported. In particular, some new features in <keyword keyref="impala25_full"/>
             and higher do not work when the spill-to-disk feature is disabled.
           </p>
         </li>
         <li>
           <p rev="IMPALA-1067">
             Improvements to scripting capability for the <cmdname>impala-shell</cmdname> command,
             through user-specified substitution variables that can appear in statements processed
             by <cmdname>impala-shell</cmdname>:
           </p>
           <ul>
             <li rev="IMPALA-2179">
               <p>
                 The <codeph>--var</codeph> command-line option lets you pass key-value pairs to
                 <cmdname>impala-shell</cmdname>. The shell can substitute the values
                 into queries before executing them, where the query text contains the notation
                 <codeph>${var:<varname>varname</varname>}</codeph>. For example, you might prepare a SQL file
                 containing a set of DDL statements and queries containing variables for
                 database and table names, and then pass the applicable names as part of the
                 <codeph>impala-shell -f <varname>filename</varname></codeph> command.
                 <ph audience="PDF">See <xref href="impala_shell_running_commands.xml#shell_running_commands"/> for details.</ph>
               </p>
             </li>
             <li rev="IMPALA-2180">
               <p>
                 The <codeph>SET</codeph> and <codeph>UNSET</codeph> commands within the
                 <cmdname>impala-shell</cmdname> interpreter now work with user-specified
                 substitution variables, as well as the built-in query options.
                 The two kinds of variables are divided in the <codeph>SET</codeph> output.
                 As with variables defined by the <codeph>--var</codeph> command-line option,
                 you refer to the user-specified substitution variables in queries by using
                 the notation <codeph>${var:<varname>varname</varname>}</codeph>
                 in the query text. Because the substitution variables are processed by
                 <cmdname>impala-shell</cmdname> instead of the <cmdname>impalad</cmdname>
                 backend, you cannot define your own substitution variables through the
                 <codeph>SET</codeph> statement in a JDBC or ODBC application.
                 <ph audience="PDF">See <xref href="impala_set.xml#set"/> for details.</ph>
               </p>
             </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-1599">
             Performance improvements for query startup. Impala better parallelizes certain work
             when coordinating plan distribution between <cmdname>impalad</cmdname> instances, which improves
             startup time for queries involving tables with many partitions on large clusters,
             or complicated queries with many plan fragments.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2560">
             Performance and scalability improvements for tables with many partitions.
             The memory requirements on the coordinator node are reduced, making it substantially
             faster and less resource-intensive
             to do joins involving several tables with thousands of partitions each.
           </p>
         </li>
         <li>
           <p rev="IMPALA-3095">
             Whitelisting for access to internal APIs. For applications that need direct access
             to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
             specify a list of Kerberos users who are allowed to call those APIs. By default, the
             <codeph>impala</codeph> and <codeph>hdfs</codeph> users are the only ones authorized
             for this kind of access.
             Any users not explicitly authorized through the <codeph>internal_principals_whitelist</codeph>
             configuration setting are blocked from accessing the APIs. This setting applies to all the
             Impala-related daemons, although currently it is primarily used for HDFS to control the
             behavior of the catalog server.
           </p>
         </li>
         <li>
           <p rev="">
             Improvements to Impala integration and usability for Hue. (The code changes
             are actually on the Hue side.)
           </p>
           <ul>
           <li>
             <p rev="">
               The list of tables now refreshes dynamically.
             </p>
           </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-1787">
             Usability improvements for case-insensitive queries.
             You can now use the operators <codeph>ILIKE</codeph> and <codeph>IREGEXP</codeph>
             to perform case-insensitive wildcard matches or regular expression matches,
             rather than explicitly converting column values with <codeph>UPPER</codeph>
             or <codeph>LOWER</codeph>.
             <ph audience="PDF">See <xref href="impala_operators.xml#ilike"/> and <xref href="impala_operators.xml#iregexp"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-1480">
             Performance and reliability improvements for DDL and insert operations on partitioned tables with a large
             number of partitions. Impala only re-evaluates metadata for partitions that are affected by
             a DDL operation, not all partitions in the table. While a DDL or insert statement is in progress,
             other Impala statements that attempt to modify metadata for the same table wait until the first one
             finishes.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2867">
             Reliability improvements for the <codeph>LOAD DATA</codeph> statement.
             Previously, this statement would fail if the source HDFS directory
             contained any subdirectories at all. Now, the statement ignores
             any hidden subdirectories, for example <filepath>_impala_insert_staging</filepath>.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2147">
             A new operator, <codeph>IS [NOT] DISTINCT FROM</codeph>, lets you compare values
             and always get a <codeph>true</codeph> or <codeph>false</codeph> result,
             even if one or both of the values are <codeph>NULL</codeph>.
             The <codeph>IS NOT DISTINCT FROM</codeph> operator, or its equivalent
             <codeph>&lt;=&gt;</codeph> notation, improves the efficiency of join queries that
             treat key values that are <codeph>NULL</codeph> in both tables as equal.
             <ph audience="PDF">See <xref href="impala_operators.xml#is_distinct_from"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-1934">
             Security enhancements for the <cmdname>impala-shell</cmdname> command.
             A new option, <codeph>--ldap_password_cmd</codeph>, lets you specify
             a command to retrieve the LDAP password. The resulting password is
             then used to authenticate the <cmdname>impala-shell</cmdname> command
             with the LDAP server.
             <ph audience="PDF">See <xref href="impala_shell_options.xml"/> for details.</ph>
           </p>
         </li>
         <li>
           <p>
             The <codeph>CREATE TABLE AS SELECT</codeph> statement now accepts a
             <codeph>PARTITIONED BY</codeph> clause, which lets you create a
             partitioned table and insert data into it with a single statement.
             <ph audience="PDF">See <xref href="impala_create_table.xml#create_table"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-1748">
             User-defined functions (UDFs and UDAFs) written in C++ now persist automatically
             when the <cmdname>catalogd</cmdname> daemon is restarted. You no longer
             have to run the <codeph>CREATE FUNCTION</codeph> statements again after a restart.
           </p>
         </li>
         <li>
           <p rev="IMPALA-2843">
             User-defined functions (UDFs) written in Java can now persist
             when the <cmdname>catalogd</cmdname> daemon is restarted, and can be shared
             transparently between Impala and Hive. You must do a one-time operation to recreate these
             UDFs using new <codeph>CREATE FUNCTION</codeph> syntax, without a signature for arguments
             or the return value. Afterwards, you no longer have to run the <codeph>CREATE FUNCTION</codeph>
             statements again after a restart.
             Although Impala does not have visibility into the UDFs that implement the
             Hive built-in functions, user-created Hive UDFs are now automatically available
             for calling through Impala.
             <ph audience="PDF">See <xref href="impala_create_function.xml#create_function"/> for details.</ph>
           </p>
         </li>
         <li>
           <!-- Listed as fixed in 2.6.0. Is this item inappropriate or did it actually come from a different JIRA? -->
           <p rev="IMPALA-2728">
             Reliability enhancements for memory management. Some aggregation and join queries
             that formerly might have failed with an out-of-memory error due to memory contention,
             now can succeed using the spill-to-disk mechanism.
           </p>
         </li>
         <li>
           <!-- Same blurb is under Incompatible Changes. Turn into a conref. -->
           <p rev="IMPALA-2070">
             The <codeph>SHOW DATABASES</codeph> statement now returns two columns rather than one.
             The second column includes the associated comment string, if any, for each database.
             Adjust any application code that examines the list of databases and assumes the
             result set contains only a single column.
             <ph audience="PDF">See <xref href="impala_show.xml#show_databases"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-2499">
             A new optimization speeds up aggregation operations that involve only the partition key
             columns of partitioned tables. For example, a query such as <codeph>SELECT COUNT(DISTINCT k), MIN(k), MAX(k) FROM t1</codeph>
             can avoid reading any data files if <codeph>T1</codeph> is a partitioned table and <codeph>K</codeph>
             is one of the partition key columns. Because this technique can produce different results in cases
             where HDFS files in a partition are manually deleted or are empty, you must enable the optimization
             by setting the query option <codeph>OPTIMIZE_PARTITION_KEY_SCANS</codeph>.
             <ph audience="PDF">See <xref href="impala_optimize_partition_key_scans.xml"/> for details.</ph>
           </p>
         </li>
         <li audience="hidden"><!-- All the other undocumented query options are not really new features for this release, so hiding this whole bullet. -->
           <p>
             Other new query options:
           </p>
           <ul>
             <li audience="hidden"><!-- Actually from a long way back, just never documented. Not sure if appropriate to keep internal-only or expose. -->
               <codeph>DISABLE_OUTERMOST_TOPN</codeph>
             </li>
             <li audience="hidden"><!-- Actually from a long way back, just never documented. Not sure if appropriate to keep internal-only or expose. -->
               <codeph>RM_INITIAL_MEM</codeph>
             </li>
             <li audience="hidden"><!-- Seems to be related to writing sequence files, a capability not externalized at this time. -->
               <codeph>SEQ_COMPRESSION_MODE</codeph>
             </li>
             <li audience="hidden"><!-- Actually, was only used for working around one JIRA. Being deprecated now in Impala 2.3 via IMPALA-2963. -->
               <codeph>DISABLE_CACHED_READS</codeph>
             </li>
           </ul>
         </li>
         <li>
           <p rev="IMPALA-2196">
             The <codeph>DESCRIBE</codeph> statement can now display metadata about a database, using the
             syntax <codeph>DESCRIBE DATABASE <varname>db_name</varname></codeph>.
             <ph audience="PDF">See <xref href="impala_describe.xml#describe"/> for details.</ph>
           </p>
         </li>
         <li>
           <p rev="IMPALA-1477">
             The <codeph>uuid()</codeph> built-in function generates an
             alphanumeric value that you can use as a guaranteed unique identifier.
             The uniqueness applies even across tables, for cases where an ascending
             numeric sequence is not suitable.
             <ph audience="PDF">See <xref href="impala_misc_functions.xml#misc_functions"/> for details.</ph>
           </p>
         </li>
       </ul>

     </conbody>
   </concept>

 <!-- All 2.4.x new features go under here -->

   <concept rev="2.4.0" id="new_features_240">

     <title>New Features in <keyword keyref="impala24_full"/></title>

     <conbody>

       <ul>
         <li>
           <p>
             Impala can be used on the DSSD D5 Storage Appliance.
             From a user perspective, the Impala features are the same as in <keyword keyref="impala23_full"/>.
           </p>
         </li>
       </ul>

     </conbody>
   </concept>

 <!-- All 2.3.x subsections go under here -->

 <!-- Actually for 2.3 / 5.5, let's get away from doing a separate subhead for each maintenance release,
      because in the normal course of events there will be nothing to add here until 5.6. If something new
      needs to get noted, just add a new bullet with wording to indicate which 5.5.x release it applies to. -->

   <concept rev="2.3.0" id="new_features_230">

     <title>New Features in <keyword keyref="impala23_full"/></title>

     <conbody>

       <p>
         The following are the major new features in Impala 2.3.x. This major release
         contains improvements to SQL syntax (particularly new support for complex types), performance,
         manageability, security.
       </p>

       <ul>

         <li>
           <p>
             Complex data types: <codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, and <codeph>MAP</codeph>. These
             types can encode multiple named fields, positional items, or key-value pairs within a single column.
             You can combine these types to produce nested types with arbitrarily deep nesting,
             such as an <codeph>ARRAY</codeph> of <codeph>STRUCT</codeph> values,
             a <codeph>MAP</codeph> where each key-value pair is an <codeph>ARRAY</codeph> of other <codeph>MAP</codeph> values,
             and so on. Currently, complex data types are only supported for the Parquet file format.
             <ph audience="PDF">See <xref href="impala_complex_types.xml#complex_types"/> for usage details and <xref href="impala_array.xml#array"/>, <xref href="impala_struct.xml#struct"/>, and <xref href="impala_map.xml#map"/> for syntax.</ph>
           </p>
         </li>

         <li rev="collevelauth">
           <p>
             Column-level authorization lets you define access to particular columns within a table,
             rather than the entire table. This feature lets you reduce the reliance on creating views to
             set up authorization schemes for subsets of information.
             See <xref keyref="sg_hive_sql"/> for background details, and
             <xref href="impala_grant.xml#grant"/> and <xref href="impala_revoke.xml#revoke"/> for Impala-specific syntax.
           </p>
         </li>

         <li rev="IMPALA-1139">
           <p>
             The <codeph>TRUNCATE TABLE</codeph> statement removes all the data from a table without removing the table itself.
             <ph audience="PDF">See <xref href="impala_truncate_table.xml#truncate_table"/> for details.</ph>
           </p>
         </li>

         <li id="IMPALA-2015">
           <p>
             Nested loop join queries. Some join queries that formerly required equality comparisons can now use
             operators such as <codeph>&lt;</codeph> or <codeph>&gt;=</codeph>. This same join mechanism is used
             internally to optimize queries that retrieve values from complex type columns.
             <ph audience="PDF">See <xref href="impala_joins.xml#joins"/> for details about Impala join queries.</ph>
           </p>
         </li>

         <li>
           <p>
             Reduced memory usage and improved performance and robustness for spill-to-disk feature.
             <ph audience="PDF">See <xref href="impala_scalability.xml#spill_to_disk"/> for details about this feature.</ph>
           </p>
         </li>

         <li rev="IMPALA-1881">
           <p>
             Performance improvements for querying Parquet data files containing multiple row groups
             and multiple data blocks:
           </p>
           <ul>
           <li>
           <p> For files written by Hive, SparkSQL, and other Parquet MR writers
                 and spanning multiple HDFS blocks, Impala now scans the extra
                 data blocks locally when possible, rather than using remote
                 reads. </p>
           </li>
           <li>
           <p>
             Impala queries benefit from the improved alignment of row groups with HDFS blocks for Parquet
             files written by Hive, MapReduce, and other components. (Impala itself never writes
             multiblock Parquet files, so the alignment change does not apply to Parquet files produced by Impala.)
             These Parquet writers now add padding to Parquet files that they write to align row groups with HDFS blocks.
             The <codeph>parquet.writer.max-padding</codeph> setting specifies the maximum number of bytes, by default
             8 megabytes, that can be added to the file between row groups to fill the gap at the end of one block
             so that the next row group starts at the beginning of the next block.
             If the gap is larger than this size, the writer attempts to fit another entire row group in the remaining space.
             Include this setting in the <filepath>hive-site</filepath> configuration file to influence Parquet files written by Hive,
             or the <filepath>hdfs-site</filepath> configuration file to influence Parquet files written by all non-Impala components.
           </p>
           </li>
           </ul>
           <p audience="PDF">
             See <xref href="impala_parquet.xml#parquet"/> for instructions about using Parquet data files
             with Impala.
           </p>
         </li>

         <li id="IMPALA-1660">
           <p>
             Many new built-in scalar functions, for convenience and enhanced portability of SQL that uses common industry extensions.
           </p>

           <p rev="IMPALA-1771">
             Math functions<ph audience="PDF"> (see <xref href="impala_math_functions.xml#math_functions"/> for details)</ph>:
           </p>
           <ul>
             <li>
               <codeph>ATAN2</codeph>
             </li>

             <li>
               <codeph>COSH</codeph>
             </li>

             <li>
               <codeph>COT</codeph>
             </li>

             <li>
               <codeph>DCEIL</codeph>
             </li>

             <li>
               <codeph>DEXP</codeph>
             </li>

             <li>
               <codeph>DFLOOR</codeph>
             </li>

             <li>
               <codeph>DLOG10</codeph>
             </li>

             <li>
               <codeph>DPOW</codeph>
             </li>

             <li>
               <codeph>DROUND</codeph>
             </li>

             <li>
               <codeph>DSQRT</codeph>
             </li>

             <li>
               <codeph>DTRUNC</codeph>
             </li>

             <li>
               <codeph>FACTORIAL</codeph>, and corresponding <codeph>!</codeph> operator
             </li>

             <li>
               <codeph>FPOW</codeph>
             </li>

             <li>
               <codeph>RADIANS</codeph>
             </li>

             <li>
               <codeph>RANDOM</codeph>
             </li>

             <li>
               <codeph>SINH</codeph>
             </li>

             <li>
               <codeph>TANH</codeph>
             </li>
           </ul>

           <p>
             String functions<ph audience="PDF"> (see <xref href="impala_string_functions.xml#string_functions"/> for details)</ph>:
           </p>
           <ul>
             <li>
               <codeph>BTRIM</codeph>
             </li>
             <li>
               <codeph>CHR</codeph>
             </li>
             <li>
               <codeph>REGEXP_LIKE</codeph>
             </li>
             <li>
               <codeph>SPLIT_PART</codeph>
             </li>
           </ul>

           <p>
             Date and time functions<ph audience="PDF"> (see <xref href="impala_datetime_functions.xml#datetime_functions"/> for details)</ph>:
           </p>
           <ul>
               <li>
                 <codeph>INT_MONTHS_BETWEEN</codeph>
               </li>
               <li>
                 <codeph>MONTHS_BETWEEN</codeph>
               </li>
               <li>
                 <codeph>TIMEOFDAY</codeph>
               </li>
               <li>
                 <codeph>TIMESTAMP_CMP</codeph>
               </li>
           </ul>

           <p>
             Bit manipulation functions<ph audience="PDF"> (see <xref href="impala_bit_functions.xml#bit_functions"/> for details)</ph>:
           </p>
           <ul>
             <li>
               <codeph>BITAND</codeph>
             </li>

             <li>
               <codeph>BITNOT</codeph>
             </li>

             <li>
               <codeph>BITOR</codeph>
             </li>

             <li>
               <codeph>BITXOR</codeph>
             </li>

             <li>
               <codeph>COUNTSET</codeph>
             </li>

             <li>
               <codeph>GETBIT</codeph>
             </li>

             <li>
               <codeph>ROTATELEFT</codeph>
             </li>

             <li>
               <codeph>ROTATERIGHT</codeph>
             </li>

             <li>
               <codeph>SETBIT</codeph>
             </li>

             <li>
               <codeph>SHIFTLEFT</codeph>
             </li>

             <li>
               <codeph>SHIFTRIGHT</codeph>
             </li>
           </ul>
           <p>
             Type conversion functions<ph audience="PDF"> (see <xref href="impala_conversion_functions.xml#conversion_functions"/> for details)</ph>:
           </p>
           <ul>
             <li>
               <codeph>TYPEOF</codeph>
             </li>
           </ul>
           <p>
             The <codeph>effective_user()</codeph> function<ph audience="PDF"> (see <xref href="impala_misc_functions.xml#misc_functions"/> for details)</ph>.
           </p>
         </li>

         <li id="IMPALA-2081">
           <p>
             New built-in analytic functions: <codeph>PERCENT_RANK</codeph>, <codeph>NTILE</codeph>,
             <codeph>CUME_DIST</codeph>.
             <ph audience="PDF">See <xref href="impala_analytic_functions.xml#analytic_functions"/> for details.</ph>
           </p>
         </li>

         <li id="IMPALA-595">
           <p>
             The <codeph>DROP DATABASE</codeph> statement now works for a non-empty database.
             When you specify the optional <codeph>CASCADE</codeph> clause, any tables in the
             database are dropped before the database itself is removed.
             <ph audience="PDF">See <xref href="impala_drop_database.xml#drop_database"/> for details.</ph>
           </p>
         </li>

         <li>
           <p>
             The <codeph>DROP TABLE</codeph> and <codeph>ALTER TABLE DROP PARTITION</codeph> statements have a new optional keyword, <codeph>PURGE</codeph>.
             This keyword causes Impala to immediately remove the relevant HDFS data files rather than sending them to the HDFS trashcan.
             This feature can help to avoid out-of-space errors on storage devices, and to avoid files being left behind in case of
             a problem with the HDFS trashcan, such as the trashcan not being configured or being in a different HDFS encryption zone
             than the data files.
             <ph audience="PDF">See <xref href="impala_drop_table.xml#drop_table"/> and <xref href="impala_alter_table.xml#alter_table"/> for syntax.</ph>
           </p>
         </li>

         <li id="IMPALA-80">
           <p>
             The <cmdname>impala-shell</cmdname> command has a new feature for live progress reporting. This feature
             is enabled through the <codeph>--live_progress</codeph> and <codeph>--live_summary</codeph>
             command-line options, or during a session through the <codeph>LIVE_SUMMARY</codeph> and
             <codeph>LIVE_PROGRESS</codeph> query options.
             <ph audience="PDF">See <xref href="impala_live_progress.xml#live_progress"/> and <xref href="impala_live_summary.xml#live_summary"/> for details.</ph>
           </p>
         </li>

         <li>
           <p>
             The <cmdname>impala-shell</cmdname> command also now displays a random <q>tip of the day</q> when it starts.
           </p>
         </li>

         <li id="IMPALA-1413">
           <p>
             The <cmdname>impala-shell</cmdname> option <codeph>-f</codeph> now recognizes a special filename
             <codeph>-</codeph> to accept input from stdin.
             <ph audience="PDF">See <xref href="impala_shell_options.xml#shell_options"/> for details about the options for running <cmdname>impala-shell</cmdname> in non-interactive mode.</ph>
           </p>
         </li>

         <li id="IMPALA-1963">
           <p>
             Format strings for the <codeph>unix_timestamp()</codeph> function can now include numeric timezone offsets.
             <ph audience="PDF">See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
           </p>
         </li>

         <li>
           <p>
             Impala can now run a specified command to obtain the password to decrypt a private-key PEM file,
             rather than having the private-key file be unencrypted on disk.
             <ph audience="PDF">See <xref href="impala_ssl.xml#ssl"/> for details.</ph>
           </p>
         </li>

         <li id="IMPALA-859">
           <p>
             Impala components now can use SSL for more of their internal communication. SSL is used for
             communication between all three Impala-related daemons when the configuration option
             <codeph>ssl_server_certificate</codeph> is enabled. SSL is used for communication with client
             applications when the configuration option <codeph>ssl_client_ca_certificate</codeph> is enabled.
             <ph audience="PDF">See <xref href="impala_ssl.xml#ssl"/> for details.</ph>
           </p>
           <p>
             Currently, you can only use one of server-to-server TLS/SSL encryption or Kerberos authentication.
             This limitation is tracked by the issue
             <xref keyref="IMPALA-2598">IMPALA-2598</xref>.
           </p>
         </li>

         <li id="IMPALA-1829">
           <p>
             Improved flexibility for intermediate data types in user-defined aggregate functions (UDAFs).
             <ph audience="PDF">See <xref href="impala_udf.xml#udafs"/> for details.</ph>
           </p>
         </li>

       </ul>

       <p>
         In <keyword keyref="impala232"/>, the bug fix for <xref keyref="IMPALA-2598">IMPALA-2598</xref>
         removes the restriction on using both Kerberos and SSL for internal communication between Impala components.
       </p>

 <!-- End of new feature list for 2.3 / 5.5. -->

     </conbody>

   </concept>

 <!-- All 2.2.x subsections go under here -->

   <concept rev="2.2.0" id="new_features_220">

     <title>New Features in <keyword keyref="impala28_full"/></title>

     <conbody>

       <p>
         The following are the major new features in <keyword keyref="impala22_full"/>. This release
         contains improvements to performance, manageability, security, and SQL syntax.
       </p>

       <ul>
         <li>
           <p>
             Several improvements to date and time features enable higher interoperability with Hive and other
             database systems, provide more flexibility for handling time zones, and future-proof the handling of
             <codeph>TIMESTAMP</codeph> values:
           </p>
           <ul>
             <li>
               <p>
                 The <codeph>WITH REPLICATION</codeph> clause for the <codeph>CREATE TABLE</codeph> and
                 <codeph>ALTER TABLE</codeph> statements lets you control the replication factor for
                 HDFS caching for a specific table or partition. By default, each cached block is
                 only present on a single host, which can lead to CPU contention if the same host
                 processes each cached block. Increasing the replication factor lets Impala choose
                 different hosts to process different cached blocks, to better distribute the CPU load.
               </p>
             </li>
             <li>
               <p>
                 Startup flags for the <cmdname>impalad</cmdname> daemon enable a higher level of compatibility with
                 <codeph>TIMESTAMP</codeph> values written by Hive, and more flexibility for working with date and
                 time data using the local time zone instead of UTC. To enable these features, set the
                 <cmdname>impalad</cmdname> startup flags
                 <codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph> and
                 <codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>.
               </p>

               <p>
                 The <codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting controls how the
                 <codeph>unix_timestamp()</codeph>, <codeph>from_unixtime()</codeph>, and <codeph>now()</codeph>
                 functions handle time zones. By default (when this setting is turned off), Impala considers all
                 <codeph>TIMESTAMP</codeph> values to be in the UTC time zone when converting to or from Unix time
                 values. When this setting is enabled, Impala treats <codeph>TIMESTAMP</codeph> values passed to or
                 returned from these functions to be in the local time zone. When this setting is enabled, take
                 particular care that all hosts in the cluster have the same timezone settings, to avoid
                 inconsistent results depending on which host reads or writes <codeph>TIMESTAMP</codeph> data.
               </p>

               <p>
                 The <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting causes Impala to convert
                 <codeph>TIMESTAMP</codeph> values to the local time zone when it reads them from Parquet files
                 written by Hive. This setting only applies to data using the Parquet file format, where Impala can
                 use metadata in the files to reliably determine that the files were written by Hive. If in the
                 future Hive changes the way it writes <codeph>TIMESTAMP</codeph> data in Parquet, Impala will
                 automatically handle that new <codeph>TIMESTAMP</codeph> encoding.
               </p>

               <p>
                 See <xref href="impala_timestamp.xml#timestamp"/> for details about time zone handling and the
                 configuration options for Impala / Hive compatibility with Parquet format.
               </p>
             </li>

             <li>
               <p conref="../shared/impala_common.xml#common/y2k38" />

               <p>
                 See <xref href="impala_datetime_functions.xml#datetime_functions"/> for the current function
                 signatures.
               </p>
             </li>
           </ul>
         </li>

         <li>
           <p>
             The <codeph>SHOW FILES</codeph> statement lets you view the names and sizes of the files that make up
             an entire table or a specific partition. See <xref href="impala_show.xml#show_files"/> for details.
           </p>
         </li>

         <li>
           <p>
             Impala can now run queries against Parquet data containing columns with complex or nested types, as
             long as the query only refers to columns with scalar types.
           </p>
         </li>

         <li>
           <p>
             Performance improvements for queries that include <codeph>IN()</codeph> operators and involve
             partitioned tables.
           </p>
         </li>

         <li>
 <!-- Same text for this item in impala_fixed_issues.xml. Could turn into a conref. -->
           <p>
             The new <codeph>-max_log_files</codeph> configuration option specifies how many log files to keep at
             each severity level. The default value is 10, meaning that Impala preserves the latest 10 log files for
             each severity level (<codeph>INFO</codeph>, <codeph>WARNING</codeph>, and <codeph>ERROR</codeph>) for
             each Impala-related daemon (<cmdname>impalad</cmdname>, <cmdname>statestored</cmdname>, and
             <cmdname>catalogd</cmdname>). Impala checks to see if any old logs need to be removed based on the
             interval specified in the <codeph>logbufsecs</codeph> setting, every 5 seconds by default. See
             <xref href="impala_logging.xml#logs_rotate"/> for details.
           </p>
         </li>

         <li>
           <p>
             Redaction of sensitive data from Impala log files. This feature protects details such as credit card
             numbers or tax IDs from administrators who see the text of SQL statements in the course of monitoring
             and troubleshooting a Hadoop cluster. See <xref href="impala_logging.xml#redaction"/> for background
             information for Impala users, and <xref keyref="sg_redaction"/> for usage details.
           </p>
         </li>

         <li>
           <p>
             Lineage information is available for data created or queried by Impala. This feature lets you track who
             has accessed data through Impala SQL statements, down to the level of specific columns, and how data
             has been propagated between tables. See <xref href="impala_lineage.xml#lineage"/> for background
             information for Impala users, <xref keyref="datamgmt_impala_lineage_log"/> for usage details and
             how to interpret the lineage information.
           </p>
         </li>

         <li>
           <p>
             Impala tables and partitions can now be located on the Amazon Simple Storage Service (S3) filesystem,
             for convenience in cases where data is already located in S3 and you prefer to query it in-place.
             Queries might have lower performance than when the data files reside on HDFS, because Impala uses some
             HDFS-specific optimizations. Impala can query data in S3, but cannot write to S3. Therefore, statements
             such as <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> are not available when the destination
             table or partition is in S3. See <xref href="impala_s3.xml#s3"/> for details.
           </p>

           <note conref="../shared/impala_common.xml#common/s3_caveat" />
         </li>

         <li>
         <!-- Only want the link out of the release notes to appear for HTML
              (N.B. audience="PDF" means hide from PDF), and only in the HTML for the
              integrated build where the topic is available for link resolution. -->
           <p>
             Improved support for HDFS encryption. The <codeph>LOAD DATA</codeph> statement now works when the
             source directory and destination table are in different encryption zones. See
             <xref keyref="cdh_sg_component_kms"/> for details about using HDFS encryption with
             Impala.
           </p>
         </li>

         <li>
           <p>
             Additional arithmetic function <codeph>mod()</codeph>. See
             <xref href="impala_math_functions.xml#math_functions"/> for details.
           </p>
         </li>

         <li>
           <p>
             Flexibility to interpret <codeph>TIMESTAMP</codeph> values using the UTC time zone (the traditional
             Impala behavior) or using the local time zone (for compatibility with <codeph>TIMESTAMP</codeph> values
             produced by Hive).
           </p>
         </li>

         <li>
           <p>
             Enhanced support for ETL using tools such as Flume. Impala ignores temporary files typically produced
             by these tools (filenames with suffixes <codeph>.copying</codeph> and <codeph>.tmp</codeph>).
           </p>
         </li>

         <li>
           <p>
             The CPU requirement for Impala, which had become more restrictive in Impala 2.0.x and 2.1.x, has now
             been relaxed.
           </p>

           <p conref="../shared/impala_common.xml#common/cpu_prereq" />
         </li>

         <li>
           <p>
             Enhanced support for <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> types in the <codeph>COMPUTE
             STATS</codeph> statement.
           </p>
         </li>

         <li rev="">
           <p>
             The amount of memory required during setup for <q>spill to disk</q> operations is greatly reduced. This
             enhancement reduces the chance of a memory-intensive join or aggregation query failing with an
             out-of-memory error.
           </p>
         </li>

         <li>
           <p>
             Several new conditional functions provide enhanced compatibility when porting code that uses industry
             extensions. The new functions are: <codeph>isfalse()</codeph>, <codeph>isnotfalse()</codeph>,
             <codeph>isnottrue()</codeph>, <codeph>istrue()</codeph>, <codeph>nonnullvalue()</codeph>, and
             <codeph>nullvalue()</codeph>. See <xref href="impala_conditional_functions.xml#conditional_functions"/>
             for details.
           </p>
         </li>

         <li>
           <p>
             The Impala debug web UI now can display a visual representation of the query plan. On the
             <uicontrol>/queries</uicontrol> tab, select <uicontrol>Details</uicontrol> for a particular query. The
             <uicontrol>Details</uicontrol> page includes a <uicontrol>Plan</uicontrol> tab with a plan diagram that
             you can zoom in or out (using scroll gestures through mouse wheel or trackpad).
           </p>
         </li>
       </ul>

 <!-- End of new feature list for 5.4. -->

     </conbody>

   </concept>

 <!-- All 2.1.x subsections go under here -->

   <concept rev="2.1.0" id="new_features_210">

     <title>New Features in <keyword keyref="impala21_full"/></title>

     <conbody>

       <p>
         This release contains the following enhancements to query performance and system scalability:
       </p>

       <ul>
         <li>
           <p>
             Impala can now collect statistics for individual partitions in a partitioned table, rather than
             processing the entire table for each <codeph>COMPUTE STATS</codeph> statement. This feature is known as
             incremental statistics, and is controlled by the <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax.
             (You can still use the original <codeph>COMPUTE STATS</codeph> statement for nonpartitioned tables or
             partitioned tables that are unchanging or whose contents are entirely replaced all at once.) See
             <xref href="impala_compute_stats.xml#compute_stats"/> and
             <xref href="impala_perf_stats.xml#perf_stats"/> for details.
           </p>
         </li>

         <li>
           <p>
             Optimization for small queries lets Impala process queries that process very few rows without the
             unnecessary overhead of parallelizing and generating native code. Reducing this overhead lets Impala
             clear small queries quickly, keeping YARN resources and admission control slots available for
             data-intensive queries. The number of rows considered to be a <q>small</q> query is controlled by the
             <codeph>EXEC_SINGLE_NODE_ROWS_THRESHOLD</codeph> query option. See
             <xref href="impala_exec_single_node_rows_threshold.xml#exec_single_node_rows_threshold"/> for details.
           </p>
         </li>

         <li>
           <p>
             An enhancement to the statestore component lets it transmit heartbeat information independently of
             broadcasting metadata updates. This optimization improves reliability of health checking on large
             clusters with many tables and partitions.
           </p>
         </li>

         <li>
           <p>
             The memory requirement for querying gzip-compressed text is reduced. Now Impala decompresses the data
             as it is read, rather than reading the entire gzipped file and decompressing it in memory.
           </p>
         </li>
       </ul>

     </conbody>

   </concept>

 <!-- All 2.0.x subsections go under here -->

   <concept rev="2.0.0" id="new_features_200">

     <title>New Features in <keyword keyref="impala20_full"/></title>

     <conbody>

       <p>
         The following are the major new features in <keyword keyref="impala20_full"/>. This major release
         contains improvements to performance, scalability, security, and SQL syntax.
       </p>

       <ul>
         <li>
           <p>
             Queries with joins or aggregation functions involving high volumes of data can now use temporary work
             areas on disk, reducing the chance of failure due to out-of-memory errors. When the required memory for
             the intermediate result set exceeds the amount available on a particular node, the query automatically
             uses a temporary work area on disk. This <q>spill to disk</q> mechanism is similar to the <codeph>ORDER
             BY</codeph> improvement from Impala 1.4. For details, see
             <xref href="impala_scalability.xml#spill_to_disk"/>.
           </p>
         </li>

         <li>
           <p>
             Subquery enhancements:
             <ul>
               <li>
                 Subqueries are now allowed in the <codeph>WHERE</codeph> clause, for example with the
                 <codeph>IN</codeph> operator.
               </li>

               <li>
                 The <codeph>EXISTS</codeph> and <codeph>NOT EXISTS</codeph> operators are available. They are
                 always used in conjunction with subqueries.
               </li>

               <li>
                 The <codeph>IN</codeph> and <codeph>NOT IN</codeph> queries can now operate on the result set from
                 a subquery, not just a hardcoded list of values.
               </li>

               <li>
                 Uncorrelated subqueries let you compare against one or more values for equality,
                 <codeph>IN</codeph>, and <codeph>EXISTS</codeph> comparisons. For example, you might use
                 <codeph>WHERE</codeph> clauses such as <codeph>WHERE <varname>column</varname> = (SELECT
                 MAX(<varname>some_other_column</varname> FROM <varname>table</varname>)</codeph> or <codeph>WHERE
                 <varname>column</varname> IN (SELECT <varname>some_other_column</varname> FROM
                 <varname>table</varname> WHERE <varname>conditions</varname>)</codeph>.
               </li>

               <li>
                 Correlated subqueries let you cross-reference values from the outer query block and the subquery.
               </li>

               <li>
                 Scalar subqueries let you substitute the result of single-value aggregate functions such as
                 <codeph>MAX()</codeph>, <codeph>MIN()</codeph>, <codeph>COUNT()</codeph>, or
                 <codeph>AVG()</codeph>, where you would normally use a numeric value in a <codeph>WHERE</codeph>
                 clause.
               </li>
             </ul>
           </p>

           <p>
             For details about subqueries, see <xref href="impala_subqueries.xml#subqueries"/> For information about
             new and improved operators, see <xref href="impala_operators.xml#exists"/> and
             <xref href="impala_operators.xml#in"/>.
           </p>
         </li>

         <li>
           <p>
             Analytic functions such as <codeph>RANK()</codeph>, <codeph>LAG()</codeph>, <codeph>LEAD()</codeph>,
             and <codeph>FIRST_VALUE()</codeph> let you analyze sequences of rows with flexible ordering and
             grouping. Existing aggregate functions such as <codeph>MAX()</codeph>, <codeph>SUM()</codeph>, and
             <codeph>COUNT()</codeph> can also be used in an analytic context. See
             <xref href="impala_analytic_functions.xml#analytic_functions"/> for details. See
             <xref href="impala_aggregate_functions.xml#aggregate_functions"/> for enhancements to existing
             aggregate functions.
           </p>
         </li>

         <li>
           <p>
             New data types provide greater compatibility with source code from traditional database systems:
           </p>
           <ul>
             <li>
               <codeph>VARCHAR</codeph> is like the <codeph>STRING</codeph> data type, but with a maximum length.
               See <xref href="impala_varchar.xml#varchar"/> for details.
             </li>

             <li>
               <codeph>CHAR</codeph> is like the <codeph>STRING</codeph> data type, but with a precise length. Short
               values are padded with spaces on the right. See <xref href="impala_char.xml#char"/> for details.
             </li>

             <li audience="hidden">
 <!-- This feature will be undocumented in Impala 2.0, probably ready for prime time in 2.1. -->
               <codeph>DATE</codeph>. See <xref href="impala_date.xml#date"/> for details.
             </li>
           </ul>
         </li>

         <li>
           <p>
             Security enhancements:
             <ul>
               <li>
                 Formerly, Impala was restricted to using either Kerberos or LDAP / Active Directory authentication
                 within a cluster. Now, Impala can freely accept either kind of authentication request, allowing you
                 to set up some hosts with Kerberos authentication and others with LDAP or Active Directory. See
                 <xref href="impala_mixed_security.xml#mixed_security"/> for details.
               </li>

               <li>
                 <codeph>GRANT</codeph> statement. See <xref href="impala_grant.xml#grant"/> for details.
               </li>

               <li>
                 <codeph>REVOKE</codeph> statement. See <xref href="impala_revoke.xml#revoke"/> for details.
               </li>

               <li>
                 <codeph>CREATE ROLE</codeph> statement. See <xref href="impala_create_role.xml#create_role"/> for
                 details.
               </li>

               <li>
                 <codeph>DROP ROLE</codeph> statement. See <xref href="impala_drop_role.xml#drop_role"/> for
                 details.
               </li>

               <li>
                 <codeph>SHOW ROLES</codeph> and <codeph>SHOW ROLE GRANT</codeph> statements. See
                 <xref href="impala_show.xml#show"/> for details.
               </li>

               <li>
                 <p>
                   To complement the HDFS encryption feature, a new Impala configuration option,
                   <codeph>--disk_spill_encryption</codeph> secures sensitive data from being observed or tampered
                   with when temporarily stored on disk.
                 </p>
               </li>
             </ul>
           </p>

           <p>
             The new security-related SQL statements work along with the Sentry authorization framework. See
             <xref keyref="authorization"/> for details.
           </p>
         </li>

         <li>
           <p>
             Impala can now read compressed text files compressed by gzip, bzip, or Snappy. These files do not
             require any special table settings to work in an Impala text table. Impala recognizes the compression
             type automatically based on file extensions of <codeph>.gz</codeph>, <codeph>.bz2</codeph>, and
             <codeph>.snappy</codeph> respectively. These types of compressed text files are intended for
             convenience with existing ETL pipelines. Their non-splittable nature means they are not optimal for
             high-performance parallel queries. See <xref href="impala_txtfile.xml#gzip"/> for details.
           </p>
         </li>

         <li>
           <p>
             Query hints can now use comment notation, <codeph>/* +<varname>hint_name</varname> */</codeph> or
             <codeph>-- +<varname>hint_name</varname></codeph>, at the same places in the query where the hints
             enclosed by <codeph>[ ]</codeph> are recognized. This enhancement makes it easier to reuse Impala
             queries on other database systems. See <xref href="impala_hints.xml#hints"/> for details.
           </p>
         </li>

         <li>
           <p>
             A new query option, <codeph>QUERY_TIMEOUT_S</codeph>, lets you specify a timeout period in seconds for
             individual queries.
           </p>

           <p>
             The working of the <codeph>--idle_query_timeout</codeph> configuration option is extended. If no
             <codeph>QUERY_OPTION_S</codeph> query option is in effect, <codeph>--idle_query_timeout</codeph> works
             the same as before, setting the timeout interval. When the <codeph>QUERY_OPTION_S</codeph> query option
             is specified, its maximum value is capped by the value of the <codeph>--idle_query_timeout</codeph>
             option.
           </p>

           <p>
             That is, the system administrator sets the default and maximum timeout through the
             <codeph>--idle_query_timeout</codeph> startup option, and then individual users or applications can set
             a lower timeout value if desired through the <codeph>QUERY_TIMEOUT_S</codeph> query option. See
             <xref href="impala_timeouts.xml#timeouts"/> and
             <xref href="impala_query_timeout_s.xml#query_timeout_s"/> for details.
           </p>
         </li>

         <li>
           <p>
             New functions <codeph>VAR_SAMP()</codeph> and <codeph>VAR_POP()</codeph> are aliases for the existing
             <codeph>VARIANCE_SAMP()</codeph> and <codeph>VARIANCE_POP()</codeph> functions.
           </p>
         </li>

         <li>
           <p>
             A new date and time function, <codeph>DATE_PART()</codeph>, provides similar functionality to
             <codeph>EXTRACT()</codeph>. You can also call the <codeph>EXTRACT()</codeph> function using the SQL-99
             syntax, <codeph>EXTRACT(<varname>unit</varname> FROM <varname>timestamp</varname>)</codeph>. These
             enhancements simplify the porting process for date-related code from other systems. See
             <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
           </p>
         </li>

         <li>
           <p>
             New approximation features provide a fast way to get results when absolute precision is not required:
           </p>
           <ul>
             <li>
               The <codeph>APPX_COUNT_DISTINCT</codeph> query option lets Impala rewrite
               <codeph>COUNT(DISTINCT)</codeph> calls to use <codeph>NDV()</codeph> instead, which speeds up the
               operation and allows multiple <codeph>COUNT(DISTINCT)</codeph> operations in a single query. See
               <xref href="impala_appx_count_distinct.xml#appx_count_distinct"/> for details.
             </li>
           </ul>
           The <codeph>APPX_MEDIAN()</codeph> aggregate function produces an estimate for the median value of a
           column by using sampling. See <xref href="impala_appx_median.xml#appx_median"/> for details.
         </li>

         <li>
           <p>
             Impala now supports a <codeph>DECODE()</codeph> function. This function works as a shorthand for a
             <codeph>CASE()</codeph> expression, and improves compatibility with SQL code containing vendor
             extensions. See <xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
           </p>
         </li>

         <li>
           <p>
             The <codeph>STDDEV()</codeph>, <codeph>STDDEV_POP()</codeph>, <codeph>STDDEV_SAMP()</codeph>,
             <codeph>VARIANCE()</codeph>, <codeph>VARIANCE_POP()</codeph>, <codeph>VARIANCE_SAMP()</codeph>, and
             <codeph>NDV()</codeph> aggregate functions now all return <codeph>DOUBLE</codeph> results rather than
             <codeph>STRING</codeph>. Formerly, you were required to <codeph>CAST()</codeph> the result to a numeric
             type before using it in arithmetic operations.
           </p>
         </li>

         <li id="parquet_block_size">
           <p>
             The default settings for Parquet block size, and the associated <codeph>PARQUET_FILE_SIZE</codeph>
             query option, are changed. Now, Impala writes Parquet files with a size of 256 MB and an HDFS block
             size of 256 MB. Previously, Impala attempted to write Parquet files with a size of 1 GB and an HDFS
             block size of 1 GB. In practice, Impala used a conservative estimate of the disk space needed for each
             Parquet block, leading to files that were typically 512 MB anyway. Thus, this change will make the file
             size more accurate if you specify a value for the <codeph>PARQUET_FILE_SIZE</codeph> query option. It
             also reduces the amount of memory reserved during <codeph>INSERT</codeph> into Parquet tables,
             potentially avoiding out-of-memory errors and improving scalability when inserting data into Parquet
             tables.
           </p>
         </li>

         <li>
           <p>
             Anti-joins are now supported, expressed using the <codeph>LEFT ANTI JOIN</codeph> and <codeph>RIGHT
             ANTI JOIN</codeph> clauses.
 <!-- Maybe RIGHT SEMI JOIN is new too? -->
 <!-- Make following statement true in the context of RIGHT ANTI JOIN. -->
             These clauses returns results from one table that have no match in the other table. You might use this
             type of join in the same sorts of use cases as the <codeph>NOT EXISTS</codeph> and <codeph>NOT
             IN</codeph> operators. See <xref href="impala_joins.xml#joins"/> for details.
           </p>
         </li>

         <li audience="hidden">
 <!-- This feature will be undocumented in Impala 2.0, probably ready for prime time in 2.1. -->
           <p>
             Improved file format support. Impala can now write to Avro, compressed text, SequenceFile, and RCFile
             tables using the <codeph>INSERT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph> statements. See
             <xref href="impala_file_formats.xml#file_formats"/> for details.
           </p>
         </li>

         <li>
           <p>
             The <codeph>SET</codeph> command in <cmdname>impala-shell</cmdname> has been promoted to a real SQL
             statement. You can now set query options such as <codeph>PARQUET_FILE_SIZE</codeph>,
             <codeph>MEM_LIMIT</codeph>, and <codeph>SYNC_DDL</codeph> within JDBC, ODBC, or any other kind of
             application that submits SQL without going through the <cmdname>impala-shell</cmdname> interpreter. See
             <xref href="impala_set.xml#set"/> for details.
           </p>
         </li>

         <li>
           <p>
             The <cmdname>impala-shell</cmdname> interpreter now reads settings from an optional configuration file,
             named <filepath>$HOME/.impalarc</filepath> by default. See
             <xref href="impala_shell_options.xml#shell_config_file"/> for details.
           </p>
         </li>

         <li audience="hidden">
 <!-- This feature will be undocumented in Impala 2.0, probably ready for prime time in 2.1. -->
           <p>
             The <codeph>COMPUTE STATS</codeph> statement can now gather statistics for newly added partitions
             rather than the entire table. This feature is known as <term>incremental statistics</term>. See
             <xref href="impala_compute_stats.xml#compute_stats"/> for details.
           </p>
         </li>

         <li>
           <p>
             The library used for regular expression parsing has changed from Boost to Google RE2. This
             implementation change adds support for non-greedy matches using the <codeph>.*?</codeph> notation. This
             and other changes in the way regular expressions are interpreted means you might need to re-test
             queries that use functions such as <codeph>regexp_extract()</codeph> or
             <codeph>regexp_replace()</codeph>, or operators such as <codeph>REGEXP</codeph> or
             <codeph>RLIKE</codeph>. See <xref href="impala_incompatible_changes.xml#incompatible_changes"/> for
             those details.
           </p>
         </li>
       </ul>

     </conbody>

   </concept>

   <concept rev="1.4.0" id="new_features_140">

     <title>New Features in <keyword keyref="impala14_full"/></title>

     <conbody>

       <p>
         The following are the major new features in <keyword keyref="impala14_full"/>:
       </p>

       <ul>
         <li>
           <p>
             The <codeph>DECIMAL</codeph> data type lets you store fixed-precision values, for working with currency
             or other fractional values where it is important to represent values exactly and avoid rounding errors.
             This feature includes enhancements to built-in functions, numeric literals, and arithmetic expressions.
             <ph audience="PDF">See <xref href="impala_decimal.xml#decimal"/> for details.</ph>
           </p>
         </li>

         <li>
           <p>
             Where the underlying HDFS support exists, Impala can take advantage of the HDFS caching feature to <q>pin</q> entire tables or
             individual partitions in memory, to speed up queries on frequently accessed data and reduce the CPU
             overhead of memory-to-memory copying. When HDFS files are cached in memory, Impala can read the cached
             data without any disk reads, and without making an additional copy of the data in memory. Other Hadoop
             components that read the same data files also experience a performance benefit.
           </p>

           <p audience="PDF">
             For background information about HDFS caching, see
             <xref keyref="setup_hdfs_caching"/>. For performance information about using this feature with Impala, see
             <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>. For the <codeph>SET CACHED</codeph> and
             <codeph>SET UNCACHED</codeph> clauses that let you control cached table data through DDL statements,
             see <xref href="impala_create_table.xml#create_table"/> and
             <xref href="impala_alter_table.xml#alter_table"/>.
           </p>
         </li>

         <li>
           <p>
             Impala can now use Sentry-based authorization based either on the original policy file, or on rules
             defined by <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements issued through Hive.
             See <xref keyref="authorization"/> for details.
           </p>
         </li>

         <li>
           <p>
             For interoperability with Parquet files created through other Hadoop components, such as Pig or
             MapReduce jobs, you can create an Impala table that automatically sets up the column definitions based
             on the layout of an existing Parquet data file. <ph audience="PDF">See
             <xref href="impala_create_table.xml#create_table"/> for the syntax, and
             <xref href="impala_parquet.xml#parquet_ddl"/> for usage information.</ph>
           </p>
         </li>

         <li>
           <p>
             <codeph>ORDER BY</codeph> queries no longer require a <codeph>LIMIT</codeph> clause. If the size of the
             result set to be sorted exceeds the memory available to Impala, Impala uses a temporary work space on
             disk to perform the sort operation. <ph audience="PDF">See <xref href="impala_order_by.xml#order_by"/>
             for details.</ph>
           </p>
         </li>

         <li>
           <p>
             LDAP connections can be secured through either SSL or TLS. <ph audience="PDF">See
             <xref href="impala_ldap.xml#ldap"/> for details.</ph>
           </p>
         </li>

         <li>
           <p>
             The following new built-in scalar and aggregate functions are available:
           </p>
           <ul>
             <li>
               <p>
                 A new built-in function, <codeph>EXTRACT()</codeph>, returns one date or time field from a
                 <codeph>TIMESTAMP</codeph> value. <ph audience="PDF">See
                 <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
               </p>
             </li>

             <li>
               <p>
                 A new built-in function, <codeph>TRUNC()</codeph>, truncates date/time values to a particular
                 granularity, such as year, month, day, hour, and so on. <ph audience="PDF">See
                 <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
               </p>
             </li>

             <li>
               <p>
                 <codeph>ADD_MONTHS()</codeph> built-in function, an alias for the existing
                 <codeph>MONTHS_ADD()</codeph> function. <ph audience="PDF">See
                 <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
               </p>
             </li>

             <li>
               <p>
                 A new built-in function, <codeph>ROUND()</codeph>, rounds <codeph>DECIMAL</codeph> values to a
                 specified number of fractional digits. <ph audience="PDF">See
                 <xref href="impala_math_functions.xml#math_functions"/> for details.</ph>
               </p>
             </li>

             <li>
               <p>
                 Several built-in aggregate functions for computing properties for statistical distributions:
                 <codeph>STDDEV()</codeph>, <codeph>STDDEV_SAMP()</codeph>, <codeph>STDDEV_POP()</codeph>,
                 <codeph>VARIANCE()</codeph>, <codeph>VARIANCE_SAMP()</codeph>, and <codeph>VARIANCE_POP()</codeph>.
                 <ph audience="PDF">See <xref href="impala_stddev.xml#stddev"/> and
                 <xref href="impala_variance.xml#variance"/> for details.</ph>
               </p>
             </li>

             <li>
               <p>
                 Several new built-in functions, such as <codeph>MAX_INT()</codeph>,
                 <codeph>MIN_SMALLINT()</codeph>, and so on, let you conveniently check whether data values are in
                 an expected range. You might be able to switch a column to a smaller type, saving memory during
                 processing. <ph audience="PDF">See <xref href="impala_math_functions.xml#math_functions"/> for
                 details.</ph>
               </p>
             </li>

             <li>
               <p>
                 New built-in functions, <codeph>IS_INF()</codeph> and <codeph>IS_NAN()</codeph>, check for the
                 special values infinity and <q>not a number</q>. These values could be specified as
                 <codeph>inf</codeph> or <codeph>nan</codeph> in text data files, or be produced by certain
                 arithmetic expressions. <ph audience="PDF">See
                 <xref href="impala_math_functions.xml#math_functions"/> for details.</ph>
               </p>
             </li>
           </ul>
         </li>

         <li>
           <p>
             The <codeph>SHOW PARTITIONS</codeph> statement displays information about the structure of a
             partitioned table. <ph audience="PDF">See <xref href="impala_show.xml#show"/> for details.</ph>
           </p>
         </li>

         <li audience="hidden">
 <!-- Not documenting for 1.4. Revisit in a future release. -->
           <p>
             Data sources. <ph audience="PDF">See <xref href="impala_data_sources.xml#data_sources"/> for
             details.</ph>
           </p>
         </li>

         <li>
           <p>
             New configuration options for the <cmdname>impalad</cmdname> daemon let you specify initial memory
             usage for all queries. The initial resource requests handled by Llama and YARN can be expanded later if
             needed, avoiding unnecessary over-allocation and reducing the chance of out-of-memory conditions.
             <ph audience="PDF">See <xref href="impala_resource_management.xml#resource_management"/> for
             details.</ph>
           </p>
         </li>

         <li>
           The Impala <codeph>CREATE TABLE</codeph> statement now has a <codeph>STORED AS AVRO</codeph> clause,
           allowing you to create Avro tables through Impala. <ph audience="PDF">See
           <xref href="impala_avro.xml#avro"/> for details and examples.</ph>
         </li>

         <li>
           <p>
             New <cmdname>impalad</cmdname> configuration options let you fine-tune the calculations Impala makes to
             estimate resource requirements for each query. These options can help avoid problems due to
             overconsumption due to too-low estimates, or underutilization due to too-high estimates.
             <ph audience="PDF">See <xref href="impala_resource_management.xml#resource_management"/> for
             details.</ph>
           </p>
         </li>

         <li>
           <p>
             A new <codeph>SUMMARY</codeph> command in the <cmdname>impala-shell</cmdname> interpreter provides a
             high-level summary of the work performed at each stage of the explain plan. The summary is also
             included in output from the <codeph>PROFILE</codeph> command. <ph audience="PDF">See
             <xref href="impala_shell_commands.xml#shell_commands"/> and
             <xref href="impala_explain_plan.xml#perf_summary"/> for details.</ph>
           </p>
         </li>

         <li>
           <p>
             Performance improvements for the <codeph>COMPUTE STATS</codeph> statement:
           </p>
           <ul>
 <!-- This particular change has been pushed out to a later release. -->

             <li audience="hidden">
               Certain simple aggregation operations (with no <codeph>GROUP BY</codeph> step) are multi-threaded if
               spare cores are available.
             </li>

             <li>
               The <codeph>NDV</codeph> function is speeded up through native code generation.
             </li>

             <li>
               Because the <codeph>NULL</codeph> count is not currently used by the Impala query planner, in Impala
               1.4.0 and higher, <codeph>COMPUTE STATS</codeph> does not count the <codeph>NULL</codeph> values for
               each column. (The <codeph>#Nulls</codeph> field of the stats table is left as -1, signifying that the
               value is unknown.)
             </li>
           </ul>
           <p audience="PDF">
             See <xref href="impala_compute_stats.xml#compute_stats"/> for general details about the <codeph>COMPUTE
             STATS</codeph> statement, and <xref href="impala_perf_stats.xml#perf_stats"/> for how to use the
             statistics to improve query performance.
           </p>
         </li>

         <li>
           <p>
             Performance improvements for partition pruning. This feature reduces the time spent in query planning,
             for partitioned tables with thousands of partitions. Previously, Impala typically queried tables with
             up to approximately 3000 partitions. With the performance improvement in partition pruning, now Impala
             can comfortably handle tables with tens of thousands of partitions. <ph audience="PDF">See
             <xref href="impala_partitioning.xml#partition_pruning"/> for information about partition pruning.</ph>
           </p>
         </li>

         <li>
           <p>
             The documentation provides additional guidance for planning tasks. <ph audience="PDF">See
             <xref href="impala_planning.xml#planning"/>.</ph>
           </p>
         </li>

         <li>
           <p>
             The <cmdname>impala-shell</cmdname> interpreter now supports UTF-8 characters for input and output. You
             can control whether <cmdname>impala-shell</cmdname> ignores invalid Unicode code points through the
             <codeph>--strict_unicode</codeph> option. (Although this option is removed in Impala 2.0.)
           </p>
         </li>
       </ul>

     </conbody>

   </concept>

   <concept rev="1.3.2" id="new_features_132">

     <title>New Features in <keyword keyref="impala132"/></title>

     <conbody>

       <p>
         No new features. This point release is exclusively a bug fix release for the IMPALA-1019 issue related to
         HDFS caching.
       </p>

     </conbody>

   </concept>

   <concept rev="1.3.1" id="new_features_131">

     <title>New Features in Impala 1.3.1</title>

     <conbody>

       <p>
         This point release is primarily a vehicle to deliver bug fixes. Any new features are minor changes
         resulting from fixes for performance, reliability, or usability issues.
       </p>

       <ul>
         <li>
           <p>
             A new <cmdname>impalad</cmdname> startup option, <codeph>--insert_inherit_permissions</codeph>, causes
             Impala <codeph>INSERT</codeph> statements to create each new partition with the same HDFS permissions
             as its parent directory. By default, <codeph>INSERT</codeph> statements create directories for new
             partitions using default HDFS permissions. See <xref href="impala_insert.xml#insert"/> for examples of
             <codeph>INSERT</codeph> statements for partitioned tables.
           </p>
         </li>

         <li>
           <p>
             The <codeph>SHOW FUNCTIONS</codeph> statement now displays the return type of each function, in
             addition to the types of its arguments. See <xref href="impala_show.xml#show"/> for examples.
           </p>
         </li>

         <li>
           <p>
             You can now specify the clause <codeph>FIELDS TERMINATED BY '\0'</codeph> with a <codeph>CREATE
             TABLE</codeph> statement to use text data files that use ASCII 0 (<codeph>nul</codeph>) characters as a
             delimiter. See <xref href="impala_txtfile.xml#txtfile"/> for details.
           </p>
         </li>

         <li>
           <p conref="../shared/impala_common.xml#common/regexp_matching" />
         </li>
       </ul>

     </conbody>

   </concept>

   <concept rev="1.3.0" id="new_features_130">

     <title>New Features in <keyword keyref="impala13_full"/></title>

     <conbody>

       <ul>
         <li>
           <p>
             The admission control feature lets you control and prioritize the volume and resource consumption of
             concurrent queries. This mechanism reduces spikes in resource usage, helping Impala to run alongside
             other kinds of workloads on a busy cluster. It also provides more user-friendly conflict resolution
             when multiple memory-intensive queries are submitted concurrently, avoiding resource contention that
             formerly resulted in out-of-memory errors. See <xref href="impala_admission.xml#admission_control"/>
             for details.
           </p>
         </li>

         <li>
           <p>
             Enhanced <codeph>EXPLAIN</codeph> plans provide more detail in an easier-to-read format. Now there are
             four levels of verbosity: the <codeph>EXPLAIN_LEVEL</codeph> option can be set from 0 (most concise) to
             3 (most verbose). See <xref href="impala_explain.xml#explain"/> for syntax and
             <xref href="impala_explain_plan.xml#explain_plan"/> for usage information.
           </p>
         </li>

         <li>
           <p>
             The <codeph>TIMESTAMP</codeph> data type accepts more kinds of input string formats through the
             <codeph>UNIX_TIMESTAMP</codeph> function, and produces more varieties of string formats through the
             <codeph>FROM_UNIXTIME</codeph> function. The documentation now also lists more functions for date
             arithmetic, used for adding and subtracting <codeph>INTERVAL</codeph> expressions from
             <codeph>TIMESTAMP</codeph> values. See <xref href="impala_datetime_functions.xml#datetime_functions"/>
             for details.
           </p>
         </li>

         <li>
           <p>
             New conditional functions, <codeph>NULLIF()</codeph>, <codeph>NULLIFZERO()</codeph>, and
             <codeph>ZEROIFNULL()</codeph>, simplify porting SQL containing vendor extensions to Impala. See
             <xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
           </p>
         </li>

         <li>
           <p>
             New utility function, <codeph>CURRENT_DATABASE()</codeph>. See
             <xref href="impala_misc_functions.xml#misc_functions"/> for details.
           </p>
         </li>

         <li>
           <p>
             Integration with the YARN resource management framework. This
             feature makes use of the underlying YARN service, plus an additional service (Llama) that coordinates
             requests to YARN for Impala resources, so that the Impala query only proceeds when all requested
             resources are available. See <xref href="impala_resource_management.xml#resource_management"/> for full
             details.
           </p>

           <p>
             On the Impala side, this feature involves some new startup options for the <cmdname>impalad</cmdname>
             daemon:
           </p>
           <ul>
             <li>
               <codeph>-enable_rm</codeph>
             </li>

             <li>
               <codeph>-llama_host</codeph>
             </li>

             <li>
               <codeph>-llama_port</codeph>
             </li>

             <li>
               <codeph>-llama_callback_port</codeph>
             </li>

             <li>
               <codeph>-cgroup_hierarchy_path</codeph>
             </li>
           </ul>
           <p>
             For details of these startup options, see <xref href="impala_config_options.xml#config_options"/>.
           </p>

           <p>
             This feature also involves several new or changed query options that you can set through the
             <cmdname>impala-shell</cmdname> interpreter and apply within a specific session:
           </p>
           <ul>
             <li>
               <codeph>MEM_LIMIT</codeph>: the function of this existing option changes when Impala resource
               management is enabled.
             </li>

             <li>
               <codeph>REQUEST_POOL</codeph>: a new option. (Renamed to <codeph>RESOURCE_POOL</codeph> in Impala
               1.3.0.)
             </li>

             <li>
               <codeph>V_CPU_CORES</codeph>: a new option.
             </li>

             <li>
               <codeph>RESERVATION_REQUEST_TIMEOUT</codeph>: a new option.
             </li>
           </ul>
           <p>
             For details of these query options, see <xref href="impala_resource_management.xml#rm_query_options"/>.
           </p>
         </li>
       </ul>

     </conbody>

   </concept>

   <concept rev="1.2.4" id="new_features_124">

     <title>New Features in Impala 1.2.4</title>

     <conbody>

       <note>
         Impala 1.2.4 is primarily a bug fix release for Impala 1.2.3, plus some performance
         enhancements for the catalog server to minimize startup and DDL wait times for Impala deployments with
         large numbers of databases, tables, and partitions.
       </note>

       <ul>
         <li>
           <p>
             On Impala startup, the metadata loading and synchronization mechanism has been improved and optimized,
             to give more responsiveness when starting Impala on a system with a large number of databases, tables,
             or partitions. The initial metadata loading happens in the background, allowing queries to be run
             before the entire process is finished. When a query refers to a table whose metadata is not yet loaded,
             the query waits until the metadata for that table is loaded, and the load operation for that table is
             prioritized to happen first.
           </p>
         </li>

         <li>
           <p>
             Formerly, if you created a new table in Hive, you had to issue the <codeph>INVALIDATE METADATA</codeph>
             statement (with no table name) which was an expensive operation that reloaded metadata for all tables.
             Impala did not recognize the name of the Hive-created table, so you could not do <codeph>INVALIDATE
             METADATA <varname>new_table</varname></codeph> to get the metadata for just that one table. Now, when
             you issue <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>, Impala checks to see if
             that name represents a table created in Hive, and if so recognizes the new table and loads the metadata
             for it. Additionally, if the new table is in a database that was newly created in Hive, Impala also
             recognizes the new database.
           </p>
         </li>

         <li>
           <p>
             If you issue <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph> and the table has been
             dropped through Hive, Impala will recognize that the table no longer exists.
           </p>
         </li>

         <li>
           <p>
             New startup options let you control the parallelism of the metadata loading during startup for the
             <cmdname>catalogd</cmdname> daemon:
           </p>
           <ul>
             <li>
               <p>
                 <codeph>--load_catalog_in_background</codeph> makes Impala load and cache metadata using background
                 threads after startup. It is <codeph>true</codeph> by default. Previously, a system with a large
                 number of databases, tables, or partitions could be unresponsive or even time out during startup.
               </p>
             </li>

             <li>
               <p>
                 <codeph>--num_metadata_loading_threads</codeph> determines how much parallelism Impala devotes to
                 loading metadata in the background. The default is 16. You might increase this value for systems
                 with huge numbers of databases, tables, or partitions. You might lower this value for busy systems
                 that are CPU-constrained due to jobs from components other than Impala.
               </p>
             </li>
           </ul>
         </li>
       </ul>

     </conbody>

   </concept>

   <concept rev="1.2.3" id="new_features_123">

     <title>New Features in Impala 1.2.3</title>

     <conbody>

       <p>
         Impala 1.2.3 contains exactly the same feature set as Impala 1.2.2. Its only difference is one additional
         fix for compatibility with Parquet files generated outside of Impala by components such as Hive, Pig, or
         MapReduce. If you are upgrading from Impala 1.2.1 or earlier, see
         <xref href="impala_new_features.xml#new_features_122"/> for the latest added features.
       </p>

     </conbody>

   </concept>

   <concept rev="1.2.2" id="new_features_122">

     <title>New Features in Impala 1.2.2</title>

     <conbody>

       <p>
         Impala 1.2.2 includes new features for performance, security, and flexibility. The major enhancements over
         1.2.1 are performance related, primarily for join queries.
       </p>

       <p>
         New user-visible features include:
       </p>

       <ul>
         <li>
           <p>
             Join order optimizations. This highly valuable feature automatically distributes and parallelizes the
             work for a join query to minimize disk I/O and network traffic. The automatic optimization reduces the
             need to use query hints or to rewrite join queries with the tables in a specific order based on size or
             cardinality. The new <codeph>COMPUTE STATS</codeph> statement gathers statistical information about
             each table that is crucial for enabling the join optimizations. See
             <xref href="impala_perf_joins.xml#perf_joins"/> for details.
           </p>
         </li>

         <li>
           <p>
             <codeph>COMPUTE STATS</codeph> statement to collect both table statistics and column statistics with a
             single statement. Intended to be more comprehensive, efficient, and reliable than the corresponding
             Hive <codeph>ANALYZE TABLE</codeph> statement, which collects statistics in multiple phases through
             MapReduce jobs. These statistics are important for query planning for join queries, queries on
             partitioned tables, and other types of data-intensive operations. For optimal planning of join queries,
             you need to collect statistics for each table involved in the join. See
             <xref href="impala_compute_stats.xml#compute_stats"/> for details.
           </p>
         </li>

         <li>
           <p>
             Reordering of tables in a join query can be overridden by the <codeph>STRAIGHT_JOIN</codeph> operator,
             allowing you to fine-tune the planning of the join query if necessary, by using the original technique
             of ordering the joined tables in descending order of size. See
             <xref href="impala_perf_joins.xml#straight_join"/> for details.
           </p>
         </li>

         <li>
           <p>
             The <codeph>CROSS JOIN</codeph> clause in the
             <codeph><xref href="impala_select.xml#select">SELECT</xref></codeph> statement to allow Cartesian
             products in queries, that is, joins without an equality comparison between columns in both tables.
             Because such queries must be carefully checked to avoid accidental overconsumption of memory, you must
             use the <codeph>CROSS JOIN</codeph> operator to explicitly select this kind of join. See
             <xref href="impala_tutorial.xml#tut_cross_join"/> for examples.
           </p>
         </li>

         <li>
           <p>
             The <codeph>ALTER TABLE</codeph> statement has new clauses that let you fine-tune table statistics. You
             can use this technique as a less-expensive way to update specific statistics, in case the statistics
             become stale, or to experiment with the effects of different data distributions on query planning.
           </p>
         </li>

         <li>
           <p>
             LDAP username/password authentication in JDBC/ODBC. See <xref href="impala_ldap.xml#ldap"/> for
             details.
           </p>
         </li>

         <li>
           <p>
             <xref href="impala_string_functions.xml#string_functions/group_concat">GROUP_CONCAT()</xref> aggregate
             function to concatenate column values across all rows of a result set.
           </p>
         </li>

         <li>
           <p>
             The <codeph>INSERT</codeph> statement now accepts hints, <codeph>[SHUFFLE]</codeph> and
             <codeph>[NOSHUFFLE]</codeph>, to influence the way work is redistributed during
             <codeph>INSERT...SELECT</codeph> operations. The hints are primarily useful for inserting into
             partitioned Parquet tables, where using the <codeph>[SHUFFLE]</codeph> hint can avoid problems due to
             memory consumption and simultaneous open files in HDFS, by collecting all the new data for each
             partition on a specific node.
           </p>
         </li>

         <li>
           <p>
             Several built-in functions and operators are now overloaded for more numeric data types, to reduce the
             requirement to use <codeph>CAST()</codeph> for type coercion in <codeph>INSERT</codeph> statements. For
             example, the expression <codeph>2+2</codeph> in an <codeph>INSERT</codeph> statement formerly produced
             a <codeph>BIGINT</codeph> result, requiring a <codeph>CAST()</codeph> to be stored in an
             <codeph>INT</codeph> variable. Now, addition, subtraction, and multiplication only produce a result
             that is one step <q>bigger</q> than their arguments, and numeric and conditional functions can return
             <codeph>SMALLINT</codeph>, <codeph>FLOAT</codeph>, and other smaller types rather than always
             <codeph>BIGINT</codeph> or <codeph>DOUBLE</codeph>.
           </p>
         </li>

         <li>
           <p>
             New <codeph>fnv_hash()</codeph> built-in function for constructing hashed values. See
             <xref href="impala_math_functions.xml#math_functions"/> for details.
           </p>
         </li>

         <li>
           <p>
             The clause <codeph>STORED AS PARQUET</codeph> is accepted as an equivalent for <codeph>STORED AS
             PARQUETFILE</codeph>. This more concise form is recommended for new code.
           </p>
         </li>
       </ul>

       <p>
         Because Impala 1.2.2 builds on a number of features introduced in 1.2.1, if you are upgrading from an older
         1.1.x release straight to 1.2.2, also review <xref href="impala_new_features.xml#new_features_121"/> to see
         features such as the <codeph>SHOW TABLE STATS</codeph> and <codeph>SHOW COLUMN STATS</codeph> statements,
         and user-defined functions (UDFs).
       </p>

     </conbody>

   </concept>

   <concept rev="1.2" id="new_features_121">

     <title>New Features in Impala 1.2.1</title>

     <conbody>

       <note>
         The Impala 1.2.1 feature set is a superset of features in the Impala 1.2.0 beta, with the
         exception of resource management, which relies on resource management infrastructure in the
         underlying Hadoop distribution.
       </note>

       <p>
         Impala 1.2.1 includes new features for security, performance, and flexibility.
       </p>

       <p>
         New user-visible features include:
       </p>

       <ul>
         <li rev="1.2.1">
           <p>
             <codeph>SHOW TABLE STATS <varname>table_name</varname></codeph> and <codeph>SHOW COLUMN STATS
             <varname>table_name</varname></codeph> statements, to verify that statistics are available and to see
             the values used during query planning.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             <codeph>CREATE TABLE AS SELECT</codeph> syntax, to create a new table and transfer data into it in a
             single operation.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             <codeph>OFFSET</codeph> clause, for use with the <codeph>ORDER BY</codeph> and <codeph>LIMIT</codeph>
             clauses to produce <q>paged</q> result sets such as items 1-10, then 11-20, and so on.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             <codeph>NULLS FIRST</codeph> and <codeph>NULLS LAST</codeph> clauses to ensure consistent placement of
             <codeph>NULL</codeph> values in <codeph>ORDER BY</codeph> queries.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             New <xref href="impala_functions.xml#builtins">built-in functions</xref>: <codeph>least()</codeph>,
             <codeph>greatest()</codeph>, <codeph>initcap()</codeph>.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             New aggregate function: <codeph>ndv()</codeph>, a fast alternative to <codeph>COUNT(DISTINCT
             <varname>col</varname>)</codeph> returning an approximate result.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             The <codeph>LIMIT</codeph> clause can now accept a numeric expression as an argument, rather than only
             a literal constant.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             The <codeph>SHOW CREATE TABLE</codeph> statement displays the end result of all the <codeph>CREATE
             TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements for a particular table. You can use the
             output to produce a simplified setup script for a schema.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             The <codeph>--idle_query_timeout</codeph> and <codeph>--idle_session_timeout</codeph> options for
             <cmdname>impalad</cmdname> control the time intervals after which idle queries are cancelled, and idle
             sessions expire. See <xref href="impala_timeouts.xml#timeouts"/> for details.
           </p>
         </li>

         <li>
           <p>
             User-defined functions (UDFs). This feature lets you transform data in very flexible ways, which is
             important when using Impala as part of an ETL or ELT pipeline. Prior to Impala 1.2, using UDFs required
             switching into Hive. Impala 1.2 can run scalar UDFs and user-defined aggregate functions (UDAs). Impala
             can run high-performance functions written in C++, or you can reuse existing Hive functions written in
             Java.
           </p>

           <p>
             You create UDFs through the <codeph>CREATE FUNCTION</codeph> statement and drop them through the
             <codeph>DROP FUNCTION</codeph> statement. See <xref href="impala_udf.xml#udfs"/> for instructions about
             coding, building, and deploying UDFs, and <xref href="impala_create_function.xml#create_function"/> and
             <xref href="impala_drop_function.xml#drop_function"/> for related SQL syntax.
           </p>
         </li>

         <li>
           <p>
             A new service automatically propagates changes to table data and metadata made by one Impala node,
             sending the new or updated metadata to all the other Impala nodes. The automatic synchronization
             mechanism eliminates the need to use the <codeph>INVALIDATE METADATA</codeph> and
             <codeph>REFRESH</codeph> statements after issuing Impala statements such as <codeph>CREATE
             TABLE</codeph>, <codeph>ALTER TABLE</codeph>, <codeph>DROP TABLE</codeph>, <codeph>INSERT</codeph>, and
             <codeph>LOAD DATA</codeph>.
           </p>

           <p>
             For even more precise synchronization, you can enable the
             <codeph><xref href="impala_sync_ddl.xml#sync_ddl">SYNC_DDL</xref></codeph> query option before issuing
             a DDL, <codeph>INSERT</codeph>, or <codeph>LOAD DATA</codeph> statement. This option causes the
             statement to wait, returning only after the catalog service has broadcast the applicable changes to all
             Impala nodes in the cluster.
           </p>

           <note>
             <p>
               Because the catalog service only monitors operations performed through Impala, <codeph>INVALIDATE
               METADATA</codeph> and <codeph>REFRESH</codeph> are still needed on the Impala side after creating new
               tables or loading data through the Hive shell or by manipulating data files directly in HDFS. Because
               the catalog service broadcasts the result of the <codeph>REFRESH</codeph> and <codeph>INVALIDATE
               METADATA</codeph> statements to all Impala nodes, when you do need to use those statements, you can
               do so a single time rather than on every Impala node.
             </p>
           </note>

           <p>
             This service is implemented by the <cmdname>catalogd</cmdname> daemon. See
             <xref href="impala_components.xml#intro_catalogd"/> for details.
           </p>
         </li>

         <li>
           <p>
             The <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements have new clauses
             <codeph>TBLPROPERTIES</codeph> and <codeph>WITH SERDEPROPERTIES</codeph>. The
             <codeph>TBLPROPERTIES</codeph> clause lets you associate arbitrary items of metadata with a particular
             table as key-value pairs. The <codeph>WITH SERDEPROPERTIES</codeph> clause lets you specify the
             serializer/deserializer (SerDes) classes that read and write data for a table; although Impala does not
             make use of these properties, sometimes particular values are needed for Hive compatibility. See
             <xref href="impala_create_table.xml#create_table"/> and
             <xref href="impala_alter_table.xml#alter_table"/> for details.
           </p>
         </li>

         <li>
           <p>
             Delegation support lets you authorize certain OS users associated with applications (for example,
             <codeph>hue</codeph>), to submit requests using the credentials of other users.
             See <xref href="impala_delegation.xml#delegation"/> for details.
           </p>
         </li>

         <li>
           <p>
             Enhancements to <codeph>EXPLAIN</codeph> output. In particular, when you enable the new
             <codeph>EXPLAIN_LEVEL</codeph> query option, the <codeph>EXPLAIN</codeph> and <codeph>PROFILE</codeph>
             statements produce more verbose output showing estimated resource requirements and whether table and
             column statistics are available for the applicable tables and columns. See
             <xref href="impala_explain.xml#explain"/> for details.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             <codeph>SHOW CREATE TABLE</codeph> summarizes the effects of the original <codeph>CREATE TABLE</codeph>
             statement and any subsequent <codeph>ALTER TABLE</codeph> statements, giving you a <codeph>CREATE
             TABLE</codeph> statement that will re-create the current structure and layout for a table.
           </p>
         </li>

         <li rev="1.2.1">
           <p>
             The <codeph>LIMIT</codeph> clause for queries now accepts an arithmetic expression, in addition to
             numeric literals.
           </p>
         </li>

       </ul>

     </conbody>

   </concept>

   <concept rev="1.2" id="new_features_120">

     <title>New Features in Impala 1.2.0 (Beta)</title>

     <conbody>

       <p>
         The Impala 1.2.0 beta includes new features for security, performance, and flexibility.
       </p>

       <p>
         New user-visible features include:
       </p>

       <ul>
         <li>
           <p>
             User-defined functions (UDFs). This feature lets you transform data in very flexible ways, which is
             important when using Impala as part of an ETL or ELT pipeline. Prior to Impala 1.2, using UDFs required
             switching into Hive. Impala 1.2 can run scalar UDFs and user-defined aggregate functions (UDAs). Impala
             can run high-performance functions written in C++, or you can reuse existing Hive functions written in
             Java.
           </p>

           <p>
             You create UDFs through the <codeph>CREATE FUNCTION</codeph> statement and drop them through the
             <codeph>DROP FUNCTION</codeph> statement. See <xref href="impala_udf.xml#udfs"/> for instructions about
             coding, building, and deploying UDFs, and <xref href="impala_create_function.xml#create_function"/> and
             <xref href="impala_drop_function.xml#drop_function"/> for related SQL syntax.
           </p>
         </li>

         <li>
           <p>
             A new service automatically propagates changes to table data and metadata made by one Impala node,
             sending the new or updated metadata to all the other Impala nodes. The automatic synchronization
             mechanism eliminates the need to use the <codeph>INVALIDATE METADATA</codeph> and
             <codeph>REFRESH</codeph> statements after issuing Impala statements such as <codeph>CREATE
             TABLE</codeph>, <codeph>ALTER TABLE</codeph>, <codeph>DROP TABLE</codeph>, <codeph>INSERT</codeph>, and
             <codeph>LOAD DATA</codeph>.
           </p>

           <note>
             <p>
               Because this service only monitors operations performed through Impala, <codeph>INVALIDATE
               METADATA</codeph> and <codeph>REFRESH</codeph> are still needed on the Impala side after creating new
               tables or loading data through the Hive shell or by manipulating data files directly in HDFS. Because
               the catalog service broadcasts the result of the <codeph>REFRESH</codeph> and <codeph>INVALIDATE
               METADATA</codeph> statements to all Impala nodes, when you do need to use those statements, you can
               do so a single time rather than on every Impala node.
             </p>
           </note>

           <p>
             This service is implemented by the <cmdname>catalogd</cmdname> daemon. See
             <xref href="impala_components.xml#intro_catalogd"/> for details.
           </p>
         </li>

         <li>
           <p>
             Integration with the YARN resource management framework. This
             feature makes use of the underlying YARN service, plus an additional service (Llama) that coordinates
             requests to YARN for Impala resources, so that the Impala query only proceeds when all requested
             resources are available. See <xref href="impala_resource_management.xml#resource_management"/> for full
             details.
           </p>

           <p>
             On the Impala side, this feature involves some new startup options for the <cmdname>impalad</cmdname>
             daemon:
           </p>
           <ul>
             <li>
               <codeph>-enable_rm</codeph>
             </li>

             <li>
               <codeph>-llama_host</codeph>
             </li>

             <li>
               <codeph>-llama_port</codeph>
             </li>

             <li>
               <codeph>-llama_callback_port</codeph>
             </li>

             <li>
               <codeph>-cgroup_hierarchy_path</codeph>
             </li>
           </ul>
           <p>
             For details of these startup options, see <xref href="impala_config_options.xml#config_options"/>.
           </p>

           <p>
             This feature also involves several new or changed query options that you can set through the
             <cmdname>impala-shell</cmdname> interpreter and apply within a specific session:
           </p>
           <ul>
             <li>
               <codeph>MEM_LIMIT</codeph>: the function of this existing option changes when Impala resource
               management is enabled.
             </li>

             <li>
               <codeph>YARN_POOL</codeph>: a new option. (Renamed to <codeph>RESOURCE_POOL</codeph> in Impala
               1.3.0.)
             </li>

             <li>
               <codeph>V_CPU_CORES</codeph>: a new option.
             </li>

             <li>
               <codeph>RESERVATION_REQUEST_TIMEOUT</codeph>: a new option.
             </li>
           </ul>
           <p>
             For details of these query options, see <xref href="impala_resource_management.xml#rm_query_options"/>.
           </p>
         </li>

         <li>
           <p>
             <codeph>CREATE TABLE ... AS SELECT</codeph> syntax, to create a table and copy data into it in a single
             operation. See <xref href="impala_create_table.xml#create_table"/> for details.
           </p>
         </li>

         <li>
           <p>
             The <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements have a new
             <codeph>TBLPROPERTIES</codeph> clause that lets you associate arbitrary items of metadata with a
             particular table as key-value pairs. See <xref href="impala_create_table.xml#create_table"/> and
             <xref href="impala_alter_table.xml#alter_table"/> for details.
           </p>
         </li>

         <li>
           <p>
             Delegation support lets you authorize certain OS users associated with applications (for example,
             <codeph>hue</codeph>), to submit requests using the credentials of other users.
             See <xref href="impala_delegation.xml#delegation"/> for details.
           </p>
         </li>

         <li>
           <p>
             Enhancements to <codeph>EXPLAIN</codeph> output. In particular, when you enable the new
             <codeph>EXPLAIN_LEVEL</codeph> query option, the <codeph>EXPLAIN</codeph> and <codeph>PROFILE</codeph>
             statements produce more verbose output showing estimated resource requirements and whether table and
             column statistics are available for the applicable tables and columns. See
             <xref href="impala_explain.xml#explain"/> for details.
           </p>
         </li>

       </ul>

     </conbody>

   </concept>

   <concept id="new_features_111">

     <title>New Features in Impala 1.1.1</title>

     <conbody>

       <p>
         Impala 1.1.1 includes new features for security and stability.
       </p>

       <p>
         New user-visible features include:
       </p>

       <ul>
         <li>
           Additional security feature: auditing. New startup options for <cmdname>impalad</cmdname> let you capture
           information about Impala queries that succeed or are blocked due to insufficient privileges. For details,
           see <xref href="impala_security.xml#security"/>.
         </li>

         <li>
           Parquet data files generated by Impala 1.1.1 are now compatible with the Parquet support in Hive. See
           <xref href="impala_incompatible_changes.xml#incompatible_changes"/> for the procedure to update older
           Impala-created Parquet files to be compatible with the Hive Parquet support.
         </li>

         <li>
           Additional improvements to stability and resource utilization for Impala queries.
         </li>

         <li>
           Additional enhancements for compatibility with existing file formats.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_11">

     <title>New Features in Impala 1.1</title>

     <conbody>

       <p>
         Impala 1.1 includes new features for security, performance, and usability.
       </p>

       <p>
         New user-visible features include:
       </p>

       <ul>
         <li>
           Extensive new security features, built on top of the Sentry open source project. Impala now supports
           fine-grained authorization based on roles. A policy file determines which privileges on which schema
           objects (servers, databases, tables, and HDFS paths) are available to users based on their membership in
           groups. By assigning privileges for views, you can control access to table data at the column level. For
           details, see <xref href="impala_security.xml#security"/>.
         </li>

         <li>
           Impala can now create, alter, drop, and query views. Views provide a flexible way to set up simple
           aliases for complex queries; hide query details from applications and users; and simplify maintenance as
           you rename or reorganize databases, tables, and columns. See the overview section
           <xref href="impala_views.xml#views"/> and the statements
           <xref href="impala_create_view.xml#create_view"/>, <xref href="impala_alter_view.xml#alter_view"/>, and
           <xref href="impala_drop_view.xml#drop_view"/>.
         </li>

         <li>
           Performance is improved through a number of automatic optimizations. Resource consumption is also reduced
           for Impala queries. These improvements apply broadly across all kinds of workloads and file formats. The
           major areas of performance enhancement include:
           <ul>
             <li>
               Improved disk and thread scheduling, which applies to all queries.
             </li>

             <li>
               Improved hash join and aggregation performance, which applies to queries with large build tables or a
               large number of groups.
             </li>

             <li>
               Dictionary encoding with Parquet, which applies to Parquet tables with short string columns.
             </li>

             <li>
               Improved performance on systems with SSDs, which applies to all queries and file formats.
             </li>
           </ul>
         </li>

         <li>
           Some new built-in functions are implemented:
           <xref href="impala_string_functions.xml#string_functions/translate">translate()</xref> to substitute
           characters within strings,
 <!-- IMPALA-418 -->
           <xref href="impala_misc_functions.xml#misc_functions/user">user()</xref> to check the login ID of the
           connected user.
 <!-- IMPALA-??? -->
         </li>

         <li>
           The new <codeph>WITH</codeph> clause for <codeph>SELECT</codeph> statements lets you simplify complicated
           queries in a way similar to creating a view. The effects of the <codeph>WITH</codeph> clause only last
           for the duration of one query, unlike views, which are persistent schema objects that can be used by
           multiple sessions or applications. See <xref href="impala_with.xml#with"/>.
         </li>

         <li>
           An enhancement to <codeph>DESCRIBE</codeph> statement, <codeph>DESCRIBE FORMATTED
           <varname>table_name</varname></codeph>, displays more detailed information about the table. This
           information includes the file format, location, delimiter, ownership, external or internal, creation and
           access times, and partitions. The information is returned as a result set that can be interpreted and
           used by a management or monitoring application. See <xref href="impala_describe.xml#describe"/>.
         </li>

         <li>
           You can now insert a subset of columns for a table, with other columns being left as all
           <codeph>NULL</codeph> values. Or you can specify the columns in any order in the destination table,
           rather than having to match the order of the corresponding columns in the source. <codeph>VALUES</codeph>
           clause. This feature is known as <q>column permutation</q>. See <xref href="impala_insert.xml#insert"/>.
         </li>

         <li>
           The new <codeph>LOAD DATA</codeph> statement lets you load data into a table directly from an HDFS data
           file. This technique lets you minimize the number of steps in your ETL process, and provides more
           flexibility. For example, you can bring data into an Impala table in one step. Formerly, you might have
           created an external table where the data files are not entirely under your control, or copied the data
           files to Impala data directories manually, or loaded the original data into one table and then used the
           <codeph>INSERT</codeph> statement to copy it to a new table with a different file format, partitioning
           scheme, and so on. See <xref href="impala_load_data.xml#load_data"/>.
         </li>

         <li>
           Improvements to Impala-HBase integration:
           <ul>
             <li>
               New query options for HBase performance:
               <codeph><xref href="impala_hbase_cache_blocks.xml#hbase_cache_blocks">HBASE_CACHE_BLOCKS</xref></codeph>
               and <codeph><xref href="impala_hbase_caching.xml#hbase_caching">HBASE_CACHING</xref></codeph>.
             </li>

             <li>
               Support for binary data types in HBase tables. See <xref href="impala_hbase.xml#hbase_types"/> for
               details.
             </li>
           </ul>
         </li>

         <li>
           You can issue <codeph>REFRESH</codeph> as a SQL statement through any of the programming interfaces that
           Impala supports. <codeph>REFRESH</codeph> formerly had to be issued as a command through the
           <cmdname>impala-shell</cmdname> interpreter, and was not available through a JDBC or ODBC API call. As
           part of this change, the functionality of the <codeph>REFRESH</codeph> statement is divided between two
           statements. In Impala 1.1, <codeph>REFRESH</codeph> requires a table name argument and immediately
           reloads the metadata; the new <codeph>INVALIDATE METADATA</codeph> statement works the same as the Impala
           1.0 <codeph>REFRESH</codeph> did: the table name argument is optional, and the metadata for one or all
           tables is marked as stale, but not actually reloaded until the table is queried. When you create a new
           table in the Hive shell or through a different Impala node, you must enter <codeph>INVALIDATE
           METADATA</codeph> with no table parameter before you can see the new table in
           <cmdname>impala-shell</cmdname>. See <xref href="impala_refresh.xml#refresh"/> and
           <xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_101">

     <title>New Features in Impala 1.0.1</title>

     <conbody>

       <p>
         New user-visible features include:
       </p>

       <ul>
         <li>
           The <codeph>VALUES</codeph> clause lets you <codeph>INSERT</codeph> one or more rows using literals,
           function return values, or other expressions. For performance and scalability, you should still use
           <codeph>INSERT ... SELECT</codeph> for bringing large quantities of data into an Impala table. The
           <codeph>VALUES</codeph> clause is a convenient way to set up small tables, particularly for initial
           testing of SQL features that do not require large amounts of data. See
           <xref href="impala_insert.xml#values"/> for details.
         </li>

         <li>
           The <codeph>-B</codeph> and <codeph>-o</codeph> options of the <codeph>impala-shell</codeph> command can
           turn query results into delimited text files and store them in an output file. The plain text results are
           useful for using with other Hadoop components or Unix tools. In benchmark tests, it is also faster to
           produce plain rather than pretty-printed results, and write to a file rather than to the screen, giving a
           more accurate picture of the actual query time.
         </li>

         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_101"/> for details.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_10">

     <title>New Features in Impala 1.0</title>

     <conbody>

       <p>
         This version has multiple performance improvements and adds the following functionality:
       </p>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_10"/>.
         </li>

         <li>
           <codeph><xref href="impala_alter_table.xml#alter_table">ALTER TABLE</xref></codeph> statement.
         </li>

         <li>
           <xref href="impala_hints.xml#hints">Hints</xref> to allow specifying a particular join strategy.
         </li>

         <li>
           <codeph><xref href="impala_refresh.xml#refresh">REFRESH</xref></codeph> for a single table.
         </li>

         <li>
           Dynamic resource management, allowing high concurrency for Impala queries.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_07">

     <title>New Features in Version 0.7 of the Impala Beta Release</title>

     <conbody>

       <p>
         This version has multiple performance improvements and adds the following functionality:
       </p>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_07"/>.
         </li>

         <li>
           Support for the Parquet file format. For more information on file formats, see
           <xref href="impala_file_formats.xml#file_formats"/>.
         </li>

         <li>
           Added support for Avro.
         </li>

         <li>
           Support for the memory limits. For more information, see the example on modifying memory limits in
           <xref href="impala_config_options.xml#config_options"/>.
         </li>

         <li>
           Bigger and faster joins through the addition of partitioned joins to the already supported broadcast
           joins.
         </li>

         <li>
           Fully distributed aggregations.
         </li>

         <li>
           Fully distributed top-n computation.
         </li>

         <li>
           Support for creating and altering tables.
         </li>

         <li>
           Support for GROUP BY with floats and doubles.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_06">

     <title>New Features in Version 0.6 of the Impala Beta Release</title>

     <conbody>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_06"/>.
         </li>

         <li>
           Added support for Impala on SUSE and Debian/Ubuntu. Impala is now supported on:
           <ul>
             <li>
               RHEL5.7/6.2 and Centos5.7/6.2
             </li>

             <li>
               SUSE 11 with Service Pack 1 or higher
             </li>

             <li>
               Ubuntu 10.04/12.04 and Debian 6.03
             </li>
           </ul>
         </li>

         <li>
           Support for the RCFile file format. For more information on file formats, see
           <xref href="impala_file_formats.xml#file_formats">Understanding File Formats</xref>.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_05">

     <title>New Features in Version 0.5 of the Impala Beta Release</title>

     <conbody>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_05"/>.
         </li>

         <li>
           Added support for a JDBC driver that allows you to access Impala from a Java client. To use this feature,
           follow the instructions in <xref href="impala_jdbc.xml#impala_jdbc"/> to install the JDBC
           driver JARs on the client machine and modify the <codeph>CLASSPATH</codeph> on the client to include the
           JARs.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_04">

     <title>New Features in Version 0.4 of the Impala Beta Release</title>

     <conbody>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_04"/>.
         </li>

         <li>
           Added support for Impala on RHEL5.7/Centos5.7. Impala is now supported on RHEL5.7/6.2 and Centos5.7/6.2.
         </li>

         <li>
           The Impala debug webserver now has the ability to serve static files from
           <codeph>${IMPALA_HOME}/www</codeph>. This can be disabled by setting
           <codeph>--enable_webserver_doc_root=false</codeph> on the command line. As a result, Impala now uses the
           Twitter Bootstrap library to style its debug webpages, and the <codeph>/queries</codeph> page now tracks
           the last 25 queries run by each Impala daemon.
         </li>

         <li>
           Additional metrics available on the Impala Debug Webpage.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_03">

     <title>New Features in Version 0.3 of the Impala Beta Release</title>

     <conbody>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_03"/>.
         </li>

         <li>
           The <codeph>state-store-service binary</codeph> has been renamed <codeph>statestored</codeph>.
         </li>

         <li>
           The location of the Impala configuration files has changed from the <codeph>/usr/lib/impala/conf</codeph>
           directory to the <codeph>/etc/impala/conf</codeph> directory.
         </li>
       </ul>

     </conbody>

   </concept>

   <concept id="new_features_02">

     <title>New Features in Version 0.2 of the Impala Beta Release</title>

     <conbody>

       <ul>
         <li>
           Several bug fixes. See <xref href="impala_fixed_issues.xml#fixed_issues_02"/>.
         </li>

         <li>
           <b>Added Default Query Options</b> Default query options override all default QueryOption values when
           starting <codeph>impalad</codeph>. The format is:
 <codeblock>-default_query_options='key=value;key=value'</codeblock>
         </li>
       </ul>

     </conbody>

   </concept>

 </concept>