docs/topics/impala_tables.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="tables">

   <title>Overview of Impala Tables</title>
   <titlealts audience="PDF"><navtitle>Tables</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="Databases"/>
       <data name="Category" value="SQL"/>
       <data name="Category" value="Data Analysts"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Querying"/>
       <data name="Category" value="Tables"/>
       <data name="Category" value="Schemas"/>
     </metadata>
   </prolog>

   <conbody>

     <p/>

     <p>
       Tables are the primary containers for data in Impala. They have the familiar row and column layout similar to
       other database systems, plus some features such as partitioning often associated with higher-end data
       warehouse systems.
     </p>

     <p>
       Logically, each table has a structure based on the definition of its columns, partitions, and other
       properties.
     </p>

     <p>
       Physically, each table that uses HDFS storage is associated with a directory in HDFS. The table data consists of all the data files
       underneath that directory:
     </p>

     <ul>
       <li>
         <xref href="impala_tables.xml#internal_tables">Internal tables</xref> are managed by Impala, and use directories
         inside the designated Impala work area.
       </li>

       <li>
         <xref href="impala_tables.xml#external_tables">External tables</xref> use arbitrary HDFS directories, where
         the data files are typically shared between different Hadoop components.
       </li>

       <li>
         Large-scale data is usually handled by partitioned tables, where the data files are divided among different
         HDFS subdirectories.
       </li>
     </ul>

     <p rev="2.2.0">
       Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<keyword keyref="impala22_full"/> or higher),
       or on Isilon storage devices (<keyword keyref="impala223_full"/> or higher).  See <xref href="impala_hbase.xml#impala_hbase"/>,
       <xref href="impala_s3.xml#s3"/>, and <xref href="impala_isilon.xml#impala_isilon"/>
       for details about those special kinds of tables.
     </p>

     <p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>

     <p outputclass="toc inpage"/>

     <p>
       <b>Related statements:</b> <xref href="impala_create_table.xml#create_table"/>,
       <xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>
       <xref href="impala_insert.xml#insert"/>, <xref href="impala_load_data.xml#load_data"/>,
       <xref href="impala_select.xml#select"/>
     </p>
   </conbody>

   <concept id="internal_tables">

     <title>Internal Tables</title>

     <conbody>

       <p>
         <indexterm audience="hidden">internal tables</indexterm>
         The default kind of table produced by the <codeph>CREATE TABLE</codeph> statement is known as an internal
         table. (Its counterpart is the external table, produced by the <codeph>CREATE EXTERNAL TABLE</codeph>
         syntax.)
       </p>

       <ul>
         <li>
           <p>
             Impala creates a directory in HDFS to hold the data files.
           </p>
         </li>

         <li>
           <p>
             You can create data in internal tables by issuing <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
             statements.
           </p>
         </li>

         <li>
           <p>
             If you add or replace data using HDFS operations, issue the <codeph>REFRESH</codeph> command in
             <cmdname>impala-shell</cmdname> so that Impala recognizes the changes in data files, block locations,
             and so on.
           </p>
         </li>

         <li>
           <p>
             When you issue a <codeph>DROP TABLE</codeph> statement, Impala physically removes all the data files
             from the directory.
           </p>
         </li>

         <li>
           <p conref="../shared/impala_common.xml#common/check_internal_external_table"/>
         </li>

         <li>
           <p>
             When you issue an <codeph>ALTER TABLE</codeph> statement to rename an internal table, all data files
             are moved into the new HDFS directory for the table. The files are moved even if they were formerly in
             a directory outside the Impala data directory, for example in an internal table with a
             <codeph>LOCATION</codeph> attribute pointing to an outside HDFS directory.
           </p>
         </li>
       </ul>

       <p conref="../shared/impala_common.xml#common/example_blurb"/>

       <p conref="../shared/impala_common.xml#common/switch_internal_external_table"/>

       <p conref="../shared/impala_common.xml#common/related_info"/>

       <p>
         <xref href="impala_tables.xml#external_tables"/>, <xref href="impala_create_table.xml#create_table"/>,
         <xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>,
         <xref href="impala_describe.xml#describe"/>
       </p>
     </conbody>
   </concept>

   <concept id="external_tables">

     <title>External Tables</title>

     <conbody>

       <p>
         <indexterm audience="hidden">external tables</indexterm>
         The syntax <codeph>CREATE EXTERNAL TABLE</codeph> sets up an Impala table that points at existing data
         files, potentially in HDFS locations outside the normal Impala data directories.. This operation saves the
         expense of importing the data into a new table when you already have the data files in a known location in
         HDFS, in the desired file format.
       </p>

       <ul>
         <li>
           <p>
             You can use Impala to query the data in this table.
           </p>
         </li>

         <li>
           <p>
             You can create data in external tables by issuing <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
             statements.
           </p>
         </li>

         <li>
           <p>
             If you add or replace data using HDFS operations, issue the <codeph>REFRESH</codeph> command in
             <cmdname>impala-shell</cmdname> so that Impala recognizes the changes in data files, block locations,
             and so on.
           </p>
         </li>

         <li>
           <p>
             When you issue a <codeph>DROP TABLE</codeph> statement in Impala, that removes the connection that
             Impala has with the associated data files, but does not physically remove the underlying data. You can
             continue to use the data files with other Hadoop components and HDFS operations.
           </p>
         </li>

         <li>
           <p conref="../shared/impala_common.xml#common/check_internal_external_table"/>
         </li>

         <li>
           <p>
             When you issue an <codeph>ALTER TABLE</codeph> statement to rename an external table, all data files
             are left in their original locations.
           </p>
         </li>

         <li>
           <p>
             You can point multiple external tables at the same HDFS directory by using the same
             <codeph>LOCATION</codeph> attribute for each one. The tables could have different column definitions,
             as long as the number and types of columns are compatible with the schema evolution considerations for
             the underlying file type. For example, for text data files, one table might define a certain column as
             a <codeph>STRING</codeph> while another defines the same column as a <codeph>BIGINT</codeph>.
           </p>
         </li>
       </ul>

       <p conref="../shared/impala_common.xml#common/example_blurb"/>

       <p conref="../shared/impala_common.xml#common/switch_internal_external_table"/>

       <p conref="../shared/impala_common.xml#common/related_info"/>

       <p>
         <xref href="impala_tables.xml#internal_tables"/>, <xref href="impala_create_table.xml#create_table"/>,
         <xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>,
         <xref href="impala_describe.xml#describe"/>
       </p>
     </conbody>
   </concept>

   <concept id="table_file_formats">
     <title>File Formats</title>

     <conbody>
       <p>
         Each table has an associated file format, which determines how Impala interprets the
         associated data files. See <xref href="impala_file_formats.xml#file_formats"/> for details.
       </p>
       <p>
         You set the file format during the <codeph>CREATE TABLE</codeph> statement,
         or change it later using the <codeph>ALTER TABLE</codeph> statement.
         Partitioned tables can have a different file format for individual partitions,
         allowing you to change the file format used in your ETL process for new data
         without going back and reconverting all the existing data in the same table.
       </p>
       <p>
         Any <codeph>INSERT</codeph> statements produce new data files with the current file format of the table.
         For existing data files, changing the file format of the table does not automatically do any data conversion.
         You must use <codeph>TRUNCATE TABLE</codeph> or <codeph>INSERT OVERWRITE</codeph> to remove any previous data
         files that use the old file format.
         Then you use the <codeph>LOAD DATA</codeph> statement, <codeph>INSERT ... SELECT</codeph>, or other mechanism
         to put data files of the correct format into the table.
       </p>
       <p>
         The default file format, text, is the most flexible and easy to produce when you are just getting started with
         Impala. The Parquet file format offers the highest query performance and uses compression to reduce storage
         requirements; therefore, where practical, use Parquet for Impala tables with substantial amounts of data.
         <ph rev="2.3.0">Also, the complex types (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>)
         available in <keyword keyref="impala23_full"/> and higher are currently only supported with the Parquet file type.</ph>
         Based on your existing ETL workflow, you might use other file formats such as Avro, possibly doing a final
         conversion step to Parquet to take advantage of its performance for analytic queries.
       </p>
     </conbody>
   </concept>

   <concept rev="kudu" id="kudu_tables">
     <title>Kudu Tables</title>
     <prolog>
       <metadata>
         <data name="Category" value="Kudu"/>
       </metadata>
     </prolog>

     <conbody>
       <p>
         Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files.
         Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
         managed internally by Kudu.
       </p>

       <p>
         When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form
         <codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>. You can see the Kudu-assigned name
         in the output of <codeph>DESCRIBE FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of the table properties.
         The Kudu-assigned name remains the same even if you use <codeph>ALTER TABLE</codeph> to rename the Impala table
         or move it to a different Impala database. If you issue the statement
         <codeph>ALTER TABLE <varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' = '<varname>different_kudu_table_name</varname>')</codeph>,
         the effect is different depending on whether the Impala table was created with a regular <codeph>CREATE TABLE</codeph>
         statement (that is, if it is an internal or managed table), or if it was created with a
         <codeph>CREATE EXTERNAL TABLE</codeph> statement (and therefore is an external table). Changing the <codeph>kudu.table_name</codeph>
         property of an internal table physically renames the underlying Kudu table to match the new name.
         Changing the <codeph>kudu.table_name</codeph> property of an external table switches which underlying Kudu table
         the Impala table refers to; the underlying Kudu table must already exist.
       </p>

       <p>
         The following example shows what happens with both internal and external Kudu tables as the <codeph>kudu.table_name</codeph>
         property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created
         outside of Impala, that is, through the Kudu API.
       </p>

 <codeblock>
 -- This is an internal table that we will create and then rename.
 create table old_name (id bigint primary key, s string)
   partition by hash(id) partitions 2 stored as kudu;

 -- Initially, the name OLD_NAME is the same on the Impala and Kudu sides.
 describe formatted old_name;
 ...
 | Location:          | hdfs://host.example.com:8020/path/user.db/old_name
 | Table Type:        | MANAGED_TABLE         | NULL
 | Table Parameters:  | NULL                  | NULL
 |                    | DO_NOT_UPDATE_STATS   | true
 |                    | kudu.master_addresses | vd0342.example.com
 |                    | kudu.table_name       | impala::user.old_name

 -- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name.
 alter table old_name rename to new_name;

 describe formatted new_name;
 | Location:          | hdfs://host.example.com:8020/path/user.db/new_name
 | Table Type:        | MANAGED_TABLE         | NULL
 | Table Parameters:  | NULL                  | NULL
 |                    | DO_NOT_UPDATE_STATS   | true
 |                    | kudu.master_addresses | vd0342.example.com
 |                    | kudu.table_name       | impala::user.old_name

 -- Setting TBLPROPERTIES changes the underlying Kudu name.
 alter table new_name
   set tblproperties('kudu.table_name' = 'impala::user.new_name');

 describe formatted new_name;
 | Location:          | hdfs://host.example.com:8020/path/user.db/new_name
 | Table Type:        | MANAGED_TABLE         | NULL
 | Table Parameters:  | NULL                  | NULL
 |                    | DO_NOT_UPDATE_STATS   | true
 |                    | kudu.master_addresses | vd0342.example.com
 |                    | kudu.table_name       | impala::user.new_name

 -- Put some data in the table to demonstrate how external tables can map to
 -- different underlying Kudu tables.
 insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two');

 -- This external table points to the same underlying Kudu table, NEW_NAME,
 -- as we created above. No need to declare columns or other table aspects.
 create external table kudu_table_alias stored as kudu
   tblproperties('kudu.table_name' = 'impala::user.new_name');

 -- The external table can fetch data from the NEW_NAME table that already
 -- existed and already had data.
 select * from kudu_table_alias limit 100;
 +----+------+
 | id | s    |
 +----+------+
 | 1  | one  |
 | 0  | zero |
 | 2  | two  |
 +----+------+

 -- We cannot re-point the external table at a different underlying Kudu table
 -- unless that other underlying Kudu table already exists.
 alter table kudu_table_alias
   set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
 ERROR:
 TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name',
   Kudu error: The table does not exist: table_name: "impala::user.yet_another_name"

 -- Once the underlying Kudu table exists, we can re-point the external table to it.
 create table yet_another_name (id bigint primary key, x int, y int, s string)
   partition by hash(id) partitions 2 stored as kudu;

 alter table kudu_table_alias
   set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');

 -- Now no data is returned because this other table is empty.
 select * from kudu_table_alias limit 100;

 -- The Impala table automatically recognizes the table schema of the new table,
 -- for example the extra X and Y columns not present in the original table.
 describe kudu_table_alias;
 +------+--------+---------+-------------+----------+...
 | name | type   | comment | primary_key | nullable |...
 +------+--------+---------+-------------+----------+...
 | id   | bigint |         | true        | false    |...
 | x    | int    |         | false       | true     |...
 | y    | int    |         | false       | true     |...
 | s    | string |         | false       | true     |...
 +------+--------+---------+-------------+----------+...
 </codeblock>

       <p>
         The <codeph>SHOW TABLE STATS</codeph> output for a Kudu table shows Kudu-specific details about the layout of the table.
         Instead of information about the number and sizes of files, the information is divided by the Kudu tablets.
         For each tablet, the output includes the fields
         <codeph># Rows</codeph> (although this number is not currently computed), <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
         The output of <codeph>SHOW COLUMN STATS</codeph>, illustrating the distribution of values within each column, is the same for Kudu tables
         as for HDFS-backed tables.
       </p>

       <p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
     </conbody>
   </concept>

 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept id="tables">

	<title>Overview of Impala Tables</title>
	<titlealts audience="PDF"><navtitle>Tables</navtitle></titlealts>
	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="Databases"/>
	<data name="Category" value="SQL"/>
	<data name="Category" value="Data Analysts"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Querying"/>
	<data name="Category" value="Tables"/>
	<data name="Category" value="Schemas"/>
	</metadata>
	</prolog>

	<conbody>

	<p/>

	<p>
	Tables are the primary containers for data in Impala. They have the familiar row and column layout similar to
	other database systems, plus some features such as partitioning often associated with higher-end data
	warehouse systems.
	</p>

	<p>
	Logically, each table has a structure based on the definition of its columns, partitions, and other
	properties.
	</p>

	<p>
	Physically, each table that uses HDFS storage is associated with a directory in HDFS. The table data consists of all the data files
	underneath that directory:
	</p>

	<ul>
	<li>
	<xref href="impala_tables.xml#internal_tables">Internal tables</xref> are managed by Impala, and use directories
	inside the designated Impala work area.
	</li>

	<li>
	<xref href="impala_tables.xml#external_tables">External tables</xref> use arbitrary HDFS directories, where
	the data files are typically shared between different Hadoop components.
	</li>

	<li>
	Large-scale data is usually handled by partitioned tables, where the data files are divided among different
	HDFS subdirectories.
	</li>
	</ul>

	<p rev="2.2.0">
	Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<keyword keyref="impala22_full"/> or higher),
	or on Isilon storage devices (<keyword keyref="impala223_full"/> or higher). See <xref href="impala_hbase.xml#impala_hbase"/>,
	<xref href="impala_s3.xml#s3"/>, and <xref href="impala_isilon.xml#impala_isilon"/>
	for details about those special kinds of tables.
	</p>

	<p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>

	<p outputclass="toc inpage"/>

	<p>
	<b>Related statements:</b> <xref href="impala_create_table.xml#create_table"/>,
	<xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>
	<xref href="impala_insert.xml#insert"/>, <xref href="impala_load_data.xml#load_data"/>,
	<xref href="impala_select.xml#select"/>
	</p>
	</conbody>

	<concept id="internal_tables">

	<title>Internal Tables</title>

	<conbody>

	<p>
	<indexterm audience="hidden">internal tables</indexterm>
	The default kind of table produced by the <codeph>CREATE TABLE</codeph> statement is known as an internal
	table. (Its counterpart is the external table, produced by the <codeph>CREATE EXTERNAL TABLE</codeph>
	syntax.)
	</p>

	<ul>
	<li>
	<p>
	Impala creates a directory in HDFS to hold the data files.
	</p>
	</li>

	<li>
	<p>
	You can create data in internal tables by issuing <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
	statements.
	</p>
	</li>

	<li>
	<p>
	If you add or replace data using HDFS operations, issue the <codeph>REFRESH</codeph> command in
	<cmdname>impala-shell</cmdname> so that Impala recognizes the changes in data files, block locations,
	and so on.
	</p>
	</li>

	<li>
	<p>
	When you issue a <codeph>DROP TABLE</codeph> statement, Impala physically removes all the data files
	from the directory.
	</p>
	</li>

	<li>
	<p conref="../shared/impala_common.xml#common/check_internal_external_table"/>
	</li>

	<li>
	<p>
	When you issue an <codeph>ALTER TABLE</codeph> statement to rename an internal table, all data files
	are moved into the new HDFS directory for the table. The files are moved even if they were formerly in
	a directory outside the Impala data directory, for example in an internal table with a
	<codeph>LOCATION</codeph> attribute pointing to an outside HDFS directory.
	</p>
	</li>
	</ul>

	<p conref="../shared/impala_common.xml#common/example_blurb"/>

	<p conref="../shared/impala_common.xml#common/switch_internal_external_table"/>

	<p conref="../shared/impala_common.xml#common/related_info"/>

	<p>
	<xref href="impala_tables.xml#external_tables"/>, <xref href="impala_create_table.xml#create_table"/>,
	<xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>,
	<xref href="impala_describe.xml#describe"/>
	</p>
	</conbody>
	</concept>

	<concept id="external_tables">

	<title>External Tables</title>

	<conbody>

	<p>
	<indexterm audience="hidden">external tables</indexterm>
	The syntax <codeph>CREATE EXTERNAL TABLE</codeph> sets up an Impala table that points at existing data
	files, potentially in HDFS locations outside the normal Impala data directories.. This operation saves the
	expense of importing the data into a new table when you already have the data files in a known location in
	HDFS, in the desired file format.
	</p>

	<ul>
	<li>
	<p>
	You can use Impala to query the data in this table.
	</p>
	</li>

	<li>
	<p>
	You can create data in external tables by issuing <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>
	statements.
	</p>
	</li>

	<li>
	<p>
	If you add or replace data using HDFS operations, issue the <codeph>REFRESH</codeph> command in
	<cmdname>impala-shell</cmdname> so that Impala recognizes the changes in data files, block locations,
	and so on.
	</p>
	</li>

	<li>
	<p>
	When you issue a <codeph>DROP TABLE</codeph> statement in Impala, that removes the connection that
	Impala has with the associated data files, but does not physically remove the underlying data. You can
	continue to use the data files with other Hadoop components and HDFS operations.
	</p>
	</li>

	<li>
	<p conref="../shared/impala_common.xml#common/check_internal_external_table"/>
	</li>

	<li>
	<p>
	When you issue an <codeph>ALTER TABLE</codeph> statement to rename an external table, all data files
	are left in their original locations.
	</p>
	</li>

	<li>
	<p>
	You can point multiple external tables at the same HDFS directory by using the same
	<codeph>LOCATION</codeph> attribute for each one. The tables could have different column definitions,
	as long as the number and types of columns are compatible with the schema evolution considerations for
	the underlying file type. For example, for text data files, one table might define a certain column as
	a <codeph>STRING</codeph> while another defines the same column as a <codeph>BIGINT</codeph>.
	</p>
	</li>
	</ul>

	<p conref="../shared/impala_common.xml#common/example_blurb"/>

	<p conref="../shared/impala_common.xml#common/switch_internal_external_table"/>

	<p conref="../shared/impala_common.xml#common/related_info"/>

	<p>
	<xref href="impala_tables.xml#internal_tables"/>, <xref href="impala_create_table.xml#create_table"/>,
	<xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>,
	<xref href="impala_describe.xml#describe"/>
	</p>
	</conbody>
	</concept>

	<concept id="table_file_formats">
	<title>File Formats</title>

	<conbody>
	<p>
	Each table has an associated file format, which determines how Impala interprets the
	associated data files. See <xref href="impala_file_formats.xml#file_formats"/> for details.
	</p>
	<p>
	You set the file format during the <codeph>CREATE TABLE</codeph> statement,
	or change it later using the <codeph>ALTER TABLE</codeph> statement.
	Partitioned tables can have a different file format for individual partitions,
	allowing you to change the file format used in your ETL process for new data
	without going back and reconverting all the existing data in the same table.
	</p>
	<p>
	Any <codeph>INSERT</codeph> statements produce new data files with the current file format of the table.
	For existing data files, changing the file format of the table does not automatically do any data conversion.
	You must use <codeph>TRUNCATE TABLE</codeph> or <codeph>INSERT OVERWRITE</codeph> to remove any previous data
	files that use the old file format.
	Then you use the <codeph>LOAD DATA</codeph> statement, <codeph>INSERT ... SELECT</codeph>, or other mechanism
	to put data files of the correct format into the table.
	</p>
	<p>
	The default file format, text, is the most flexible and easy to produce when you are just getting started with
	Impala. The Parquet file format offers the highest query performance and uses compression to reduce storage
	requirements; therefore, where practical, use Parquet for Impala tables with substantial amounts of data.
	<ph rev="2.3.0">Also, the complex types (<codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>)
	available in <keyword keyref="impala23_full"/> and higher are currently only supported with the Parquet file type.</ph>
	Based on your existing ETL workflow, you might use other file formats such as Avro, possibly doing a final
	conversion step to Parquet to take advantage of its performance for analytic queries.
	</p>
	</conbody>
	</concept>

	<concept rev="kudu" id="kudu_tables">
	<title>Kudu Tables</title>
	<prolog>
	<metadata>
	<data name="Category" value="Kudu"/>
	</metadata>
	</prolog>

	<conbody>
	<p>
	Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files.
	Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
	managed internally by Kudu.
	</p>

	<p>
	When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form
	<codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>. You can see the Kudu-assigned name
	in the output of <codeph>DESCRIBE FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of the table properties.
	The Kudu-assigned name remains the same even if you use <codeph>ALTER TABLE</codeph> to rename the Impala table
	or move it to a different Impala database. If you issue the statement
	<codeph>ALTER TABLE <varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' = '<varname>different_kudu_table_name</varname>')</codeph>,
	the effect is different depending on whether the Impala table was created with a regular <codeph>CREATE TABLE</codeph>
	statement (that is, if it is an internal or managed table), or if it was created with a
	<codeph>CREATE EXTERNAL TABLE</codeph> statement (and therefore is an external table). Changing the <codeph>kudu.table_name</codeph>
	property of an internal table physically renames the underlying Kudu table to match the new name.
	Changing the <codeph>kudu.table_name</codeph> property of an external table switches which underlying Kudu table
	the Impala table refers to; the underlying Kudu table must already exist.
	</p>

	<p>
	The following example shows what happens with both internal and external Kudu tables as the <codeph>kudu.table_name</codeph>
	property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created
	outside of Impala, that is, through the Kudu API.
	</p>

	<codeblock>
	-- This is an internal table that we will create and then rename.
	create table old_name (id bigint primary key, s string)
	partition by hash(id) partitions 2 stored as kudu;

	-- Initially, the name OLD_NAME is the same on the Impala and Kudu sides.
	describe formatted old_name;
	...
	\| Location: \| hdfs://host.example.com:8020/path/user.db/old_name
	\| Table Type: \| MANAGED_TABLE \| NULL
	\| Table Parameters: \| NULL \| NULL
	\| \| DO_NOT_UPDATE_STATS \| true
	\| \| kudu.master_addresses \| vd0342.example.com
	\| \| kudu.table_name \| impala::user.old_name

	-- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name.
	alter table old_name rename to new_name;

	describe formatted new_name;
	\| Location: \| hdfs://host.example.com:8020/path/user.db/new_name
	\| Table Type: \| MANAGED_TABLE \| NULL
	\| Table Parameters: \| NULL \| NULL
	\| \| DO_NOT_UPDATE_STATS \| true
	\| \| kudu.master_addresses \| vd0342.example.com
	\| \| kudu.table_name \| impala::user.old_name

	-- Setting TBLPROPERTIES changes the underlying Kudu name.
	alter table new_name
	set tblproperties('kudu.table_name' = 'impala::user.new_name');

	describe formatted new_name;
	\| Location: \| hdfs://host.example.com:8020/path/user.db/new_name
	\| Table Type: \| MANAGED_TABLE \| NULL
	\| Table Parameters: \| NULL \| NULL
	\| \| DO_NOT_UPDATE_STATS \| true
	\| \| kudu.master_addresses \| vd0342.example.com
	\| \| kudu.table_name \| impala::user.new_name

	-- Put some data in the table to demonstrate how external tables can map to
	-- different underlying Kudu tables.
	insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two');

	-- This external table points to the same underlying Kudu table, NEW_NAME,
	-- as we created above. No need to declare columns or other table aspects.
	create external table kudu_table_alias stored as kudu
	tblproperties('kudu.table_name' = 'impala::user.new_name');

	-- The external table can fetch data from the NEW_NAME table that already
	-- existed and already had data.
	select * from kudu_table_alias limit 100;
	+----+------+
	\| id \| s \|
	+----+------+
	\| 1 \| one \|
	\| 0 \| zero \|
	\| 2 \| two \|
	+----+------+

	-- We cannot re-point the external table at a different underlying Kudu table
	-- unless that other underlying Kudu table already exists.
	alter table kudu_table_alias
	set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
	ERROR:
	TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name',
	Kudu error: The table does not exist: table_name: "impala::user.yet_another_name"

	-- Once the underlying Kudu table exists, we can re-point the external table to it.
	create table yet_another_name (id bigint primary key, x int, y int, s string)
	partition by hash(id) partitions 2 stored as kudu;

	alter table kudu_table_alias
	set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');

	-- Now no data is returned because this other table is empty.
	select * from kudu_table_alias limit 100;

	-- The Impala table automatically recognizes the table schema of the new table,
	-- for example the extra X and Y columns not present in the original table.
	describe kudu_table_alias;
	+------+--------+---------+-------------+----------+...
	\| name \| type \| comment \| primary_key \| nullable \|...
	+------+--------+---------+-------------+----------+...
	\| id \| bigint \| \| true \| false \|...
	\| x \| int \| \| false \| true \|...
	\| y \| int \| \| false \| true \|...
	\| s \| string \| \| false \| true \|...
	+------+--------+---------+-------------+----------+...
	</codeblock>

	<p>
	The <codeph>SHOW TABLE STATS</codeph> output for a Kudu table shows Kudu-specific details about the layout of the table.
	Instead of information about the number and sizes of files, the information is divided by the Kudu tablets.
	For each tablet, the output includes the fields
	<codeph># Rows</codeph> (although this number is not currently computed), <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
	The output of <codeph>SHOW COLUMN STATS</codeph>, illustrating the distribution of values within each column, is the same for Kudu tables
	as for HDFS-backed tables.
	</p>

	<p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
	</conbody>
	</concept>

	</concept>