docs/topics/impala_refresh.xml - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="refresh">

   <title>REFRESH Statement</title>
   <titlealts audience="PDF"><navtitle>REFRESH</navtitle></titlealts>
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
       <data name="Category" value="SQL"/>
       <data name="Category" value="DDL"/>
       <data name="Category" value="Tables"/>
       <data name="Category" value="Hive"/>
       <data name="Category" value="Metastore"/>
       <data name="Category" value="ETL"/>
       <data name="Category" value="Ingest"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
     </metadata>
   </prolog>

   <conbody>

     <p>
       <indexterm audience="hidden">REFRESH statement</indexterm>
       To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are
       connected through <cmdname>impala-shell</cmdname>, JDBC, or ODBC) must have current metadata about those
       databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses
       metadata and how it shares the same metastore database as Hive, see
       <xref href="impala_hadoop.xml#intro_metastore"/> for background information.
     </p>

     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>

 <codeblock rev="IMPALA-1683">REFRESH [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>key_col1</varname>=<varname>val1</varname> [, <varname>key_col2</varname>=<varname>val2</varname>...])]
 <ph rev="2.9.0 IMPALA-5259">REFRESH FUNCTIONS <varname>db_name</varname></ph>
 </codeblock>

     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

     <p>
       Use the <codeph>REFRESH</codeph> statement to load the latest metastore metadata and block location data for
       a particular table in these scenarios:
     </p>

     <ul>
       <li>
         After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL
         pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why
         metadata needs to be refreshed.)
       </li>

       <li>
         After issuing <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>, or other
         table-modifying SQL statement in Hive.
       </li>
     </ul>

     <note rev="2.3.0">
       <p rev="2.3.0">
         In <keyword keyref="impala23_full"/> and higher, the syntax <codeph>ALTER TABLE <varname>table_name</varname> RECOVER PARTITIONS</codeph>
         is a faster alternative to <codeph>REFRESH</codeph> when the only change to the table data is the addition of
         new partition directories through Hive or manual HDFS operations.
         See <xref href="impala_alter_table.xml#alter_table"/> for details.
       </p>
     </note>

     <p>
       You only need to issue the <codeph>REFRESH</codeph> statement on the node to which you connect to issue
       queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read
       requests for the correct HDFS blocks without relying on the metadata on the other nodes.
     </p>

     <p>
       <codeph>REFRESH</codeph> reloads the metadata for the table from the metastore database, and does an
       incremental reload of the low-level block location data to account for any new data files added to the HDFS
       data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common
       scenario where new data files are added to HDFS.
     </p>

     <p>
       Only the metadata for the specified table is flushed. The table must already exist and be known to Impala,
       either because the <codeph>CREATE TABLE</codeph> statement was run in Impala rather than Hive, or because a
       previous <codeph>INVALIDATE METADATA</codeph> statement caused Impala to reload its entire metadata catalog.
     </p>

     <note>
       <p rev="1.2">
         The catalog service broadcasts any changed metadata as a result of Impala
         <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> statements to all
         Impala nodes. Thus, the <codeph>REFRESH</codeph> statement is only required if you load data through Hive
         or by manipulating data files in HDFS directly. See <xref href="impala_components.xml#intro_catalogd"/> for
         more information on the catalog service.
       </p>
       <p rev="1.2.1">
         Another way to avoid inconsistency across nodes is to enable the
         <codeph>SYNC_DDL</codeph> query option before performing a DDL statement or an <codeph>INSERT</codeph> or
         <codeph>LOAD DATA</codeph>.
       </p>
       <p rev="1.1">
         The table name is a required parameter. To flush the metadata for all tables, use the
         <codeph><xref href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE METADATA</xref></codeph>
         command.
       </p>
       <p conref="../shared/impala_common.xml#common/invalidate_then_refresh"/>
     </note>

     <p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>

     <p>
       A metadata update for an <codeph>impalad</codeph> instance <b>is</b> required if:
     </p>

     <ul>
       <li>
         A metadata change occurs.
       </li>

       <li>
         <b>and</b> the change is made through Hive.
       </li>

       <li>
         <b>and</b> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
         connect.
       </li>
     </ul>

     <p rev="1.2">
       A metadata update for an Impala node is <b>not</b> required after you run <codeph>ALTER TABLE</codeph>,
       <codeph>INSERT</codeph>, or other table-modifying statement in Impala rather than Hive. Impala handles the
       metadata synchronization automatically through the catalog service.
     </p>

     <p>
       Database and table metadata is typically modified by:
     </p>

     <ul>
       <li>
         Hive - through <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, <codeph>DROP</codeph> or
         <codeph>INSERT</codeph> operations.
       </li>

       <li>
         Impalad - through <codeph>CREATE TABLE</codeph>, <codeph>ALTER TABLE</codeph>, and <codeph>INSERT</codeph>
         operations. <ph rev="1.2">Such changes are propagated to all Impala nodes by the
         Impala catalog service.</ph>
       </li>
     </ul>

     <p>
       <codeph>REFRESH</codeph> causes the metadata for that table to be immediately reloaded. For a huge table,
       that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable
       delay later, for example if the next reference to the table is during a benchmark test.
     </p>

     <p rev="IMPALA-1683">
       <b>Refreshing a single partition:</b>
     </p>

     <p rev="IMPALA-1683">
       In <keyword keyref="impala27_full"/> and higher, the <codeph>REFRESH</codeph> statement can apply to a single partition at a time,
       rather than the whole table. Include the optional <codeph>PARTITION (<varname>partition_spec</varname>)</codeph>
       clause and specify values for each of the partition key columns.
     </p>

     <p rev="IMPALA-1683">
       The following examples show how to make Impala aware of data added to a single partition, after data is loaded into
       a partition's data directory using some mechanism outside Impala, such as Hive or Spark. The partition can be one that
       Impala created and is already aware of, or a new partition created through Hive.
     </p>

 <codeblock rev="IMPALA-1683"><![CDATA[
 impala> create table p (x int) partitioned by (y int);
 impala> insert into p (x,y) values (1,2), (2,2), (2,1);
 impala> show partitions p;
 +-------+-------+--------+------+...
 | y     | #Rows | #Files | Size |...
 +-------+-------+--------+------+...
 | 1     | -1    | 1      | 2B   |...
 | 2     | -1    | 1      | 4B   |...
 | Total | -1    | 2      | 6B   |...
 +-------+-------+--------+------+...

 -- ... Data is inserted into one of the partitions by some external mechanism ...
 beeline> insert into p partition (y = 1) values(1000);

 impala> refresh p partition (y=1);
 impala> select x from p where y=1;
 +------+
 | x    |
 +------+
 | 2    | <- Original data created by Impala
 | 1000 | <- Additional data inserted through Beeline
 +------+
 ]]>
 </codeblock>

     <p rev="IMPALA-1683">
       The same applies for tables with more than one partition key column.
       The <codeph>PARTITION</codeph> clause of the <codeph>REFRESH</codeph>
       statement must include all the partition key columns.
     </p>

 <codeblock rev="IMPALA-1683"><![CDATA[
 impala> create table p2 (x int) partitioned by (y int, z int);
 impala> insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3);
 impala> show partitions p2;
 +-------+---+-------+--------+------+...
 | y     | z | #Rows | #Files | Size |...
 +-------+---+-------+--------+------+...
 | 0     | 0 | -1    | 1      | 2B   |...
 | 2     | 3 | -1    | 1      | 4B   |...
 | Total |   | -1    | 2      | 6B   |...
 +-------+---+-------+--------+------+...

 -- ... Data is inserted into one of the partitions by some external mechanism ...
 beeline> insert into p2 partition (y = 2, z = 3) values(1000);

 impala> refresh p2 partition (y=2, z=3);
 impala> select x from p where y=2 and z = 3;
 +------+
 | x    |
 +------+
 | 1    | <- Original data created by Impala
 | 2    | <- Original data created by Impala
 | 1000 | <- Additional data inserted through Beeline
 +------+
 ]]>
 </codeblock>

     <p rev="IMPALA-1683">
       The following examples show how specifying a nonexistent partition does not cause any error,
       and the order of the partition key columns does not have to match the column order in the table.
       The partition spec must include all the partition key columns; specifying an incomplete set of
       columns does cause an error.
     </p>

 <codeblock rev="IMPALA-1683"><![CDATA[
 -- Partition doesn't exist.
 refresh p2 partition (y=0, z=3);
 refresh p2 partition (y=0, z=-1)
 -- Key columns specified in a different order than the table definition.
 refresh p2 partition (z=1, y=0)
 -- Incomplete partition spec causes an error.
 refresh p2 partition (y=0)
 ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
 ]]>
 </codeblock>

     <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>

     <p conref="../shared/impala_common.xml#common/example_blurb"/>

     <p>
       The following example shows how you might use the <codeph>REFRESH</codeph> statement after manually adding
       new HDFS data files to the Impala data directory for a table:
     </p>

 <codeblock>[impalad-host:21000] &gt; refresh t1;
 [impalad-host:21000] &gt; refresh t2;
 [impalad-host:21000] &gt; select * from t1;
 ...
 [impalad-host:21000] &gt; select * from t2;
 ... </codeblock>

     <p>
       For more examples of using <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> with a
       combination of Impala and Hive operations, see <xref href="impala_tutorial.xml#tutorial_impala_hive"/>.
     </p>

     <p>
       <b>Related impala-shell options:</b>
     </p>

     <p rev="1.1">
       The <cmdname>impala-shell</cmdname> option <codeph>-r</codeph> issues an <codeph>INVALIDATE METADATA</codeph> statement
       when starting up the shell, effectively performing a <codeph>REFRESH</codeph> of all tables.
       Due to the expense of reloading the metadata for all tables, the <cmdname>impala-shell</cmdname> <codeph>-r</codeph>
       option is not recommended for day-to-day use in a production environment. (This option was mainly intended as a workaround
       for synchronization issues in very old Impala versions.)
     </p>

     <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
     <p rev="">
       The user ID that the <cmdname>impalad</cmdname> daemon runs under,
       typically the <codeph>impala</codeph> user, must have execute
       permissions for all the relevant directories holding table data.
       (A table could have data spread across multiple directories,
       or in unexpected paths, if it uses partitioning or
       specifies a <codeph>LOCATION</codeph> attribute for
       individual partitions or the entire table.)
       Issues with permissions might not cause an immediate error for this statement,
       but subsequent statements such as <codeph>SELECT</codeph>
       or <codeph>SHOW TABLE STATS</codeph> could fail.
     </p>
     <p rev="IMPALA-1683">
       All HDFS and Sentry permissions and privileges are the same whether you refresh the entire table
       or a single partition.
     </p>

     <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>

     <p>
       The <codeph>REFRESH</codeph> command checks HDFS permissions of the underlying data files and directories,
       caching this information so that a statement can be cancelled immediately if for example the
       <codeph>impala</codeph> user does not have permission to write to the data directory for the table. Impala
       reports any lack of write permissions as an <codeph>INFO</codeph> message in the log file, in case that
       represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
       user, issue another <codeph>REFRESH</codeph> to make Impala aware of the change.
     </p>

     <note conref="../shared/impala_common.xml#common/compute_stats_next"/>

     <p conref="../shared/impala_common.xml#common/s3_blurb"/>
     <p conref="../shared/impala_common.xml#common/s3_metadata"/>

     <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>

     <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
     <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
     <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>

     <p conref="../shared/impala_common.xml#common/udf_blurb"/>
     <p conref="../shared/impala_common.xml#common/refresh_functions_tip"/>

     <p conref="../shared/impala_common.xml#common/related_info"/>
     <p>
       <xref href="impala_hadoop.xml#intro_metastore"/>,
       <xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
     </p>
   </conbody>
 </concept>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
	<concept id="refresh">

	<title>REFRESH Statement</title>
	<titlealts audience="PDF"><navtitle>REFRESH</navtitle></titlealts>
	<prolog>
	<metadata>
	<data name="Category" value="Impala"/>
	<data name="Category" value="SQL"/>
	<data name="Category" value="DDL"/>
	<data name="Category" value="Tables"/>
	<data name="Category" value="Hive"/>
	<data name="Category" value="Metastore"/>
	<data name="Category" value="ETL"/>
	<data name="Category" value="Ingest"/>
	<data name="Category" value="Developers"/>
	<data name="Category" value="Data Analysts"/>
	</metadata>
	</prolog>

	<conbody>

	<p>
	<indexterm audience="hidden">REFRESH statement</indexterm>
	To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are
	connected through <cmdname>impala-shell</cmdname>, JDBC, or ODBC) must have current metadata about those
	databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses
	metadata and how it shares the same metastore database as Hive, see
	<xref href="impala_hadoop.xml#intro_metastore"/> for background information.
	</p>

	<p conref="../shared/impala_common.xml#common/syntax_blurb"/>

	<codeblock rev="IMPALA-1683">REFRESH [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>key_col1</varname>=<varname>val1</varname> [, <varname>key_col2</varname>=<varname>val2</varname>...])]
	<ph rev="2.9.0 IMPALA-5259">REFRESH FUNCTIONS <varname>db_name</varname></ph>
	</codeblock>

	<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>

	<p>
	Use the <codeph>REFRESH</codeph> statement to load the latest metastore metadata and block location data for
	a particular table in these scenarios:
	</p>

	<ul>
	<li>
	After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL
	pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why
	metadata needs to be refreshed.)
	</li>

	<li>
	After issuing <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>, or other
	table-modifying SQL statement in Hive.
	</li>
	</ul>

	<note rev="2.3.0">
	<p rev="2.3.0">
	In <keyword keyref="impala23_full"/> and higher, the syntax <codeph>ALTER TABLE <varname>table_name</varname> RECOVER PARTITIONS</codeph>
	is a faster alternative to <codeph>REFRESH</codeph> when the only change to the table data is the addition of
	new partition directories through Hive or manual HDFS operations.
	See <xref href="impala_alter_table.xml#alter_table"/> for details.
	</p>
	</note>

	<p>
	You only need to issue the <codeph>REFRESH</codeph> statement on the node to which you connect to issue
	queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read
	requests for the correct HDFS blocks without relying on the metadata on the other nodes.
	</p>

	<p>
	<codeph>REFRESH</codeph> reloads the metadata for the table from the metastore database, and does an
	incremental reload of the low-level block location data to account for any new data files added to the HDFS
	data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common
	scenario where new data files are added to HDFS.
	</p>

	<p>
	Only the metadata for the specified table is flushed. The table must already exist and be known to Impala,
	either because the <codeph>CREATE TABLE</codeph> statement was run in Impala rather than Hive, or because a
	previous <codeph>INVALIDATE METADATA</codeph> statement caused Impala to reload its entire metadata catalog.
	</p>

	<note>
	<p rev="1.2">
	The catalog service broadcasts any changed metadata as a result of Impala
	<codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> statements to all
	Impala nodes. Thus, the <codeph>REFRESH</codeph> statement is only required if you load data through Hive
	or by manipulating data files in HDFS directly. See <xref href="impala_components.xml#intro_catalogd"/> for
	more information on the catalog service.
	</p>
	<p rev="1.2.1">
	Another way to avoid inconsistency across nodes is to enable the
	<codeph>SYNC_DDL</codeph> query option before performing a DDL statement or an <codeph>INSERT</codeph> or
	<codeph>LOAD DATA</codeph>.
	</p>
	<p rev="1.1">
	The table name is a required parameter. To flush the metadata for all tables, use the
	<codeph><xref href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE METADATA</xref></codeph>
	command.
	</p>
	<p conref="../shared/impala_common.xml#common/invalidate_then_refresh"/>
	</note>

	<p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>

	<p>
	A metadata update for an <codeph>impalad</codeph> instance <b>is</b> required if:
	</p>

	<ul>
	<li>
	A metadata change occurs.
	</li>

	<li>
	<b>and</b> the change is made through Hive.
	</li>

	<li>
	<b>and</b> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
	connect.
	</li>
	</ul>

	<p rev="1.2">
	A metadata update for an Impala node is <b>not</b> required after you run <codeph>ALTER TABLE</codeph>,
	<codeph>INSERT</codeph>, or other table-modifying statement in Impala rather than Hive. Impala handles the
	metadata synchronization automatically through the catalog service.
	</p>

	<p>
	Database and table metadata is typically modified by:
	</p>

	<ul>
	<li>
	Hive - through <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, <codeph>DROP</codeph> or
	<codeph>INSERT</codeph> operations.
	</li>

	<li>
	Impalad - through <codeph>CREATE TABLE</codeph>, <codeph>ALTER TABLE</codeph>, and <codeph>INSERT</codeph>
	operations. <ph rev="1.2">Such changes are propagated to all Impala nodes by the
	Impala catalog service.</ph>
	</li>
	</ul>

	<p>
	<codeph>REFRESH</codeph> causes the metadata for that table to be immediately reloaded. For a huge table,
	that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable
	delay later, for example if the next reference to the table is during a benchmark test.
	</p>

	<p rev="IMPALA-1683">
	<b>Refreshing a single partition:</b>
	</p>

	<p rev="IMPALA-1683">
	In <keyword keyref="impala27_full"/> and higher, the <codeph>REFRESH</codeph> statement can apply to a single partition at a time,
	rather than the whole table. Include the optional <codeph>PARTITION (<varname>partition_spec</varname>)</codeph>
	clause and specify values for each of the partition key columns.
	</p>

	<p rev="IMPALA-1683">
	The following examples show how to make Impala aware of data added to a single partition, after data is loaded into
	a partition's data directory using some mechanism outside Impala, such as Hive or Spark. The partition can be one that
	Impala created and is already aware of, or a new partition created through Hive.
	</p>

	<codeblock rev="IMPALA-1683"><![CDATA[
	impala> create table p (x int) partitioned by (y int);
	impala> insert into p (x,y) values (1,2), (2,2), (2,1);
	impala> show partitions p;
	+-------+-------+--------+------+...
	\| y \| #Rows \| #Files \| Size \|...
	+-------+-------+--------+------+...
	\| 1 \| -1 \| 1 \| 2B \|...
	\| 2 \| -1 \| 1 \| 4B \|...
	\| Total \| -1 \| 2 \| 6B \|...
	+-------+-------+--------+------+...

	-- ... Data is inserted into one of the partitions by some external mechanism ...
	beeline> insert into p partition (y = 1) values(1000);

	impala> refresh p partition (y=1);
	impala> select x from p where y=1;
	+------+
	\| x \|
	+------+
	\| 2 \| <- Original data created by Impala
	\| 1000 \| <- Additional data inserted through Beeline
	+------+
	]]>
	</codeblock>

	<p rev="IMPALA-1683">
	The same applies for tables with more than one partition key column.
	The <codeph>PARTITION</codeph> clause of the <codeph>REFRESH</codeph>
	statement must include all the partition key columns.
	</p>

	<codeblock rev="IMPALA-1683"><![CDATA[
	impala> create table p2 (x int) partitioned by (y int, z int);
	impala> insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3);
	impala> show partitions p2;
	+-------+---+-------+--------+------+...
	\| y \| z \| #Rows \| #Files \| Size \|...
	+-------+---+-------+--------+------+...
	\| 0 \| 0 \| -1 \| 1 \| 2B \|...
	\| 2 \| 3 \| -1 \| 1 \| 4B \|...
	\| Total \| \| -1 \| 2 \| 6B \|...
	+-------+---+-------+--------+------+...

	-- ... Data is inserted into one of the partitions by some external mechanism ...
	beeline> insert into p2 partition (y = 2, z = 3) values(1000);

	impala> refresh p2 partition (y=2, z=3);
	impala> select x from p where y=2 and z = 3;
	+------+
	\| x \|
	+------+
	\| 1 \| <- Original data created by Impala
	\| 2 \| <- Original data created by Impala
	\| 1000 \| <- Additional data inserted through Beeline
	+------+
	]]>
	</codeblock>

	<p rev="IMPALA-1683">
	The following examples show how specifying a nonexistent partition does not cause any error,
	and the order of the partition key columns does not have to match the column order in the table.
	The partition spec must include all the partition key columns; specifying an incomplete set of
	columns does cause an error.
	</p>

	<codeblock rev="IMPALA-1683"><![CDATA[
	-- Partition doesn't exist.
	refresh p2 partition (y=0, z=3);
	refresh p2 partition (y=0, z=-1)
	-- Key columns specified in a different order than the table definition.
	refresh p2 partition (z=1, y=0)
	-- Incomplete partition spec causes an error.
	refresh p2 partition (y=0)
	ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
	]]>
	</codeblock>

	<p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>

	<p conref="../shared/impala_common.xml#common/example_blurb"/>

	<p>
	The following example shows how you might use the <codeph>REFRESH</codeph> statement after manually adding
	new HDFS data files to the Impala data directory for a table:
	</p>

	<codeblock>[impalad-host:21000] > refresh t1;
	[impalad-host:21000] > refresh t2;
	[impalad-host:21000] > select * from t1;
	...
	[impalad-host:21000] > select * from t2;
	... </codeblock>

	<p>
	For more examples of using <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> with a
	combination of Impala and Hive operations, see <xref href="impala_tutorial.xml#tutorial_impala_hive"/>.
	</p>

	<p>
	<b>Related impala-shell options:</b>
	</p>

	<p rev="1.1">
	The <cmdname>impala-shell</cmdname> option <codeph>-r</codeph> issues an <codeph>INVALIDATE METADATA</codeph> statement
	when starting up the shell, effectively performing a <codeph>REFRESH</codeph> of all tables.
	Due to the expense of reloading the metadata for all tables, the <cmdname>impala-shell</cmdname> <codeph>-r</codeph>
	option is not recommended for day-to-day use in a production environment. (This option was mainly intended as a workaround
	for synchronization issues in very old Impala versions.)
	</p>

	<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
	<p rev="">
	The user ID that the <cmdname>impalad</cmdname> daemon runs under,
	typically the <codeph>impala</codeph> user, must have execute
	permissions for all the relevant directories holding table data.
	(A table could have data spread across multiple directories,
	or in unexpected paths, if it uses partitioning or
	specifies a <codeph>LOCATION</codeph> attribute for
	individual partitions or the entire table.)
	Issues with permissions might not cause an immediate error for this statement,
	but subsequent statements such as <codeph>SELECT</codeph>
	or <codeph>SHOW TABLE STATS</codeph> could fail.
	</p>
	<p rev="IMPALA-1683">
	All HDFS and Sentry permissions and privileges are the same whether you refresh the entire table
	or a single partition.
	</p>

	<p conref="../shared/impala_common.xml#common/hdfs_blurb"/>

	<p>
	The <codeph>REFRESH</codeph> command checks HDFS permissions of the underlying data files and directories,
	caching this information so that a statement can be cancelled immediately if for example the
	<codeph>impala</codeph> user does not have permission to write to the data directory for the table. Impala
	reports any lack of write permissions as an <codeph>INFO</codeph> message in the log file, in case that
	represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
	user, issue another <codeph>REFRESH</codeph> to make Impala aware of the change.
	</p>

	<note conref="../shared/impala_common.xml#common/compute_stats_next"/>

	<p conref="../shared/impala_common.xml#common/s3_blurb"/>
	<p conref="../shared/impala_common.xml#common/s3_metadata"/>

	<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>

	<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
	<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
	<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>

	<p conref="../shared/impala_common.xml#common/udf_blurb"/>
	<p conref="../shared/impala_common.xml#common/refresh_functions_tip"/>

	<p conref="../shared/impala_common.xml#common/related_info"/>
	<p>
	<xref href="impala_hadoop.xml#intro_metastore"/>,
	<xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
	</p>
	</conbody>
	</concept>