blob: 2ed6971cdb1acf14a7f059440e55cb0d802e42f1 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="refresh">
<title>REFRESH Statement</title>
<titlealts audience="PDF">
<navtitle>REFRESH</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="SQL"/>
<data name="Category" value="DDL"/>
<data name="Category" value="Tables"/>
<data name="Category" value="Hive"/>
<data name="Category" value="Metastore"/>
<data name="Category" value="ETL"/>
<data name="Category" value="Ingest"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
The <codeph>REFRESH</codeph> statement reloads the metadata for the table from the
metastore database and does an incremental reload of the file and block metadata from the
HDFS NameNode. <codeph>REFRESH</codeph> is used to avoid inconsistencies between Impala
and external metadata sources, namely Hive Metastore (HMS) and NameNodes.
</p>
<p> The <codeph>REFRESH</codeph> statement is only required if you load data
from outside of Impala. Updated metadata, as a result of running
<codeph>REFRESH</codeph>, is broadcast to all Impala coordinators. </p>
<p>
See <xref href="impala_hadoop.xml#intro_metastore"/> for the information about the way
Impala uses metadata and how it shares the same metastore database as Hive.
</p>
<p>
Once issued, the <codeph>REFRESH</codeph> statement cannot be cancelled.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<codeblock rev="IMPALA-1683">REFRESH [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>key_col1</varname>=<varname>val1</varname> [, <varname>key_col2</varname>=<varname>val2</varname>...])]</codeblock>
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
<p>
The table name is a required parameter, and the table must already exist and be known to
Impala.
</p>
<p>
Only the metadata for the specified table is reloaded.
</p>
<p>
Use the <codeph>REFRESH</codeph> statement to load the latest metastore metadata for a
particular table after one of the following scenarios happens outside of Impala:
</p>
<ul>
<li> Deleting, adding, or modifying files. <p> For example, after loading
new data files into the HDFS data directory for the table, appending
to an existing HDFS file, inserting data from Hive via
<codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph>. </p>
</li>
<li>
Deleting, adding, or modifying partitions.
<p>
For example, after issuing <codeph>ALTER TABLE</codeph> or other table-modifying SQL
statement in Hive
</p>
</li>
</ul>
<note rev="2.3.0">
<p rev="2.3.0">
In <keyword keyref="impala23_full"/> and higher, the <codeph>ALTER TABLE
<varname>table_name</varname> RECOVER PARTITIONS</codeph> statement is a faster
alternative to <codeph>REFRESH</codeph> when you are only adding new partition
directories through Hive or manual HDFS operations. See
<xref
href="impala_alter_table.xml#alter_table"/> for details.
</p>
</note>
<p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
<p rev="IMPALA-1683">
<b>Refreshing a single partition:</b>
</p>
<p rev="IMPALA-1683">
In <keyword keyref="impala27_full"/> and higher, the <codeph>REFRESH</codeph> statement
can apply to a single partition at a time, rather than the whole table. Include the
optional <codeph>PARTITION (<varname>partition_spec</varname>)</codeph> clause and specify
values for each of the partition key columns.
</p>
<p>
The following rules apply:
<ul>
<li>
The <codeph>PARTITION</codeph> clause of the <codeph>REFRESH</codeph> statement must
include all the partition key columns.
</li>
<li>
The order of the partition key columns does not have to match the column order in the
table.
</li>
<li>
Specifying a nonexistent partition does not cause an error.
</li>
<li>
The partition can be one that Impala created and is already aware of, or a new
partition created through Hive.
</li>
</ul>
</p>
<p rev="IMPALA-1683">
The following examples demonstrates the above rules.
</p>
<codeblock rev="IMPALA-1683"><![CDATA[
-- Partition doesn't exist.
refresh p2 partition (y=0, z=3);
refresh p2 partition (y=0, z=-1)
-- Key columns specified in a different order than the table definition.
refresh p2 partition (z=1, y=0)
-- Incomplete partition spec causes an error.
refresh p2 partition (y=0)
ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
]]>
</codeblock>
<p>
For examples of using <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
with a combination of Impala and Hive operations, see
<xref href="impala_tutorial.xml#tutorial_impala_hive"/>.
</p>
<p>
<b>Related impala-shell options:</b>
</p>
<p rev="1.1">
Due to the expense of reloading the metadata for all tables, the
<cmdname>impala-shell</cmdname> <codeph>-r</codeph> option is not recommended.
</p>
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
<p rev="IMPALA-1683">
All HDFS and Sentry permissions and privilege requirements are the same whether you
refresh the entire table or a single partition.
</p>
<p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
<p>
The <codeph>REFRESH</codeph> statement checks HDFS permissions of the underlying data
files and directories, caching this information so that a statement can be cancelled
immediately if for example the <codeph>impala</codeph> user does not have permission to
write to the data directory for the table. Impala reports any lack of write permissions as
an <codeph>INFO</codeph> message in the log file.
</p>
<p>
If you change HDFS permissions to make data readable or writeable by the Impala user,
issue another <codeph>REFRESH</codeph> to make Impala aware of the change.
</p>
<p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_hadoop.xml#intro_metastore"/>,
<xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
</p>
</conbody>
</concept>