blob: 81924c7c9de99dcf735895d2cefd7c1a5c5ce26a [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="copyright" content="(C) Copyright 2025" />
<meta name="DC.rights.owner" content="(C) Copyright 2025" />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="REFRESH Statement" />
<meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html" />
<meta name="prodname" content="Impala" />
<meta name="prodname" content="Impala" />
<meta name="version" content="Impala 3.4.x" />
<meta name="version" content="Impala 3.4.x" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="refresh" />
<link rel="stylesheet" type="text/css" href="../commonltr.css" />
<title>REFRESH Statement</title>
</head>
<body id="refresh">
<h1 class="title topictitle1" id="ariaid-title1">REFRESH Statement</h1>
<div class="body conbody">
<p class="p">
The <code class="ph codeph">REFRESH</code> statement reloads the metadata for the table from the
metastore database and does an incremental reload of the file and block metadata from the
HDFS NameNode. <code class="ph codeph">REFRESH</code> is used to avoid inconsistencies between Impala
and external metadata sources, namely Hive Metastore (HMS) and NameNodes.
</p>
<p class="p"> The <code class="ph codeph">REFRESH</code> statement is only required if you load data
from outside of Impala. Updated metadata, as a result of running
<code class="ph codeph">REFRESH</code>, is broadcast to all Impala coordinators. </p>
<p class="p">
See <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for the information about the way
Impala uses metadata and how it shares the same metastore database as Hive.
</p>
<p class="p">
Once issued, the <code class="ph codeph">REFRESH</code> statement cannot be cancelled.
</p>
<p class="p">
<strong class="ph b">Syntax:</strong>
</p>
<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])]</code></pre>
<p class="p">
<strong class="ph b">Usage notes:</strong>
</p>
<p class="p">
The table name is a required parameter, and the table must already exist and be known to
Impala.
</p>
<p class="p">
Only the metadata for the specified table is reloaded.
</p>
<p class="p">
Use the <code class="ph codeph">REFRESH</code> statement to load the latest metastore metadata for a
particular table after one of the following scenarios happens outside of Impala:
</p>
<ul class="ul">
<li class="li"> Deleting, adding, or modifying files. <p class="p"> For example, after loading
new data files into the HDFS data directory for the table, appending
to an existing HDFS file, inserting data from Hive via
<code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code>. </p>
</li>
<li class="li">
Deleting, adding, or modifying partitions.
<p class="p">
For example, after issuing <code class="ph codeph">ALTER TABLE</code> or other table-modifying SQL
statement in Hive
</p>
</li>
</ul>
<div class="note note"><span class="notetitle">Note:</span>
<p class="p">
In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">ALTER TABLE
<var class="keyword varname">table_name</var> RECOVER PARTITIONS</code> statement is a faster
alternative to <code class="ph codeph">REFRESH</code> when you are only adding new partition
directories through Hive or manual HDFS operations. See
<a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for details.
</p>
</div>
<div class="p">
<code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts:
<ul class="ul">
<li class="li">
<code class="ph codeph">INVALIDATE METADATA</code> is an asynchronous operations that simply
discards the loaded metadata from the catalog and coordinator caches. After that
operation, the catalog and all the Impala coordinators only know about the existence
of databases and tables and nothing more. Metadata loading for tables is triggered
by any subsequent queries.
</li>
<li class="li">
<code class="ph codeph">REFRESH</code> reloads the metadata synchronously.
<code class="ph codeph">REFRESH</code> is more lightweight than doing a full metadata load after a
table has been invalidated. <code class="ph codeph">REFRESH</code> cannot detect changes in block
locations triggered by operations like HDFS balancer, hence causing remote reads
during query execution with negative performance implications.
</li>
</ul>
</div>
<p class="p">
<strong class="ph b">Refreshing a single partition:</strong>
</p>
<p class="p">
In <span class="keyword">Impala 2.7</span> and higher, the <code class="ph codeph">REFRESH</code> statement
can apply to a single partition at a time, rather than the whole table. Include the
optional <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code> clause and specify
values for each of the partition key columns.
</p>
<div class="p">
The following rules apply:
<ul class="ul">
<li class="li">
The <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">REFRESH</code> statement must
include all the partition key columns.
</li>
<li class="li">
The order of the partition key columns does not have to match the column order in the
table.
</li>
<li class="li">
Specifying a nonexistent partition does not cause an error.
</li>
<li class="li">
The partition can be one that Impala created and is already aware of, or a new
partition created through Hive.
</li>
</ul>
</div>
<p class="p">
The following examples demonstrates the above rules.
</p>
<pre class="pre codeblock"><code>
-- Partition doesn't exist.
refresh p2 partition (y=0, z=3);
refresh p2 partition (y=0, z=-1)
-- Key columns specified in a different order than the table definition.
refresh p2 partition (z=1, y=0)
-- Incomplete partition spec causes an error.
refresh p2 partition (y=0)
ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
</code></pre>
<p class="p">
For examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
with a combination of Impala and Hive operations, see
<a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
</p>
<p class="p">
<strong class="ph b">Related impala-shell options:</strong>
</p>
<p class="p">
Due to the expense of reloading the metadata for all tables, the
<span class="keyword cmdname">impala-shell</span> <code class="ph codeph">-r</code> option is not recommended.
</p>
<p class="p">
<strong class="ph b">HDFS permissions:</strong>
</p>
<p class="p">
All HDFS and Ranger permissions and privilege requirements are the same whether you
refresh the entire table or a single partition.
</p>
<p class="p">
<strong class="ph b">HDFS considerations:</strong>
</p>
<p class="p">
The <code class="ph codeph">REFRESH</code> statement checks HDFS permissions of the underlying data
files and directories, caching this information so that a statement can be cancelled
immediately if for example the <code class="ph codeph">impala</code> user does not have permission to
write to the data directory for the table. Impala reports any lack of write permissions as
an <code class="ph codeph">INFO</code> message in the log file.
</p>
<p class="p">
If you change HDFS permissions to make data readable or writeable by the Impala user,
issue another <code class="ph codeph">REFRESH</code> to make Impala aware of the change.
</p>
<p class="p">
<strong class="ph b">Kudu considerations:</strong>
</p>
<p class="p">By default, much of the metadata
for Kudu tables is handled by the underlying storage layer. Kudu tables
have less reliance on the Metastore database, and require less metadata
caching on the Impala side. For example, information about partitions in
Kudu tables is managed by Kudu, and Impala does not cache any block
locality metadata for Kudu tables. If the Kudu service is not integrated
with the Hive Metastore, Impala will manage Kudu table metadata in the
Hive Metastore.</p>
<p class="p">
The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are
needed less frequently for Kudu tables than for HDFS-backed tables. Neither statement is
needed when data is added to, removed, or updated in a Kudu table, even if the changes
are made directly to Kudu through a client program using the Kudu API. Run
<code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or <code class="ph codeph">INVALIDATE METADATA
<var class="keyword varname">table_name</var></code> for a Kudu table only after making a change to
the Kudu table schema, such as adding or dropping a column.
</p>
<p class="p">
<strong class="ph b">Related information:</strong>
</p>
<p class="p">
<a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
<a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a>
</p>
</div>
<div class="related-links">
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div>
</div>
</div></body>
</html>