blob: 4404da52e08a8f97e20deb51ae506730eef5e073 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="copyright" content="(C) Copyright 2023" />
<meta name="DC.rights.owner" content="(C) Copyright 2023" />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)" />
<meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" />
<meta name="prodname" content="Impala" />
<meta name="prodname" content="Impala" />
<meta name="version" content="Impala 3.4.x" />
<meta name="version" content="Impala 3.4.x" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="s3_skip_insert_staging" />
<link rel="stylesheet" type="text/css" href="../commonltr.css" />
<title>S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</title>
</head>
<body id="s3_skip_insert_staging">
<h1 class="title topictitle1" id="ariaid-title1">S3_SKIP_INSERT_STAGING Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
<div class="body conbody">
<p class="p">
</p>
<p class="p">
Speeds up <code class="ph codeph">INSERT</code> operations on tables or partitions residing on the
Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind
if an error occurs partway through the operation.
</p>
<p class="p">
By default, Impala write operations to S3 tables and partitions involve a two-stage process.
Impala writes intermediate files to S3, then (because S3 does not provide a <span class="q">"rename"</span>
operation) those intermediate files are copied to their final location, making the process
more expensive as on a filesystem that supports renaming or moving files.
This query option makes Impala skip the intermediate files, and instead write the
new data directly to the final destination.
</p>
<p class="p">
<strong class="ph b">Usage notes:</strong>
</p>
<div class="note important"><span class="importanttitle">Important:</span>
<p class="p">
If a host that is participating in the <code class="ph codeph">INSERT</code> operation fails partway through
the query, you might be left with a table or partition that contains some but not all of the
expected data files. Therefore, this option is most appropriate for a development or test
environment where you have the ability to reconstruct the table if a problem during
<code class="ph codeph">INSERT</code> leaves the data in an inconsistent state.
</p>
</div>
<p class="p">
The timing of file deletion during an <code class="ph codeph">INSERT OVERWRITE</code> operation
makes it impractical to write new files to S3 and delete the old files in a single operation.
Therefore, this query option only affects regular <code class="ph codeph">INSERT</code> statements that add
to the existing data in a table, not <code class="ph codeph">INSERT OVERWRITE</code> statements.
Use <code class="ph codeph">TRUNCATE TABLE</code> if you need to remove all contents from an S3 table
before performing a fast <code class="ph codeph">INSERT</code> with this option enabled.
</p>
<p class="p">
Performance improvements with this option enabled can be substantial. The speed increase
might be more noticeable for non-partitioned tables than for partitioned tables.
</p>
<p class="p">
<strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and
<code class="ph codeph">false</code>; any other value interpreted as <code class="ph codeph">false</code>
</p>
<p class="p">
<strong class="ph b">Default:</strong> <code class="ph codeph">true</code> (shown as 1 in output of <code class="ph codeph">SET</code>
statement)
</p>
<p class="p">
<strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
</p>
<p class="p">
<strong class="ph b">Related information:</strong>
</p>
<p class="p">
<a class="xref" href="impala_s3.html#s3">Using Impala with Amazon S3 Object Store</a>
</p>
</div>
<div class="related-links">
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div>
</div>
</div></body>
</html>