blob: f712633ce326c7d58194392a554ef864c10696b1 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="copyright" content="(C) Copyright 2023" />
<meta name="DC.rights.owner" content="(C) Copyright 2023" />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)" />
<meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html" />
<meta name="prodname" content="Impala" />
<meta name="prodname" content="Impala" />
<meta name="version" content="Impala 3.4.x" />
<meta name="version" content="Impala 3.4.x" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="parquet_annotate_strings_utf8" />
<link rel="stylesheet" type="text/css" href="../commonltr.css" />
<title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</title>
</head>
<body id="parquet_annotate_strings_utf8">
<h1 class="title topictitle1" id="ariaid-title1">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
<div class="body conbody">
<p class="p">
Causes Impala <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
to write Parquet files that use the UTF-8 annotation for <code class="ph codeph">STRING</code> columns.
</p>
<p class="p">
<strong class="ph b">Usage notes:</strong>
</p>
<p class="p">
By default, Impala represents a <code class="ph codeph">STRING</code> column in Parquet as an unannotated binary field.
</p>
<p class="p">
Impala always uses the UTF-8 annotation when writing <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
columns to Parquet files. An alternative to using the query option is to cast <code class="ph codeph">STRING</code>
values to <code class="ph codeph">VARCHAR</code>.
</p>
<p class="p">
This option is to help make Impala-written data more interoperable with other data processing engines.
Impala itself currently does not support all operations on UTF-8 data.
Although data processed by Impala is typically represented in ASCII, it is valid to designate the
data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
</p>
<p class="p">
<strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and
<code class="ph codeph">false</code>; any other value interpreted as <code class="ph codeph">false</code>
</p>
<p class="p">
<strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code>
statement)
</p>
<p class="p">
<strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
</p>
<p class="p">
<strong class="ph b">Related information:</strong>
</p>
<p class="p">
<a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
</p>
</div>
<div class="related-links">
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div>
</div>
</div></body>
</html>