| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
| |
| <meta name="copyright" content="(C) Copyright 2023" /> |
| <meta name="DC.rights.owner" content="(C) Copyright 2023" /> |
| <meta name="DC.Type" content="concept" /> |
| <meta name="DC.Title" content="PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)" /> |
| <meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="DC.Format" content="XHTML" /> |
| <meta name="DC.Identifier" content="parquet_annotate_strings_utf8" /> |
| <link rel="stylesheet" type="text/css" href="../commonltr.css" /> |
| <title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</title> |
| </head> |
| <body id="parquet_annotate_strings_utf8"> |
| |
| |
| <h1 class="title topictitle1" id="ariaid-title1">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1> |
| |
| |
| |
| |
| <div class="body conbody"> |
| |
| <p class="p"> |
| |
| Causes Impala <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements |
| to write Parquet files that use the UTF-8 annotation for <code class="ph codeph">STRING</code> columns. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Usage notes:</strong> |
| </p> |
| |
| <p class="p"> |
| By default, Impala represents a <code class="ph codeph">STRING</code> column in Parquet as an unannotated binary field. |
| </p> |
| |
| <p class="p"> |
| Impala always uses the UTF-8 annotation when writing <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> |
| columns to Parquet files. An alternative to using the query option is to cast <code class="ph codeph">STRING</code> |
| values to <code class="ph codeph">VARCHAR</code>. |
| </p> |
| |
| <p class="p"> |
| This option is to help make Impala-written data more interoperable with other data processing engines. |
| Impala itself currently does not support all operations on UTF-8 data. |
| Although data processed by Impala is typically represented in ASCII, it is valid to designate the |
| data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8. |
| </p> |
| |
| <p class="p"> |
| <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and |
| <code class="ph codeph">false</code>; any other value interpreted as <code class="ph codeph">false</code> |
| </p> |
| |
| <p class="p"> |
| <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> |
| statement) |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span> |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Related information:</strong> |
| </p> |
| |
| <p class="p"> |
| <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> |
| </p> |
| |
| |
| </div> |
| |
| <div class="related-links"> |
| <div class="familylinks"> |
| <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div> |
| </div> |
| </div></body> |
| </html> |