| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
| |
| <meta name="copyright" content="(C) Copyright 2023" /> |
| <meta name="DC.rights.owner" content="(C) Copyright 2023" /> |
| <meta name="DC.Type" content="concept" /> |
| <meta name="DC.Title" content="PARQUET_ARRAY_RESOLUTION Query Option (Impala 2.9 or higher only)" /> |
| <meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="DC.Format" content="XHTML" /> |
| <meta name="DC.Identifier" content="parquet_array_resolution" /> |
| <link rel="stylesheet" type="text/css" href="../commonltr.css" /> |
| <title>PARQUET_ARRAY_RESOLUTION Query Option (Impala 2.9 or higher only)</title> |
| </head> |
| <body id="parquet_array_resolution"> |
| |
| |
| <h1 class="title topictitle1" id="ariaid-title1"> |
| PARQUET_ARRAY_RESOLUTION Query Option (<span class="keyword">Impala 2.9</span> or higher only) |
| </h1> |
| |
| |
| |
| |
| |
| |
| <div class="body conbody"> |
| |
| <p class="p"> |
| The <code class="ph codeph">PARQUET_ARRAY_RESOLUTION</code> query option controls the |
| behavior of the indexed-based resolution for nested arrays in Parquet. |
| </p> |
| |
| |
| <p class="p"> |
| In Parquet, you can represent an array using a 2-level or 3-level |
| representation. The modern, standard representation is 3-level. The legacy |
| 2-level scheme is supported for compatibility with older Parquet files. |
| However, there is no reliable metadata within Parquet files to indicate |
| which encoding was used. It is even possible to have mixed encodings within |
| the same file if there are multiple arrays. The |
| <code class="ph codeph">PARQUET_ARRAY_RESOLUTION</code> option controls the process of |
| resolution that is to match every column/field reference from a query to a |
| column in the Parquet file.</p> |
| |
| |
| <p class="p"> |
| The supported values for the query option are: |
| </p> |
| |
| |
| <ul class="ul"> |
| <li class="li"> |
| <code class="ph codeph">THREE_LEVEL</code>: Assumes arrays are encoded with the 3-level |
| representation, and does not attempt the 2-level resolution. |
| </li> |
| |
| |
| <li class="li"> |
| <code class="ph codeph">TWO_LEVEL</code>: Assumes arrays are encoded with the 2-level |
| representation, and does not attempt the 3-level resolution. |
| </li> |
| |
| |
| <li class="li"> |
| <code class="ph codeph">TWO_LEVEL_THEN_THREE_LEVEL</code>: First tries to resolve |
| assuming a 2-level representation, and if unsuccessful, tries a 3-level |
| representation. |
| </li> |
| |
| </ul> |
| |
| |
| <p class="p"> |
| All of the above options resolve arrays encoded with a single level. |
| </p> |
| |
| |
| <p class="p"> |
| A failure to resolve a column/field reference in a query with a given array |
| resolution policy does not necessarily result in a warning or error returned |
| by the query. A mismatch might be treated like a missing column (returns |
| NULL values), and it is not possible to reliably distinguish the 'bad |
| resolution' and 'legitimately missing column' cases. |
| </p> |
| |
| |
| <p class="p"> |
| The name-based policy generally does not have the problem of ambiguous |
| array representations. You specify to use the name-based policy by setting |
| the <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option to |
| <code class="ph codeph">NAME</code>. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Type:</strong> Enum of <code class="ph codeph">TWO_LEVEL</code>, |
| <code class="ph codeph">TWO_LEVEL_THEN_THREE_LEVEL</code>, and |
| <code class="ph codeph">THREE_LEVEL</code> |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Default:</strong> <code class="ph codeph">THREE_LEVEL</code> |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span> |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Examples:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| EXAMPLE A: The following Parquet schema of a file can be interpreted as a |
| 2-level or 3-level: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| ParquetSchemaExampleA { |
| optional group single_element_groups (LIST) { |
| repeated group single_element_group { |
| required int64 count; |
| } |
| } |
| } |
| </code></pre> |
| |
| <p class="p"> |
| The following table schema corresponds to a 2-level interpretation: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| CREATE TABLE t (col1 array<struct<f1: bigint>>) STORED AS PARQUET; |
| </code></pre> |
| |
| <p class="p"> |
| Successful query with a 2-level interpretation: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL; |
| SELECT ITEM.f1 FROM t.col1; |
| </code></pre> |
| |
| <p class="p"> |
| The following table schema corresponds to a 3-level interpretation: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| CREATE TABLE t (col1 array<bigint>) STORED AS PARQUET; |
| </code></pre> |
| |
| <p class="p"> |
| Successful query with a 3-level interpretation: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL; |
| SELECT ITEM FROM t.col1 |
| </code></pre> |
| |
| <p class="p"> |
| EXAMPLE B: The following Parquet schema of a file can be only be successfully |
| interpreted as a 2-level: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| ParquetSchemaExampleB { |
| required group list_of_ints (LIST) { |
| repeated int32 list_of_ints_tuple; |
| } |
| } |
| </code></pre> |
| |
| <p class="p"> |
| The following table schema corresponds to a 2-level interpretation: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| CREATE TABLE t (col1 array<int>) STORED AS PARQUET; |
| </code></pre> |
| |
| <p class="p"> |
| Successful query with a 2-level interpretation: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL; |
| SELECT ITEM FROM t.col1 |
| </code></pre> |
| |
| <p class="p"> |
| Unsuccessful query with a 3-level interpretation. The query returns |
| <code class="ph codeph">NULL</code>s as if the column was missing in the file: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code> |
| SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL; |
| SELECT ITEM FROM t.col1 |
| </code></pre> |
| |
| </div> |
| |
| |
| <div class="related-links"> |
| <div class="familylinks"> |
| <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div> |
| </div> |
| </div></body> |
| </html> |