| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
| |
| <meta name="copyright" content="(C) Copyright 2023" /> |
| <meta name="DC.rights.owner" content="(C) Copyright 2023" /> |
| <meta name="DC.Type" content="concept" /> |
| <meta name="DC.Title" content="PARQUET_READ_STATISTICS Query Option (Impala 2.9 or higher only)" /> |
| <meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="DC.Format" content="XHTML" /> |
| <meta name="DC.Identifier" content="parquet_read_statistics" /> |
| <link rel="stylesheet" type="text/css" href="../commonltr.css" /> |
| <title>PARQUET_READ_STATISTICS Query Option (Impala 2.9 or higher only)</title> |
| </head> |
| <body id="parquet_read_statistics"> |
| |
| |
| <h1 class="title topictitle1" id="ariaid-title1">PARQUET_READ_STATISTICS Query Option (<span class="keyword">Impala 2.9</span> or higher only)</h1> |
| |
| |
| |
| |
| |
| |
| <div class="body conbody"> |
| |
| <p class="p"> |
| The <code class="ph codeph">PARQUET_READ_STATISTICS</code> query option controls whether to read |
| statistics from Parquet files and use them during query processing. |
| </p> |
| |
| |
| <p class="p"> |
| Parquet stores min/max stats which can be used to skip reading row groups if they don't |
| qualify a certain predicate. When this query option is set to <code class="ph codeph">true</code>, |
| Impala reads the Parquet statistics and skips reading row groups that do not match the |
| conditions in the <code class="ph codeph">WHERE</code> clause. |
| </p> |
| |
| |
| <p class="p"> |
| Impala supports filtering based on Parquet statistics: |
| </p> |
| |
| |
| <ul class="ul"> |
| <li class="li"> |
| Of the numerical types for the old version of the statistics: Boolean, Integer, Float |
| </li> |
| |
| |
| <li class="li"> |
| Of the types for the new version of the statistics (starting in IMPALA 2.8): Boolean, |
| Integer, Float, Decimal, String, Timestamp |
| </li> |
| |
| |
| <li class="li"> |
| For simple predicates of the forms: <code class="ph codeph"><slot> <op> <constant></code> or |
| <code class="ph codeph"><constant> <op> <slot></code>, where <code class="ph codeph"><op></code> is LT, |
| LE, GE, GT, and EQ |
| </li> |
| |
| </ul> |
| |
| |
| <p class="p"> |
| The <code class="ph codeph">PARQUET_READ_STATISTICS</code> option provides a workaround when dealing |
| with files that have corrupt Parquet statistics and unknown errors. |
| </p> |
| |
| |
| <p class="p"> |
| In the query runtime profile output for each Impalad instance, the |
| <code class="ph codeph">NumStatsFilteredRowGroups</code> field in the SCAN node section shows the number |
| of row groups that were skipped based on Parquet statistics. |
| </p> |
| |
| |
| <div class="p"> |
| The supported values for the query option are: |
| <ul class="ul"> |
| <li class="li"> |
| <code class="ph codeph">true</code> (<code class="ph codeph">1</code>): Read statistics from Parquet files and use |
| them in query processing. |
| </li> |
| |
| |
| <li class="li"> |
| <code class="ph codeph">false</code> (<code class="ph codeph">0</code>): Do not use Parquet read statistics. |
| </li> |
| |
| |
| <li class="li"> |
| Any other values are treated as <code class="ph codeph">false</code>. |
| </li> |
| |
| </ul> |
| |
| </div> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Type:</strong> Boolean |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Default:</strong> <code class="ph codeph">true</code> |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span> |
| </p> |
| |
| |
| </div> |
| |
| |
| <div class="related-links"> |
| <div class="familylinks"> |
| <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div> |
| </div> |
| </div></body> |
| </html> |