blob: 4d3c645236ecc692953fbd86da64723673afda15 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="copyright" content="(C) Copyright 2023" />
<meta name="DC.rights.owner" content="(C) Copyright 2023" />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="PARQUET_READ_STATISTICS Query Option (Impala 2.9 or higher only)" />
<meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html" />
<meta name="prodname" content="Impala" />
<meta name="version" content="Impala 3.4.x" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="parquet_read_statistics" />
<link rel="stylesheet" type="text/css" href="../commonltr.css" />
<title>PARQUET_READ_STATISTICS Query Option (Impala 2.9 or higher only)</title>
</head>
<body id="parquet_read_statistics">
<h1 class="title topictitle1" id="ariaid-title1">PARQUET_READ_STATISTICS Query Option (<span class="keyword">Impala 2.9</span> or higher only)</h1>
<div class="body conbody">
<p class="p">
The <code class="ph codeph">PARQUET_READ_STATISTICS</code> query option controls whether to read
statistics from Parquet files and use them during query processing.
</p>
<p class="p">
Parquet stores min/max stats which can be used to skip reading row groups if they don't
qualify a certain predicate. When this query option is set to <code class="ph codeph">true</code>,
Impala reads the Parquet statistics and skips reading row groups that do not match the
conditions in the <code class="ph codeph">WHERE</code> clause.
</p>
<p class="p">
Impala supports filtering based on Parquet statistics:
</p>
<ul class="ul">
<li class="li">
Of the numerical types for the old version of the statistics: Boolean, Integer, Float
</li>
<li class="li">
Of the types for the new version of the statistics (starting in IMPALA 2.8): Boolean,
Integer, Float, Decimal, String, Timestamp
</li>
<li class="li">
For simple predicates of the forms: <code class="ph codeph">&lt;slot&gt; &lt;op&gt; &lt;constant&gt;</code> or
<code class="ph codeph">&lt;constant&gt; &lt;op&gt; &lt;slot&gt;</code>, where <code class="ph codeph">&lt;op&gt;</code> is LT,
LE, GE, GT, and EQ
</li>
</ul>
<p class="p">
The <code class="ph codeph">PARQUET_READ_STATISTICS</code> option provides a workaround when dealing
with files that have corrupt Parquet statistics and unknown errors.
</p>
<p class="p">
In the query runtime profile output for each Impalad instance, the
<code class="ph codeph">NumStatsFilteredRowGroups</code> field in the SCAN node section shows the number
of row groups that were skipped based on Parquet statistics.
</p>
<div class="p">
The supported values for the query option are:
<ul class="ul">
<li class="li">
<code class="ph codeph">true</code> (<code class="ph codeph">1</code>): Read statistics from Parquet files and use
them in query processing.
</li>
<li class="li">
<code class="ph codeph">false</code> (<code class="ph codeph">0</code>): Do not use Parquet read statistics.
</li>
<li class="li">
Any other values are treated as <code class="ph codeph">false</code>.
</li>
</ul>
</div>
<p class="p">
<strong class="ph b">Type:</strong> Boolean
</p>
<p class="p">
<strong class="ph b">Default:</strong> <code class="ph codeph">true</code>
</p>
<p class="p">
<strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
</p>
</div>
<div class="related-links">
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div>
</div>
</div></body>
</html>