| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
| |
| <meta name="copyright" content="(C) Copyright 2023" /> |
| <meta name="DC.rights.owner" content="(C) Copyright 2023" /> |
| <meta name="DC.Type" content="concept" /> |
| <meta name="DC.Title" content="PARQUET_FILE_SIZE Query Option" /> |
| <meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="DC.Format" content="XHTML" /> |
| <meta name="DC.Identifier" content="parquet_file_size" /> |
| <link rel="stylesheet" type="text/css" href="../commonltr.css" /> |
| <title>PARQUET_FILE_SIZE Query Option</title> |
| </head> |
| <body id="parquet_file_size"> |
| |
| |
| <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FILE_SIZE Query Option</h1> |
| |
| |
| |
| |
| <div class="body conbody"> |
| |
| <p class="p"> |
| |
| Specifies the maximum size of each Parquet data file produced by Impala <code class="ph codeph">INSERT</code> statements. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Syntax:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate |
| megabytes or gigabytes. For example: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>-- 128 megabytes. |
| set PARQUET_FILE_SIZE=134217728 |
| INSERT OVERWRITE parquet_table SELECT * FROM text_table; |
| |
| -- 512 megabytes. |
| set PARQUET_FILE_SIZE=512m; |
| INSERT OVERWRITE parquet_table SELECT * FROM text_table; |
| |
| -- 1 gigabyte. |
| set PARQUET_FILE_SIZE=1g; |
| INSERT OVERWRITE parquet_table SELECT * FROM text_table; |
| </code></pre> |
| |
| <p class="p"> |
| <strong class="ph b">Usage notes:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB |
| in Impala 2.0 and later) could be much larger than needed for each data file. For <code class="ph codeph">INSERT</code> |
| operations into such tables, you can increase parallelism by specifying a smaller |
| <code class="ph codeph">PARQUET_FILE_SIZE</code> value, resulting in more HDFS blocks that can be processed by different |
| nodes. |
| |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Type:</strong> numeric, with optional unit specifier |
| </p> |
| |
| |
| <div class="note important"><span class="importanttitle">Important:</span> |
| <p class="p"> |
| Currently, the maximum value for this setting is 1 gigabyte (<code class="ph codeph">1g</code>). |
| Setting a value higher than 1 gigabyte could result in errors during |
| an <code class="ph codeph">INSERT</code> operation. |
| </p> |
| |
| </div> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Default:</strong> 0 (produces files with a target size of 256 MB; files might be larger for very wide tables) |
| </p> |
| |
| |
| <p class="p"> |
| Because ADLS does not expose the block sizes of data files the way HDFS does, any Impala |
| <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements use the |
| <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of Parquet |
| data files. (Using a large block size is more important for Parquet tables than for |
| tables that use other file formats.) |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Isilon considerations:</strong> |
| </p> |
| |
| <div class="p"> |
| Because the EMC Isilon storage devices use a global value for the block size rather than |
| a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option |
| has no effect when Impala inserts data into a table or partition residing on Isilon |
| storage. Use the <code class="ph codeph">isi</code> command to set the default block size globally on |
| the Isilon device. For example, to set the Isilon default block size to 256 MB, the |
| recommended size for Parquet data files for Impala, issue the following command: |
| <pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre> |
| </div> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Ozone considerations:</strong> |
| </p> |
| |
| <p class="p"> |
| Because Apache Ozone storage buckets use a global value for the block size rather than |
| a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option |
| has no effect when Impala inserts data into a table or partition residing on Ozone |
| storage. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Related information:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| For information about the Parquet file format, and how the number and size of data files affects query |
| performance, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>. |
| </p> |
| |
| |
| |
| |
| </div> |
| |
| <div class="related-links"> |
| <div class="familylinks"> |
| <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div> |
| </div> |
| </div></body> |
| </html> |