docs/build/plain-html/topics/impala_rcfile.html - impala - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 <meta name="copyright" content="(C) Copyright 2023" />
 <meta name="DC.rights.owner" content="(C) Copyright 2023" />
 <meta name="DC.Type" content="concept" />
 <meta name="DC.Title" content="Using the RCFile File Format with Impala Tables" />
 <meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html" />
 <meta name="prodname" content="Impala" />
 <meta name="prodname" content="Impala" />
 <meta name="version" content="Impala 3.4.x" />
 <meta name="version" content="Impala 3.4.x" />
 <meta name="DC.Format" content="XHTML" />
 <meta name="DC.Identifier" content="rcfile" />
 <link rel="stylesheet" type="text/css" href="../commonltr.css" />
 <title>Using the RCFile File Format with Impala Tables</title>
 </head>
 <body id="rcfile">


   <h1 class="title topictitle1" id="ariaid-title1">Using the RCFile File Format with Impala Tables</h1>


   <div class="body conbody">

     <p class="p">

       Impala supports using RCFile data files.
     </p>


 <div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" class="table" frame="border" border="1" rules="all"><caption><span class="tablecap"><span class="table--title-label">Table 1. </span>RCFile Format Support in Impala</span></caption><colgroup><col style="width:10%" /><col style="width:10%" /><col style="width:20%" /><col style="width:30%" /><col style="width:30%" /></colgroup><thead class="thead" style="text-align:left;">
           <tr class="row">
             <th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e80">
               File Type
             </th>

             <th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e83">
               Format
             </th>

             <th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e86">
               Compression Codecs
             </th>

             <th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e89">
               Impala Can CREATE?
             </th>

             <th class="entry cell-norowborder" style="vertical-align:top;" id="d164295e92">
               Impala Can INSERT?
             </th>

           </tr>

         </thead>
 <tbody class="tbody">
           <tr class="row">
             <td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e80 ">
               <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
             </td>

             <td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e83 ">
               Structured
             </td>

             <td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e86 ">
               Snappy, gzip, deflate, bzip2
             </td>

             <td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e89 ">
               Yes.
             </td>

             <td class="entry cellrowborder" style="vertical-align:top;" headers="d164295e92 ">
               No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the
               right format, or use <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH
               <var class="keyword varname">table_name</var></code> in Impala.
             </td>


           </tr>

         </tbody>
 </table>
 </div>


     <p class="p toc inpage"></p>

   </div>


   <div class="related-links">
 <div class="familylinks">
 <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div>
 </div>
 </div><div class="topic concept nested1" aria-labelledby="ariaid-title2" id="rcfile_create">

     <h2 class="title topictitle2" id="ariaid-title2">Creating RCFile Tables and Loading Data</h2>


     <div class="body conbody">

       <p class="p">
         If you do not have an existing data file to use, begin by creating one in the appropriate format.
       </p>


       <p class="p">
         <strong class="ph b">To create an RCFile table:</strong>
       </p>


       <p class="p">
         In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
       </p>


 <pre class="pre codeblock"><code>create table rcfile_table (<var class="keyword varname">column_specs</var>) stored as rcfile;</code></pre>

       <p class="p">
         Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
         certain file formats, you might use the Hive shell to load the data. See
         <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
         Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
         statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
         the new data.
       </p>


       <div class="note important"><span class="importanttitle">Important:</span>
         See <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a> for potential compatibility issues with
         RCFile tables created in Hive 0.12, due to a change in the default RCFile SerDe for Hive.
       </div>


       <p class="p">
         For example, here is how you might create some RCFile tables in Impala (by specifying the columns
         explicitly, or cloning the structure of another table), load data through Hive, and query them through
         Impala:
       </p>


 <pre class="pre codeblock"><code>$ impala-shell -i localhost
 [localhost:21000] &gt; create table rcfile_table (x int) stored as rcfile;
 [localhost:21000] &gt; create table rcfile_clone like some_other_table stored as rcfile;
 [localhost:21000] &gt; quit;

 $ hive
 hive&gt; insert into table rcfile_table select x from some_other_table;
 3 Rows loaded to rcfile_table
 Time taken: 19.015 seconds
 hive&gt; quit;

 $ impala-shell -i localhost
 [localhost:21000] &gt; select * from rcfile_table;
 Returned 0 row(s) in 0.23s
 [localhost:21000] &gt; -- Make Impala recognize the data loaded through Hive;
 [localhost:21000] &gt; refresh rcfile_table;
 [localhost:21000] &gt; select * from rcfile_table;
 +---+
 | x |
 +---+
 | 1 |
 | 2 |
 | 3 |
 +---+
 Returned 3 row(s) in 0.23s</code></pre>

       <p class="p">
         <strong class="ph b">Complex type considerations:</strong> Although you can create tables in this file format
         using the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and
         <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
         currently, Impala can query these types only in Parquet tables. <span class="ph">
         The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile
         tables that include complex types. Such queries are allowed in
         <span class="keyword">Impala 2.6</span> and higher. </span>
       </p>


     </div>

   </div>


   <div class="topic concept nested1" aria-labelledby="ariaid-title3" id="rcfile_compression">

     <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for RCFile Tables</h2>


     <div class="body conbody">

       <p class="p">

         You may want to enable compression on existing tables. Enabling compression provides performance gains in
         most cases and is supported for RCFile tables. For example, to enable Snappy compression, you would specify
         the following additional settings when loading data through the Hive shell:
       </p>


 <pre class="pre codeblock"><code>hive&gt; SET hive.exec.compress.output=true;
 hive&gt; SET mapred.max.split.size=256000000;
 hive&gt; SET mapred.output.compression.type=BLOCK;
 hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
 hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>

       <p class="p">
         If you are converting partitioned tables, you must complete additional steps. In such a case, specify
         additional settings similar to the following:
       </p>


 <pre class="pre codeblock"><code>hive&gt; CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>;
 hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
 hive&gt; SET hive.exec.dynamic.partition=true;
 hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>

       <p class="p">
         Remember that Hive does not require that you specify a source format for it. Consider the case of
         converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
         Snappy compressed RCFile. Combining the components outlined previously to complete this table conversion,
         you would specify settings similar to the following:
       </p>


 <pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) STORED AS RCFILE;
 hive&gt; SET hive.exec.compress.output=true;
 hive&gt; SET mapred.max.split.size=256000000;
 hive&gt; SET mapred.output.compression.type=BLOCK;
 hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
 hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
 hive&gt; SET hive.exec.dynamic.partition=true;
 hive&gt; INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</code></pre>

       <p class="p">
         To complete a similar process for a table that includes partitions, you would specify settings similar to
         the following:
       </p>


 <pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS RCFILE;
 hive&gt; SET hive.exec.compress.output=true;
 hive&gt; SET mapred.max.split.size=256000000;
 hive&gt; SET mapred.output.compression.type=BLOCK;
 hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
 hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
 hive&gt; SET hive.exec.dynamic.partition=true;
 hive&gt; INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM tbl;</code></pre>

       <div class="note note"><span class="notetitle">Note:</span>
         <p class="p">
           The compression type is specified in the following command:
         </p>

 <pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
         <p class="p">
           You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
         </p>

       </div>

     </div>

   </div>


   <div class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile_performance">

     <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala RCFile Tables</h2>


     <div class="body conbody">

       <p class="p">
         In general, expect query performance with RCFile tables to be
         faster than with tables using text data, but slower than with
         Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
         for information about using the Parquet file format for
         high-performance analytic queries.
       </p>


       <p class="p">
         In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files
         stored in Amazon S3. For Impala tables that use the file formats Parquet, ORC, RCFile,
         SequenceFile, Avro, and uncompressed text, the setting
         <code class="ph codeph">fs.s3a.block.size</code> in the <span class="ph filepath">core-site.xml</span>
         configuration file determines how Impala divides the I/O work of reading the data files.
         This configuration setting is specified in bytes. By default, this value is 33554432 (32
         MB), meaning that Impala parallelizes S3 read operations on the files as if they were
         made up of 32 MB blocks. For example, if your S3 queries primarily access Parquet files
         written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> to 134217728
         (128 MB) to match the row group size of those files. If most S3 queries involve Parquet
         files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> to 268435456 (256
         MB) to match the row group size produced by Impala.
       </p>


     </div>

   </div>


 </body>
 </html>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
	<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

	<meta name="copyright" content="(C) Copyright 2023" />
	<meta name="DC.rights.owner" content="(C) Copyright 2023" />
	<meta name="DC.Type" content="concept" />
	<meta name="DC.Title" content="Using the RCFile File Format with Impala Tables" />
	<meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html" />
	<meta name="prodname" content="Impala" />
	<meta name="prodname" content="Impala" />
	<meta name="version" content="Impala 3.4.x" />
	<meta name="version" content="Impala 3.4.x" />
	<meta name="DC.Format" content="XHTML" />
	<meta name="DC.Identifier" content="rcfile" />
	<link rel="stylesheet" type="text/css" href="../commonltr.css" />
	<title>Using the RCFile File Format with Impala Tables</title>
	</head>
	<body id="rcfile">


	<h1 class="title topictitle1" id="ariaid-title1">Using the RCFile File Format with Impala Tables</h1>




	<div class="body conbody">

	<p class="p">

	Impala supports using RCFile data files.
	</p>



	<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" class="table" frame="border" border="1" rules="all"><caption><span class="tablecap"><span class="table--title-label">Table 1. </span>RCFile Format Support in Impala</span></caption><colgroup><col style="width:10%" /><col style="width:10%" /><col style="width:20%" /><col style="width:30%" /><col style="width:30%" /></colgroup><thead class="thead" style="text-align:left;">
	<tr class="row">
	<th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e80">
	File Type
	</th>

	<th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e83">
	Format
	</th>

	<th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e86">
	Compression Codecs
	</th>

	<th class="entry nocellnorowborder" style="vertical-align:top;" id="d164295e89">
	Impala Can CREATE?
	</th>

	<th class="entry cell-norowborder" style="vertical-align:top;" id="d164295e92">
	Impala Can INSERT?
	</th>

	</tr>

	</thead>
	<tbody class="tbody">
	<tr class="row">
	<td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e80 ">
	<a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
	</td>

	<td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e83 ">
	Structured
	</td>

	<td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e86 ">
	Snappy, gzip, deflate, bzip2
	</td>

	<td class="entry row-nocellborder" style="vertical-align:top;" headers="d164295e89 ">
	Yes.
	</td>

	<td class="entry cellrowborder" style="vertical-align:top;" headers="d164295e92 ">
	No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the
	right format, or use <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH
	<var class="keyword varname">table_name</var></code> in Impala.
	</td>


	</tr>

	</tbody>
	</table>
	</div>


	<p class="p toc inpage"></p>

	</div>


	<div class="related-links">
	<div class="familylinks">
	<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div>
	</div>
	</div><div class="topic concept nested1" aria-labelledby="ariaid-title2" id="rcfile_create">

	<h2 class="title topictitle2" id="ariaid-title2">Creating RCFile Tables and Loading Data</h2>



	<div class="body conbody">

	<p class="p">
	If you do not have an existing data file to use, begin by creating one in the appropriate format.
	</p>


	<p class="p">
	<strong class="ph b">To create an RCFile table:</strong>
	</p>


	<p class="p">
	In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
	</p>


	<pre class="pre codeblock"><code>create table rcfile_table (<var class="keyword varname">column_specs</var>) stored as rcfile;</code></pre>

	<p class="p">
	Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
	certain file formats, you might use the Hive shell to load the data. See
	<a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
	Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
	statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
	the new data.
	</p>


	<div class="note important"><span class="importanttitle">Important:</span>
	See <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a> for potential compatibility issues with
	RCFile tables created in Hive 0.12, due to a change in the default RCFile SerDe for Hive.
	</div>


	<p class="p">
	For example, here is how you might create some RCFile tables in Impala (by specifying the columns
	explicitly, or cloning the structure of another table), load data through Hive, and query them through
	Impala:
	</p>


	<pre class="pre codeblock"><code>$ impala-shell -i localhost
	[localhost:21000] > create table rcfile_table (x int) stored as rcfile;
	[localhost:21000] > create table rcfile_clone like some_other_table stored as rcfile;
	[localhost:21000] > quit;

	$ hive
	hive> insert into table rcfile_table select x from some_other_table;
	3 Rows loaded to rcfile_table
	Time taken: 19.015 seconds
	hive> quit;

	$ impala-shell -i localhost
	[localhost:21000] > select * from rcfile_table;
	Returned 0 row(s) in 0.23s
	[localhost:21000] > -- Make Impala recognize the data loaded through Hive;
	[localhost:21000] > refresh rcfile_table;
	[localhost:21000] > select * from rcfile_table;
	+---+
	\| x \|
	+---+
	\| 1 \|
	\| 2 \|
	\| 3 \|
	+---+
	Returned 3 row(s) in 0.23s</code></pre>

	<p class="p">
	<strong class="ph b">Complex type considerations:</strong> Although you can create tables in this file format
	using the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and
	<code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
	currently, Impala can query these types only in Parquet tables. <span class="ph">
	The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile
	tables that include complex types. Such queries are allowed in
	<span class="keyword">Impala 2.6</span> and higher. </span>
	</p>


	</div>

	</div>


	<div class="topic concept nested1" aria-labelledby="ariaid-title3" id="rcfile_compression">

	<h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for RCFile Tables</h2>



	<div class="body conbody">

	<p class="p">

	You may want to enable compression on existing tables. Enabling compression provides performance gains in
	most cases and is supported for RCFile tables. For example, to enable Snappy compression, you would specify
	the following additional settings when loading data through the Hive shell:
	</p>


	<pre class="pre codeblock"><code>hive> SET hive.exec.compress.output=true;
	hive> SET mapred.max.split.size=256000000;
	hive> SET mapred.output.compression.type=BLOCK;
	hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
	hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>

	<p class="p">
	If you are converting partitioned tables, you must complete additional steps. In such a case, specify
	additional settings similar to the following:
	</p>


	<pre class="pre codeblock"><code>hive> CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>;
	hive> SET hive.exec.dynamic.partition.mode=nonstrict;
	hive> SET hive.exec.dynamic.partition=true;
	hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>

	<p class="p">
	Remember that Hive does not require that you specify a source format for it. Consider the case of
	converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
	Snappy compressed RCFile. Combining the components outlined previously to complete this table conversion,
	you would specify settings similar to the following:
	</p>


	<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_rc (int_col INT, string_col STRING) STORED AS RCFILE;
	hive> SET hive.exec.compress.output=true;
	hive> SET mapred.max.split.size=256000000;
	hive> SET mapred.output.compression.type=BLOCK;
	hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
	hive> SET hive.exec.dynamic.partition.mode=nonstrict;
	hive> SET hive.exec.dynamic.partition=true;
	hive> INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</code></pre>

	<p class="p">
	To complete a similar process for a table that includes partitions, you would specify settings similar to
	the following:
	</p>


	<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_rc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS RCFILE;
	hive> SET hive.exec.compress.output=true;
	hive> SET mapred.max.split.size=256000000;
	hive> SET mapred.output.compression.type=BLOCK;
	hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
	hive> SET hive.exec.dynamic.partition.mode=nonstrict;
	hive> SET hive.exec.dynamic.partition=true;
	hive> INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM tbl;</code></pre>

	<div class="note note"><span class="notetitle">Note:</span>
	<p class="p">
	The compression type is specified in the following command:
	</p>

	<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
	<p class="p">
	You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
	</p>

	</div>

	</div>

	</div>


	<div class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile_performance">

	<h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala RCFile Tables</h2>


	<div class="body conbody">

	<p class="p">
	In general, expect query performance with RCFile tables to be
	faster than with tables using text data, but slower than with
	Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
	for information about using the Parquet file format for
	high-performance analytic queries.
	</p>


	<p class="p">
	In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files
	stored in Amazon S3. For Impala tables that use the file formats Parquet, ORC, RCFile,
	SequenceFile, Avro, and uncompressed text, the setting
	<code class="ph codeph">fs.s3a.block.size</code> in the <span class="ph filepath">core-site.xml</span>
	configuration file determines how Impala divides the I/O work of reading the data files.
	This configuration setting is specified in bytes. By default, this value is 33554432 (32
	MB), meaning that Impala parallelizes S3 read operations on the files as if they were
	made up of 32 MB blocks. For example, if your S3 queries primarily access Parquet files
	written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> to 134217728
	(128 MB) to match the row group size of those files. If most S3 queries involve Parquet
	files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> to 268435456 (256
	MB) to match the row group size produced by Impala.
	</p>


	</div>

	</div>



	</body>
	</html>