blob: 63ee8785d2ee7ab1afb25b0d36fbacfd834c0075 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="copyright" content="(C) Copyright 2023" />
<meta name="DC.rights.owner" content="(C) Copyright 2023" />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)" />
<meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html" />
<meta name="prodname" content="Impala" />
<meta name="prodname" content="Impala" />
<meta name="version" content="Impala 3.4.x" />
<meta name="version" content="Impala 3.4.x" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="compression_codec" />
<link rel="stylesheet" type="text/css" href="../commonltr.css" />
<title>COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</title>
</head>
<body id="compression_codec">
<h1 class="title topictitle1" id="ariaid-title1">COMPRESSION_CODEC Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
<div class="body conbody">
<p class="p">
When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying compression
is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option.
</p>
<div class="note note"><span class="notetitle">Note:</span>
Prior to Impala 2.0, this option was named <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>. In Impala 2.0 and
later, the <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> name is not recognized. Use the more general name
<code class="ph codeph">COMPRESSION_CODEC</code> for new code.
</div>
<p class="p">
<strong class="ph b">Syntax:</strong>
</p>
<pre class="pre codeblock"><code>SET COMPRESSION_CODEC=<var class="keyword varname">codec_name</var>; // Supported for all codecs.
SET COMPRESSION_CODEC=<var class="keyword varname">codec_name</var>:<var class="keyword varname">compression_level</var>; // Only supported for ZSTD.
</code></pre>
<p class="p">
The allowed values for this query option are <code class="ph codeph">SNAPPY</code> (the default), <code class="ph codeph">GZIP</code>,
<code class="ph codeph">ZSTD</code>, <code class="ph codeph">LZ4</code>, and <code class="ph codeph">NONE</code>.
</p>
<p class="p">
<code class="ph codeph">ZSTD</code> also supports setting a compression level. The lower the level, the faster the speed at
the cost of compression ratio. Compression levels from 1 up to 22 are supported for <code class="ph codeph">ZSTD</code>.
The default compression level 3 is used, if one is not passed using the <code class="ph codeph">compression_codec</code>
query option.
</p>
<div class="note note"><span class="notetitle">Note:</span>
A Parquet file created with <code class="ph codeph">COMPRESSION_CODEC=NONE</code> is still typically smaller than the
original data, due to encoding schemes such as run-length encoding and dictionary encoding that are applied
separately from compression.
</div>
<p class="p"></p>
<p class="p">
The option value is not case-sensitive.
</p>
<p class="p">
If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option
setting, not just queries involving Parquet tables. (The value <code class="ph codeph">BZIP2</code> is also recognized, but
is not compatible with Parquet tables.)
</p>
<p class="p">
<strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
</p>
<p class="p">
<strong class="ph b">Default:</strong> <code class="ph codeph">SNAPPY</code>
</p>
<p class="p">
<strong class="ph b">Examples:</strong>
</p>
<pre class="pre codeblock"><code>
set compression_codec=lz4;
insert into parquet_table_lz4_compressed select * from t1;
set compression_codec=zstd; // Default compression level 3.
insert into parquet_table_zstd_default_compressed select * from t1;
set compression_codec=zstd:12; // Compression level 12.
insert into parquet_table_zstd_highly_compressed select * from t1;
set compression_codec=gzip;
insert into parquet_table_highly_compressed select * from t1;
set compression_codec=snappy;
insert into parquet_table_compression_plus_fast_queries select * from t1;
set compression_codec=none;
insert into parquet_table_no_compression select * from t1;
set compression_codec=foo;
select * from t1 limit 5;
ERROR: Invalid compression codec: foo
</code></pre>
<p class="p">
<strong class="ph b">Related information:</strong>
</p>
<p class="p">
For information about how compressing Parquet data files affects query performance, see
<a class="xref" href="impala_parquet.html#parquet_compression">Compressions for Parquet Data Files</a>.
</p>
</div>
<div class="related-links">
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div>
</div>
</div></body>
</html>