blob: 566ef975008e57c5ca50c164deb5c4c1c87bde62 [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></meta><title>ConvertAvroToParquet</title><link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"></link></head><script type="text/javascript">window.onload = function(){if(self==top) { document.getElementById('nameHeader').style.display = "inherit"; } }</script><body><h1 id="nameHeader" style="display: none;">ConvertAvroToParquet</h1><h2>Description: </h2><p>Converts Avro records into Parquet file format. The incoming FlowFile should be a valid avro file. If an incoming FlowFile does not contain any records, an empty parquet file is the output. NOTE: Many Avro datatypes (collections, primitives, and unions of primitives, e.g.) can be converted to parquet, but unions of collections and other complex datatypes may not be able to be converted to Parquet.</p><h3>Tags: </h3><p>avro, parquet, convert</p><h3>Properties: </h3><p>In the list below, the names of required properties appear in <strong>bold</strong>. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the <a href="../../../../../html/expression-language-guide.html">NiFi Expression Language</a>.</p><table id="properties"><tr><th>Display Name</th><th>API Name</th><th>Default Value</th><th>Allowable Values</th><th>Description</th></tr><tr><td id="name"><strong>Compression Type</strong></td><td>compression-type</td><td id="default-value">UNCOMPRESSED</td><td id="allowable-values"><ul><li>UNCOMPRESSED</li><li>SNAPPY</li><li>GZIP</li><li>LZO</li><li>BROTLI</li><li>LZ4</li><li>ZSTD</li></ul></td><td id="description">The type of compression for the file being written.</td></tr><tr><td id="name">Row Group Size</td><td>row-group-size</td><td></td><td id="allowable-values"></td><td id="description">The row group size used by the Parquet writer. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Page Size</td><td>page-size</td><td></td><td id="allowable-values"></td><td id="description">The page size used by the Parquet writer. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Dictionary Page Size</td><td>dictionary-page-size</td><td></td><td id="allowable-values"></td><td id="description">The dictionary page size used by the Parquet writer. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Max Padding Size</td><td>max-padding-size</td><td></td><td id="allowable-values"></td><td id="description">The maximum amount of padding that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Enable Dictionary Encoding</td><td>enable-dictionary-encoding</td><td></td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies whether dictionary encoding should be enabled for the Parquet writer</td></tr><tr><td id="name">Enable Validation</td><td>enable-validation</td><td></td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies whether validation should be enabled for the Parquet writer</td></tr><tr><td id="name">Writer Version</td><td>writer-version</td><td></td><td id="allowable-values"><ul><li>PARQUET_1_0</li><li>PARQUET_2_0</li></ul></td><td id="description">Specifies the version used by Parquet writer</td></tr></table><h3>Relationships: </h3><table id="relationships"><tr><th>Name</th><th>Description</th></tr><tr><td>success</td><td>Parquet file that was converted successfully from Avro</td></tr><tr><td>failure</td><td>Avro content that could not be processed</td></tr></table><h3>Reads Attributes: </h3>None specified.<h3>Writes Attributes: </h3><table id="writes-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>filename</td><td>Sets the filename to the existing filename with the extension replaced by / added to by .parquet</td></tr><tr><td>record.count</td><td>Sets the number of records in the parquet file.</td></tr></table><h3>State management: </h3>This component does not store state.<h3>Restricted: </h3>This component is not restricted.<h3>Input requirement: </h3>This component requires an incoming relationship.<h3>System Resource Considerations:</h3>None specified.</body></html>