blob: 6ca4659596ace159ab76f95659bb72643c88449b [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></meta><title>PutParquet</title><link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"></link></head><script type="text/javascript">window.onload = function(){if(self==top) { document.getElementById('nameHeader').style.display = "inherit"; } }</script><body><h1 id="nameHeader" style="display: none;">PutParquet</h1><h2>Description: </h2><p>Reads records from an incoming FlowFile using the provided Record Reader, and writes those records to a Parquet file. The schema for the Parquet file must be provided in the processor properties. This processor will first write a temporary dot file and upon successfully writing every record to the dot file, it will rename the dot file to it's final name. If the dot file cannot be renamed, the rename operation will be attempted up to 10 times, and if still not successful, the dot file will be deleted and the flow file will be routed to failure. If any error occurs while reading records from the input, or writing records to the output, the entire dot file will be removed and the flow file will be routed to failure or retry, depending on the error.</p><h3>Tags: </h3><p>put, parquet, hadoop, HDFS, filesystem, record</p><h3>Properties: </h3><p>In the list below, the names of required properties appear in <strong>bold</strong>. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the <a href="../../../../../html/expression-language-guide.html">NiFi Expression Language</a>.</p><table id="properties"><tr><th>Display Name</th><th>API Name</th><th>Default Value</th><th>Allowable Values</th><th>Description</th></tr><tr><td id="name">Hadoop Configuration Resources</td><td>Hadoop Configuration Resources</td><td></td><td id="allowable-values"></td><td id="description">A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. To use swebhdfs, see 'Additional Details' section of PutHDFS's documentation.<br/><br/><strong>This property expects a comma-separated list of file resources.</strong><br/><br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Kerberos Credentials Service</td><td>kerberos-credentials-service</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>KerberosCredentialsService<br/><strong>Implementation: </strong><a href="../../../nifi-kerberos-credentials-service-nar/1.19.1/org.apache.nifi.kerberos.KeytabCredentialsService/index.html">KeytabCredentialsService</a></td><td id="description">Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos</td></tr><tr><td id="name">Kerberos User Service</td><td>kerberos-user-service</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>KerberosUserService<br/><strong>Implementations: </strong><a href="../../../nifi-kerberos-user-service-nar/1.19.1/org.apache.nifi.kerberos.KerberosPasswordUserService/index.html">KerberosPasswordUserService</a><br/><a href="../../../nifi-kerberos-user-service-nar/1.19.1/org.apache.nifi.kerberos.KerberosKeytabUserService/index.html">KerberosKeytabUserService</a><br/><a href="../../../nifi-kerberos-user-service-nar/1.19.1/org.apache.nifi.kerberos.KerberosTicketCacheUserService/index.html">KerberosTicketCacheUserService</a></td><td id="description">Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos</td></tr><tr><td id="name">Kerberos Principal</td><td>Kerberos Principal</td><td></td><td id="allowable-values"></td><td id="description">Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties<br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Kerberos Keytab</td><td>Kerberos Keytab</td><td></td><td id="allowable-values"></td><td id="description">Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties<br/><br/><strong>This property requires exactly one file to be provided..</strong><br/><br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Kerberos Password</td><td>Kerberos Password</td><td></td><td id="allowable-values"></td><td id="description">Kerberos password associated with the principal.<br/><strong>Sensitive Property: true</strong></td></tr><tr><td id="name">Kerberos Relogin Period</td><td>Kerberos Relogin Period</td><td id="default-value">4 hours</td><td id="allowable-values"></td><td id="description">Period of time which should pass before attempting a kerberos relogin.
This property has been deprecated, and has no effect on processing. Relogins now occur automatically.<br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Additional Classpath Resources</td><td>Additional Classpath Resources</td><td></td><td id="allowable-values"></td><td id="description">A comma-separated list of paths to files and/or directories that will be added to the classpath and used for loading native libraries. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.<br/><br/><strong>This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: directory, file.</strong><br/></td></tr><tr><td id="name"><strong>Record Reader</strong></td><td>record-reader</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>RecordReaderFactory<br/><strong>Implementations: </strong><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.grok.GrokReader/index.html">GrokReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.xml.XMLReader/index.html">XMLReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.avro.AvroReader/index.html">AvroReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.syslog.Syslog5424Reader/index.html">Syslog5424Reader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.cef.CEFReader/index.html">CEFReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.syslog.SyslogReader/index.html">SyslogReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.json.JsonTreeReader/index.html">JsonTreeReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.csv.CSVReader/index.html">CSVReader</a><br/><a href="../../../nifi-scripting-nar/1.19.1/org.apache.nifi.record.script.ScriptedReader/index.html">ScriptedReader</a><br/><a href="../org.apache.nifi.parquet.ParquetReader/index.html">ParquetReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.json.JsonPathReader/index.html">JsonPathReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.lookup.ReaderLookup/index.html">ReaderLookup</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.1/org.apache.nifi.windowsevent.WindowsEventLogReader/index.html">WindowsEventLogReader</a></td><td id="description">The service for reading records from incoming flow files.</td></tr><tr><td id="name"><strong>Directory</strong></td><td>Directory</td><td></td><td id="allowable-values"></td><td id="description">The parent directory to which files should be written. Will be created if it doesn't exist.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name"><strong>Compression Type</strong></td><td>compression-type</td><td id="default-value">UNCOMPRESSED</td><td id="allowable-values"><ul><li>UNCOMPRESSED</li><li>SNAPPY</li><li>GZIP</li><li>LZO</li><li>BROTLI</li><li>LZ4</li><li>ZSTD</li></ul></td><td id="description">The type of compression for the file being written.</td></tr><tr><td id="name"><strong>Overwrite Files</strong></td><td>overwrite</td><td id="default-value">false</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Whether or not to overwrite existing files in the same directory with the same name. When set to false, flow files will be routed to failure when a file exists in the same directory with the same name.</td></tr><tr><td id="name">Permissions umask</td><td>permissions-umask</td><td></td><td id="allowable-values"></td><td id="description">A umask represented as an octal number which determines the permissions of files written to HDFS. This overrides the Hadoop Configuration dfs.umaskmode</td></tr><tr><td id="name">Remote Group</td><td>remote-group</td><td></td><td id="allowable-values"></td><td id="description">Changes the group of the HDFS file to this value after it is written. This only works if NiFi is running as a user that has HDFS super user privilege to change group</td></tr><tr><td id="name">Remote Owner</td><td>remote-owner</td><td></td><td id="allowable-values"></td><td id="description">Changes the owner of the HDFS file to this value after it is written. This only works if NiFi is running as a user that has HDFS super user privilege to change owner</td></tr><tr><td id="name">Row Group Size</td><td>row-group-size</td><td></td><td id="allowable-values"></td><td id="description">The row group size used by the Parquet writer. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Page Size</td><td>page-size</td><td></td><td id="allowable-values"></td><td id="description">The page size used by the Parquet writer. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Dictionary Page Size</td><td>dictionary-page-size</td><td></td><td id="allowable-values"></td><td id="description">The dictionary page size used by the Parquet writer. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Max Padding Size</td><td>max-padding-size</td><td></td><td id="allowable-values"></td><td id="description">The maximum amount of padding that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect. The value is specified in the format of &lt;Data Size&gt; &lt;Data Unit&gt; where Data Unit is one of B, KB, MB, GB, TB.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Enable Dictionary Encoding</td><td>enable-dictionary-encoding</td><td></td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies whether dictionary encoding should be enabled for the Parquet writer</td></tr><tr><td id="name">Enable Validation</td><td>enable-validation</td><td></td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies whether validation should be enabled for the Parquet writer</td></tr><tr><td id="name">Writer Version</td><td>writer-version</td><td></td><td id="allowable-values"><ul><li>PARQUET_1_0</li><li>PARQUET_2_0</li></ul></td><td id="description">Specifies the version used by Parquet writer</td></tr><tr><td id="name"><strong>Avro Write Old List Structure</strong></td><td>avro-write-old-list-structure</td><td id="default-value">true</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies the value for 'parquet.avro.write-old-list-structure' in the underlying Parquet library</td></tr><tr><td id="name"><strong>Avro Add List Element Records</strong></td><td>avro-add-list-element-records</td><td id="default-value">true</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies the value for 'parquet.avro.add-list-element-records' in the underlying Parquet library</td></tr><tr><td id="name">Remove CRC Files</td><td>remove-crc-files</td><td id="default-value">false</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Specifies whether the corresponding CRC file should be deleted upon successfully writing a Parquet file</td></tr></table><h3>Relationships: </h3><table id="relationships"><tr><th>Name</th><th>Description</th></tr><tr><td>retry</td><td>Flow Files that could not be processed due to issues that can be retried are transferred to this relationship</td></tr><tr><td>success</td><td>Flow Files that have been successfully processed are transferred to this relationship</td></tr><tr><td>failure</td><td>Flow Files that could not be processed due to issue that cannot be retried are transferred to this relationship</td></tr></table><h3>Reads Attributes: </h3><table id="reads-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>filename</td><td>The name of the file to write comes from the value of this attribute.</td></tr></table><h3>Writes Attributes: </h3><table id="writes-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>filename</td><td>The name of the file is stored in this attribute.</td></tr><tr><td>absolute.hdfs.path</td><td>The absolute path to the file is stored in this attribute.</td></tr><tr><td>record.count</td><td>The number of records written to the Parquet file</td></tr></table><h3>State management: </h3>This component does not store state.<h3>Restricted: </h3><table id="restrictions"><tr><th>Required Permission</th><th>Explanation</th></tr><tr><td>write distributed filesystem</td><td>Provides operator the ability to write any file that NiFi has access to in HDFS or the local filesystem.</td></tr></table><h3>Input requirement: </h3>This component requires an incoming relationship.<h3>System Resource Considerations:</h3>None specified.</body></html>