blob: cf561d4cdcb9141753ccf7836489e7a25e72a413 [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></meta><title>GetHDFSFileInfo</title><link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"></link></head><script type="text/javascript">window.onload = function(){if(self==top) { document.getElementById('nameHeader').style.display = "inherit"; } }</script><body><h1 id="nameHeader" style="display: none;">GetHDFSFileInfo</h1><h2>Description: </h2><p>Retrieves a listing of files and directories from HDFS. This processor creates a FlowFile(s) that represents the HDFS file/dir with relevant information. Main purpose of this processor to provide functionality similar to HDFS Client, i.e. count, du, ls, test, etc. Unlike ListHDFS, this processor is stateless, supports incoming connections and provides information on a dir level. </p><h3>Tags: </h3><p>hadoop, HCFS, HDFS, get, list, ingest, source, filesystem</p><h3>Properties: </h3><p>In the list below, the names of required properties appear in <strong>bold</strong>. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the <a href="../../../../../html/expression-language-guide.html">NiFi Expression Language</a>.</p><table id="properties"><tr><th>Display Name</th><th>API Name</th><th>Default Value</th><th>Allowable Values</th><th>Description</th></tr><tr><td id="name">Hadoop Configuration Resources</td><td>Hadoop Configuration Resources</td><td></td><td id="allowable-values"></td><td id="description">A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. To use swebhdfs, see 'Additional Details' section of PutHDFS's documentation.<br/><br/><strong>This property expects a comma-separated list of file resources.</strong><br/><br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Kerberos Credentials Service</td><td>kerberos-credentials-service</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>KerberosCredentialsService<br/><strong>Implementation: </strong><a href="../../../nifi-kerberos-credentials-service-nar/1.19.1/org.apache.nifi.kerberos.KeytabCredentialsService/index.html">KeytabCredentialsService</a></td><td id="description">Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos</td></tr><tr><td id="name">Kerberos User Service</td><td>kerberos-user-service</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>KerberosUserService<br/><strong>Implementations: </strong><a href="../../../nifi-kerberos-user-service-nar/1.19.1/org.apache.nifi.kerberos.KerberosPasswordUserService/index.html">KerberosPasswordUserService</a><br/><a href="../../../nifi-kerberos-user-service-nar/1.19.1/org.apache.nifi.kerberos.KerberosKeytabUserService/index.html">KerberosKeytabUserService</a><br/><a href="../../../nifi-kerberos-user-service-nar/1.19.1/org.apache.nifi.kerberos.KerberosTicketCacheUserService/index.html">KerberosTicketCacheUserService</a></td><td id="description">Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos</td></tr><tr><td id="name">Kerberos Principal</td><td>Kerberos Principal</td><td></td><td id="allowable-values"></td><td id="description">Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties<br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Kerberos Keytab</td><td>Kerberos Keytab</td><td></td><td id="allowable-values"></td><td id="description">Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties<br/><br/><strong>This property requires exactly one file to be provided..</strong><br/><br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Kerberos Password</td><td>Kerberos Password</td><td></td><td id="allowable-values"></td><td id="description">Kerberos password associated with the principal.<br/><strong>Sensitive Property: true</strong></td></tr><tr><td id="name">Kerberos Relogin Period</td><td>Kerberos Relogin Period</td><td id="default-value">4 hours</td><td id="allowable-values"></td><td id="description">Period of time which should pass before attempting a kerberos relogin.
This property has been deprecated, and has no effect on processing. Relogins now occur automatically.<br/><strong>Supports Expression Language: true (will be evaluated using variable registry only)</strong></td></tr><tr><td id="name">Additional Classpath Resources</td><td>Additional Classpath Resources</td><td></td><td id="allowable-values"></td><td id="description">A comma-separated list of paths to files and/or directories that will be added to the classpath and used for loading native libraries. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.<br/><br/><strong>This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: directory, file.</strong><br/></td></tr><tr><td id="name"><strong>Full path</strong></td><td>gethdfsfileinfo-full-path</td><td id="default-value"></td><td id="allowable-values"></td><td id="description">A directory to start listing from, or a file's full path.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name"><strong>Recurse Subdirectories</strong></td><td>gethdfsfileinfo-recurse-subdirs</td><td id="default-value">true</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">Indicates whether to list files from subdirectories of the HDFS directory</td></tr><tr><td id="name">Directory Filter</td><td>gethdfsfileinfo-dir-filter</td><td></td><td id="allowable-values"></td><td id="description">Regex. Only directories whose names match the given regular expression will be picked up. If not provided, any filter would be apply (performance considerations).<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">File Filter</td><td>gethdfsfileinfo-file-filter</td><td></td><td id="allowable-values"></td><td id="description">Regex. Only files whose names match the given regular expression will be picked up. If not provided, any filter would be apply (performance considerations).<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Exclude Files</td><td>gethdfsfileinfo-file-exclude-filter</td><td></td><td id="allowable-values"></td><td id="description">Regex. Files whose names match the given regular expression will not be picked up. If not provided, any filter won't be apply (performance considerations).<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name"><strong>Ignore Dotted Directories</strong></td><td>gethdfsfileinfo-ignore-dotted-dirs</td><td id="default-value">true</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">If true, directories whose names begin with a dot (".") will be ignored</td></tr><tr><td id="name"><strong>Ignore Dotted Files</strong></td><td>gethdfsfileinfo-ignore-dotted-files</td><td id="default-value">true</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">If true, files whose names begin with a dot (".") will be ignored</td></tr><tr><td id="name"><strong>Group Results</strong></td><td>gethdfsfileinfo-group</td><td id="default-value">All</td><td id="allowable-values"><ul><li>All <img src="../../../../../html/images/iconInfo.png" alt="Group all results into a single flowfile." title="Group all results into a single flowfile."></img></li><li>Parent Directory <img src="../../../../../html/images/iconInfo.png" alt="Group HDFS objects by their parent directories only. Processor will generate flowfile for each directory (if recursive). If 'Recurse Subdirectories' property set to 'false', then will have the same effect as 'All'" title="Group HDFS objects by their parent directories only. Processor will generate flowfile for each directory (if recursive). If 'Recurse Subdirectories' property set to 'false', then will have the same effect as 'All'"></img></li><li>None <img src="../../../../../html/images/iconInfo.png" alt="Don't group results. Generate flowfile per each HDFS object." title="Don't group results. Generate flowfile per each HDFS object."></img></li></ul></td><td id="description">Groups HDFS objects</td></tr><tr><td id="name">Batch Size</td><td>gethdfsfileinfo-batch-size</td><td></td><td id="allowable-values"></td><td id="description">Number of records to put into an output flowfile when 'Destination' is set to 'Content' and 'Group Results' is set to 'None'</td></tr><tr><td id="name"><strong>Destination</strong></td><td>gethdfsfileinfo-destination</td><td id="default-value">Content</td><td id="allowable-values"><ul><li>Attributes <img src="../../../../../html/images/iconInfo.png" alt="Details of given HDFS object will be stored in attributes of flowfile. WARNING: In case when scan finds thousands or millions of objects, having huge values in attribute could impact flow file repo and GC/heap usage. Use content destination for such cases." title="Details of given HDFS object will be stored in attributes of flowfile. WARNING: In case when scan finds thousands or millions of objects, having huge values in attribute could impact flow file repo and GC/heap usage. Use content destination for such cases."></img></li><li>Content <img src="../../../../../html/images/iconInfo.png" alt="Details of given HDFS object will be stored in a content in JSON format" title="Details of given HDFS object will be stored in a content in JSON format"></img></li></ul></td><td id="description">Sets the destination for the resutls. When set to 'Content', attributes of flowfile won't be used for storing results. </td></tr></table><h3>Relationships: </h3><table id="relationships"><tr><th>Name</th><th>Description</th></tr><tr><td>success</td><td>All successfully generated FlowFiles are transferred to this relationship</td></tr><tr><td>not found</td><td>If no objects are found, original FlowFile are transferred to this relationship</td></tr><tr><td>failure</td><td>All failed attempts to access HDFS will be routed to this relationship</td></tr><tr><td>original</td><td>Original FlowFiles are transferred to this relationship</td></tr></table><h3>Reads Attributes: </h3>None specified.<h3>Writes Attributes: </h3><table id="writes-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>hdfs.objectName</td><td>The name of the file/dir found on HDFS.</td></tr><tr><td>hdfs.path</td><td>The path is set to the absolute path of the object's parent directory on HDFS. For example, if an object is a directory 'foo', under directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' will be '/bar'</td></tr><tr><td>hdfs.type</td><td>The type of an object. Possible values: directory, file, link</td></tr><tr><td>hdfs.owner</td><td>The user that owns the object in HDFS</td></tr><tr><td>hdfs.group</td><td>The group that owns the object in HDFS</td></tr><tr><td>hdfs.lastModified</td><td>The timestamp of when the object in HDFS was last modified, as milliseconds since midnight Jan 1, 1970 UTC</td></tr><tr><td>hdfs.length</td><td>In case of files: The number of bytes in the file in HDFS. In case of dirs: Retuns storage space consumed by directory. </td></tr><tr><td>hdfs.count.files</td><td>In case of type='directory' will represent total count of files under this dir. Won't be populated to other types of HDFS objects. </td></tr><tr><td>hdfs.count.dirs</td><td>In case of type='directory' will represent total count of directories under this dir (including itself). Won't be populated to other types of HDFS objects. </td></tr><tr><td>hdfs.replication</td><td>The number of HDFS replicas for the file</td></tr><tr><td>hdfs.permissions</td><td>The permissions for the object in HDFS. This is formatted as 3 characters for the owner, 3 for the group, and 3 for other users. For example rw-rw-r--</td></tr><tr><td>hdfs.status</td><td>The status contains comma separated list of file/dir paths, which couldn't be listed/accessed. Status won't be set if no errors occured.</td></tr><tr><td>hdfs.full.tree</td><td>When destination is 'attribute', will be populated with full tree of HDFS directory in JSON format.WARNING: In case when scan finds thousands or millions of objects, having huge values in attribute could impact flow file repo and GC/heap usage. Use content destination for such cases</td></tr></table><h3>State management: </h3>This component does not store state.<h3>Restricted: </h3>This component is not restricted.<h3>Input requirement: </h3>This component allows an incoming relationship.<h3>System Resource Considerations:</h3>None specified.<h3>See Also:</h3><p><a href="../org.apache.nifi.processors.hadoop.ListHDFS/index.html">ListHDFS</a>, <a href="../org.apache.nifi.processors.hadoop.GetHDFS/index.html">GetHDFS</a>, <a href="../org.apache.nifi.processors.hadoop.FetchHDFS/index.html">FetchHDFS</a>, <a href="../org.apache.nifi.processors.hadoop.PutHDFS/index.html">PutHDFS</a></p></body></html>