blob: 96090532f92b231bf9630919a772d3c6fc29315e [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></meta><title>SampleRecord</title><link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"></link></head><script type="text/javascript">window.onload = function(){if(self==top) { document.getElementById('nameHeader').style.display = "inherit"; } }</script><body><h1 id="nameHeader" style="display: none;">SampleRecord</h1><h2>Description: </h2><p>Samples the records of a FlowFile based on a specified sampling strategy (such as Reservoir Sampling). The resulting FlowFile may be of a fixed number of records (in the case of reservoir-based algorithms) or some subset of the total number of records (in the case of probabilistic sampling), or a deterministic number of records (in the case of interval sampling).</p><p><a href="additionalDetails.html">Additional Details...</a></p><h3>Tags: </h3><p>record, sample, reservoir, range, interval</p><h3>Properties: </h3><p>In the list below, the names of required properties appear in <strong>bold</strong>. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the <a href="../../../../../html/expression-language-guide.html">NiFi Expression Language</a>.</p><table id="properties"><tr><th>Display Name</th><th>API Name</th><th>Default Value</th><th>Allowable Values</th><th>Description</th></tr><tr><td id="name"><strong>Record Reader</strong></td><td>record-reader</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>RecordReaderFactory<br/><strong>Implementations: </strong><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.cef.CEFReader/index.html">CEFReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.syslog.Syslog5424Reader/index.html">Syslog5424Reader</a><br/><a href="../../../nifi-scripting-nar/1.19.0/org.apache.nifi.record.script.ScriptedReader/index.html">ScriptedReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.syslog.SyslogReader/index.html">SyslogReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.csv.CSVReader/index.html">CSVReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.grok.GrokReader/index.html">GrokReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.json.JsonTreeReader/index.html">JsonTreeReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.xml.XMLReader/index.html">XMLReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.avro.AvroReader/index.html">AvroReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.lookup.ReaderLookup/index.html">ReaderLookup</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.windowsevent.WindowsEventLogReader/index.html">WindowsEventLogReader</a><br/><a href="../../../nifi-parquet-nar/1.19.0/org.apache.nifi.parquet.ParquetReader/index.html">ParquetReader</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.json.JsonPathReader/index.html">JsonPathReader</a></td><td id="description">Specifies the Controller Service to use for parsing incoming data and determining the data's schema</td></tr><tr><td id="name"><strong>Record Writer</strong></td><td>record-writer</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>RecordSetWriterFactory<br/><strong>Implementations: </strong><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.lookup.RecordSetWriterLookup/index.html">RecordSetWriterLookup</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.xml.XMLRecordSetWriter/index.html">XMLRecordSetWriter</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.text.FreeFormTextRecordSetWriter/index.html">FreeFormTextRecordSetWriter</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.csv.CSVRecordSetWriter/index.html">CSVRecordSetWriter</a><br/><a href="../../../nifi-scripting-nar/1.19.0/org.apache.nifi.record.script.ScriptedRecordSetWriter/index.html">ScriptedRecordSetWriter</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.avro.AvroRecordSetWriter/index.html">AvroRecordSetWriter</a><br/><a href="../../../nifi-record-serialization-services-nar/1.19.0/org.apache.nifi.json.JsonRecordSetWriter/index.html">JsonRecordSetWriter</a><br/><a href="../../../nifi-parquet-nar/1.19.0/org.apache.nifi.parquet.ParquetRecordSetWriter/index.html">ParquetRecordSetWriter</a></td><td id="description">Specifies the Controller Service to use for writing results to a FlowFile</td></tr><tr><td id="name"><strong>Sampling Strategy</strong></td><td>sample-record-sampling-strategy</td><td id="default-value">Reservoir Sampling</td><td id="allowable-values"><ul><li>Interval Sampling <img src="../../../../../html/images/iconInfo.png" alt="Selects every Nth record where N is the value of the 'Interval Value' property" title="Selects every Nth record where N is the value of the 'Interval Value' property"></img></li><li>Range Sampling <img src="../../../../../html/images/iconInfo.png" alt="Creates a sample of records based on the index (i.e. record number) of the records using the specified range. An example is '3,6-8,20-' which includes the third record, the sixth, seventh and eighth record, and all records from the twentieth record on. Commas separate intervals that don't overlap, and an interval can be between two numbers (i.e. 6-8) or up to a given number (i.e. -5), or from a number to the number of the last record (i.e. 20-)." title="Creates a sample of records based on the index (i.e. record number) of the records using the specified range. An example is '3,6-8,20-' which includes the third record, the sixth, seventh and eighth record, and all records from the twentieth record on. Commas separate intervals that don't overlap, and an interval can be between two numbers (i.e. 6-8) or up to a given number (i.e. -5), or from a number to the number of the last record (i.e. 20-)."></img></li><li>Probabilistic Sampling <img src="../../../../../html/images/iconInfo.png" alt="Selects each record with probability P where P is the value of the 'Selection Probability' property" title="Selects each record with probability P where P is the value of the 'Selection Probability' property"></img></li><li>Reservoir Sampling <img src="../../../../../html/images/iconInfo.png" alt="Creates a sample of K records where each record has equal probability of being included, where K is the value of the 'Reservoir Size' property. Note that if the value is very large it may cause memory issues as the reservoir is kept in-memory." title="Creates a sample of K records where each record has equal probability of being included, where K is the value of the 'Reservoir Size' property. Note that if the value is very large it may cause memory issues as the reservoir is kept in-memory."></img></li></ul></td><td id="description">Specifies which method to use for sampling records from the incoming FlowFile</td></tr><tr><td id="name"><strong>Sampling Interval</strong></td><td>sample-record-interval</td><td></td><td id="allowable-values"></td><td id="description">Specifies the number of records to skip before writing a record to the outgoing FlowFile. This property is only used if Sampling Strategy is set to Interval Sampling. A value of zero (0) will cause no records to be included in theoutgoing FlowFile, a value of one (1) will cause all records to be included, and a value of two (2) will cause half the records to be included, and so on.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong><br/><br/><strong>This Property is only considered if </strong><strong>the [Sampling Strategy] Property has a value of "Interval Sampling".</strong></td></tr><tr><td id="name"><strong>Sampling Range</strong></td><td>sample-record-range</td><td></td><td id="allowable-values"></td><td id="description">Specifies the range of records to include in the sample, from 1 to the total number of records. An example is '3,6-8,20-' which includes the third record, the sixth, seventh and eighth records, and all records from the twentieth record on. Commas separate intervals that don't overlap, and an interval can be between two numbers (i.e. 6-8) or up to a given number (i.e. -5), or from a number to the number of the last record (i.e. 20-). If this property is unset, all records will be included.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong><br/><br/><strong>This Property is only considered if </strong><strong>the [Sampling Strategy] Property has a value of "Range Sampling".</strong></td></tr><tr><td id="name"><strong>Sampling Probability</strong></td><td>sample-record-probability</td><td></td><td id="allowable-values"></td><td id="description">Specifies the probability (as a percent from 0-100) of a record being included in the outgoing FlowFile. This property is only used if Sampling Strategy is set to Probabilistic Sampling. A value of zero (0) will cause no records to be included in theoutgoing FlowFile, and a value of 100 will cause all records to be included in the outgoing FlowFile..<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong><br/><br/><strong>This Property is only considered if </strong><strong>the [Sampling Strategy] Property has a value of "Probabilistic Sampling".</strong></td></tr><tr><td id="name"><strong>Reservoir Size</strong></td><td>sample-record-reservoir</td><td></td><td id="allowable-values"></td><td id="description">Specifies the number of records to write to the outgoing FlowFile. This property is only used if Sampling Strategy is set to reservoir-based strategies such as Reservoir Sampling.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong><br/><br/><strong>This Property is only considered if </strong><strong>the [Sampling Strategy] Property has a value of "Reservoir Sampling".</strong></td></tr><tr><td id="name">Random Seed</td><td>sample-record-random-seed</td><td></td><td id="allowable-values"></td><td id="description">Specifies a particular number to use as the seed for the random number generator (used by probabilistic strategies). Setting this property will ensure the same records are selected even when using probabilistic strategies.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong><br/><br/><strong>This Property is only considered if </strong><strong>the [Sampling Strategy] Property is set to one of the following values: [Probabilistic Sampling], [Reservoir Sampling]</strong></td></tr></table><h3>Relationships: </h3><table id="relationships"><tr><th>Name</th><th>Description</th></tr><tr><td>success</td><td>The FlowFile is routed to this relationship if the sampling completed successfully</td></tr><tr><td>failure</td><td>If a FlowFile fails processing for any reason (for example, any record is not valid), the original FlowFile will be routed to this relationship</td></tr><tr><td>original</td><td>The original FlowFile is routed to this relationship if sampling is successful</td></tr></table><h3>Reads Attributes: </h3>None specified.<h3>Writes Attributes: </h3><table id="writes-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>mime.type</td><td>The MIME type indicated by the record writer</td></tr><tr><td>record.count</td><td>The number of records in the resulting flow file</td></tr></table><h3>State management: </h3>This component does not store state.<h3>Restricted: </h3>This component is not restricted.<h3>Input requirement: </h3>This component requires an incoming relationship.<h3>System Resource Considerations:</h3><table id="system-resource-considerations"><tr><th>Resource</th><th>Description</th></tr><tr><td>MEMORY</td><td>An instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.</td></tr></table></body></html>