blob: 27051dd80a0b4cc60e1b3addcec152b88318df7e [file] [log] [blame]
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/html">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<head>
<meta charset="utf-8"/>
<title>ScriptedFilterRecord</title>
<link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"/>
<style>
h2 {margin-top: 4em}
h3 {margin-top: 3em}
td {text-align: left}
</style>
</head>
<body>
<h1>ScriptedFilterRecord</h1>
<h3>Description</h3>
<p>
The ScriptedFilterRecord Processor provides the ability to use a scripting language, such as Groovy or Jyton in order to remove Records from an incoming FlowFile.
NiFi provides several different Processors that can be used to work with Records in different ways. Each of these processors has its pros and cons.
The ScriptedFilterRecord is intended to work together with these processors and be used as a pre-processing step before processing the FlowFile with more performance consuming Processors, like ScriptedTransformRecord.
</p>
<p>
The Processor expects an user defined script in order to determine which Records should be kept and filtered out.
When creating a script, it is important to note that, unlike ExecuteScript, this Processor does not allow the script itself to expose Properties to be configured or define Relationships.
</p>
<p>
The provided script is evaluated once for each Record that is encountered in the incoming FlowFile. Each time that the script is invoked, it is expected to return a <code>boolean</code> value, which is used as a basis of filtering:
For Records the script returns with a <code>true</code> value, the given Record will be included to the outgoing FlowFile which will be routed to the <code>success</code> Relationship.
For <code>false</code> values the given Record will not be added to the output.
In addition to this the incoming FlowFile will be transferred to the <code>original</code> Relationship without change.
If the script returns an object that is not considered as <code>boolean</code>, the incoming FlowFile will be routed to the <code>failure</code> Relationship instead and no FlowFile will be routed to the <code>success</code> Relationship.
</p>
<p>
This Processor maintains a Counter: "Records Processed" indicating the number of Records that were passed to the script regardless of the result of the filtering.
</p>
<h3>Variable Bindings</h3>
<p>
While the script provided to this Processor does not need to provide boilerplate code or implement any classes/interfaces, it does need some way to access the Records and other information
that it needs in order to perform its task. This is accomplished by using Variable Bindings. Each time that the script is invoked, each of the following variables will be made
available to the script:
</p>
<table>
<tr>
<th>Variable Name</th>
<th>Description</th>
<th>Variable Class</th>
</tr>
<tr>
<td>record</td>
<td>The Record that is to be processed.</td>
<td><a href="https://www.javadoc.io/doc/org.apache.nifi/nifi-record/latest/org/apache/nifi/serialization/record/Record.html">Record</a></td>
</tr>
<tr>
<td>recordIndex</td>
<td>The zero-based index of the Record in the FlowFile.</td>
<td>Long (64-bit signed integer)</td>
</tr>
<tr>
<td>log</td>
<td>The Processor's Logger. Anything that is logged to this logger will be written to the logs as if the Processor itself had logged it. Additionally, a bulletin will be created for any
log message written to this logger (though by default, the Processor will hide any bulletins with a level below WARN).</td>
<td><a href="https://www.javadoc.io/doc/org.apache.nifi/nifi-api/latest/org/apache/nifi/logging/ComponentLog.html">ComponentLog</a></td>
</tr>
<tr>
<td>attributes</td>
<td>Map of key/value pairs that are the Attributes of the FlowFile. Both the keys and the values of this Map are of type String. This Map is immutable.
Any attempt to modify it will result in an UnsupportedOperationException being thrown.</td>
<td>java.util.Map</td>
</tr>
</table>
<h3>Return Value</h3>
<p>
Each time the script is invoked, it is expected to return a <code>boolean</code> value. Return values other than <code>boolean</code>, including <code>null</code> value will be handled as
unexpected script behaviour and handled accordingly: the processing will be interrupted and the incoming FlowFile will be transferred to the <code>failure</code> relationship without further execution.
</p>
<h2>Example Scripts</h2>
<h3>Filtering based on position</h3>
<p>
The following script will keep only the first 2 Records from a FlowFile and filter out all the rest.
</p>
<p>
Example Input (CSV):
</p>
<pre>
<code>
name, allyOf
Decelea, Athens
Corinth, Sparta
Mycenae, Sparta
Potidaea, Athens
</code>
</pre>
<p>
Example Output (CSV):
</p>
<pre>
<code>
name, allyOf
Decelea, Athens
Corinth, Sparta
</code>
</pre>
<p>
Example Script (Groovy):
</p>
<pre>
<code>
return recordIndex < 2 ? true : false
</code>
</pre>
<p>
Example Script (Python):
</p>
<pre>
<code>
_ = True if (recordIndex < 2) else False
</code>
</pre>
<h3>Filtering based on Record contents</h3>
<p>
The following script will filter the Records based on their content. Any Records satisfies the condition will be part of the FlowFile routed to the <code>success</code> Relationship.
</p>
<p>
Example Input (JSON):
</p>
<pre>
<code>
[
{
"city": "Decelea",
"allyOf": "Athens"
}, {
"city": "Corinth",
"allyOf": "Sparta"
}, {
"city": "Mycenae",
"allyOf": "Sparta"
}, {
"city": "Potidaea",
"allyOf": "Athens"
}
]
</code>
</pre>
<p>
Example Output (CSV):
</p>
<pre>
<code>
[
{
"city": "Decelea",
"allyOf": "Athens"
}, {
"city": "Potidaea",
"allyOf": "Athens"
}
]
</code>
</pre>
<p>
Example Script (Groovy):
</p>
<pre>
<code>
if (record.getValue("allyOf") == "Athens") {
return true;
} else {
return false;
}
</code>
</pre>
<p>
Example Script (Python):
</p>
<pre>
<code>
allyOf = record.getValue("allyOf")
_ = True if (allyOf == "Athens") else False
</code>
</pre>
</body>
</html>