blob: e68a786f386c667f32bb4de92455f3a2beb910be [file] [log] [blame]
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/html">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<head>
<meta charset="utf-8"/>
<title>ScriptedPartitionRecord</title>
<link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"/>
<style>
h2 {margin-top: 4em}
h3 {margin-top: 3em}
td {text-align: left}
</style>
</head>
<body>
<h1>ScriptedPartitionRecord</h1>
<h3>Description</h3>
<p>
The ScriptedPartitionRecord provides the ability to use a scripting language, such as Groovy or Jython, to quickly and easily partition a Record based on its contents.
There are multiple ways to reach the same behaviour such as using PartitionRecord but working with user provided scripts opens a wide range of possibilities on the decision
logic of partitioning the individual records.
</p>
<p>
The provided script is evaluated once for each Record that is encountered in the incoming FlowFile. Each time that the script is invoked, it is expected to return an <code>object</code> or a <code>null</code> value.
The string representation of the return value is used as the record's "partition". The <code>null</code> value is handled separately without conversion into string.
All Records with the same partition then will be batched to one FlowFile and routed to the <code>success</code> Relationship.
</p>
<p>
This Processor maintains a Counter with the name of "Records Processed". This represents the number of processed Records regardless of partitioning.
</p>
<h3>Variable Bindings</h3>
<p>
While the script provided to this Processor does not need to provide boilerplate code or implement any classes/interfaces, it does need some way to access the Records and other information
that it needs in order to perform its task. This is accomplished by using Variable Bindings. Each time that the script is invoked, each of the following variables will be made
available to the script:
</p>
<table>
<tr>
<th>Variable Name</th>
<th>Description</th>
<th>Variable Class</th>
</tr>
<tr>
<td>record</td>
<td>The Record that is to be processed.</td>
<td><a href="https://www.javadoc.io/doc/org.apache.nifi/nifi-record/latest/org/apache/nifi/serialization/record/Record.html">Record</a></td>
</tr>
<tr>
<td>recordIndex</td>
<td>The zero-based index of the Record in the FlowFile.</td>
<td>Long (64-bit signed integer)</td>
</tr>
<tr>
<td>log</td>
<td>The Processor's Logger. Anything that is logged to this logger will be written to the logs as if the Processor itself had logged it. Additionally, a bulletin will be created for any
log message written to this logger (though by default, the Processor will hide any bulletins with a level below WARN).</td>
<td><a href="https://www.javadoc.io/doc/org.apache.nifi/nifi-api/latest/org/apache/nifi/logging/ComponentLog.html">ComponentLog</a></td>
</tr>
<tr>
<td>attributes</td>
<td>Map of key/value pairs that are the Attributes of the FlowFile. Both the keys and the values of this Map are of type String. This Map is immutable.
Any attempt to modify it will result in an UnsupportedOperationException being thrown.</td>
<td>java.util.Map</td>
</tr>
</table>
<h3>Return Value</h3>
<p>
The script is invoked separately for each Record. It is acceptable to return any Object might be represented as string. This string value will be used as the partition of the given
Record. Additionally, the script may return <code>null</code>.
</p>
<h2>Example</h2>
<p>
The following script will partition the input on the value of the "stellarType" field.
</p>
<p>
Example Input (CSV):
</p>
<pre>
<code>
starSystem, stellarType
Wolf 359, M
Epsilon Eridani, K
Tau Ceti, G
Groombridge 1618, K
Gliese 1, M
</code>
</pre>
<p>
Example Output 1 (CSV) - for partition "M":
</p>
<pre>
<code>
starSystem, stellarType
Wolf 359,M
Gliese 1,M
</code>
</pre>
<p>
Example Output 2 (CSV) - for partition "K":
</p>
<pre>
<code>
starSystem, stellarType
Epsilon Eridani,K
Groombridge 1618,K
</code>
</pre>
<p>
Example Output 3 (CSV) - for partition "G":
</p>
<pre>
<code>
starSystem, stellarType
Tau Ceti,G
</code>
</pre>
<p>
Note: the order of the outgoing FlowFiles is not guaranteed.
</p>
<p>
Example Script (Groovy):
</p>
<code>
return record.getValue("stellarType")
</code>
<p>
Example Script (Python):
</p>
<code>
_ = record.getValue("stellarType")
</code>
</body>
</html>