blob: 5cf4c49e3efc403b953d84828d898e9652842048 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<head>
<meta charset="utf-8" />
<title>ForkRecord</title>
<link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css" />
</head>
<body>
<p>
ForkRecord allows the user to fork a record into multiple records. To do that, the user must specify
one or multiple <a href="../../../../../html/record-path-guide.html">RecordPath</a> (as dynamic
properties of the processor) pointing to a field of type ARRAY containing RECORD elements.
</p>
<p>
The processor accepts two modes:
<ul>
<li>Split mode - in this mode, the generated records will have the same schema as the input. For
every element in the array, one record will be generated and the array will only contain this
element.</li>
<li>Extract mode - in this mode, the generated records will be the elements contained in the array.
Besides, it is also possible to add in each record all the fields of the parent records from the root
level to the record element being forked. However it supposes the fields to add are defined in the
schema of the Record Writer controller service. </li>
</ul>
</p>
<h2>Examples</h2>
<h3>EXTRACT mode</h3>
<p>
To better understand how this Processor works, we will lay out a few examples. For the sake of these examples, let's assume that our input
data is JSON formatted and looks like this:
</p>
<code>
<pre>
[{
"id": 1,
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"accounts": [{
"id": 42,
"balance": 4750.89
}, {
"id": 43,
"balance": 48212.38
}]
},
{
"id": 2,
"name": "Jane Doe",
"address": "345 My Street",
"city": "Her City",
"state": "NY",
"zipCode": "22222",
"country": "USA",
"accounts": [{
"id": 45,
"balance": 6578.45
}, {
"id": 46,
"balance": 34567.21
}]
}]
</pre>
</code>
<h4>Example 1 - Extracting without parent fields</h4>
<p>
For this case, we want to create one record per <code>account</code> and we don't care about
the other fields. We'll add a dynamic property "path" set to <code>/accounts</code>. The resulting
flow file will contain 4 records and will look like (assuming the Record Writer schema is correctly set):
</p>
<code>
<pre>
[{
"id": 42,
"balance": 4750.89
}, {
"id": 43,
"balance": 48212.38
}, {
"id": 45,
"balance": 6578.45
}, {
"id": 46,
"balance": 34567.21
}]
</pre>
</code>
<h4>Example 2 - Extracting with parent fields</h4>
<p>
Now, if we set the property "Include parent fields" to true, this will recursively include
the parent fields into the output records assuming the Record Writer schema allows it. In
case multiple fields have the same name (like we have in this example for <code>id</code>),
the child field will have the priority over all the parent fields sharing the same name. In
this case, the <code>id</code> of the array <code>accounts</code> will be saved in the
forked records. The resulting flow file will contain 4 records and will look like:
</p>
<code>
<pre>
[{
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"id": 42,
"balance": 4750.89
}, {
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"id": 43,
"balance": 48212.38
}, {
"name": "Jane Doe",
"address": "345 My Street",
"city": "Her City",
"state": "NY",
"zipCode": "22222",
"country": "USA",
"id": 45,
"balance": 6578.45
}, {
"name": "Jane Doe",
"address": "345 My Street",
"city": "Her City",
"state": "NY",
"zipCode": "22222",
"country": "USA",
"id": 46,
"balance": 34567.21
}]
</pre>
</code>
<h4>Example 3 - Multi-nested arrays</h4>
<p>
Now let's say that the input record contains multi-nested arrays like the below example:
</p>
<code>
<pre>
[{
"id": 1,
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"accounts": [{
"id": 42,
"balance": 4750.89,
"transactions": [{
"id": 5,
"amount": 150.31
},
{
"id": 6,
"amount": -15.31
}]
}, {
"id": 43,
"balance": 48212.38,
"transactions": [{
"id": 7,
"amount": 36.78
},
{
"id": 8,
"amount": -21.34
}]
}]
}]
</pre>
</code>
<p>
If we want to have one record per <code>transaction</code> for each <code>account</code>, then
the Record Path should be set to <code>/accounts[*]/transactions</code>. If we have the following
schema for our Record Reader:
</p>
<code>
<pre>
{
"type" : "record",
"name" : "bank",
"fields" : [ {
"name" : "id",
"type" : "int"
}, {
"name" : "name",
"type" : "string"
}, {
"name" : "address",
"type" : "string"
}, {
"name" : "city",
"type" : "string"
}, {
"name" : "state",
"type" : "string"
}, {
"name" : "zipCode",
"type" : "string"
}, {
"name" : "country",
"type" : "string"
}, {
"name" : "accounts",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "accounts",
"fields" : [ {
"name" : "id",
"type" : "int"
}, {
"name" : "balance",
"type" : "double"
}, {
"name" : "transactions",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "transactions",
"fields" : [ {
"name" : "id",
"type" : "int"
}, {
"name" : "amount",
"type" : "double"
} ]
}
}
} ]
}
}
} ]
}
</pre>
</code>
<p>
And if we have the following schema for our Record Writer:
</p>
<code>
<pre>
{
"type" : "record",
"name" : "bank",
"fields" : [ {
"name" : "id",
"type" : "int"
}, {
"name" : "name",
"type" : "string"
}, {
"name" : "address",
"type" : "string"
}, {
"name" : "city",
"type" : "string"
}, {
"name" : "state",
"type" : "string"
}, {
"name" : "zipCode",
"type" : "string"
}, {
"name" : "country",
"type" : "string"
}, {
"name" : "amount",
"type" : "double"
}, {
"name" : "balance",
"type" : "double"
} ]
}
</pre>
</code>
<p>
Then, if we include the parent fields, we'll have 4 records as below:
</p>
<code>
<pre>
[ {
"id" : 5,
"name" : "John Doe",
"address" : "123 My Street",
"city" : "My City",
"state" : "MS",
"zipCode" : "11111",
"country" : "USA",
"amount" : 150.31,
"balance" : 4750.89
}, {
"id" : 6,
"name" : "John Doe",
"address" : "123 My Street",
"city" : "My City",
"state" : "MS",
"zipCode" : "11111",
"country" : "USA",
"amount" : -15.31,
"balance" : 4750.89
}, {
"id" : 7,
"name" : "John Doe",
"address" : "123 My Street",
"city" : "My City",
"state" : "MS",
"zipCode" : "11111",
"country" : "USA",
"amount" : 36.78,
"balance" : 48212.38
}, {
"id" : 8,
"name" : "John Doe",
"address" : "123 My Street",
"city" : "My City",
"state" : "MS",
"zipCode" : "11111",
"country" : "USA",
"amount" : -21.34,
"balance" : 48212.38
} ]
</pre>
</code>
<h3>SPLIT mode</h3>
<h4>Example</h4>
<p>
Assuming we have the below data and we added a property "path" set to <code>/accounts</code>:
</p>
<code>
<pre>
[{
"id": 1,
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"accounts": [{
"id": 42,
"balance": 4750.89
}, {
"id": 43,
"balance": 48212.38
}]
},
{
"id": 2,
"name": "Jane Doe",
"address": "345 My Street",
"city": "Her City",
"state": "NY",
"zipCode": "22222",
"country": "USA",
"accounts": [{
"id": 45,
"balance": 6578.45
}, {
"id": 46,
"balance": 34567.21
}]
}]
</pre>
</code>
<p>
Then we'll get 4 records as below:
</p>
<code>
<pre>
[{
"id": 1,
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"accounts": [{
"id": 42,
"balance": 4750.89
}]
},
{
"id": 1,
"name": "John Doe",
"address": "123 My Street",
"city": "My City",
"state": "MS",
"zipCode": "11111",
"country": "USA",
"accounts": [{
"id": 43,
"balance": 48212.38
}]
},
{
"id": 2,
"name": "Jane Doe",
"address": "345 My Street",
"city": "Her City",
"state": "NY",
"zipCode": "22222",
"country": "USA",
"accounts": [{
"id": 45,
"balance": 6578.45
}]
},
{
"id": 2,
"name": "Jane Doe",
"address": "345 My Street",
"city": "Her City",
"state": "NY",
"zipCode": "22222",
"country": "USA",
"accounts": [{
"id": 46,
"balance": 34567.21
}]
}]
</pre>
</code>
</body>
</html>