| <!DOCTYPE html> |
| <html lang="en"> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| http://www.apache.org/licenses/LICENSE-2.0 |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <head> |
| <meta charset="utf-8" /> |
| <title>ForkRecord</title> |
| |
| <link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css" /> |
| </head> |
| |
| <body> |
| <p> |
| ForkRecord allows the user to fork a record into multiple records. To do that, the user must specify |
| one or multiple <a href="../../../../../html/record-path-guide.html">RecordPath</a> (as dynamic |
| properties of the processor) pointing to a field of type ARRAY containing RECORD elements. |
| </p> |
| <p> |
| The processor accepts two modes: |
| <ul> |
| <li>Split mode - in this mode, the generated records will have the same schema as the input. For |
| every element in the array, one record will be generated and the array will only contain this |
| element.</li> |
| <li>Extract mode - in this mode, the generated records will be the elements contained in the array. |
| Besides, it is also possible to add in each record all the fields of the parent records from the root |
| level to the record element being forked. However it supposes the fields to add are defined in the |
| schema of the Record Writer controller service. </li> |
| </ul> |
| </p> |
| |
| <h2>Examples</h2> |
| |
| <h3>EXTRACT mode</h3> |
| |
| <p> |
| To better understand how this Processor works, we will lay out a few examples. For the sake of these examples, let's assume that our input |
| data is JSON formatted and looks like this: |
| </p> |
| |
| <code> |
| <pre> |
| [{ |
| "id": 1, |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "accounts": [{ |
| "id": 42, |
| "balance": 4750.89 |
| }, { |
| "id": 43, |
| "balance": 48212.38 |
| }] |
| }, |
| { |
| "id": 2, |
| "name": "Jane Doe", |
| "address": "345 My Street", |
| "city": "Her City", |
| "state": "NY", |
| "zipCode": "22222", |
| "country": "USA", |
| "accounts": [{ |
| "id": 45, |
| "balance": 6578.45 |
| }, { |
| "id": 46, |
| "balance": 34567.21 |
| }] |
| }] |
| </pre> |
| </code> |
| |
| |
| <h4>Example 1 - Extracting without parent fields</h4> |
| |
| <p> |
| For this case, we want to create one record per <code>account</code> and we don't care about |
| the other fields. We'll add a dynamic property "path" set to <code>/accounts</code>. The resulting |
| flow file will contain 4 records and will look like (assuming the Record Writer schema is correctly set): |
| </p> |
| |
| <code> |
| <pre> |
| [{ |
| "id": 42, |
| "balance": 4750.89 |
| }, { |
| "id": 43, |
| "balance": 48212.38 |
| }, { |
| "id": 45, |
| "balance": 6578.45 |
| }, { |
| "id": 46, |
| "balance": 34567.21 |
| }] |
| </pre> |
| </code> |
| |
| |
| <h4>Example 2 - Extracting with parent fields</h4> |
| |
| <p> |
| Now, if we set the property "Include parent fields" to true, this will recursively include |
| the parent fields into the output records assuming the Record Writer schema allows it. In |
| case multiple fields have the same name (like we have in this example for <code>id</code>), |
| the child field will have the priority over all the parent fields sharing the same name. In |
| this case, the <code>id</code> of the array <code>accounts</code> will be saved in the |
| forked records. The resulting flow file will contain 4 records and will look like: |
| </p> |
| |
| <code> |
| <pre> |
| [{ |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "id": 42, |
| "balance": 4750.89 |
| }, { |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "id": 43, |
| "balance": 48212.38 |
| }, { |
| "name": "Jane Doe", |
| "address": "345 My Street", |
| "city": "Her City", |
| "state": "NY", |
| "zipCode": "22222", |
| "country": "USA", |
| "id": 45, |
| "balance": 6578.45 |
| }, { |
| "name": "Jane Doe", |
| "address": "345 My Street", |
| "city": "Her City", |
| "state": "NY", |
| "zipCode": "22222", |
| "country": "USA", |
| "id": 46, |
| "balance": 34567.21 |
| }] |
| </pre> |
| </code> |
| |
| |
| <h4>Example 3 - Multi-nested arrays</h4> |
| |
| <p> |
| Now let's say that the input record contains multi-nested arrays like the below example: |
| </p> |
| |
| <code> |
| <pre> |
| [{ |
| "id": 1, |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "accounts": [{ |
| "id": 42, |
| "balance": 4750.89, |
| "transactions": [{ |
| "id": 5, |
| "amount": 150.31 |
| }, |
| { |
| "id": 6, |
| "amount": -15.31 |
| }] |
| }, { |
| "id": 43, |
| "balance": 48212.38, |
| "transactions": [{ |
| "id": 7, |
| "amount": 36.78 |
| }, |
| { |
| "id": 8, |
| "amount": -21.34 |
| }] |
| }] |
| }] |
| </pre> |
| </code> |
| |
| <p> |
| If we want to have one record per <code>transaction</code> for each <code>account</code>, then |
| the Record Path should be set to <code>/accounts[*]/transactions</code>. If we have the following |
| schema for our Record Reader: |
| </p> |
| |
| <code> |
| <pre> |
| { |
| "type" : "record", |
| "name" : "bank", |
| "fields" : [ { |
| "name" : "id", |
| "type" : "int" |
| }, { |
| "name" : "name", |
| "type" : "string" |
| }, { |
| "name" : "address", |
| "type" : "string" |
| }, { |
| "name" : "city", |
| "type" : "string" |
| }, { |
| "name" : "state", |
| "type" : "string" |
| }, { |
| "name" : "zipCode", |
| "type" : "string" |
| }, { |
| "name" : "country", |
| "type" : "string" |
| }, { |
| "name" : "accounts", |
| "type" : { |
| "type" : "array", |
| "items" : { |
| "type" : "record", |
| "name" : "accounts", |
| "fields" : [ { |
| "name" : "id", |
| "type" : "int" |
| }, { |
| "name" : "balance", |
| "type" : "double" |
| }, { |
| "name" : "transactions", |
| "type" : { |
| "type" : "array", |
| "items" : { |
| "type" : "record", |
| "name" : "transactions", |
| "fields" : [ { |
| "name" : "id", |
| "type" : "int" |
| }, { |
| "name" : "amount", |
| "type" : "double" |
| } ] |
| } |
| } |
| } ] |
| } |
| } |
| } ] |
| } |
| </pre> |
| </code> |
| |
| <p> |
| And if we have the following schema for our Record Writer: |
| </p> |
| |
| <code> |
| <pre> |
| { |
| "type" : "record", |
| "name" : "bank", |
| "fields" : [ { |
| "name" : "id", |
| "type" : "int" |
| }, { |
| "name" : "name", |
| "type" : "string" |
| }, { |
| "name" : "address", |
| "type" : "string" |
| }, { |
| "name" : "city", |
| "type" : "string" |
| }, { |
| "name" : "state", |
| "type" : "string" |
| }, { |
| "name" : "zipCode", |
| "type" : "string" |
| }, { |
| "name" : "country", |
| "type" : "string" |
| }, { |
| "name" : "amount", |
| "type" : "double" |
| }, { |
| "name" : "balance", |
| "type" : "double" |
| } ] |
| } |
| </pre> |
| </code> |
| |
| <p> |
| Then, if we include the parent fields, we'll have 4 records as below: |
| </p> |
| |
| <code> |
| <pre> |
| [ { |
| "id" : 5, |
| "name" : "John Doe", |
| "address" : "123 My Street", |
| "city" : "My City", |
| "state" : "MS", |
| "zipCode" : "11111", |
| "country" : "USA", |
| "amount" : 150.31, |
| "balance" : 4750.89 |
| }, { |
| "id" : 6, |
| "name" : "John Doe", |
| "address" : "123 My Street", |
| "city" : "My City", |
| "state" : "MS", |
| "zipCode" : "11111", |
| "country" : "USA", |
| "amount" : -15.31, |
| "balance" : 4750.89 |
| }, { |
| "id" : 7, |
| "name" : "John Doe", |
| "address" : "123 My Street", |
| "city" : "My City", |
| "state" : "MS", |
| "zipCode" : "11111", |
| "country" : "USA", |
| "amount" : 36.78, |
| "balance" : 48212.38 |
| }, { |
| "id" : 8, |
| "name" : "John Doe", |
| "address" : "123 My Street", |
| "city" : "My City", |
| "state" : "MS", |
| "zipCode" : "11111", |
| "country" : "USA", |
| "amount" : -21.34, |
| "balance" : 48212.38 |
| } ] |
| </pre> |
| </code> |
| |
| <h3>SPLIT mode</h3> |
| |
| <h4>Example</h4> |
| |
| <p> |
| Assuming we have the below data and we added a property "path" set to <code>/accounts</code>: |
| </p> |
| |
| <code> |
| <pre> |
| [{ |
| "id": 1, |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "accounts": [{ |
| "id": 42, |
| "balance": 4750.89 |
| }, { |
| "id": 43, |
| "balance": 48212.38 |
| }] |
| }, |
| { |
| "id": 2, |
| "name": "Jane Doe", |
| "address": "345 My Street", |
| "city": "Her City", |
| "state": "NY", |
| "zipCode": "22222", |
| "country": "USA", |
| "accounts": [{ |
| "id": 45, |
| "balance": 6578.45 |
| }, { |
| "id": 46, |
| "balance": 34567.21 |
| }] |
| }] |
| </pre> |
| </code> |
| |
| <p> |
| Then we'll get 4 records as below: |
| </p> |
| |
| <code> |
| <pre> |
| [{ |
| "id": 1, |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "accounts": [{ |
| "id": 42, |
| "balance": 4750.89 |
| }] |
| }, |
| { |
| "id": 1, |
| "name": "John Doe", |
| "address": "123 My Street", |
| "city": "My City", |
| "state": "MS", |
| "zipCode": "11111", |
| "country": "USA", |
| "accounts": [{ |
| "id": 43, |
| "balance": 48212.38 |
| }] |
| }, |
| { |
| "id": 2, |
| "name": "Jane Doe", |
| "address": "345 My Street", |
| "city": "Her City", |
| "state": "NY", |
| "zipCode": "22222", |
| "country": "USA", |
| "accounts": [{ |
| "id": 45, |
| "balance": 6578.45 |
| }] |
| }, |
| { |
| "id": 2, |
| "name": "Jane Doe", |
| "address": "345 My Street", |
| "city": "Her City", |
| "state": "NY", |
| "zipCode": "22222", |
| "country": "USA", |
| "accounts": [{ |
| "id": 46, |
| "balance": 34567.21 |
| }] |
| }] |
| </pre> |
| </code> |
| |
| </body> |
| </html> |