blob: 85f665dea3af48fb476ca785514a0e2f5e536d08 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<head>
<meta charset="utf-8"/>
<title>ForkEnrichment</title>
<link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"/>
</head>
<body>
<h3>Introduction</h3>
<p>
The ForkEnrichment processor is designed to be used in conjunction with the <a href="../org.apache.nifi.processors.standard.JoinEnrichment/index.html">JoinEnrichment Processor</a>.
Used together, they provide a powerful mechanism for transforming data into a separate request payload for gathering enrichment data, gathering that enrichment data, optionally transforming
the enrichment data, and finally joining together the original payload with the enrichment data.
</p>
<h3>Typical Dataflow</h3>
<p>
A typical dataflow for accomplishing this may look something like this:
</p>
<img src="fork-join-enrichment.png" style="height: 50%; width: 50%" />
<p>
Here, we have a ForkEnrichment processor that is responsible for taking in a FlowFile and producing two copies of it: one to the "original" relationship and the other to the "enrichment"
relationship. Each copy will have its own set of attributes added to it.
</p>
<p>
Next, we have the "original" FlowFile being routed to the JoinEnrichment processor, while the "enrichment" FlowFile is routed in a different direction. Each of these FlowFiles will have an
attribute named "enrichment.group.id" with the same value. The JoinEnrichment processor then uses this information to correlate the two FlowFiles. The "enrichment.role" attribute will also
be added to each FlowFile but with a different value. The FlowFile routed to "original" will have an enrichment.role of ORIGINAL while the FlowFile routed to "enrichment" will have an
enrichment.role of ENRICHMENT.
</p>
<p>
The Processors that make up the "enrichment" path will vary from use case to use case. In this example, we use
<a href="../org.apache.nifi.processors.standard.JoltTransformJSON/index.html">JoltTransformJSON</a> processor in order to transform our payload from the original payload into a payload that is
expected by our web service. We then use the <a href="../org.apache.nifi.processors.standard.InvokeHTTP/index.html">InvokeHTTP</a> processor in order to gather
enrichment data that is relevant to our use case. Other common processors to use in this path include
<a href="../org.apache.nifi.processors.standard.QueryRecord/index.html">QueryRecord</a>, <a href="../org.apache.nifi.processors.standard.UpdateRecord/index.html">UpdateRecord</a>,
<a href="../org.apache.nifi.processors.standard.ReplaceText/index.html">ReplaceText</a>, JoltTransformRecord, and ScriptedTransformRecord.
It is also be a common use case to transform the response from the web service that is invoked via InvokeHTTP using one or more of these processors.
</p>
<p>
After the enrichment data has been gathered, it does us little good unless we are able to somehow combine our enrichment data back with our original payload.
To achieve this, we use the JoinEnrichment processor. It is responsible for combining records from both the "original" FlowFile and the "enrichment" FlowFile.
</p>
<p>
The JoinEnrichment Processor is configured with a separate RecordReader for the "original" FlowFile and for the "enrichment" FlowFile. This means that the original data and the
enrichment data can have entirely different schemas and can even be in different data formats. For example, our original payload may be CSV data, while our enrichment data is a JSON
payload. Because we make use of RecordReaders, this is entirely okay. The Processor also requires a RecordWriter to use for writing out the enriched payload (i.e., the payload that contains
the join of both the "original" and the "enrichment" data).
</p>
<p>
For details on how to join the original payload with the enrichment data, see the Additional Details of the
<a href="../org.apache.nifi.processors.standard.JoinEnrichment/index.html">JoinEnrichment Processor</a> documentation.
</p>
</body>
</html>