<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></meta><title>DetectDuplicate</title><link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"></link></head><script type="text/javascript">window.onload = function(){if(self==top) { document.getElementById('nameHeader').style.display = "inherit"; } }</script><body><h1 id="nameHeader" style="display: none;">DetectDuplicate</h1><h2>Description: </h2><p>Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen. If so, routes the FlowFile to 'duplicate' with an attribute named 'original.identifier' that specifies the original FlowFile's "description", which is specified in the &lt;FlowFile Description&gt; property. If the FlowFile is not determined to be a duplicate, the Processor routes the FlowFile to 'non-duplicate'</p><h3>Tags: </h3><p>hash, dupe, duplicate, dedupe</p><h3>Properties: </h3><p>In the list below, the names of required properties appear in <strong>bold</strong>. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the <a href="../../../../../html/expression-language-guide.html">NiFi Expression Language</a>.</p><table id="properties"><tr><th>Display Name</th><th>API Name</th><th>Default Value</th><th>Allowable Values</th><th>Description</th></tr><tr><td id="name"><strong>Cache Entry Identifier</strong></td><td>Cache Entry Identifier</td><td id="default-value">${hash.value}</td><td id="allowable-values"></td><td id="description">A FlowFile attribute, or the results of an Attribute Expression Language statement, which will be evaluated against a FlowFile in order to determine the value used to identify duplicates; it is this value that is cached<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name"><strong>FlowFile Description</strong></td><td>FlowFile Description</td><td id="default-value"></td><td id="allowable-values"></td><td id="description">When a FlowFile is added to the cache, this value is stored along with it so that if a duplicate is found, this description of the original FlowFile will be added to the duplicate's "original.flowfile.description" attribute<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Age Off Duration</td><td>Age Off Duration</td><td></td><td id="allowable-values"></td><td id="description">Time interval to age off cached FlowFiles</td></tr><tr><td id="name"><strong>Distributed Cache Service</strong></td><td>Distributed Cache Service</td><td></td><td id="allowable-values"><strong>Controller Service API: </strong><br/>DistributedMapCacheClient<br/><strong>Implementations: </strong><a href="../../../nifi-redis-nar/1.19.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html">RedisDistributedMapCacheClientService</a><br/><a href="../../../nifi-hbase_2-client-service-nar/1.19.0/org.apache.nifi.hbase.HBase_2_ClientMapCacheService/index.html">HBase_2_ClientMapCacheService</a><br/><a href="../../../nifi-hbase_1_1_2-client-service-nar/1.19.0/org.apache.nifi.hbase.HBase_1_1_2_ClientMapCacheService/index.html">HBase_1_1_2_ClientMapCacheService</a><br/><a href="../../../nifi-couchbase-nar/1.19.0/org.apache.nifi.couchbase.CouchbaseMapCacheClient/index.html">CouchbaseMapCacheClient</a><br/><a href="../../../nifi-cassandra-services-nar/1.19.0/org.apache.nifi.controller.cassandra.CassandraDistributedMapCache/index.html">CassandraDistributedMapCache</a><br/><a href="../../../nifi-distributed-cache-services-nar/1.19.0/org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService/index.html">DistributedMapCacheClientService</a><br/><a href="../../../nifi-hazelcast-services-nar/1.19.0/org.apache.nifi.hazelcast.services.cacheclient.HazelcastMapCacheClient/index.html">HazelcastMapCacheClient</a></td><td id="description">The Controller Service that is used to cache unique identifiers, used to determine duplicates</td></tr><tr><td id="name">Cache The Entry Identifier</td><td>Cache The Entry Identifier</td><td id="default-value">true</td><td id="allowable-values"><ul><li>true</li><li>false</li></ul></td><td id="description">When true this cause the processor to check for duplicates and cache the Entry Identifier. When false, the processor would only check for duplicates and not cache the Entry Identifier, requiring another processor to add identifiers to the distributed cache.</td></tr></table><h3>Relationships: </h3><table id="relationships"><tr><th>Name</th><th>Description</th></tr><tr><td>duplicate</td><td>If a FlowFile has been detected to be a duplicate, it will be routed to this relationship</td></tr><tr><td>non-duplicate</td><td>If a FlowFile's Cache Entry Identifier was not found in the cache, it will be routed to this relationship</td></tr><tr><td>failure</td><td>If unable to communicate with the cache, the FlowFile will be penalized and routed to this relationship</td></tr></table><h3>Reads Attributes: </h3>None specified.<h3>Writes Attributes: </h3><table id="writes-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>original.flowfile.description</td><td>All FlowFiles routed to the duplicate relationship will have an attribute added named original.flowfile.description. The value of this attribute is determined by the attributes of the original copy of the data and by the FlowFile Description property.</td></tr></table><h3>State management: </h3>This component does not store state.<h3>Restricted: </h3>This component is not restricted.<h3>Input requirement: </h3>This component requires an incoming relationship.<h3>System Resource Considerations:</h3>None specified.<h3>See Also:</h3><p><a href="../../../nifi-distributed-cache-services-nar/1.19.0/org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService/index.html">DistributedMapCacheClientService</a>, <a href="../../../nifi-distributed-cache-services-nar/1.19.0/org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer/index.html">DistributedMapCacheServer</a></p></body></html>