This sample application shows how to use the Dedup operator for de-duplicating in a stream of incoming data. The operators in the application are as follows:
The following properties are configured for using the Application:
dt.application.DedupExample.operator.RandomGenerator.prop.tuplesPerWindow
- This is a limit on the number of tuples that will be generated by the Random Generator operator.dt.application.DedupExample.operator.Deduper.prop.keyExpression
- This is the pseudo java expression for deriving the key fields from the incoming POJO.dt.application.DedupExample.operator.Deduper.prop.timeExpression
- This is the pseudo java expression for deriving the time field in the incoming POJO. In case, timeExpression
is not specified, then the System time is used to compute the expiration for the tuples.dt.application.DedupExample.operator.Deduper.prop.expireBefore
- The expiry time for incoming tuples in seconds. The keys in the system expire after every expireBefore
seconds.dt.application.DedupExample.operator.Deduper.prop.bucketSpan
- The span of a single expiry bucket. When an expiry time elapses, the bucket as a whole is discarded from the system. This can be set keeping in mind the largest unit that can be discarded. For example, if expireBefore
is set to 1 hour, and we are getting new data per minuite, it would make sense to set the bucketSpan
to 1 minute or 5 minutes.Example values for these parameters have been specified in src/main/resources/META-INF/properties.xml.