blob: 1be0835920438a3b94bc252073a6575bba9e987e [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Trident API Overview</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-5">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-5">
<h1>Version: 2.3.0</h1>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/releases/2.3.0/index.html">2.3.0</a></li>
<li><a href="/releases/2.2.0/index.html">2.2.0</a></li>
<li><a href="/releases/2.1.0/index.html">2.1.0</a></li>
<li><a href="/releases/2.0.0/index.html">2.0.0</a></li>
<li><a href="/releases/1.2.3/index.html">1.2.3</a></li>
</ul>
</li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2021/09/27/storm230-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Trident API Overview</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<div class="documentation-content"><p>The core data model in Trident is the &quot;Stream&quot;, processed as a series of batches. A stream is partitioned among the nodes in the cluster, and operations applied to a stream are applied in parallel across each partition.</p>
<p>There are five kinds of operations in Trident:</p>
<ol>
<li>Operations that apply locally to each partition and cause no network transfer</li>
<li>Repartitioning operations that repartition a stream but otherwise don&#39;t change the contents (involves network transfer)</li>
<li>Aggregation operations that do network transfer as part of the operation</li>
<li>Operations on grouped streams</li>
<li>Merges and joins</li>
</ol>
<h2 id="partition-local-operations">Partition-local operations</h2>
<p>Partition-local operations involve no network transfer and are applied to each batch partition independently.</p>
<h3 id="functions">Functions</h3>
<p>A function takes in a set of input fields and emits zero or more tuples as output. The fields of the output tuple are appended to the original input tuple in the stream. If a function emits no tuples, the original input tuple is filtered out. Otherwise, the input tuple is duplicated for each output tuple. Suppose you have this function:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyFunction</span> <span class="kd">extends</span> <span class="n">BaseFunction</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">execute</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">)</span> <span class="o">{</span>
<span class="k">for</span><span class="o">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">tuple</span><span class="o">.</span><span class="na">getInteger</span><span class="o">(</span><span class="mi">0</span><span class="o">);</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="n">collector</span><span class="o">.</span><span class="na">emit</span><span class="o">(</span><span class="k">new</span> <span class="n">Values</span><span class="o">(</span><span class="n">i</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>Now suppose you have a stream in the variable &quot;mystream&quot; with the fields [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;] with the following tuples:</p>
<div class="highlight"><pre><code class="language-" data-lang="">[1, 2, 3]
[4, 1, 6]
[3, 0, 8]
</code></pre></div>
<p>If you run this code:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">each</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"b"</span><span class="o">),</span> <span class="k">new</span> <span class="n">MyFunction</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"d"</span><span class="o">)))</span>
</code></pre></div>
<p>The resulting tuples would have fields [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;] and look like this:</p>
<div class="highlight"><pre><code class="language-" data-lang="">[1, 2, 3, 0]
[1, 2, 3, 1]
[4, 1, 6, 0]
</code></pre></div>
<h3 id="filters">Filters</h3>
<p>Filters take in a tuple as input and decide whether or not to keep that tuple or not. Suppose you had this filter:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyFilter</span> <span class="kd">extends</span> <span class="n">BaseFilter</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">isKeep</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">tuple</span><span class="o">.</span><span class="na">getInteger</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span> <span class="o">==</span> <span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">tuple</span><span class="o">.</span><span class="na">getInteger</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span> <span class="o">==</span> <span class="mi">2</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>Now suppose you had these tuples with fields [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;]:</p>
<div class="highlight"><pre><code class="language-" data-lang="">[1, 2, 3]
[2, 1, 1]
[2, 3, 4]
</code></pre></div>
<p>If you ran this code:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="k">new</span> <span class="n">MyFilter</span><span class="o">())</span>
</code></pre></div>
<p>The resulting tuples would be:</p>
<div class="highlight"><pre><code class="language-" data-lang="">[1, 2, 3]
</code></pre></div>
<h3 id="map-and-flatmap">map and flatMap</h3>
<p><code>map</code> returns a stream consisting of the result of applying the given mapping function to the tuples of the stream. This
can be used to apply a one-one transformation to the tuples.</p>
<p>For example, if there is a stream of words and you wanted to convert it to a stream of upper case words,
you could define a mapping function as follows,</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">UpperCase</span> <span class="kd">extends</span> <span class="n">MapFunction</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">Values</span> <span class="nf">execute</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">Values</span><span class="o">(</span><span class="n">input</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="mi">0</span><span class="o">).</span><span class="na">toUpperCase</span><span class="o">());</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>The mapping function can then be applied on the stream to produce a stream of uppercase words.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">UpperCase</span><span class="o">())</span>
</code></pre></div>
<p><code>flatMap</code> is similar to <code>map</code> but has the effect of applying a one-to-many transformation to the values of the stream,
and then flattening the resulting elements into a new stream.</p>
<p>For example, if there is a stream of sentences and you wanted to convert it to a stream of words,
you could define a flatMap function as follows,</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Split</span> <span class="kd">extends</span> <span class="n">FlatMapFunction</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">Iterable</span><span class="o">&lt;</span><span class="n">Values</span><span class="o">&gt;</span> <span class="nf">execute</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
<span class="n">List</span><span class="o">&lt;</span><span class="n">Values</span><span class="o">&gt;</span> <span class="n">valuesList</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o">&lt;&gt;();</span>
<span class="k">for</span> <span class="o">(</span><span class="n">String</span> <span class="n">word</span> <span class="o">:</span> <span class="n">input</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="mi">0</span><span class="o">).</span><span class="na">split</span><span class="o">(</span><span class="s">" "</span><span class="o">))</span> <span class="o">{</span>
<span class="n">valuesList</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">Values</span><span class="o">(</span><span class="n">word</span><span class="o">));</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">valuesList</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>The flatMap function can then be applied on the stream of sentences to produce a stream of words,</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">Split</span><span class="o">())</span>
</code></pre></div>
<p>Of course these operations can be chained, so a stream of uppercase words can be obtained from a stream of sentences as follows,</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">Split</span><span class="o">()).</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">UpperCase</span><span class="o">())</span>
</code></pre></div>
<p>If you don&#39;t pass output fields as parameter, map and flatMap preserves the input fields to output fields.</p>
<p>If you want to apply MapFunction or FlatMapFunction with replacing old fields with new output fields,
you can call map / flatMap with additional Fields parameter as follows,</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">UpperCase</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"uppercased"</span><span class="o">))</span>
</code></pre></div>
<p>Output stream wil have only one output field &quot;uppercased&quot; regardless of what output fields previous stream had.
Same thing applies to flatMap, so following is valid as well,</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">Split</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"word"</span><span class="o">))</span>
</code></pre></div>
<h3 id="peek">peek</h3>
<p><code>peek</code> can be used to perform an additional action on each trident tuple as they flow through the stream.
This could be useful for debugging to see the tuples as they flow past a certain point in a pipeline.</p>
<p>For example, the below code would print the result of converting the words to uppercase before they are passed to <code>groupBy</code></p>
<div class="highlight"><pre><code class="language-java" data-lang="java"> <span class="n">mystream</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">Split</span><span class="o">()).</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">UpperCase</span><span class="o">())</span>
<span class="o">.</span><span class="na">peek</span><span class="o">(</span><span class="k">new</span> <span class="n">Consumer</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">accept</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">input</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="mi">0</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">})</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"word"</span><span class="o">))</span>
<span class="o">.</span><span class="na">persistentAggregate</span><span class="o">(</span><span class="k">new</span> <span class="n">MemoryMapState</span><span class="o">.</span><span class="na">Factory</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Count</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"count"</span><span class="o">))</span>
</code></pre></div>
<h3 id="min-and-minby">min and minBy</h3>
<p><code>min</code> and <code>minBy</code> operations return minimum value on each partition of a batch of tuples in a trident stream.</p>
<p>Suppose, a trident stream contains fields [&quot;device-id&quot;, &quot;count&quot;] and the following partitions of tuples</p>
<div class="highlight"><pre><code class="language-" data-lang="">Partition 0:
[123, 2]
[113, 54]
[23, 28]
[237, 37]
[12, 23]
[62, 17]
[98, 42]
Partition 1:
[64, 18]
[72, 54]
[2, 28]
[742, 71]
[98, 45]
[62, 12]
[19, 174]
Partition 2:
[27, 94]
[82, 23]
[9, 86]
[53, 71]
[74, 37]
[51, 49]
[37, 98]
</code></pre></div>
<p><code>minBy</code> operation can be applied on the above stream of tuples like below which results in emitting tuples with minimum values of <code>count</code> field in each partition.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"> <span class="n">mystream</span><span class="o">.</span><span class="na">minBy</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"count"</span><span class="o">))</span>
</code></pre></div>
<p>Result of the above code on mentioned partitions is:</p>
<div class="highlight"><pre><code class="language-" data-lang="">Partition 0:
[123, 2]
Partition 1:
[62, 12]
Partition 2:
[82, 23]
</code></pre></div>
<p>You can look at other <code>min</code> and <code>minBy</code> operations on Stream
<code>java
public &lt;T&gt; Stream minBy(String inputFieldName, Comparator&lt;T&gt; comparator)
public Stream min(Comparator&lt;TridentTuple&gt; comparator)
</code>
Below example shows how these APIs can be used to find minimum using respective Comparators on a tuple. </p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="n">FixedBatchSpout</span> <span class="n">spout</span> <span class="o">=</span> <span class="k">new</span> <span class="n">FixedBatchSpout</span><span class="o">(</span><span class="n">allFields</span><span class="o">,</span> <span class="mi">10</span><span class="o">,</span> <span class="n">Vehicle</span><span class="o">.</span><span class="na">generateVehicles</span><span class="o">(</span><span class="mi">20</span><span class="o">));</span>
<span class="n">TridentTopology</span> <span class="n">topology</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TridentTopology</span><span class="o">();</span>
<span class="n">Stream</span> <span class="n">vehiclesStream</span> <span class="o">=</span> <span class="n">topology</span><span class="o">.</span><span class="na">newStream</span><span class="o">(</span><span class="s">"spout1"</span><span class="o">,</span> <span class="n">spout</span><span class="o">).</span>
<span class="n">each</span><span class="o">(</span><span class="n">allFields</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"##### vehicles"</span><span class="o">));</span>
<span class="n">Stream</span> <span class="n">slowVehiclesStream</span> <span class="o">=</span>
<span class="n">vehiclesStream</span>
<span class="o">.</span><span class="na">min</span><span class="o">(</span><span class="k">new</span> <span class="n">SpeedComparator</span><span class="o">())</span> <span class="c1">// Comparator w.r.t speed on received tuple.</span>
<span class="o">.</span><span class="na">each</span><span class="o">(</span><span class="n">vehicleField</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"#### slowest vehicle"</span><span class="o">));</span>
<span class="n">vehiclesStream</span>
<span class="o">.</span><span class="na">minBy</span><span class="o">(</span><span class="n">Vehicle</span><span class="o">.</span><span class="na">FIELD_NAME</span><span class="o">,</span> <span class="k">new</span> <span class="n">EfficiencyComparator</span><span class="o">())</span> <span class="c1">// Comparator w.r.t efficiency on received tuple.</span>
<span class="o">.</span><span class="na">each</span><span class="o">(</span><span class="n">vehicleField</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"#### least efficient vehicle"</span><span class="o">));</span>
</code></pre></div>
<p>Example applications of these APIs can be located at <a href="https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfDevicesTopology.java">TridentMinMaxOfDevicesTopology</a> and <a href="https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfVehiclesTopology.java">TridentMinMaxOfVehiclesTopology</a> </p>
<h3 id="max-and-maxby">max and maxBy</h3>
<p><code>max</code> and <code>maxBy</code> operations return maximum value on each partition of a batch of tuples in a trident stream.</p>
<p>Suppose, a trident stream contains fields [&quot;device-id&quot;, &quot;count&quot;] as mentioned in the above section.</p>
<p><code>max</code> and <code>maxBy</code> operations can be applied on the above stream of tuples like below which results in emitting tuples with maximum values of <code>count</code> field for each partition.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"> <span class="n">mystream</span><span class="o">.</span><span class="na">maxBy</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"count"</span><span class="o">))</span>
</code></pre></div>
<p>Result of the above code on mentioned partitions is:</p>
<div class="highlight"><pre><code class="language-" data-lang="">Partition 0:
[113, 54]
Partition 1:
[19, 174]
Partition 2:
[37, 98]
</code></pre></div>
<p>You can look at other <code>max</code> and <code>maxBy</code> functions on Stream</p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="kd">public</span> <span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Stream</span> <span class="nf">maxBy</span><span class="o">(</span><span class="n">String</span> <span class="n">inputFieldName</span><span class="o">,</span> <span class="n">Comparator</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">comparator</span><span class="o">)</span>
<span class="kd">public</span> <span class="n">Stream</span> <span class="nf">max</span><span class="o">(</span><span class="n">Comparator</span><span class="o">&lt;</span><span class="n">TridentTuple</span><span class="o">&gt;</span> <span class="n">comparator</span><span class="o">)</span>
</code></pre></div>
<p>Below example shows how these APIs can be used to find maximum using respective Comparators on a tuple.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="n">FixedBatchSpout</span> <span class="n">spout</span> <span class="o">=</span> <span class="k">new</span> <span class="n">FixedBatchSpout</span><span class="o">(</span><span class="n">allFields</span><span class="o">,</span> <span class="mi">10</span><span class="o">,</span> <span class="n">Vehicle</span><span class="o">.</span><span class="na">generateVehicles</span><span class="o">(</span><span class="mi">20</span><span class="o">));</span>
<span class="n">TridentTopology</span> <span class="n">topology</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TridentTopology</span><span class="o">();</span>
<span class="n">Stream</span> <span class="n">vehiclesStream</span> <span class="o">=</span> <span class="n">topology</span><span class="o">.</span><span class="na">newStream</span><span class="o">(</span><span class="s">"spout1"</span><span class="o">,</span> <span class="n">spout</span><span class="o">).</span>
<span class="n">each</span><span class="o">(</span><span class="n">allFields</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"##### vehicles"</span><span class="o">));</span>
<span class="n">vehiclesStream</span>
<span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="k">new</span> <span class="n">SpeedComparator</span><span class="o">())</span> <span class="c1">// Comparator w.r.t speed on received tuple.</span>
<span class="o">.</span><span class="na">each</span><span class="o">(</span><span class="n">vehicleField</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"#### fastest vehicle"</span><span class="o">))</span>
<span class="o">.</span><span class="na">project</span><span class="o">(</span><span class="n">driverField</span><span class="o">)</span>
<span class="o">.</span><span class="na">each</span><span class="o">(</span><span class="n">driverField</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"##### fastest driver"</span><span class="o">));</span>
<span class="n">vehiclesStream</span>
<span class="o">.</span><span class="na">maxBy</span><span class="o">(</span><span class="n">Vehicle</span><span class="o">.</span><span class="na">FIELD_NAME</span><span class="o">,</span> <span class="k">new</span> <span class="n">EfficiencyComparator</span><span class="o">())</span> <span class="c1">// Comparator w.r.t efficiency on received tuple.</span>
<span class="o">.</span><span class="na">each</span><span class="o">(</span><span class="n">vehicleField</span><span class="o">,</span> <span class="k">new</span> <span class="n">Debug</span><span class="o">(</span><span class="s">"#### most efficient vehicle"</span><span class="o">));</span>
</code></pre></div>
<p>Example applications of these APIs can be located at <a href="https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfDevicesTopology.java">TridentMinMaxOfDevicesTopology</a> and <a href="https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfVehiclesTopology.java">TridentMinMaxOfVehiclesTopology</a> </p>
<h3 id="windowing">Windowing</h3>
<p>Trident streams can process tuples in batches which are of the same window and emit aggregated result to the next operation.
There are two kinds of windowing supported which are based on processing time or tuples count:
1. Tumbling window
2. Sliding window</p>
<h4 id="tumbling-window">Tumbling window</h4>
<p>Tuples are grouped in a single window based on processing time or count. Any tuple belongs to only one of the windows.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="cm">/**
* Returns a stream of tuples which are aggregated results of a tumbling window with every {@code windowCount} of tuples.
*/</span>
<span class="kd">public</span> <span class="n">Stream</span> <span class="nf">tumblingWindow</span><span class="o">(</span><span class="kt">int</span> <span class="n">windowCount</span><span class="o">,</span> <span class="n">WindowsStoreFactory</span> <span class="n">windowStoreFactory</span><span class="o">,</span>
<span class="n">Fields</span> <span class="n">inputFields</span><span class="o">,</span> <span class="n">Aggregator</span> <span class="n">aggregator</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">functionFields</span><span class="o">);</span>
<span class="cm">/**
* Returns a stream of tuples which are aggregated results of a window that tumbles at duration of {@code windowDuration}
*/</span>
<span class="kd">public</span> <span class="n">Stream</span> <span class="nf">tumblingWindow</span><span class="o">(</span><span class="n">BaseWindowedBolt</span><span class="o">.</span><span class="na">Duration</span> <span class="n">windowDuration</span><span class="o">,</span> <span class="n">WindowsStoreFactory</span> <span class="n">windowStoreFactory</span><span class="o">,</span>
<span class="n">Fields</span> <span class="n">inputFields</span><span class="o">,</span> <span class="n">Aggregator</span> <span class="n">aggregator</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">functionFields</span><span class="o">);</span>
</code></pre></div>
<h4 id="sliding-window">Sliding window</h4>
<p>Tuples are grouped in windows and window slides for every sliding interval. A tuple can belong to more than one window.</p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="cm">/**
* Returns a stream of tuples which are aggregated results of a sliding window with every {@code windowCount} of tuples
* and slides the window after {@code slideCount}.
*/</span>
<span class="kd">public</span> <span class="n">Stream</span> <span class="nf">slidingWindow</span><span class="o">(</span><span class="kt">int</span> <span class="n">windowCount</span><span class="o">,</span> <span class="kt">int</span> <span class="n">slideCount</span><span class="o">,</span> <span class="n">WindowsStoreFactory</span> <span class="n">windowStoreFactory</span><span class="o">,</span>
<span class="n">Fields</span> <span class="n">inputFields</span><span class="o">,</span> <span class="n">Aggregator</span> <span class="n">aggregator</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">functionFields</span><span class="o">);</span>
<span class="cm">/**
* Returns a stream of tuples which are aggregated results of a window which slides at duration of {@code slidingInterval}
* and completes a window at {@code windowDuration}
*/</span>
<span class="kd">public</span> <span class="n">Stream</span> <span class="nf">slidingWindow</span><span class="o">(</span><span class="n">BaseWindowedBolt</span><span class="o">.</span><span class="na">Duration</span> <span class="n">windowDuration</span><span class="o">,</span> <span class="n">BaseWindowedBolt</span><span class="o">.</span><span class="na">Duration</span> <span class="n">slidingInterval</span><span class="o">,</span>
<span class="n">WindowsStoreFactory</span> <span class="n">windowStoreFactory</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">inputFields</span><span class="o">,</span> <span class="n">Aggregator</span> <span class="n">aggregator</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">functionFields</span><span class="o">);</span>
</code></pre></div>
<p>Examples of tumbling and sliding windows can be found <a href="Windowing.html">here</a></p>
<h4 id="common-windowing-api">Common windowing API</h4>
<p>Below is the common windowing API which takes <code>WindowConfig</code> for any supported windowing configurations. </p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="kd">public</span> <span class="n">Stream</span> <span class="nf">window</span><span class="o">(</span><span class="n">WindowConfig</span> <span class="n">windowConfig</span><span class="o">,</span> <span class="n">WindowsStoreFactory</span> <span class="n">windowStoreFactory</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">inputFields</span><span class="o">,</span>
<span class="n">Aggregator</span> <span class="n">aggregator</span><span class="o">,</span> <span class="n">Fields</span> <span class="n">functionFields</span><span class="o">)</span>
</code></pre></div>
<p><code>windowConfig</code> can be any of the below.
- <code>SlidingCountWindow.of(int windowCount, int slidingCount)</code>
- <code>SlidingDurationWindow.of(BaseWindowedBolt.Duration windowDuration, BaseWindowedBolt.Duration slidingDuration)</code>
- <code>TumblingCountWindow.of(int windowLength)</code>
- <code>TumblingDurationWindow.of(BaseWindowedBolt.Duration windowLength)</code></p>
<p>Trident windowing APIs need <code>WindowsStoreFactory</code> to store received tuples and aggregated values. Currently, basic implementation
for HBase is given with <code>HBaseWindowsStoreFactory</code>. It can further be extended to address respective usecases.
Example of using <code>HBaseWindowStoreFactory</code> for windowing can be seen below. </p>
<div class="highlight"><pre><code class="language-java" data-lang="java">
<span class="c1">// window-state table should already be created with cf:tuples column</span>
<span class="n">HBaseWindowsStoreFactory</span> <span class="n">windowStoreFactory</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HBaseWindowsStoreFactory</span><span class="o">(</span><span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;(),</span> <span class="s">"window-state"</span><span class="o">,</span> <span class="s">"cf"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">(</span><span class="s">"UTF-8"</span><span class="o">),</span> <span class="s">"tuples"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">(</span><span class="s">"UTF-8"</span><span class="o">));</span>
<span class="n">FixedBatchSpout</span> <span class="n">spout</span> <span class="o">=</span> <span class="k">new</span> <span class="n">FixedBatchSpout</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"sentence"</span><span class="o">),</span> <span class="mi">3</span><span class="o">,</span> <span class="k">new</span> <span class="n">Values</span><span class="o">(</span><span class="s">"the cow jumped over the moon"</span><span class="o">),</span>
<span class="k">new</span> <span class="nf">Values</span><span class="o">(</span><span class="s">"the man went to the store and bought some candy"</span><span class="o">),</span> <span class="k">new</span> <span class="n">Values</span><span class="o">(</span><span class="s">"four score and seven years ago"</span><span class="o">),</span>
<span class="k">new</span> <span class="nf">Values</span><span class="o">(</span><span class="s">"how many apples can you eat"</span><span class="o">),</span> <span class="k">new</span> <span class="n">Values</span><span class="o">(</span><span class="s">"to be or not to be the person"</span><span class="o">));</span>
<span class="n">spout</span><span class="o">.</span><span class="na">setCycle</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">TridentTopology</span> <span class="n">topology</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TridentTopology</span><span class="o">();</span>
<span class="n">Stream</span> <span class="n">stream</span> <span class="o">=</span> <span class="n">topology</span><span class="o">.</span><span class="na">newStream</span><span class="o">(</span><span class="s">"spout1"</span><span class="o">,</span> <span class="n">spout</span><span class="o">).</span><span class="na">parallelismHint</span><span class="o">(</span><span class="mi">16</span><span class="o">).</span><span class="na">each</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"sentence"</span><span class="o">),</span>
<span class="k">new</span> <span class="nf">Split</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"word"</span><span class="o">))</span>
<span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">TumblingCountWindow</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="mi">1000</span><span class="o">),</span> <span class="n">windowStoreFactory</span><span class="o">,</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"word"</span><span class="o">),</span> <span class="k">new</span> <span class="n">CountAsAggregator</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"count"</span><span class="o">))</span>
<span class="o">.</span><span class="na">peek</span><span class="o">(</span><span class="k">new</span> <span class="n">Consumer</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">accept</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
<span class="n">LOG</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Received tuple: [{}]"</span><span class="o">,</span> <span class="n">input</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">});</span>
<span class="n">StormTopology</span> <span class="n">stormTopology</span> <span class="o">=</span> <span class="n">topology</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
</code></pre></div>
<p>Detailed description of all the above APIs in this section can be found <a href="javadocs/org/apache/storm/trident/Stream.html">here</a> </p>
<h4 id="example-applications">Example applications</h4>
<p>Example applications of these APIs are located at <a href="http://github.com/apache/storm/blob/v2.3.0/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentHBaseWindowingStoreTopology.java">TridentHBaseWindowingStoreTopology</a>
and <a href="http://github.com/apache/storm/blob/v2.3.0/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentWindowingInmemoryStoreTopology.java">TridentWindowingInmemoryStoreTopology</a> </p>
<h3 id="partitionaggregate">partitionAggregate</h3>
<p>partitionAggregate runs a function on each partition of a batch of tuples. Unlike functions, the tuples emitted by partitionAggregate replace the input tuples given to it. Consider this example:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">partitionAggregate</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"b"</span><span class="o">),</span> <span class="k">new</span> <span class="n">Sum</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"sum"</span><span class="o">))</span>
</code></pre></div>
<p>Suppose the input stream contained fields [&quot;a&quot;, &quot;b&quot;] and the following partitions of tuples:</p>
<div class="highlight"><pre><code class="language-" data-lang="">Partition 0:
["a", 1]
["b", 2]
Partition 1:
["a", 3]
["c", 8]
Partition 2:
["e", 1]
["d", 9]
["d", 10]
</code></pre></div>
<p>Then the output stream of that code would contain these tuples with one field called &quot;sum&quot;:</p>
<div class="highlight"><pre><code class="language-" data-lang="">Partition 0:
[3]
Partition 1:
[11]
Partition 2:
[20]
</code></pre></div>
<p>There are three different interfaces for defining aggregators: CombinerAggregator, ReducerAggregator, and Aggregator.</p>
<p>Here&#39;s the interface for CombinerAggregator:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">CombinerAggregator</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">Serializable</span> <span class="o">{</span>
<span class="n">T</span> <span class="nf">init</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">);</span>
<span class="n">T</span> <span class="nf">combine</span><span class="o">(</span><span class="n">T</span> <span class="n">val1</span><span class="o">,</span> <span class="n">T</span> <span class="n">val2</span><span class="o">);</span>
<span class="n">T</span> <span class="nf">zero</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div>
<p>A CombinerAggregator returns a single tuple with a single field as output. CombinerAggregators run the init function on each input tuple and use the combine function to combine values until there&#39;s only one value left. If there&#39;s no tuples in the partition, the CombinerAggregator emits the output of the zero function. For example, here&#39;s the implementation of Count:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Count</span> <span class="kd">implements</span> <span class="n">CombinerAggregator</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">Long</span> <span class="nf">init</span><span class="o">(</span><span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="mi">1L</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">Long</span> <span class="nf">combine</span><span class="o">(</span><span class="n">Long</span> <span class="n">val1</span><span class="o">,</span> <span class="n">Long</span> <span class="n">val2</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">val1</span> <span class="o">+</span> <span class="n">val2</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">Long</span> <span class="nf">zero</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="mi">0L</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>CombinerAggregators offer high efficiency when used with the aggregate method instead of partitionAggregate (<a href="#aggregation-operations">see below</a>).</p>
<p>A ReducerAggregator has the following interface:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">ReducerAggregator</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">Serializable</span> <span class="o">{</span>
<span class="n">T</span> <span class="nf">init</span><span class="o">();</span>
<span class="n">T</span> <span class="nf">reduce</span><span class="o">(</span><span class="n">T</span> <span class="n">curr</span><span class="o">,</span> <span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">);</span>
<span class="o">}</span>
</code></pre></div>
<p>A ReducerAggregator produces an initial value with init, and then it iterates on that value for each input tuple to produce a single tuple with a single value as output. For example, here&#39;s how you would define Count as a ReducerAggregator:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Count</span> <span class="kd">implements</span> <span class="n">ReducerAggregator</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">Long</span> <span class="nf">init</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="mi">0L</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">Long</span> <span class="nf">reduce</span><span class="o">(</span><span class="n">Long</span> <span class="n">curr</span><span class="o">,</span> <span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">curr</span> <span class="o">+</span> <span class="mi">1</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>ReducerAggregator can also be used with persistentAggregate, as you&#39;ll see later.</p>
<p>The most general interface for performing aggregations is Aggregator, which looks like this:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">Aggregator</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">Operation</span> <span class="o">{</span>
<span class="n">T</span> <span class="nf">init</span><span class="o">(</span><span class="n">Object</span> <span class="n">batchId</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">);</span>
<span class="kt">void</span> <span class="nf">aggregate</span><span class="o">(</span><span class="n">T</span> <span class="n">state</span><span class="o">,</span> <span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">);</span>
<span class="kt">void</span> <span class="nf">complete</span><span class="o">(</span><span class="n">T</span> <span class="n">state</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">);</span>
<span class="o">}</span>
</code></pre></div>
<p>Aggregators can emit any number of tuples with any number of fields. They can emit tuples at any point during execution. Aggregators execute in the following way:</p>
<ol>
<li>The init method is called before processing the batch. The return value of init is an Object that will represent the state of the aggregation and will be passed into the aggregate and complete methods.</li>
<li>The aggregate method is called for each input tuple in the batch partition. This method can update the state and optionally emit tuples.</li>
<li>The complete method is called when all tuples for the batch partition have been processed by aggregate. </li>
</ol>
<p>Here&#39;s how you would implement Count as an Aggregator:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">CountAgg</span> <span class="kd">extends</span> <span class="n">BaseAggregator</span><span class="o">&lt;</span><span class="n">CountState</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">static</span> <span class="kd">class</span> <span class="nc">CountState</span> <span class="o">{</span>
<span class="kt">long</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">CountState</span> <span class="nf">init</span><span class="o">(</span><span class="n">Object</span> <span class="n">batchId</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">CountState</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">aggregate</span><span class="o">(</span><span class="n">CountState</span> <span class="n">state</span><span class="o">,</span> <span class="n">TridentTuple</span> <span class="n">tuple</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">)</span> <span class="o">{</span>
<span class="n">state</span><span class="o">.</span><span class="na">count</span><span class="o">+=</span><span class="mi">1</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">complete</span><span class="o">(</span><span class="n">CountState</span> <span class="n">state</span><span class="o">,</span> <span class="n">TridentCollector</span> <span class="n">collector</span><span class="o">)</span> <span class="o">{</span>
<span class="n">collector</span><span class="o">.</span><span class="na">emit</span><span class="o">(</span><span class="k">new</span> <span class="n">Values</span><span class="o">(</span><span class="n">state</span><span class="o">.</span><span class="na">count</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div>
<p>Sometimes you want to execute multiple aggregators at the same time. This is called chaining and can be accomplished like this:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">chainedAgg</span><span class="o">()</span>
<span class="o">.</span><span class="na">partitionAggregate</span><span class="o">(</span><span class="k">new</span> <span class="n">Count</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"count"</span><span class="o">))</span>
<span class="o">.</span><span class="na">partitionAggregate</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"b"</span><span class="o">),</span> <span class="k">new</span> <span class="n">Sum</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"sum"</span><span class="o">))</span>
<span class="o">.</span><span class="na">chainEnd</span><span class="o">()</span>
</code></pre></div>
<p>This code will run the Count and Sum aggregators on each partition. The output will contain a single tuple with the fields [&quot;count&quot;, &quot;sum&quot;].</p>
<h3 id="statequery-and-partitionpersist">stateQuery and partitionPersist</h3>
<p>stateQuery and partitionPersist query and update sources of state, respectively. You can read about how to use them on <a href="Trident-state.html">Trident state doc</a>.</p>
<h3 id="projection">projection</h3>
<p>The projection method on Stream keeps only the fields specified in the operation. If you had a Stream with fields [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;] and you ran this code:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">project</span><span class="o">(</span><span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"b"</span><span class="o">,</span> <span class="s">"d"</span><span class="o">))</span>
</code></pre></div>
<p>The output stream would contain only the fields [&quot;b&quot;, &quot;d&quot;].</p>
<h2 id="repartitioning-operations">Repartitioning operations</h2>
<p>Repartitioning operations run a function to change how the tuples are partitioned across tasks. The number of partitions can also change as a result of repartitioning (for example, if the parallelism hint is greater after repartioning). Repartitioning requires network transfer. Here are the repartitioning functions:</p>
<ol>
<li>shuffle: Use random round robin algorithm to evenly redistribute tuples across all target partitions</li>
<li>broadcast: Every tuple is replicated to all target partitions. This can useful during DRPC – for example, if you need to do a stateQuery on every partition of data.</li>
<li>partitionBy: partitionBy takes in a set of fields and does semantic partitioning based on that set of fields. The fields are hashed and modded by the number of target partitions to select the target partition. partitionBy guarantees that the same set of fields always goes to the same target partition.</li>
<li>global: All tuples are sent to the same partition. The same partition is chosen for all batches in the stream.</li>
<li>batchGlobal: All tuples in the batch are sent to the same partition. Different batches in the stream may go to different partitions. </li>
<li>partition: This method takes in a custom partitioning function that implements org.apache.storm.grouping.CustomStreamGrouping</li>
</ol>
<h2 id="aggregation-operations">Aggregation operations</h2>
<p>Trident has aggregate and persistentAggregate methods for doing aggregations on Streams. aggregate is run on each batch of the stream in isolation, while persistentAggregate will aggregation on all tuples across all batches in the stream and store the result in a source of state.</p>
<p>Running aggregate on a Stream does a global aggregation. When you use a ReducerAggregator or an Aggregator, the stream is first repartitioned into a single partition, and then the aggregation function is run on that partition. When you use a CombinerAggregator, on the other hand, first Trident will compute partial aggregations of each partition, then repartition to a single partition, and then finish the aggregation after the network transfer. CombinerAggregator&#39;s are far more efficient and should be used when possible.</p>
<p>Here&#39;s an example of using aggregate to get a global count for a batch:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">mystream</span><span class="o">.</span><span class="na">aggregate</span><span class="o">(</span><span class="k">new</span> <span class="n">Count</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"count"</span><span class="o">))</span>
</code></pre></div>
<p>Like partitionAggregate, aggregators for aggregate can be chained. However, if you chain a CombinerAggregator with a non-CombinerAggregator, Trident is unable to do the partial aggregation optimization.</p>
<p>You can read more about how to use persistentAggregate in the <a href="Trident-state.html">Trident state doc</a>.</p>
<h2 id="operations-on-grouped-streams">Operations on grouped streams</h2>
<p>The groupBy operation repartitions the stream by doing a partitionBy on the specified fields, and then within each partition groups tuples together whose group fields are equal. For example, here&#39;s an illustration of a groupBy operation:</p>
<p><img src="images/grouping.png" alt="Grouping"></p>
<p>If you run aggregators on a grouped stream, the aggregation will be run within each group instead of against the whole batch. persistentAggregate can also be run on a GroupedStream, in which case the results will be stored in a <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/trident/state/map/MapState.java">MapState</a> with the key being the grouping fields. You can read more about persistentAggregate in the <a href="Trident-state.html">Trident state doc</a>.</p>
<p>Like regular streams, aggregators on grouped streams can be chained.</p>
<h2 id="merges-and-joins">Merges and joins</h2>
<p>The last part of the API is combining different streams together. The simplest way to combine streams is to merge them into one stream. You can do that with the TridentTopology#merge method, like so:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">topology</span><span class="o">.</span><span class="na">merge</span><span class="o">(</span><span class="n">stream1</span><span class="o">,</span> <span class="n">stream2</span><span class="o">,</span> <span class="n">stream3</span><span class="o">);</span>
</code></pre></div>
<p>Trident will name the output fields of the new, merged stream as the output fields of the first stream.</p>
<p>Another way to combine streams is with a join. Now, a standard join, like the kind from SQL, require finite input. So they don&#39;t make sense with infinite streams. Joins in Trident only apply within each small batch that comes off of the spout. </p>
<p>Here&#39;s an example join between a stream containing fields [&quot;key&quot;, &quot;val1&quot;, &quot;val2&quot;] and another stream containing [&quot;x&quot;, &quot;val1&quot;]:</p>
<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">topology</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">stream1</span><span class="o">,</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"key"</span><span class="o">),</span> <span class="n">stream2</span><span class="o">,</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"x"</span><span class="o">),</span> <span class="k">new</span> <span class="n">Fields</span><span class="o">(</span><span class="s">"key"</span><span class="o">,</span> <span class="s">"a"</span><span class="o">,</span> <span class="s">"b"</span><span class="o">,</span> <span class="s">"c"</span><span class="o">));</span>
</code></pre></div>
<p>This joins stream1 and stream2 together using &quot;key&quot; and &quot;x&quot; as the join fields for each respective stream. Then, Trident requires that all the output fields of the new stream be named, since the input streams could have overlapping field names. The tuples emitted from the join will contain:</p>
<ol>
<li>First, the list of join fields. In this case, &quot;key&quot; corresponds to &quot;key&quot; from stream1 and &quot;x&quot; from stream2.</li>
<li>Next, a list of all non-join fields from all streams, in order of how the streams were passed to the join method. In this case, &quot;a&quot; and &quot;b&quot; correspond to &quot;val1&quot; and &quot;val2&quot; from stream1, and &quot;c&quot; corresponds to &quot;val1&quot; from stream2.</li>
</ol>
<p>When a join happens between streams originating from different spouts, those spouts will be synchronized with how they emit batches. That is, a batch of processing will include tuples from each spout.</p>
<p>You might be wondering – how do you do something like a &quot;windowed join&quot;, where tuples from one side of the join are joined against the last hour of tuples from the other side of the join.</p>
<p>To do this, you would make use of partitionPersist and stateQuery. The last hour of tuples from one side of the join would be stored and rotated in a source of state, keyed by the join field. Then the stateQuery would do lookups by the join field to perform the &quot;join&quot;.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Apache Storm</h5>
<p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/releases/current/Rationale.html">Rationale</a></li>
<li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
<li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/releases/current/index.html">Index</a></li>
<li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
<li><a href="/releases/current/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>