blob: ee81652b00451d082b4b74ce0d15ac30f51e8197 [file] [log] [blame]
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]--><head>
<meta charset='utf-8'/><meta http-equiv='X-UA-Compatible' content='IE=edge'/><meta name='viewport' content='width=device-width, initial-scale=1'/><meta name='keywords' content='gatherers, jdk22, chop, collate, inject, ginq, streams, jep461, fold, scan'/><meta name='description' content='This post looks at using Gatherers (JEP 461) with Groovy.'/><title>The Apache Groovy programming language - Blogs - Using Gatherers with Groovy</title><link href='../img/favicon.ico' type='image/x-ico' rel='icon'/><link rel='stylesheet' type='text/css' href='../css/bootstrap.css'/><link rel='stylesheet' type='text/css' href='../css/font-awesome.min.css'/><link rel='stylesheet' type='text/css' href='../css/style.css'/><link rel='stylesheet' type='text/css' href='https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.css'/>
</head><body>
<div id='fork-me'>
<a href='https://github.com/apache/groovy'>
<img style='position: fixed; top: 20px; right: -58px; border: 0; z-index: 100; transform: rotate(45deg);' src='/img/horizontal-github-ribbon.png'/>
</a>
</div><div id='st-container' class='st-container st-effect-9'>
<nav class='st-menu st-effect-9' id='menu-12'>
<h2 class='icon icon-lab'>Socialize</h2><ul>
<li>
<a href='https://groovy-lang.org/mailing-lists.html' class='icon'><span class='fa fa-envelope'></span> Discuss on the mailing-list</a>
</li><li>
<a href='https://twitter.com/ApacheGroovy' class='icon'><span class='fa fa-twitter'></span> Groovy on Twitter</a>
</li><li>
<a href='https://groovy-lang.org/events.html' class='icon'><span class='fa fa-calendar'></span> Events and conferences</a>
</li><li>
<a href='https://github.com/apache/groovy' class='icon'><span class='fa fa-github'></span> Source code on GitHub</a>
</li><li>
<a href='https://groovy-lang.org/reporting-issues.html' class='icon'><span class='fa fa-bug'></span> Report issues in Jira</a>
</li><li>
<a href='http://stackoverflow.com/questions/tagged/groovy' class='icon'><span class='fa fa-stack-overflow'></span> Stack Overflow questions</a>
</li><li>
<a href='http://groovycommunity.com/' class='icon'><span class='fa fa-slack'></span> Slack Community</a>
</li>
</ul>
</nav><div class='st-pusher'>
<div class='st-content'>
<div class='st-content-inner'>
<!--[if lt IE 7]>
<p class="browsehappy">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
<![endif]--><div><div class='navbar navbar-default navbar-static-top' role='navigation'>
<div class='container'>
<div class='navbar-header'>
<button type='button' class='navbar-toggle' data-toggle='collapse' data-target='.navbar-collapse'>
<span class='sr-only'></span><span class='icon-bar'></span><span class='icon-bar'></span><span class='icon-bar'></span>
</button><a class='navbar-brand' href='../index.html'>
<i class='fa fa-star'></i> Apache Groovy
</a>
</div><div class='navbar-collapse collapse'>
<ul class='nav navbar-nav navbar-right'>
<li class=''><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li class=''><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li class=''><a href='/download.html'>Download</a></li><li class=''><a href='https://groovy-lang.org/support.html'>Support</a></li><li class=''><a href='/'>Contribute</a></li><li class=''><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li class=''><a href='/blog'>Blog posts</a></li><li class=''><a href='https://groovy.apache.org/events.html'></a></li><li>
<a data-effect='st-effect-9' class='st-trigger' href='#'>Socialize</a>
</li><li class=''>
<a href='../search.html'>
<i class='fa fa-search'></i>
</a>
</li>
</ul>
</div>
</div>
</div><div id='content' class='page-1'><div class='row'><div class='row-fluid'><div class='col-lg-3'><ul class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a href='#doc'>Using Gatherers with Groovy</a></li><li><a href='#_understanding_gatherers' class='anchor-link'>Understanding Gatherers</a></li><li><a href='#_accessing_parts_of_a_collection' class='anchor-link'>Accessing parts of a collection</a></li><li><a href='#_collate' class='anchor-link'>Collate</a></li><li><a href='#_chop' class='anchor-link'>Chop</a></li><li><a href='#_inject_fold_and_scan' class='anchor-link'>Inject, fold and scan</a></li><li><a href='#_testing_for_a_subsequence_fun_with_inits_and_tails' class='anchor-link'>Testing for a subsequence (fun with <code>inits</code> and <code>tails</code>)</a></li><li><a href='#_further_information' class='anchor-link'>Further information</a></li><li><a href='#_conclusion' class='anchor-link'>Conclusion</a></li></ul></div><div class='col-lg-8 col-lg-pull-0'><a name='doc'></a><h1>Using Gatherers with Groovy</h1><p><span>Author: <i>Paul King</i></span><br/><span>Published: 2023-12-09 03:30PM (Last updated: 2024-01-18 10:00PM)</span></p><hr/><div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>An interesting feature being previewed in JDK22 is <em>Gatherers</em>
(<a href="https://openjdk.java.net/jeps/461">JEP 461</a>).
This blog looks at using that feature with Groovy.
The examples in this blog were tested with Groovy 4.0.16 using JDK version 22-ea+27-2262.
As the JDK version we used is still in early access status,
you should read the disclaimers to understand that this JDK feature
is subject to change before final release. If and when the feature becomes
final, it looks like Groovy will automatically support it without needing
any additional tweaks.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_understanding_gatherers">Understanding Gatherers</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Java developers are by now very familiar with streams.
A stream is a potentially unbounded sequence of values supporting lazy computation.
Processing streams is done via a stream pipeline which consists of three parts:
a source of elements, zero or more intermediate operations (like <code>filter</code> and <code>map</code>),
and a terminal operation.</p>
</div>
<div class="paragraph">
<p>This framework is very powerful and efficient and offers some extensibility
via a customisable terminal operation. The available intermediate operations
is fixed in size, and while the built-in ones are very useful,
some complex tasks cannot easily be expressed as stream pipelines.
Enter <em>gatherers</em>. Gatherers provide the ability to customize intermediate operations.</p>
</div>
<div class="paragraph">
<p>With gatherers, the stream API is updated to support a <code>gather</code> intermediate operation
which takes a gatherer and returns a transformed stream.
Let&#8217;s dive into a few more details of gatherers.</p>
</div>
<div class="paragraph">
<p>A gatherer is defined by four pieces of functionality:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The optional <em>initializer</em> is just a <code>Supplier</code> which returns some (initial) state.</p>
</li>
<li>
<p>The <em>integrator</em> is typically the most important part. It satisfies the following interface:</p>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="java">interface Integrator&lt;A, T, R&gt; {
boolean integrate(A state, T element, Downstream&lt;? super R&gt; downstream);
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>where <code>state</code> is some state&#8201;&#8212;&#8201;we&#8217;ll use a list as state in a few of the upcoming
examples, but it could just as easily be an instance of some other class or record, <code>element</code>
is the next element in the current stream to be processed, and <code>downstream</code> is
a hook for creating the elements that will be processed in the next stage of the stream pipeline.</p>
</div>
</li>
<li>
<p>The optional <code>finisher</code> has access to the state and downstream pipeline hook.
It performs any last step actions which might be needed.</p>
</li>
<li>
<p>The optional <em>combiner</em> is used to evaluate the gatherer in parallel when processing an input stream in parallel. The examples we&#8217;ll look at in this blog post are inherently ordered in nature
and thus cannot be parallelized, so we won&#8217;t discuss this aspect further here.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Over and above, the Gatherer API, there are a number of built-in gathers
like <code>windowFixed</code>, <code>windowSliding</code>, and <code>fold</code>, among others.</p>
</div>
<div class="paragraph">
<p>Before getting into functionality where gatherers will become essential,
let&#8217;s start off by looking at accessing collections where functionality
is well provided for in both the collection and stream APIs and related
extension methods.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_accessing_parts_of_a_collection">Accessing parts of a collection</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Groovy provides very flexible indexing variants to
select specific elements from a collection:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8)[0..2] == [1, 2, 3] // index by closed range
assert (1..8)[3&lt;..&lt;6] == [5, 6] // index by open range
assert (1..8)[0..2,3..4,5] == [1, 2, 3, 4, 5, 6] // index by multiple ranges
assert (1..8)[0..2,3..-1] == 1..8 // ditto
assert (1..8)[0,2,4,6] == [1,3,5,7] // select odd numbers
assert (1..8)[1,3,5,7] == [2,4,6,8] // select even numbers</code></pre>
</div>
</div>
<div class="paragraph">
<p>You can also pick out a window of elements using <code>take</code> and <code>drop</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).take(3) == [1, 2, 3] // same as [0..2]
assert (1..8).drop(2).take(3) == [3, 4, 5] // same as [2..4]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Stream users might do the same thing using <code>skip</code> and <code>limit</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).stream().limit(3).toList() == [1, 2, 3]
assert (1..8).stream().skip(2).limit(3).toList() == [3, 4, 5]</code></pre>
</div>
</div>
<div class="paragraph">
<p>We can see here there are stream equivalents for <code>drop</code> and <code>take</code>,
but what about some of Groovy&#8217;s more elaborate mechanisms for manipulating collections?
I&#8217;m glad you asked. Let&#8217;s look at stream equivalents for <code>collate</code> and <code>chop</code>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_collate">Collate</h2>
<div class="sectionbody">
<div class="paragraph">
<p><span class="image right"><img src="img/collate.png" alt="collate a list - produced by Dall-E 3" width="200"></span>
Groovy&#8217;s <code>collate</code> method splits a collection into fixed size chunks:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).collate(3) == [[1, 2, 3], [4, 5, 6], [7, 8]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>The last chunk in this example is smaller than the chunk size.
It contains the remaining elements left over after all full size chunks
have been created. If we don&#8217;t want the leftover chunk,
we can ask for it to be excluded using an optional boolean parameter:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).collate(3, false) == [[1, 2, 3], [4, 5, 6]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Such functionality isn&#8217;t really possible with streams unless you wanted to
process the stream multiple times, or you shoved all the logic in the
collector, but then you&#8217;d be giving up some of the key benefits of streams.
Luckily, with gatherers, we can now obtain this functionality.</p>
</div>
<div class="paragraph">
<p>The first case is so common, there is a built-in gatherer (<code>Gatherers#windowFixed</code>) for it:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).stream().gather(windowFixed(3)).toList() ==
[[1, 2, 3], [4, 5, 6], [7, 8]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>There is no exact equivalent to handle the less common case of discarding
the leftover elements, but it&#8217;s easy enough to write our own gatherer:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">&lt;T&gt; Gatherer&lt;T, ?, List&lt;T&gt;&gt; windowFixedTruncating(int windowSize) {
Gatherer.ofSequential(
() -&gt; [], // initializer
Gatherer.Integrator.ofGreedy { window, element, downstream -&gt; // integrator
window &lt;&lt; element
if (window.size() &lt; windowSize) return true
var result = List.copyOf(window)
window.clear()
downstream.push(result)
}
)
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>We have an initializer which just returns an empty list as our initial state.
The integrator keeps adding elements to the state (our list or window). Once the
list is filled to the window size, we&#8217;ll output it to the downstream,
and then clear the list ready for the next window.</p>
</div>
<div class="paragraph">
<p>The code here is essentially a simplified version of <code>windowFixed</code>, we can
just leave out the finisher that <code>windowFixed</code> would require to potentially
output the partially-filled window at the end.</p>
</div>
<div class="paragraph">
<p>A few details:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Our operation is sequential since it is inherently ordered,
hence we used <code>ofSequential</code> to mark it so.</p>
</li>
<li>
<p>We will also always process all
elements, so we create a greedy gatherer using <code>ofGreedy</code>. While not strictly
necessary, this allows for optimisation of the pipeline.</p>
</li>
<li>
<p>We have specifically left out some validation logic out of this example
to focus on the new gatherer functionality. Check out how <code>windowFixed</code>
throws <code>IllegalArgumentException</code> for window sizes less than 1 to see what
should really also be added here if you were using this in production.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>We&#8217;d use <code>windowFixedTruncating</code> like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).stream().gather(windowFixedTruncating(3)).toList() ==
[[1, 2, 3], [4, 5, 6]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>The default when using <code>collate</code> is to start the next chunk/window
at the element directly after the previous one, but there are overloads
which also take a step size. This is used to calculate the index at which
the second (and subsequent) window(s) will start.
There is an optional <code>keepRemaining</code> boolean
to handle the leftover case as well.
If we want to slide along by 1 and discard leftovers, we&#8217;d use:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).collate(3, 1, false) == [[1, 2, 3], [2, 3, 4], [3, 4, 5]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>This aligns with the built-in <code>windowSliding</code> gatherer:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).stream().gather(windowSliding(3)).toList() ==
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>If we want the step size to be other than 1, or we want control over
the leftovers, there is no built-in gatherer option,
but we can again write one ourselves. Let&#8217;s consider some examples.
We&#8217;ll look at a gatherer implementation shortly, but first Groovy&#8217;s
collection variants:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).collate(3, 1) == [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
assert (1..8).collate(3, 2) == [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
assert (1..8).collate(3, 2, false) == [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
assert (1..8).collate(3, 4, false) == [[1, 2, 3], [5, 6, 7]]
assert (1..8).collate(3, 3) == [[1, 2, 3], [4, 5, 6], [7, 8]] // same as collate(3)</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now let&#8217;s write our gatherer:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">&lt;T&gt; Gatherer&lt;T, ?, List&lt;T&gt;&gt; windowSlidingByStep(int windowSize, int stepSize, boolean keepRemaining = true) {
int skip = 0
Gatherer.ofSequential(
() -&gt; [], // initializer
Gatherer.Integrator.ofGreedy { window, element, downstream -&gt; // integrator
if (skip) {
skip--
return true
}
window &lt;&lt; element
if (window.size() &lt; windowSize) return true
var result = List.copyOf(window)
skip = stepSize &gt; windowSize ? stepSize - windowSize : 0
[stepSize, windowSize].min().times { window.removeFirst() }
downstream.push(result)
},
(window, downstream) -&gt; { // finisher
if (keepRemaining) {
while(window.size() &gt; stepSize) {
downstream.push(List.copyOf(window))
stepSize.times{ window.removeFirst() }
}
downstream.push(List.copyOf(window))
}
}
)
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Some points:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Our gatherer is still sequential for the same reasons as previously.
We are still processing every element, so we again created a greedy gatherer.</p>
</li>
<li>
<p>We have a little bit of optimization baked into the code. If our step size
is bigger than the window size, we can do no further processing in our gatherer
for the elements in between our windows. We could simplify the code and store those
elements only to throw them away later, but it&#8217;s not too much effort to make
the algorithm as efficient as possible.</p>
</li>
<li>
<p>We also need a finisher here which
handles the leftover chunk(s) when required.</p>
</li>
<li>
<p>As per the previous example, we chose to elide some argument validation logic.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>And we&#8217;d use it like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).stream().gather(windowSlidingByStep(3, 1)).toList() ==
[[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
assert (1..8).stream().gather(windowSlidingByStep(3, 2)).toList() ==
[[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
assert (1..8).stream().gather(windowSlidingByStep(3, 2, false)).toList() ==
[[1, 2, 3], [3, 4, 5], [5, 6, 7]]
assert (1..8).stream().gather(windowSlidingByStep(3, 4, false)).toList() ==
[[1, 2, 3], [5, 6, 7]]
assert (1..8).stream().gather(windowSlidingByStep(3, 3)).toList() ==
[[1, 2, 3], [4, 5, 6], [7, 8]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Before leaving this section, let&#8217;s look at a few examples using Groovy&#8217;s
language integrated query capabilities as an alternative way to manipulate
collections.</p>
</div>
<div class="paragraph">
<p>Firstly, the equivalent of what we saw with <code>take</code> / <code>limit</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert GQL {
from n in 1..8
limit 3
select n
} == [1, 2, 3]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then, the equivalent if we added in <code>drop</code> / <code>skip</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert GQL {
from n in 1..8
limit 2, 3
select n
} == [3, 4, 5]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Finally, a sliding window equivalent:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert GQL {
from ns in (
from n in 1..8
select n, (lead(n) over(orderby n)), (lead(n, 2) over(orderby n))
)
limit 3
select ns
}*.toList() == [[1, 2, 3], [2, 3, 4], [3, 4, 5]]</code></pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_chop">Chop</h2>
<div class="sectionbody">
<div class="paragraph">
<p><span class="image right"><img src="img/chop.png" alt="chop a list - produced by Dall-E 3" width="200"></span>
A related collection extension method in Groovy is <code>chop</code>.
For this method, we also create chunks from the original collection but rather
than specifying a fixed size that applies to all chunks, we specify the size we
want for each chunk. We give a list of sizes, and each size is only used once.
The special size of <code>-1</code> indicates that we want the remainder of the collection as
the last chunk.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).chop(3) == [[1, 2, 3]]
assert (1..8).chop(3, 2, 1) == [[1, 2, 3], [4, 5], [6]]
assert (1..8).chop(3, -1) == [[1, 2, 3], [4, 5, 6, 7, 8]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>There is no original stream or pre-built gatherer for this functionality.
We&#8217;ll write our own:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">&lt;T&gt; Gatherer&lt;T, ?, List&lt;T&gt;&gt; windowMultiple(int... steps) {
var remaining = steps.toList()
Gatherer.ofSequential(
() -&gt; [],
Gatherer.Integrator.of { window, element, downstream -&gt;
if (!remaining) {
return false
}
window &lt;&lt; element
if (remaining[0] != -1) remaining[0]--
if (remaining[0]) return true
remaining.removeFirst()
var result = List.copyOf(window)
window.clear()
downstream.push(result)
},
(window, downstream) -&gt; {
if (window) {
var result = List.copyOf(window)
downstream.push(result)
}
}
)
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Some points:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>This is also an ordered algorithm, so we use <code>ofSequential</code> again.</p>
</li>
<li>
<p>This is similar to what we used for collate, but we have a different sized
window for each chunk size as we process the elements.</p>
</li>
<li>
<p>Once we hit the last chunk, we don&#8217;t want to process further
elements unless we see the special -1 marker, so we won&#8217;t create a greedy gatherer.</p>
</li>
<li>
<p>We do need a finisher to potentially output elements that have been stored but not yet
pushed downstream.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>We&#8217;d use <code>windowMultiple</code> like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..8).stream().gather(windowMultiple(3)).toList() ==
[[1, 2, 3]]
assert (1..8).stream().gather(windowMultiple(3, 2, 1)).toList() ==
[[1, 2, 3], [4, 5], [6]]
assert (1..8).stream().gather(windowMultiple(3, -1)).toList() ==
[[1, 2, 3], [4, 5, 6, 7, 8]]</code></pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_inject_fold_and_scan">Inject, fold and scan</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Groovy&#8217;s <code>inject</code> is a little different to the stream APIs <code>reduce</code> intermediate operator.
The latter expects a binary operator which restricts the types of the elements
being consumed and produced.</p>
</div>
<div class="paragraph">
<p>The <code>inject</code> method can have different types for its arguments as shown here:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).inject(''){ string, number -&gt; string + number } == '12345'</code></pre>
</div>
</div>
<div class="paragraph">
<p>The <code>fold</code> built-in gatherer allows us to write the equivalent functionality for stream processing as shown here:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).stream()
.gather(fold(() -&gt; '', (string, number) -&gt; string + number))
.findFirst()
.get() == '12345'</code></pre>
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s look at another <code>inject</code> example. This time <em>cumulative sum</em>.
If we have a sequence of numbers, the cumulative sum is another sequence
whose value at any index is determined by accumulating all the
numbers from the original sequence up to and including the index in question, e.g. the cumulative sum of <code>[1, 2, 3, 4]</code> is <code>[1, 3, 6, 10]</code>.</p>
</div>
<div class="paragraph">
<p>This is again a good fit for Groovy&#8217;s <code>inject</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).inject([]) { acc, next -&gt;
acc + [acc ? acc.last() + next : next]
} == [1, 3, 6, 10, 15]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Groovy has a number of alternatives to achieve this functionality.
Here is one using <code>inits</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).inits().grep().reverse()*.sum() == [1, 3, 6, 10, 15]</code></pre>
</div>
</div>
<div class="paragraph">
<p><code>inits</code> is a list processing function which we cover in more detail
in the next section.</p>
</div>
<div class="paragraph">
<p>Before examining gatherer equivalents, we should note that this particular operation
is deemed useful enough that Java actually has built-in library function for arrays:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">Integer[] nums = 1..5
Arrays.parallelPrefix(nums, Integer::sum)
assert nums == [1, 3, 6, 10, 15]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Cumulative sum isn&#8217;t well suited to traditional streams,
but now with gatherers, we can use the <code>scan</code> built-in gatherer
to have similar functionality when processing streams:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..5).stream()
.gather(scan(() -&gt; 0, Integer::sum))
.toList() == [1, 3, 6, 10, 15]</code></pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_testing_for_a_subsequence_fun_with_inits_and_tails">Testing for a subsequence (fun with <code>inits</code> and <code>tails</code>)</h2>
<div class="sectionbody">
<div class="paragraph">
<p>As a final example, let&#8217;s have a look at how we might test
if one list is a subset of another.</p>
</div>
<div class="paragraph">
<p>We&#8217;ll start with a list of words, and a list containing the ordered search terms:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">var words = 'the quick brown fox jumped over the lazy dog'.split().toList()
var search = 'brown fox'.split().toList()</code></pre>
</div>
</div>
<div class="paragraph">
<p>It turns out that this is solved already in the JDK for collections:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert Collections.indexOfSubList(words, search) != -1</code></pre>
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s have a look at some possible stream implementations.
But first a diversion. For any functional programmers who might have dabbled
with Haskell, you may have seen the book <a href="http://learnyouahaskell.com/">Learn You a Haskell for Great Good!</a>. It sets an interesting exercise for finding a "Needle in the Haystack"
using <code>inits</code> and <code>tails</code>. So what are <code>inits</code> and <code>tails</code>? They are built-in functions
in Haskell and Groovy:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert (1..6).inits() == [[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5],
[1, 2, 3, 4],
[1, 2, 3],
[1, 2],
[1],
[]]
assert (1..6).tails() == [[1, 2, 3, 4, 5, 6],
[2, 3, 4, 5, 6],
[3, 4, 5, 6],
[4, 5, 6],
[5, 6],
[6],
[]]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Once we know about these methods, we can paraphrase the "Needle in the Haystack"
solution for collections in Groovy as follows:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">var found = words.tails().any{ subseq -&gt; subseq.inits().contains(search) }
assert found</code></pre>
</div>
</div>
<div class="paragraph">
<p>It may not be the most efficient implementation of this functionality,
but it has a nice symmetry. Let&#8217;s now explore some stream-based solutions.</p>
</div>
<div class="paragraph">
<p>We can start off with a <code>tails</code> gatherer:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">&lt;T&gt; Gatherer&lt;T, ?, List&lt;T&gt;&gt; tails() {
Gatherer.ofSequential(
() -&gt; [],
Gatherer.Integrator.ofGreedy { state, element, downstream -&gt;
state &lt;&lt; element
return true
},
(state, downstream) -&gt; {
state.tails().each(downstream::push)
}
)
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>In the integrator, we just store away all the elements,
and in the finisher we do all the work. This works but isn&#8217;t really
properly leveraging the stream pipeline nature.</p>
</div>
<div class="paragraph">
<p>We can check it works as follows:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert search.stream().gather(tails()).toList() ==
[['brown', 'fox'], ['fox'], []]</code></pre>
</div>
</div>
<div class="paragraph">
<p>We could continue with this approach to create an <code>initsOfTails</code> gatherer:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">&lt;T&gt; Gatherer&lt;T, ?, List&lt;T&gt;&gt; initsOfTails() {
Gatherer.ofSequential(
() -&gt; [],
Gatherer.Integrator.ofGreedy { state, element, downstream -&gt;
state &lt;&lt; element
return true
},
(state, downstream) -&gt; {
state.tails()*.inits().sum().each(downstream::push)
}
)
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Again, all the work is in the finisher, and we haven&#8217;t really made use
of the power of the stream pipeline.</p>
</div>
<div class="paragraph">
<p>It still works of course:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert words.stream().gather(initsOfTails()).anyMatch { it == search }</code></pre>
</div>
</div>
<div class="paragraph">
<p>But it might have been more efficient to have collected
the stream as a list and used Groovy&#8217;s built-in <code>inits</code> and <code>tails</code> on that.</p>
</div>
<div class="paragraph">
<p>But all is not lost. If we are willing to tweak the algorithm slightly,
we could make better use of the stream pipeline. For example, if we don&#8217;t
mind getting the <code>inits</code> results in the reverse order, we could define the following
gatherer for <code>inits</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">&lt;T&gt; Gatherer&lt;T, ?, List&lt;T&gt;&gt; inits() {
Gatherer.ofSequential(
() -&gt; [],
Gatherer.Integrator.ofGreedy { state, element, downstream -&gt;
downstream.push(List.copyOf(state))
state &lt;&lt; element
return true
},
(state, downstream) -&gt; {
downstream.push(state)
}
)
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Which we&#8217;d use like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code data-lang="groovy">assert search.stream().gather(inits()).toList() ==
[[], ['brown'], ['brown', 'fox']]</code></pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_further_information">Further information</h2>
<div class="sectionbody">
<div class="ulist">
<ul>
<li>
<p><a href="https://openjdk.org/jeps/461">JEP 461: Stream Gatherers (Preview)</a></p>
</li>
<li>
<p><a href="https://nipafx.dev/inside-java-newscast-57/">Better Java Streams with Gatherers - Inside Java Newscast #57</a></p>
</li>
<li>
<p><a href="https://nipafx.dev/implementing-gatherers/">Implementing New Java Stream Operations</a></p>
</li>
<li>
<p><a href="https://github.com/paulk-asert/groovy-gatherers">Source code on GitHub</a></p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>It is great that Groovy has a rich set of methods that
work with collections. Some of these methods have stream
equivalents, and now we see that using gatherers with Groovy,
we can emulate more of the methods.
Not all algorithms need or benefit from using streams,
but it&#8217;s great to know that with gatherers, we can
likely pick whichever style makes sense.</p>
</div>
<div class="paragraph">
<p>We are still in the early days of gatherers being available,
so expect much more to emerge as this feature becomes more mainstream.
We look forward to it advancing past preview status.</p>
</div>
<div class="sidebarblock">
<div class="content">
<div class="title">Update history</div>
<div class="paragraph">
<p><strong>18/Jan/2024</strong>: Updated with a scan/cumulative sum example.</p>
</div>
</div>
</div>
</div>
</div></div></div></div></div><footer id='footer'>
<div class='row'>
<div class='colset-3-footer'>
<div class='col-1'>
<h1>Groovy</h1><ul>
<li><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li><a href='/download.html'>Download</a></li><li><a href='https://groovy-lang.org/support.html'>Support</a></li><li><a href='/'>Contribute</a></li><li><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li><a href='/blog'>Blog posts</a></li><li><a href='https://groovy.apache.org/events.html'></a></li>
</ul>
</div><div class='col-2'>
<h1>About</h1><ul>
<li><a href='https://github.com/apache/groovy'>Source code</a></li><li><a href='https://groovy-lang.org/security.html'>Security</a></li><li><a href='https://groovy-lang.org/learn.html#books'>Books</a></li><li><a href='https://groovy-lang.org/thanks.html'>Thanks</a></li><li><a href='http://www.apache.org/foundation/sponsorship.html'>Sponsorship</a></li><li><a href='https://groovy-lang.org/faq.html'>FAQ</a></li><li><a href='https://groovy-lang.org/search.html'>Search</a></li>
</ul>
</div><div class='col-3'>
<h1>Socialize</h1><ul>
<li><a href='https://groovy-lang.org/mailing-lists.html'>Discuss on the mailing-list</a></li><li><a href='https://twitter.com/ApacheGroovy'>Groovy on Twitter</a></li><li><a href='https://groovy-lang.org/events.html'>Events and conferences</a></li><li><a href='https://github.com/apache/groovy'>Source code on GitHub</a></li><li><a href='https://groovy-lang.org/reporting-issues.html'>Report issues in Jira</a></li><li><a href='http://stackoverflow.com/questions/tagged/groovy'>Stack Overflow questions</a></li><li><a href='http://groovycommunity.com/'>Slack Community</a></li>
</ul>
</div><div class='col-right'>
<p>
The Groovy programming language is supported by the <a href='http://www.apache.org'>Apache Software Foundation</a> and the Groovy community.
</p><div text-align='right'>
<img src='../img/asf_logo.png' title='The Apache Software Foundation' alt='The Apache Software Foundation' style='width:60%'/>
</div><p>Apache&reg; and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p>
</div>
</div><div class='clearfix'>&copy; 2003-2024 the Apache Groovy project &mdash; Groovy is Open Source: <a href='http://www.apache.org/licenses/LICENSE-2.0.html' alt='Apache 2 License'>license</a>, <a href='https://privacy.apache.org/policies/privacy-policy-public.html'>privacy policy</a>.</div>
</div>
</footer></div>
</div>
</div>
</div>
</div><script src='../js/vendor/jquery-1.10.2.min.js' defer></script><script src='../js/vendor/classie.js' defer></script><script src='../js/vendor/bootstrap.js' defer></script><script src='../js/vendor/sidebarEffects.js' defer></script><script src='../js/vendor/modernizr-2.6.2.min.js' defer></script><script src='../js/plugins.js' defer></script><script src='https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.js'></script><script>document.addEventListener('DOMContentLoaded',prettyPrint)</script><script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-257558-10', 'auto');
ga('send', 'pageview');
</script>
</body></html>