src/spec/doc/_design-pattern-monoid.adoc - groovy - Git at Google

 //////////////////////////////////////////

   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.

 //////////////////////////////////////////

 = Using Monoids


 https://en.wikipedia.org/wiki/Monoid#Monoids_in_computer_science[Monoids] allow
 the mechanics of an aggregation algorithm to be separated from the algorithm-specific logic associated with that aggregation.
 It is often thought to be a functional design pattern.

 Perhaps, it is easiest seen with an example. Consider the code for integer sum, integer product and string concatenation.
 We might note various similarities:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_intro,indent=0]
 ----
 <1> Initialize an aggregate counter
 <2> Loop throw elements with for/while/iteration adjusting counter

 We can remove the duplicate aggregation coding and the tease out the important differences for each algorithm.
 We might instead use Groovy's `inject` method. This is a _fold_ operation in functional programming jargon.

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_inject,indent=0]
 ----

 Here the first parameter is the initial value, and the supplied closure contains the algorithm-specific logic.

 Similarly, for Groovy 3+, we can use the JDK stream API and lambda syntax as follows:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_lambdas,indent=0]
 ----

 == A touch of formalism

 Looking at these examples, we might think all aggregation can be supported this way.
 In fact, we look for certain characteristics to ensure that this aggregation pattern will apply:

 * Closure: performing the aggregation step should produce a result of the same type as the elements being aggregated.
 ****
 Examples: `1L + 3L` produces a `Long`, and `'foo' + 'bar'` produces a `String`. +
 Non-monoid examples: `'foo'.size() + 'bar'.size()` (takes strings, returns an integer),
 the type _odd numbers_ with respect to addition, algorithms which don't handle null arguments if such arguments are possible.
 ****
 [NOTE]
 ====
 When using the term _closure_ here, we simply mean closed under the operation, not the Groovy `Closure` class.
 ====
 * Associativity: the order in which we apply the aggregation step should not matter.
 ****
 Examples: `(1 + 3) + 5` is the same as `1 + (3 + 5)`, and `('a' + 'b') + 'c'` is the same as `'a' + ('b' + 'c')`. +
 Non-monoid example: `(10 - 5) - 3` is not equal to `10 - (5 - 3)` therefore integers are not a monoid with respect to subtraction.
 ****
 * Identity element (sometimes also called a 'zero' element):
 there should be an element which aggregated with any element returns the original element.
 ****
 Examples: `0 + 42 == 42`, `42 + 0 == 42`, `1 * 42 == 42`, and `'' + 'foo' == 'foo'`. +
 Non-monoid example: the type _non-empty strings_ is not a monoid with respect to concatenation.
 ****

 If your algorithm doesn't satisfy all the monoid properties, that doesn't mean aggregation isn't possible.
 It just means that you won't get all the benefits from monoids, which we'll cover shortly, or you might have
 a little more work to do.
 Also, you might be able to convert your data structures slightly to turn your problem into one involving monoids.
 We'll cover that topic a little later in this section.

 == Benefits of monoids

 Consider adding the integers 10 through 16.
 Because the operation of addition for integers is a monoid, we already know that we can save writing code
 and instead use the approach we saw in the earlier `inject` examples.
 There are some other nice properties.

 Because of the _closure_ property,
 if we have a pairwise method like `sum(Integer a, Integer b)`, then for a monoid, we can always
 extend that method to work with a list, e.g. `sum(List<Integer> nums)` or `sum(Integer first, Integer... rest)`.

 Because of _associativity_,
 we can employ some interesting ways to solve the aggregation including:

 * Divide and conquer algorithms which break the problem into smaller pieces
 * Various incremental algorithms for example memoization would allow summing from 1..5
 to potentially start part way through be reusing a cached value of summing 1..4 if that had been calculated earlier
 * Inherent parallelization can make use of multiple cores

 Let's just look at the first of these in more detail. With a multicore
 processor, one core could add `10` plus `11`, another core `12` plus `13`, and so on.
 We'd use the _identity_ element if needed (shown being added to `16` in our example).
 Then the intermediate results could also be added together concurrently and so on until the result was reached.

 [plantuml, MonoidAddition, png]
 ....
 skinparam shadowing false
 skinparam ClassFontSize 18
 skinparam ClassBackgroundColor<<Identity>> Transparent
 skinparam ClassBorderColor<<Identity>> grey
 skinparam ClassStereotypeFontSize<<Identity>> 4
 skinparam ClassStereotypeFontColor<<Identity>> Transparent
 hide circle
 class " 0 " as zero << (I,lightgrey) Identity >>
 show zero circle
 class " 10 " as a1
 class " 11 " as a2
 class " 12 " as a3
 class " 13 " as a4
 class " 14 " as a5
 class " 15 " as a6
 class " 16 " as a7
 class " 21 " as b1
 class " 25 " as b2
 class " 29 " as b3
 class " 16 " as b4
 class " 46 " as c1
 class " 45 " as c2
 class " 91 " as d1
 a1 .r[hidden].> a2
 a2 .r[hidden].> a3
 a3 .r[hidden].> a4
 a4 .r[hidden].> a5
 a5 .r[hidden].> a6
 a6 .r[hidden].> a7
 a7 .r[hidden].> zero
 b1 <.d. a1
 b1 <.d. a2
 b2 <.d. a3
 b2 <.d. a4
 b3 <.d. a5
 b3 <.d. a6
 b4 <.d. a7
 b4 <.d. zero
 c1 <.d. b1
 c1 <.d. b2
 c2 <.d. b3
 c2 <.d. b4
 d1 <.d. c1
 d1 <.d. c2
 hide empty members
 ....

 We have reduced the amount of code we need to write, and we also have potential performance gains.

 Here is how we might code the previous example using the http://gpars.org/[GPars]
 concurrency and parallelism framework (two alternatives shown):

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_gpars,indent=0]
 ----

 == Working with Non-monoids

 Suppose we want to find the average of the numbers 1..10. Groovy has a built-in method for this:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_1to10,indent=0]
 ----

 Now, suppose we want to build our own monoid solution instead of using the built-in version.
 It might seem difficult to find the _identity_ element. After all:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_0to10,indent=0]
 ----

 Similarly, if we are tempted to write the pairwise aggregation closure it might be something like:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_bad,indent=0]
 ----

 What `b` can we use for the _identity_ element here so that our equation returns the original?
 We need to use `a`, but that isn't a fixed value, so there is no _identity_.

 Also, associativity doesn't hold for this initial attempt at defining `avg` as these examples show:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_assoc,indent=0]
 ----

 Also, what about our _closure_ property? Our original numbers were integers, but our average (`5.5`)
 is not. We can solve this by making our average work for any `Number` instances, but it might not always be this easy.

 It might appear that this problem is not amenable to a monoidal solution.
 However, there are numerous ways to bring monoids into the solution.

 We can split it into two parts:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_split,indent=0]
 ----

 The calculation of `sum()` can follow monoid rules and then our last step can calculate the average.
 We can even do a concurrent version with GPars:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_split_gpars,indent=0]
 ----

 Here, we were using the built-in `sum()` method (and `sumParallel()` for the GPars example),
 but if you were doing it by hand, the monoid nature of that part of your calculation would
 make it easier to write your own code for that step.

 Alternatively, we can introduce a helper data structure that reworks the problem to be a monoid.
 Instead of just keeping the total, let's keep a list containing the total and count of numbers.
 The code could look something like this:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_reworked_simple,indent=0]
 ----

 Or, to be a little fancier, we could introduce a class for our data structure and even calculate
 concurrently:

 [source,groovy]
 ----
 include::../test/DesignPatternsTest.groovy[tags=monoids_average_reworked_gpars,indent=0]
 ----
	//////////////////////////////////////////

	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	//////////////////////////////////////////

	= Using Monoids


	https://en.wikipedia.org/wiki/Monoid#Monoids_in_computer_science[Monoids] allow
	the mechanics of an aggregation algorithm to be separated from the algorithm-specific logic associated with that aggregation.
	It is often thought to be a functional design pattern.

	Perhaps, it is easiest seen with an example. Consider the code for integer sum, integer product and string concatenation.
	We might note various similarities:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_intro,indent=0]
	----
	<1> Initialize an aggregate counter
	<2> Loop throw elements with for/while/iteration adjusting counter

	We can remove the duplicate aggregation coding and the tease out the important differences for each algorithm.
	We might instead use Groovy's `inject` method. This is a _fold_ operation in functional programming jargon.

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_inject,indent=0]
	----

	Here the first parameter is the initial value, and the supplied closure contains the algorithm-specific logic.

	Similarly, for Groovy 3+, we can use the JDK stream API and lambda syntax as follows:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_lambdas,indent=0]
	----

	== A touch of formalism

	Looking at these examples, we might think all aggregation can be supported this way.
	In fact, we look for certain characteristics to ensure that this aggregation pattern will apply:

	* Closure: performing the aggregation step should produce a result of the same type as the elements being aggregated.
	****
	Examples: `1L + 3L` produces a `Long`, and `'foo' + 'bar'` produces a `String`. +
	Non-monoid examples: `'foo'.size() + 'bar'.size()` (takes strings, returns an integer),
	the type _odd numbers_ with respect to addition, algorithms which don't handle null arguments if such arguments are possible.
	****
	[NOTE]
	====
	When using the term _closure_ here, we simply mean closed under the operation, not the Groovy `Closure` class.
	====
	* Associativity: the order in which we apply the aggregation step should not matter.
	****
	Examples: `(1 + 3) + 5` is the same as `1 + (3 + 5)`, and `('a' + 'b') + 'c'` is the same as `'a' + ('b' + 'c')`. +
	Non-monoid example: `(10 - 5) - 3` is not equal to `10 - (5 - 3)` therefore integers are not a monoid with respect to subtraction.
	****
	* Identity element (sometimes also called a 'zero' element):
	there should be an element which aggregated with any element returns the original element.
	****
	Examples: `0 + 42 == 42`, `42 + 0 == 42`, `1 * 42 == 42`, and `'' + 'foo' == 'foo'`. +
	Non-monoid example: the type _non-empty strings_ is not a monoid with respect to concatenation.
	****

	If your algorithm doesn't satisfy all the monoid properties, that doesn't mean aggregation isn't possible.
	It just means that you won't get all the benefits from monoids, which we'll cover shortly, or you might have
	a little more work to do.
	Also, you might be able to convert your data structures slightly to turn your problem into one involving monoids.
	We'll cover that topic a little later in this section.

	== Benefits of monoids

	Consider adding the integers 10 through 16.
	Because the operation of addition for integers is a monoid, we already know that we can save writing code
	and instead use the approach we saw in the earlier `inject` examples.
	There are some other nice properties.

	Because of the _closure_ property,
	if we have a pairwise method like `sum(Integer a, Integer b)`, then for a monoid, we can always
	extend that method to work with a list, e.g. `sum(List<Integer> nums)` or `sum(Integer first, Integer... rest)`.

	Because of _associativity_,
	we can employ some interesting ways to solve the aggregation including:

	* Divide and conquer algorithms which break the problem into smaller pieces
	* Various incremental algorithms for example memoization would allow summing from 1..5
	to potentially start part way through be reusing a cached value of summing 1..4 if that had been calculated earlier
	* Inherent parallelization can make use of multiple cores

	Let's just look at the first of these in more detail. With a multicore
	processor, one core could add `10` plus `11`, another core `12` plus `13`, and so on.
	We'd use the _identity_ element if needed (shown being added to `16` in our example).
	Then the intermediate results could also be added together concurrently and so on until the result was reached.

	[plantuml, MonoidAddition, png]
	....
	skinparam shadowing false
	skinparam ClassFontSize 18
	skinparam ClassBackgroundColor<<Identity>> Transparent
	skinparam ClassBorderColor<<Identity>> grey
	skinparam ClassStereotypeFontSize<<Identity>> 4
	skinparam ClassStereotypeFontColor<<Identity>> Transparent
	hide circle
	class " 0 " as zero << (I,lightgrey) Identity >>
	show zero circle
	class " 10 " as a1
	class " 11 " as a2
	class " 12 " as a3
	class " 13 " as a4
	class " 14 " as a5
	class " 15 " as a6
	class " 16 " as a7
	class " 21 " as b1
	class " 25 " as b2
	class " 29 " as b3
	class " 16 " as b4
	class " 46 " as c1
	class " 45 " as c2
	class " 91 " as d1
	a1 .r[hidden].> a2
	a2 .r[hidden].> a3
	a3 .r[hidden].> a4
	a4 .r[hidden].> a5
	a5 .r[hidden].> a6
	a6 .r[hidden].> a7
	a7 .r[hidden].> zero
	b1 <.d. a1
	b1 <.d. a2
	b2 <.d. a3
	b2 <.d. a4
	b3 <.d. a5
	b3 <.d. a6
	b4 <.d. a7
	b4 <.d. zero
	c1 <.d. b1
	c1 <.d. b2
	c2 <.d. b3
	c2 <.d. b4
	d1 <.d. c1
	d1 <.d. c2
	hide empty members
	....

	We have reduced the amount of code we need to write, and we also have potential performance gains.

	Here is how we might code the previous example using the http://gpars.org/[GPars]
	concurrency and parallelism framework (two alternatives shown):

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_gpars,indent=0]
	----

	== Working with Non-monoids

	Suppose we want to find the average of the numbers 1..10. Groovy has a built-in method for this:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_1to10,indent=0]
	----

	Now, suppose we want to build our own monoid solution instead of using the built-in version.
	It might seem difficult to find the _identity_ element. After all:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_0to10,indent=0]
	----

	Similarly, if we are tempted to write the pairwise aggregation closure it might be something like:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_bad,indent=0]
	----

	What `b` can we use for the _identity_ element here so that our equation returns the original?
	We need to use `a`, but that isn't a fixed value, so there is no _identity_.

	Also, associativity doesn't hold for this initial attempt at defining `avg` as these examples show:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_assoc,indent=0]
	----

	Also, what about our _closure_ property? Our original numbers were integers, but our average (`5.5`)
	is not. We can solve this by making our average work for any `Number` instances, but it might not always be this easy.

	It might appear that this problem is not amenable to a monoidal solution.
	However, there are numerous ways to bring monoids into the solution.

	We can split it into two parts:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_split,indent=0]
	----

	The calculation of `sum()` can follow monoid rules and then our last step can calculate the average.
	We can even do a concurrent version with GPars:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_split_gpars,indent=0]
	----

	Here, we were using the built-in `sum()` method (and `sumParallel()` for the GPars example),
	but if you were doing it by hand, the monoid nature of that part of your calculation would
	make it easier to write your own code for that step.

	Alternatively, we can introduce a helper data structure that reworks the problem to be a monoid.
	Instead of just keeping the total, let's keep a list containing the total and count of numbers.
	The code could look something like this:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_reworked_simple,indent=0]
	----

	Or, to be a little fancier, we could introduce a class for our data structure and even calculate
	concurrently:

	[source,groovy]
	----
	include::../test/DesignPatternsTest.groovy[tags=monoids_average_reworked_gpars,indent=0]
	----