| ////////////////////////////////////////// |
| |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| ////////////////////////////////////////// |
| |
| = Using Monoids |
| |
| |
| https://en.wikipedia.org/wiki/Monoid#Monoids_in_computer_science[Monoids] allow |
| the mechanics of an aggregation algorithm to be separated from the algorithm-specific logic associated with that aggregation. |
| It is often thought to be a functional design pattern. |
| |
| Perhaps, it is easiest seen with an example. Consider the code for integer sum, integer product and string concatenation. |
| We might note various similarities: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_intro,indent=0] |
| ---- |
| <1> Initialize an aggregate counter |
| <2> Loop throw elements with for/while/iteration adjusting counter |
| |
| We can remove the duplicate aggregation coding and the tease out the important differences for each algorithm. |
| We might instead use Groovy's `inject` method. This is a _fold_ operation in functional programming jargon. |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_inject,indent=0] |
| ---- |
| |
| Here the first parameter is the initial value, and the supplied closure contains the algorithm-specific logic. |
| |
| Similarly, for Groovy 3+, we can use the JDK stream API and lambda syntax as follows: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_lambdas,indent=0] |
| ---- |
| |
| == A touch of formalism |
| |
| Looking at these examples, we might think all aggregation can be supported this way. |
| In fact, we look for certain characteristics to ensure that this aggregation pattern will apply: |
| |
| * Closure: performing the aggregation step should produce a result of the same type as the elements being aggregated. |
| **** |
| Examples: `1L + 3L` produces a `Long`, and `'foo' + 'bar'` produces a `String`. + |
| Non-monoid examples: `'foo'.size() + 'bar'.size()` (takes strings, returns an integer), |
| the type _odd numbers_ with respect to addition, algorithms which don't handle null arguments if such arguments are possible. |
| **** |
| [NOTE] |
| ==== |
| When using the term _closure_ here, we simply mean closed under the operation, not the Groovy `Closure` class. |
| ==== |
| * Associativity: the order in which we apply the aggregation step should not matter. |
| **** |
| Examples: `(1 + 3) + 5` is the same as `1 + (3 + 5)`, and `('a' + 'b') + 'c'` is the same as `'a' + ('b' + 'c')`. + |
| Non-monoid example: `(10 - 5) - 3` is not equal to `10 - (5 - 3)` therefore integers are not a monoid with respect to subtraction. |
| **** |
| * Identity element (sometimes also called a 'zero' element): |
| there should be an element which aggregated with any element returns the original element. |
| **** |
| Examples: `0 + 42 == 42`, `42 + 0 == 42`, `1 * 42 == 42`, and `'' + 'foo' == 'foo'`. + |
| Non-monoid example: the type _non-empty strings_ is not a monoid with respect to concatenation. |
| **** |
| |
| If your algorithm doesn't satisfy all the monoid properties, that doesn't mean aggregation isn't possible. |
| It just means that you won't get all the benefits from monoids, which we'll cover shortly, or you might have |
| a little more work to do. |
| Also, you might be able to convert your data structures slightly to turn your problem into one involving monoids. |
| We'll cover that topic a little later in this section. |
| |
| == Benefits of monoids |
| |
| Consider adding the integers 10 through 16. |
| Because the operation of addition for integers is a monoid, we already know that we can save writing code |
| and instead use the approach we saw in the earlier `inject` examples. |
| There are some other nice properties. |
| |
| Because of the _closure_ property, |
| if we have a pairwise method like `sum(Integer a, Integer b)`, then for a monoid, we can always |
| extend that method to work with a list, e.g. `sum(List<Integer> nums)` or `sum(Integer first, Integer... rest)`. |
| |
| Because of _associativity_, |
| we can employ some interesting ways to solve the aggregation including: |
| |
| * Divide and conquer algorithms which break the problem into smaller pieces |
| * Various incremental algorithms for example memoization would allow summing from 1..5 |
| to potentially start part way through be reusing a cached value of summing 1..4 if that had been calculated earlier |
| * Inherent parallelization can make use of multiple cores |
| |
| Let's just look at the first of these in more detail. With a multicore |
| processor, one core could add `10` plus `11`, another core `12` plus `13`, and so on. |
| We'd use the _identity_ element if needed (shown being added to `16` in our example). |
| Then the intermediate results could also be added together concurrently and so on until the result was reached. |
| |
| [plantuml, MonoidAddition, png] |
| .... |
| skinparam shadowing false |
| skinparam ClassFontSize 18 |
| skinparam ClassBackgroundColor<<Identity>> Transparent |
| skinparam ClassBorderColor<<Identity>> grey |
| skinparam ClassStereotypeFontSize<<Identity>> 4 |
| skinparam ClassStereotypeFontColor<<Identity>> Transparent |
| hide circle |
| class " 0 " as zero << (I,lightgrey) Identity >> |
| show zero circle |
| class " 10 " as a1 |
| class " 11 " as a2 |
| class " 12 " as a3 |
| class " 13 " as a4 |
| class " 14 " as a5 |
| class " 15 " as a6 |
| class " 16 " as a7 |
| class " 21 " as b1 |
| class " 25 " as b2 |
| class " 29 " as b3 |
| class " 16 " as b4 |
| class " 46 " as c1 |
| class " 45 " as c2 |
| class " 91 " as d1 |
| a1 .r[hidden].> a2 |
| a2 .r[hidden].> a3 |
| a3 .r[hidden].> a4 |
| a4 .r[hidden].> a5 |
| a5 .r[hidden].> a6 |
| a6 .r[hidden].> a7 |
| a7 .r[hidden].> zero |
| b1 <.d. a1 |
| b1 <.d. a2 |
| b2 <.d. a3 |
| b2 <.d. a4 |
| b3 <.d. a5 |
| b3 <.d. a6 |
| b4 <.d. a7 |
| b4 <.d. zero |
| c1 <.d. b1 |
| c1 <.d. b2 |
| c2 <.d. b3 |
| c2 <.d. b4 |
| d1 <.d. c1 |
| d1 <.d. c2 |
| hide empty members |
| .... |
| |
| We have reduced the amount of code we need to write, and we also have potential performance gains. |
| |
| Here is how we might code the previous example using the http://gpars.org/[GPars] |
| concurrency and parallelism framework (two alternatives shown): |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_gpars,indent=0] |
| ---- |
| |
| == Working with Non-monoids |
| |
| Suppose we want to find the average of the numbers 1..10. Groovy has a built-in method for this: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_1to10,indent=0] |
| ---- |
| |
| Now, suppose we want to build our own monoid solution instead of using the built-in version. |
| It might seem difficult to find the _identity_ element. After all: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_0to10,indent=0] |
| ---- |
| |
| Similarly, if we are tempted to write the pairwise aggregation closure it might be something like: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_bad,indent=0] |
| ---- |
| |
| What `b` can we use for the _identity_ element here so that our equation returns the original? |
| We need to use `a`, but that isn't a fixed value, so there is no _identity_. |
| |
| Also, associativity doesn't hold for this initial attempt at defining `avg` as these examples show: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_assoc,indent=0] |
| ---- |
| |
| Also, what about our _closure_ property? Our original numbers were integers, but our average (`5.5`) |
| is not. We can solve this by making our average work for any `Number` instances, but it might not always be this easy. |
| |
| It might appear that this problem is not amenable to a monoidal solution. |
| However, there are numerous ways to bring monoids into the solution. |
| |
| We can split it into two parts: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_split,indent=0] |
| ---- |
| |
| The calculation of `sum()` can follow monoid rules and then our last step can calculate the average. |
| We can even do a concurrent version with GPars: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_split_gpars,indent=0] |
| ---- |
| |
| Here, we were using the built-in `sum()` method (and `sumParallel()` for the GPars example), |
| but if you were doing it by hand, the monoid nature of that part of your calculation would |
| make it easier to write your own code for that step. |
| |
| Alternatively, we can introduce a helper data structure that reworks the problem to be a monoid. |
| Instead of just keeping the total, let's keep a list containing the total and count of numbers. |
| The code could look something like this: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_reworked_simple,indent=0] |
| ---- |
| |
| Or, to be a little fancier, we could introduce a class for our data structure and even calculate |
| concurrently: |
| |
| [source,groovy] |
| ---- |
| include::../test/DesignPatternsTest.groovy[tags=monoids_average_reworked_gpars,indent=0] |
| ---- |