blog/using-groovy-with-apache-wayang.html - groovy-dev-site - Git at Google

 <!DOCTYPE html>
 <!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
 <!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
 <!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
 <!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]--><head>
     <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible' content='IE=edge'/><meta name='viewport' content='width=device-width, initial-scale=1'/><meta name='keywords' content='centroids, data science, groovy, kmeans, records, apache spark, apache wayang'/><meta name='description' content='This post looks at using Apache Wayang and Apache Spark with Apache Groovy to cluster various Whiskies.'/><title>The Apache Groovy programming language - Blogs - Using Groovy with Apache Wayang and Apache Spark</title><link href='../img/favicon.ico' type='image/x-ico' rel='icon'/><link rel='stylesheet' type='text/css' href='../css/bootstrap.css'/><link rel='stylesheet' type='text/css' href='../css/font-awesome.min.css'/><link rel='stylesheet' type='text/css' href='../css/style.css'/><link rel='stylesheet' type='text/css' href='https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.css'/>
 </head><body>
     <div id='fork-me'>
         <a href='https://github.com/apache/groovy'>
             <img style='position: fixed; top: 20px; right: -58px; border: 0; z-index: 100; transform: rotate(45deg);' src='/img/horizontal-github-ribbon.png'/>
         </a>
     </div><div id='st-container' class='st-container st-effect-9'>
         <nav class='st-menu st-effect-9' id='menu-12'>
             <h2 class='icon icon-lab'>Socialize</h2><ul>
                 <li>
                     <a href='https://groovy-lang.org/mailing-lists.html' class='icon'><span class='fa fa-envelope'></span> Discuss on the mailing-list</a>
                 </li><li>
                     <a href='https://twitter.com/ApacheGroovy' class='icon'><span class='fa fa-twitter'></span> Groovy on Twitter</a>
                 </li><li>
                     <a href='https://groovy-lang.org/events.html' class='icon'><span class='fa fa-calendar'></span> Events and conferences</a>
                 </li><li>
                     <a href='https://github.com/apache/groovy' class='icon'><span class='fa fa-github'></span> Source code on GitHub</a>
                 </li><li>
                     <a href='https://groovy-lang.org/reporting-issues.html' class='icon'><span class='fa fa-bug'></span> Report issues in Jira</a>
                 </li><li>
                     <a href='http://stackoverflow.com/questions/tagged/groovy' class='icon'><span class='fa fa-stack-overflow'></span> Stack Overflow questions</a>
                 </li><li>
                     <a href='http://groovycommunity.com/' class='icon'><span class='fa fa-slack'></span> Slack Community</a>
                 </li>
             </ul>
         </nav><div class='st-pusher'>
             <div class='st-content'>
                 <div class='st-content-inner'>
                     <!--[if lt IE 7]>
                     <p class="browsehappy">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
                 <![endif]--><div><div class='navbar navbar-default navbar-static-top' role='navigation'>
                             <div class='container'>
                                 <div class='navbar-header'>
                                     <button type='button' class='navbar-toggle' data-toggle='collapse' data-target='.navbar-collapse'>
                                         <span class='sr-only'></span><span class='icon-bar'></span><span class='icon-bar'></span><span class='icon-bar'></span>
                                     </button><a class='navbar-brand' href='../index.html'>
                                         <i class='fa fa-star'></i> Apache Groovy
                                     </a>
                                 </div><div class='navbar-collapse collapse'>
                                     <ul class='nav navbar-nav navbar-right'>
                                         <li class=''><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li class=''><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li class=''><a href='/download.html'>Download</a></li><li class=''><a href='https://groovy-lang.org/support.html'>Support</a></li><li class=''><a href='/'>Contribute</a></li><li class=''><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li class=''><a href='/blog'>Blog posts</a></li><li class=''><a href='https://groovy.apache.org/events.html'></a></li><li>
                                             <a data-effect='st-effect-9' class='st-trigger' href='#'>Socialize</a>
                                         </li><li class=''>
                                             <a href='../search.html'>
                                                 <i class='fa fa-search'></i>
                                             </a>
                                         </li>
                                     </ul>
                                 </div>
                             </div>
                         </div><div id='content' class='page-1'><div class='row'><div class='row-fluid'><div class='col-lg-3'><ul class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a href='#doc'>Using Groovy with Apache Wayang and Apache Spark</a></li><li><a href='#_whiskey_clustering' class='anchor-link'>Whiskey Clustering</a></li><li><a href='#_implementation_details' class='anchor-link'>Implementation Details</a></li><li><a href='#_running_with_the_java_streams_backed_platform' class='anchor-link'>Running with the Java streams-backed platform</a></li><li><a href='#_running_with_apache_spark' class='anchor-link'>Running with Apache Spark</a></li><li><a href='#_discussion' class='anchor-link'>Discussion</a></li><li><a href='#_conclusion' class='anchor-link'>Conclusion</a></li><li><a href='#_more_information' class='anchor-link'>More Information</a></li></ul><br/><ul class='nav-sidebar'><li style='padding: 0.35em 0.625em; background-color: #eee'><span>Related posts</span></li><li><a href='./whiskey-clustering-with-groovy-and'>Whiskey Clustering with Groovy and Apache Ignite</a></li><li><a href='./reading-and-writing-csv-files'>Reading and Writing CSV files with Groovy</a></li><li><a href='./fruity-eclipse-collections'>Fruity Eclipse Collections</a></li><li><a href='./groovy-records'>Groovy Records</a></li><li><a href='./deep-learning-and-eclipse-collections'>Deep Learning and Eclipse Collections</a></li><li><a href='./matrix-calculations-with-groovy-apache'>Matrix calculations with Groovy, Apache Commons Math, ojAlgo, Nd4j and EJML</a></li><li><a href='./groovy-record-performance'>Groovy Record Performance</a></li><li><a href='./comparators-and-sorting-in-groovy'>Comparators and Sorting in Groovy</a></li><li><a href='./deck-of-cards-with-groovy'>Deck of cards with Groovy, JDK collections and Eclipse Collections</a></li><li><a href='./detecting-objects-with-groovy-the'>Detecting objects with Groovy, the Deep Java Library (DJL), and Apache MXNet</a></li><li><a href='./classifying-iris-flowers-with-deep'>Classifying Iris Flowers with Deep Learning, Groovy and GraalVM</a></li></ul></div><div class='col-lg-8 col-lg-pull-0'><a name='doc'></a><h1>Using Groovy with Apache Wayang and Apache Spark</h1><p><span>Author: <i>Paul King</i></span><br/><span>Published: 2022-06-19 01:01PM</span></p><hr/><div id="preamble">
 <div class="sectionbody">
 <div class="paragraph">
 <p><span class="image right"><img src="https://www.apache.org/logos/res/wayang/default.png" alt="wayang logo" width="100"></span>
 <a href="https://wayang.apache.org/">Apache Wayang</a> (incubating) is an API
 for big data cross-platform processing. It provides an abstraction
 over other platforms like <a href="https://spark.apache.org/">Apache Spark</a>
 and <a href="https://flink.apache.org/">Apache Flink</a> as well as a default
 built-in stream-based "platform". The goal is to provide a
 consistent developer experience when writing code regardless of
 whether a light-weight or highly-scalable platform may eventually
 be required. Execution of the application is specified in a logical
 plan which is again platform agnostic. Wayang will transform the
 logical plan into a set of physical operators to be executed by
 specific underlying processing platforms.</p>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_whiskey_clustering">Whiskey Clustering</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><span class="image right"><img src="img/groovy_logo.png" alt="groovy logo" width="140"></span>
 We&#8217;ll take a look at using Apache Wayang with Groovy to help us in
 the quest to find the perfect single-malt Scotch whiskey.
 The whiskies produced from
 <a href="https://www.niss.org/sites/default/files/ScotchWhisky01.txt">86 distilleries</a>
 have been ranked by expert tasters according to 12 criteria
 (Body, Sweetness, Malty, Smoky, Fruity, etc.).
 We&#8217;ll use a KMeans algorithm to calculate the centroids.
 This is similar to the
 <a href="https://github.com/apache/incubator-wayang/blob/main/README.md#k-means">KMeans example in the Wayang documentation</a>
 but instead of 2 dimensions (x and y coordinates), we have 12
 dimensions corresponding to our criteria. The main point is that
 it is illustrative of typical data science and machine learning
 algorithms involving iteration (the typical map, filter, reduce
 style of processing).</p>
 </div>
 <div class="paragraph">
 <p><span class="image"><img src="img/whiskey_bottles.jpg" alt="whiskey_bottles"></span></p>
 </div>
 <div class="paragraph">
 <p>KMeans is a standard data-science clustering technique. In our
 case, it groups whiskies with similar characteristics (according
 to the 12 criteria) into clusters. If we have a favourite whiskey,
 chances are we can find something similar by looking at other
 instances in the same cluster. If we are feeling like a change,
 we can look for a whiskey in some other cluster. The centroid
 is the notional "point" in the middle of the cluster. For us,
 it reflects the typical measure of each criteria for a whiskey
 in that cluster.</p>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_implementation_details">Implementation Details</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>We&#8217;ll start with defining a Point record:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="groovy">record Point(double[] pts) implements Serializable {
     static Point fromLine(String line) { new Point(line.split(',')[2..-1]*.toDouble() as double[]) }
 }</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>We&#8217;ve made it <code>Serializable</code> (more on that later) and included
 a <code>fromLine</code> factory method to help us make points from a CSV
 file. We&#8217;ll do that ourselves rather than rely on other libraries
 which could assist. It&#8217;s not a 2D or 3D point for us but 12D
 corresponding to the 12 criteria. We just use a <code>double</code> array,
 so any dimension would be supported but the 12 comes from the
 number of columns in our data file.</p>
 </div>
 <div class="paragraph">
 <p>We&#8217;ll define a related <code>TaggedPointCounter</code> record. It&#8217;s like
 <code>Point</code> but tracks an <code>int</code> cluster id and <code>long</code> count used
 when clustering the points:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="groovy">record TaggedPointCounter(double[] pts, int cluster, long count) implements Serializable {
     TaggedPointCounter plus(TaggedPointCounter that) {
         new TaggedPointCounter((0..&lt;pts.size()).collect{ pts[it] + that.pts[it] } as double[], cluster, count + that.count)
     }

     TaggedPointCounter average() {
         new TaggedPointCounter(pts.collect{ double d -&gt; d/count } as double[], cluster, 0)
     }
 }</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>We have <code>plus</code> and <code>average</code> methods which will be helpful
 later in the map/reduce parts of the algorithm.</p>
 </div>
 <div class="paragraph">
 <p>Another aspect of the KMeans algorithm is assigning points to the
 cluster associated with their nearest centroid. For 2 dimensions,
 recalling pythagoras' theorem, this would be the square root of x
 squared plus y squared, where x and y are the distance of a point
 from the centroid in the x and y dimensions respectively. We&#8217;ll do
 the same across all dimensions and define the following helper
 class to capture this part of the algorithm:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="groovy">class SelectNearestCentroid implements ExtendedSerializableFunction&lt;Point, TaggedPointCounter&gt; {
     Iterable&lt;TaggedPointCounter&gt; centroids

     void open(ExecutionContext context) {
         centroids = context.getBroadcast("centroids")
     }

     TaggedPointCounter apply(Point p) {
         def minDistance = Double.POSITIVE_INFINITY
         def nearestCentroidId = -1
         for (c in centroids) {
             def distance = sqrt((0..&lt;p.pts.size()).collect{ p.pts[it] - c.pts[it] }.sum{ it ** 2 } as double)
             if (distance &lt; minDistance) {
                 minDistance = distance
                 nearestCentroidId = c.cluster
             }
         }
         new TaggedPointCounter(p.pts, nearestCentroidId, 1)
     }
 }</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>In Wayang parlance, the <code>SelectNearestCentroid</code> class is a
 <em>UDF</em>, a User-Defined Function. It represents some chunk of
 functionality where an optimization decision can be made about
 where to run the operation.</p>
 </div>
 <div class="paragraph">
 <p>Once we get to using Spark, the classes in the map/reduce part
 of our algorithm will need to be serializable. Method closures
 in dynamic Groovy aren&#8217;t serializable. We have a few options to
 avoid using them. I&#8217;ll show one approach here which is to use
 some helper classes in places where we might typically use method
 references. Here are the helper classes:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="groovy">class Cluster implements SerializableFunction&lt;TaggedPointCounter, Integer&gt; {
     Integer apply(TaggedPointCounter tpc) { tpc.cluster() }
 }

 class Average implements SerializableFunction&lt;TaggedPointCounter, TaggedPointCounter&gt; {
     TaggedPointCounter apply(TaggedPointCounter tpc) { tpc.average() }
 }

 class Plus implements SerializableBinaryOperator&lt;TaggedPointCounter&gt; {
     TaggedPointCounter apply(TaggedPointCounter tpc1, TaggedPointCounter tpc2) { tpc1.plus(tpc2) }
 }</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Now we are ready for our KMeans script:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="groovy">int k = 5
 int iterations = 20

 // read in data from our file
 def url = WhiskeyWayang.classLoader.getResource('whiskey.csv').file
 def pointsData = new File(url).readLines()[1..-1].collect{ Point.fromLine(it) }
 def dims = pointsData[0].pts().size()

 // create some random points as initial centroids
 def r = new Random()
 def initPts = (1..k).collect { (0..&lt;dims).collect { r.nextGaussian() + 2 } as double[] }

 // create planbuilder with Java and Spark enabled
 def configuration = new Configuration()
 def context = new WayangContext(configuration)
     .withPlugin(Java.basicPlugin())
     .withPlugin(Spark.basicPlugin())
 def planBuilder = new JavaPlanBuilder(context, "KMeans ($url, k=$k, iterations=$iterations)")

 def points = planBuilder
     .loadCollection(pointsData).withName('Load points')

 def initialCentroids = planBuilder
     .loadCollection((0..&lt;k).collect{ idx -&gt; new TaggedPointCounter(initPts[idx], idx, 0) })
     .withName("Load random centroids")

 def finalCentroids = initialCentroids
     .repeat(iterations, currentCentroids -&gt;
         points.map(new SelectNearestCentroid())
             .withBroadcast(currentCentroids, "centroids").withName("Find nearest centroid")
             .reduceByKey(new Cluster(), new Plus()).withName("Add up points")
             .map(new Average()).withName("Average points")
             .withOutputClass(TaggedPointCounter)).withName("Loop").collect()

 println 'Centroids:'
 finalCentroids.each { c -&gt;
     println "Cluster$c.cluster: ${c.pts.collect{ sprintf('%.3f', it) }.join(', ')}"
 }</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Here, <code>k</code> is the desired number of clusters, and <code>iterations</code>
 is the number of times to iterate through the KMeans loop.
 The <code>pointsData</code> variable is a list of <code>Point</code> instances loaded
 from our data file. We&#8217;d use the <code>readTextFile</code> method instead
 of <code>loadCollection</code> if our data set was large.
 The <code>initPts</code> variable is some random starting positions for our
 initial centroids. Being random, and given the way the KMeans
 algorithm works, it is possible that some of our clusters may
 have no points assigned.Our algorithm works by assigning,
 at each iteration, all the points to their closest current
 centroid and then calculating the new centroids given those
 assignments. Finally, we output the results.</p>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_running_with_the_java_streams_backed_platform">Running with the Java streams-backed platform</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>As we mentioned earlier, Wayang selects which platform(s) will
 run our application. It has numerous capabilities whereby cost
 functions and load estimators can be used to influence and
 optimize how the application is run. For our simple example,
 it is enough to know that even though we specified Java or
 Spark as options, Wayang knows that for our small data set,
 the Java streams option is the way to go.</p>
 </div>
 <div class="paragraph">
 <p>Since we prime the algorithm with random data, we
 expect the results to be slightly different each time
 the script is run, but here is one output:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="shell">&gt; Task :WhiskeyWayang:run
 Centroids:
 Cluster0: 2.548, 2.419, 1.613, 0.194, 0.097, 1.871, 1.742, 1.774, 1.677, 1.935, 1.806, 1.613
 Cluster2: 1.464, 2.679, 1.179, 0.321, 0.071, 0.786, 1.429, 0.429, 0.964, 1.643, 1.929, 2.179
 Cluster3: 3.250, 1.500, 3.250, 3.000, 0.500, 0.250, 1.625, 0.375, 1.375, 1.375, 1.250, 0.250
 Cluster4: 1.684, 1.842, 1.211, 0.421, 0.053, 1.316, 0.632, 0.737, 1.895, 2.000, 1.842, 1.737
 ...</code></pre>
 </div>
 </div>
 <div class="paragraph">
 <p>Which, if plotted looks like this:</p>
 </div>
 <div class="paragraph">
 <p><span class="image"><img src="img/whiskey_wayang_kmeans_spiderplot.png" alt="WhiskeyWayang Centroid Spider Plot"></span></p>
 </div>
 <div class="paragraph">
 <p>If you are interested, check out the examples in the repo links
 at the end of this article to see the code for producing this
 centroid spider plot or the Jupyter/BeakerX notebook in this
 project&#8217;s GitHub repo.</p>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_running_with_apache_spark">Running with Apache Spark</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><span class="image right"><img src="https://www.apache.org/logos/res/spark/default.png" alt="spark logo" width="100"></span>
 Given our small dataset size and no other customization, Wayang
 will choose the Java streams based solution. We could use Wayang
 optimization features to influence which processing platform it
 chooses, but to keep things simple, we&#8217;ll just disable the Java
 streams platform in our configuration by making the following
 change in our code:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="prettyprint highlight"><code data-lang="groovy">...
 def configuration = new Configuration()
 def context = new WayangContext(configuration)
 //    .withPlugin(Java.basicPlugin())                <b class="conum">(1)</b>
     .withPlugin(Spark.basicPlugin())
 def planBuilder = new JavaPlanBuilder(context, "KMeans ($url, k=$k, iterations=$iterations)")
 ...</code></pre>
 </div>
 </div>
 <div class="colist arabic">
 <ol>
 <li>
 <p>Disabled</p>
 </li>
 </ol>
 </div>
 <div class="paragraph">
 <p>Now when we run the application, the output will be something like
 this (a solution similar to before but with 1000+ extra lines of
 Spark and Wayang log information - truncated for presentation purposes):</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre>[main] INFO org.apache.spark.SparkContext - Running Spark version 3.3.0
 [main] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 62081.
 ...
 Centroids:
 Cluster4: 1.414, 2.448, 0.966, 0.138, 0.034, 0.862, 1.000, 0.483, 1.345, 1.690, 2.103, 2.138
 Cluster0: 2.773, 2.455, 1.455, 0.000, 0.000, 1.909, 1.682, 1.955, 2.091, 2.045, 2.136, 1.818
 Cluster1: 1.762, 2.286, 1.571, 0.619, 0.143, 1.714, 1.333, 0.905, 1.190, 1.952, 1.095, 1.524
 Cluster2: 3.250, 1.500, 3.250, 3.000, 0.500, 0.250, 1.625, 0.375, 1.375, 1.375, 1.250, 0.250
 Cluster3: 2.167, 2.000, 2.167, 1.000, 0.333, 0.333, 2.000, 0.833, 0.833, 1.500, 2.333, 1.667
 ...
 [shutdown-hook-0] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext
 [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Shutdown hook called</pre>
 </div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_discussion">Discussion</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>A goal of Apache Wayang is to allow developers to write
 platform-agnostic applications. While this is mostly true,
 the abstractions aren&#8217;t perfect. As an example, if I know I
 am only using the streams-backed platform, I don&#8217;t need to worry
 about making any of my classes serializable (which is a Spark
 requirement). In our example, we could have omitted the
 <code>implements Serializable</code> part of the <code>TaggedPointCounter</code> record,
 and we could have used a method reference
 <code>TaggedPointCounter::average</code> instead of our <code>Average</code>
 helper class. This isn&#8217;t meant to be a criticism of Wayang,
 after all if you want to write cross-platform UDFs, you might
 expect to have to follow some rules. Instead, it is meant to
 just indicate that abstractions often have leaks around the edges.
 Sometimes those leaks can be beneficially used, other times they
 are traps waiting for unknowing developers.</p>
 </div>
 <div class="paragraph">
 <p>To summarise, if using the Java streams-backed platform, you can
 run the application on JDK17 (which uses native records) as well
 as JDK11 and JDK8 (where Groovy provides emulated records).
 Also, we could make numerous simplifications if we desired.
 When using the Spark processing platform, the potential
 simplifications aren&#8217;t applicable, and we can run on JDK8 and
 JDK11 (Spark isn&#8217;t yet supported on JDK17).</p>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_conclusion">Conclusion</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>We have looked at using Apache Wayang to implement a KMeans
 algorithm that runs either backed by the JDK streams capabilities
 or by Apache Spark. The Wayang API hid from us some of the
 complexities of writing code that works on a distributed platform
 and some of the intricacies of dealing with the Spark platform.
 The abstractions aren&#8217;t perfect, but they certainly aren&#8217;t hard to
 use and provide extra protection should we wish to move between
 platforms. As an added bonus, they open up numerous optimization
 possibilities.</p>
 </div>
 <div class="paragraph">
 <p>Apache Wayang is an incubating project at Apache and still has
 work to do before it graduates but lots of work has gone on
 previously (it was previously known as Rheem and was started
 in 2015). Platform-agnostic applications is a holy grail that
 has been desired for many years but is hard to achieve.
 It should be exciting to see how far Apache Wayang progresses
 in achieving this goal.</p>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_more_information">More Information</h2>
 <div class="sectionbody">
 <div class="ulist">
 <ul>
 <li>
 <p>Repo containing the source code: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/WhiskeyWayang">WhiskeyWayang</a></p>
 </li>
 <li>
 <p>Repo containing similar examples using a variety of libraries including Apache Commons CSV, Weka, Smile, Tribuo and others: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/Whiskey">Whiskey</a></p>
 </li>
 <li>
 <p>A similar example using Apache Spark directly but with a built-in parallelized KMeans from the <code>spark-mllib</code> library rather than a hand-crafted algorithm: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/WhiskeySpark">WhiskeySpark</a></p>
 </li>
 <li>
 <p>A similar example using Apache Ignite directly but with a built-in clustered KMeans from the <code>ignite-ml</code> library rather than a hand-crafted algorithm: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/WhiskeyIgnite">WhiskeyIgnite</a></p>
 </li>
 </ul>
 </div>
 </div>
 </div></div></div></div></div><footer id='footer'>
                             <div class='row'>
                                 <div class='colset-3-footer'>
                                     <div class='col-1'>
                                         <h1>Groovy</h1><ul>
                                             <li><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li><a href='/download.html'>Download</a></li><li><a href='https://groovy-lang.org/support.html'>Support</a></li><li><a href='/'>Contribute</a></li><li><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li><a href='/blog'>Blog posts</a></li><li><a href='https://groovy.apache.org/events.html'></a></li>
                                         </ul>
                                     </div><div class='col-2'>
                                         <h1>About</h1><ul>
                                             <li><a href='https://github.com/apache/groovy'>Source code</a></li><li><a href='https://groovy-lang.org/security.html'>Security</a></li><li><a href='https://groovy-lang.org/learn.html#books'>Books</a></li><li><a href='https://groovy-lang.org/thanks.html'>Thanks</a></li><li><a href='http://www.apache.org/foundation/sponsorship.html'>Sponsorship</a></li><li><a href='https://groovy-lang.org/faq.html'>FAQ</a></li><li><a href='https://groovy-lang.org/search.html'>Search</a></li>
                                         </ul>
                                     </div><div class='col-3'>
                                         <h1>Socialize</h1><ul>
                                             <li><a href='https://groovy-lang.org/mailing-lists.html'>Discuss on the mailing-list</a></li><li><a href='https://twitter.com/ApacheGroovy'>Groovy on Twitter</a></li><li><a href='https://groovy-lang.org/events.html'>Events and conferences</a></li><li><a href='https://github.com/apache/groovy'>Source code on GitHub</a></li><li><a href='https://groovy-lang.org/reporting-issues.html'>Report issues in Jira</a></li><li><a href='http://stackoverflow.com/questions/tagged/groovy'>Stack Overflow questions</a></li><li><a href='http://groovycommunity.com/'>Slack Community</a></li>
                                         </ul>
                                     </div><div class='col-right'>
                                         <p>
                                             The Groovy programming language is supported by the <a href='http://www.apache.org'>Apache Software Foundation</a> and the Groovy community.
                                         </p><div text-align='right'>
                                             <img src='../img/asf_logo.png' title='The Apache Software Foundation' alt='The Apache Software Foundation' style='width:60%'/>
                                         </div><p>Apache&reg; and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p>
                                     </div>
                                 </div><div class='clearfix'>&copy; 2003-2023 the Apache Groovy project &mdash; Groovy is Open Source: <a href='http://www.apache.org/licenses/LICENSE-2.0.html' alt='Apache 2 License'>license</a>, <a href='https://privacy.apache.org/policies/privacy-policy-public.html'>privacy policy</a>.</div>
                             </div>
                         </footer></div>
                 </div>
             </div>
         </div>
     </div><script src='../js/vendor/jquery-1.10.2.min.js' defer></script><script src='../js/vendor/classie.js' defer></script><script src='../js/vendor/bootstrap.js' defer></script><script src='../js/vendor/sidebarEffects.js' defer></script><script src='../js/vendor/modernizr-2.6.2.min.js' defer></script><script src='../js/plugins.js' defer></script><script src='https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.js'></script><script>document.addEventListener('DOMContentLoaded',prettyPrint)</script><script>
           (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
           (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
           m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
           })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

           ga('create', 'UA-257558-10', 'auto');
           ga('send', 'pageview');
     </script>
 </body></html>