| <!DOCTYPE html> |
| <!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--> |
| <!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]--> |
| <!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]--> |
| <!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]--><head> |
| <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible' content='IE=edge'/><meta name='viewport' content='width=device-width, initial-scale=1'/><meta name='keywords' content='centroids, data science, groovy, kmeans, records, apache spark, apache wayang'/><meta name='description' content='This post looks at using Apache Wayang and Apache Spark with Apache Groovy to cluster various Whiskies.'/><title>The Apache Groovy programming language - Blogs - Using Groovy with Apache Wayang and Apache Spark</title><link href='../img/favicon.ico' type='image/x-ico' rel='icon'/><link rel='stylesheet' type='text/css' href='../css/bootstrap.css'/><link rel='stylesheet' type='text/css' href='../css/font-awesome.min.css'/><link rel='stylesheet' type='text/css' href='../css/style.css'/><link rel='stylesheet' type='text/css' href='https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.css'/> |
| </head><body> |
| <div id='fork-me'> |
| <a href='https://github.com/apache/groovy'> |
| <img style='position: fixed; top: 20px; right: -58px; border: 0; z-index: 100; transform: rotate(45deg);' src='/img/horizontal-github-ribbon.png'/> |
| </a> |
| </div><div id='st-container' class='st-container st-effect-9'> |
| <nav class='st-menu st-effect-9' id='menu-12'> |
| <h2 class='icon icon-lab'>Socialize</h2><ul> |
| <li> |
| <a href='https://groovy-lang.org/mailing-lists.html' class='icon'><span class='fa fa-envelope'></span> Discuss on the mailing-list</a> |
| </li><li> |
| <a href='https://twitter.com/ApacheGroovy' class='icon'><span class='fa fa-twitter'></span> Groovy on Twitter</a> |
| </li><li> |
| <a href='https://groovy-lang.org/events.html' class='icon'><span class='fa fa-calendar'></span> Events and conferences</a> |
| </li><li> |
| <a href='https://github.com/apache/groovy' class='icon'><span class='fa fa-github'></span> Source code on GitHub</a> |
| </li><li> |
| <a href='https://groovy-lang.org/reporting-issues.html' class='icon'><span class='fa fa-bug'></span> Report issues in Jira</a> |
| </li><li> |
| <a href='http://stackoverflow.com/questions/tagged/groovy' class='icon'><span class='fa fa-stack-overflow'></span> Stack Overflow questions</a> |
| </li><li> |
| <a href='http://groovycommunity.com/' class='icon'><span class='fa fa-slack'></span> Slack Community</a> |
| </li> |
| </ul> |
| </nav><div class='st-pusher'> |
| <div class='st-content'> |
| <div class='st-content-inner'> |
| <!--[if lt IE 7]> |
| <p class="browsehappy">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p> |
| <![endif]--><div><div class='navbar navbar-default navbar-static-top' role='navigation'> |
| <div class='container'> |
| <div class='navbar-header'> |
| <button type='button' class='navbar-toggle' data-toggle='collapse' data-target='.navbar-collapse'> |
| <span class='sr-only'></span><span class='icon-bar'></span><span class='icon-bar'></span><span class='icon-bar'></span> |
| </button><a class='navbar-brand' href='../index.html'> |
| <i class='fa fa-star'></i> Apache Groovy |
| </a> |
| </div><div class='navbar-collapse collapse'> |
| <ul class='nav navbar-nav navbar-right'> |
| <li class=''><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li class=''><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li class=''><a href='/download.html'>Download</a></li><li class=''><a href='https://groovy-lang.org/support.html'>Support</a></li><li class=''><a href='/'>Contribute</a></li><li class=''><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li class=''><a href='/blog'>Blog posts</a></li><li class=''><a href='https://groovy.apache.org/events.html'></a></li><li> |
| <a data-effect='st-effect-9' class='st-trigger' href='#'>Socialize</a> |
| </li><li class=''> |
| <a href='../search.html'> |
| <i class='fa fa-search'></i> |
| </a> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div><div id='content' class='page-1'><div class='row'><div class='row-fluid'><div class='col-lg-3'><ul class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a href='#doc'>Using Groovy with Apache Wayang and Apache Spark</a></li><li><a href='#_whiskey_clustering' class='anchor-link'>Whiskey Clustering</a></li><li><a href='#_implementation_details' class='anchor-link'>Implementation Details</a></li><li><a href='#_running_with_the_java_streams_backed_platform' class='anchor-link'>Running with the Java streams-backed platform</a></li><li><a href='#_running_with_apache_spark' class='anchor-link'>Running with Apache Spark</a></li><li><a href='#_discussion' class='anchor-link'>Discussion</a></li><li><a href='#_conclusion' class='anchor-link'>Conclusion</a></li><li><a href='#_more_information' class='anchor-link'>More Information</a></li></ul><br/><ul class='nav-sidebar'><li style='padding: 0.35em 0.625em; background-color: #eee'><span>Related posts</span></li><li><a href='./whiskey-clustering-with-groovy-and'>Whiskey Clustering with Groovy and Apache Ignite</a></li><li><a href='./reading-and-writing-csv-files'>Reading and Writing CSV files with Groovy</a></li><li><a href='./fruity-eclipse-collections'>Fruity Eclipse Collections</a></li><li><a href='./groovy-records'>Groovy Records</a></li><li><a href='./deep-learning-and-eclipse-collections'>Deep Learning and Eclipse Collections</a></li><li><a href='./matrix-calculations-with-groovy-apache'>Matrix calculations with Groovy, Apache Commons Math, ojAlgo, Nd4j and EJML</a></li><li><a href='./groovy-record-performance'>Groovy Record Performance</a></li><li><a href='./comparators-and-sorting-in-groovy'>Comparators and Sorting in Groovy</a></li><li><a href='./deck-of-cards-with-groovy'>Deck of cards with Groovy, JDK collections and Eclipse Collections</a></li><li><a href='./detecting-objects-with-groovy-the'>Detecting objects with Groovy, the Deep Java Library (DJL), and Apache MXNet</a></li><li><a href='./classifying-iris-flowers-with-deep'>Classifying Iris Flowers with Deep Learning, Groovy and GraalVM</a></li></ul></div><div class='col-lg-8 col-lg-pull-0'><a name='doc'></a><h1>Using Groovy with Apache Wayang and Apache Spark</h1><p><span>Author: <i>Paul King</i></span><br/><span>Published: 2022-06-19 01:01PM</span></p><hr/><div id="preamble"> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p><span class="image right"><img src="https://www.apache.org/logos/res/wayang/default.png" alt="wayang logo" width="100"></span> |
| <a href="https://wayang.apache.org/">Apache Wayang</a> (incubating) is an API |
| for big data cross-platform processing. It provides an abstraction |
| over other platforms like <a href="https://spark.apache.org/">Apache Spark</a> |
| and <a href="https://flink.apache.org/">Apache Flink</a> as well as a default |
| built-in stream-based "platform". The goal is to provide a |
| consistent developer experience when writing code regardless of |
| whether a light-weight or highly-scalable platform may eventually |
| be required. Execution of the application is specified in a logical |
| plan which is again platform agnostic. Wayang will transform the |
| logical plan into a set of physical operators to be executed by |
| specific underlying processing platforms.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_whiskey_clustering">Whiskey Clustering</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p><span class="image right"><img src="img/groovy_logo.png" alt="groovy logo" width="140"></span> |
| We’ll take a look at using Apache Wayang with Groovy to help us in |
| the quest to find the perfect single-malt Scotch whiskey. |
| The whiskies produced from |
| <a href="https://www.niss.org/sites/default/files/ScotchWhisky01.txt">86 distilleries</a> |
| have been ranked by expert tasters according to 12 criteria |
| (Body, Sweetness, Malty, Smoky, Fruity, etc.). |
| We’ll use a KMeans algorithm to calculate the centroids. |
| This is similar to the |
| <a href="https://github.com/apache/incubator-wayang/blob/main/README.md#k-means">KMeans example in the Wayang documentation</a> |
| but instead of 2 dimensions (x and y coordinates), we have 12 |
| dimensions corresponding to our criteria. The main point is that |
| it is illustrative of typical data science and machine learning |
| algorithms involving iteration (the typical map, filter, reduce |
| style of processing).</p> |
| </div> |
| <div class="paragraph"> |
| <p><span class="image"><img src="img/whiskey_bottles.jpg" alt="whiskey_bottles"></span></p> |
| </div> |
| <div class="paragraph"> |
| <p>KMeans is a standard data-science clustering technique. In our |
| case, it groups whiskies with similar characteristics (according |
| to the 12 criteria) into clusters. If we have a favourite whiskey, |
| chances are we can find something similar by looking at other |
| instances in the same cluster. If we are feeling like a change, |
| we can look for a whiskey in some other cluster. The centroid |
| is the notional "point" in the middle of the cluster. For us, |
| it reflects the typical measure of each criteria for a whiskey |
| in that cluster.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_implementation_details">Implementation Details</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>We’ll start with defining a Point record:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="groovy">record Point(double[] pts) implements Serializable { |
| static Point fromLine(String line) { new Point(line.split(',')[2..-1]*.toDouble() as double[]) } |
| }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>We’ve made it <code>Serializable</code> (more on that later) and included |
| a <code>fromLine</code> factory method to help us make points from a CSV |
| file. We’ll do that ourselves rather than rely on other libraries |
| which could assist. It’s not a 2D or 3D point for us but 12D |
| corresponding to the 12 criteria. We just use a <code>double</code> array, |
| so any dimension would be supported but the 12 comes from the |
| number of columns in our data file.</p> |
| </div> |
| <div class="paragraph"> |
| <p>We’ll define a related <code>TaggedPointCounter</code> record. It’s like |
| <code>Point</code> but tracks an <code>int</code> cluster id and <code>long</code> count used |
| when clustering the points:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="groovy">record TaggedPointCounter(double[] pts, int cluster, long count) implements Serializable { |
| TaggedPointCounter plus(TaggedPointCounter that) { |
| new TaggedPointCounter((0..<pts.size()).collect{ pts[it] + that.pts[it] } as double[], cluster, count + that.count) |
| } |
| |
| TaggedPointCounter average() { |
| new TaggedPointCounter(pts.collect{ double d -> d/count } as double[], cluster, 0) |
| } |
| }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>We have <code>plus</code> and <code>average</code> methods which will be helpful |
| later in the map/reduce parts of the algorithm.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Another aspect of the KMeans algorithm is assigning points to the |
| cluster associated with their nearest centroid. For 2 dimensions, |
| recalling pythagoras' theorem, this would be the square root of x |
| squared plus y squared, where x and y are the distance of a point |
| from the centroid in the x and y dimensions respectively. We’ll do |
| the same across all dimensions and define the following helper |
| class to capture this part of the algorithm:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="groovy">class SelectNearestCentroid implements ExtendedSerializableFunction<Point, TaggedPointCounter> { |
| Iterable<TaggedPointCounter> centroids |
| |
| void open(ExecutionContext context) { |
| centroids = context.getBroadcast("centroids") |
| } |
| |
| TaggedPointCounter apply(Point p) { |
| def minDistance = Double.POSITIVE_INFINITY |
| def nearestCentroidId = -1 |
| for (c in centroids) { |
| def distance = sqrt((0..<p.pts.size()).collect{ p.pts[it] - c.pts[it] }.sum{ it ** 2 } as double) |
| if (distance < minDistance) { |
| minDistance = distance |
| nearestCentroidId = c.cluster |
| } |
| } |
| new TaggedPointCounter(p.pts, nearestCentroidId, 1) |
| } |
| }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>In Wayang parlance, the <code>SelectNearestCentroid</code> class is a |
| <em>UDF</em>, a User-Defined Function. It represents some chunk of |
| functionality where an optimization decision can be made about |
| where to run the operation.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Once we get to using Spark, the classes in the map/reduce part |
| of our algorithm will need to be serializable. Method closures |
| in dynamic Groovy aren’t serializable. We have a few options to |
| avoid using them. I’ll show one approach here which is to use |
| some helper classes in places where we might typically use method |
| references. Here are the helper classes:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="groovy">class Cluster implements SerializableFunction<TaggedPointCounter, Integer> { |
| Integer apply(TaggedPointCounter tpc) { tpc.cluster() } |
| } |
| |
| class Average implements SerializableFunction<TaggedPointCounter, TaggedPointCounter> { |
| TaggedPointCounter apply(TaggedPointCounter tpc) { tpc.average() } |
| } |
| |
| class Plus implements SerializableBinaryOperator<TaggedPointCounter> { |
| TaggedPointCounter apply(TaggedPointCounter tpc1, TaggedPointCounter tpc2) { tpc1.plus(tpc2) } |
| }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Now we are ready for our KMeans script:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="groovy">int k = 5 |
| int iterations = 20 |
| |
| // read in data from our file |
| def url = WhiskeyWayang.classLoader.getResource('whiskey.csv').file |
| def pointsData = new File(url).readLines()[1..-1].collect{ Point.fromLine(it) } |
| def dims = pointsData[0].pts().size() |
| |
| // create some random points as initial centroids |
| def r = new Random() |
| def initPts = (1..k).collect { (0..<dims).collect { r.nextGaussian() + 2 } as double[] } |
| |
| // create planbuilder with Java and Spark enabled |
| def configuration = new Configuration() |
| def context = new WayangContext(configuration) |
| .withPlugin(Java.basicPlugin()) |
| .withPlugin(Spark.basicPlugin()) |
| def planBuilder = new JavaPlanBuilder(context, "KMeans ($url, k=$k, iterations=$iterations)") |
| |
| def points = planBuilder |
| .loadCollection(pointsData).withName('Load points') |
| |
| def initialCentroids = planBuilder |
| .loadCollection((0..<k).collect{ idx -> new TaggedPointCounter(initPts[idx], idx, 0) }) |
| .withName("Load random centroids") |
| |
| def finalCentroids = initialCentroids |
| .repeat(iterations, currentCentroids -> |
| points.map(new SelectNearestCentroid()) |
| .withBroadcast(currentCentroids, "centroids").withName("Find nearest centroid") |
| .reduceByKey(new Cluster(), new Plus()).withName("Add up points") |
| .map(new Average()).withName("Average points") |
| .withOutputClass(TaggedPointCounter)).withName("Loop").collect() |
| |
| println 'Centroids:' |
| finalCentroids.each { c -> |
| println "Cluster$c.cluster: ${c.pts.collect{ sprintf('%.3f', it) }.join(', ')}" |
| }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Here, <code>k</code> is the desired number of clusters, and <code>iterations</code> |
| is the number of times to iterate through the KMeans loop. |
| The <code>pointsData</code> variable is a list of <code>Point</code> instances loaded |
| from our data file. We’d use the <code>readTextFile</code> method instead |
| of <code>loadCollection</code> if our data set was large. |
| The <code>initPts</code> variable is some random starting positions for our |
| initial centroids. Being random, and given the way the KMeans |
| algorithm works, it is possible that some of our clusters may |
| have no points assigned.Our algorithm works by assigning, |
| at each iteration, all the points to their closest current |
| centroid and then calculating the new centroids given those |
| assignments. Finally, we output the results.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_running_with_the_java_streams_backed_platform">Running with the Java streams-backed platform</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>As we mentioned earlier, Wayang selects which platform(s) will |
| run our application. It has numerous capabilities whereby cost |
| functions and load estimators can be used to influence and |
| optimize how the application is run. For our simple example, |
| it is enough to know that even though we specified Java or |
| Spark as options, Wayang knows that for our small data set, |
| the Java streams option is the way to go.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Since we prime the algorithm with random data, we |
| expect the results to be slightly different each time |
| the script is run, but here is one output:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="shell">> Task :WhiskeyWayang:run |
| Centroids: |
| Cluster0: 2.548, 2.419, 1.613, 0.194, 0.097, 1.871, 1.742, 1.774, 1.677, 1.935, 1.806, 1.613 |
| Cluster2: 1.464, 2.679, 1.179, 0.321, 0.071, 0.786, 1.429, 0.429, 0.964, 1.643, 1.929, 2.179 |
| Cluster3: 3.250, 1.500, 3.250, 3.000, 0.500, 0.250, 1.625, 0.375, 1.375, 1.375, 1.250, 0.250 |
| Cluster4: 1.684, 1.842, 1.211, 0.421, 0.053, 1.316, 0.632, 0.737, 1.895, 2.000, 1.842, 1.737 |
| ...</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Which, if plotted looks like this:</p> |
| </div> |
| <div class="paragraph"> |
| <p><span class="image"><img src="img/whiskey_wayang_kmeans_spiderplot.png" alt="WhiskeyWayang Centroid Spider Plot"></span></p> |
| </div> |
| <div class="paragraph"> |
| <p>If you are interested, check out the examples in the repo links |
| at the end of this article to see the code for producing this |
| centroid spider plot or the Jupyter/BeakerX notebook in this |
| project’s GitHub repo.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_running_with_apache_spark">Running with Apache Spark</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p><span class="image right"><img src="https://www.apache.org/logos/res/spark/default.png" alt="spark logo" width="100"></span> |
| Given our small dataset size and no other customization, Wayang |
| will choose the Java streams based solution. We could use Wayang |
| optimization features to influence which processing platform it |
| chooses, but to keep things simple, we’ll just disable the Java |
| streams platform in our configuration by making the following |
| change in our code:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="prettyprint highlight"><code data-lang="groovy">... |
| def configuration = new Configuration() |
| def context = new WayangContext(configuration) |
| // .withPlugin(Java.basicPlugin()) <b class="conum">(1)</b> |
| .withPlugin(Spark.basicPlugin()) |
| def planBuilder = new JavaPlanBuilder(context, "KMeans ($url, k=$k, iterations=$iterations)") |
| ...</code></pre> |
| </div> |
| </div> |
| <div class="colist arabic"> |
| <ol> |
| <li> |
| <p>Disabled</p> |
| </li> |
| </ol> |
| </div> |
| <div class="paragraph"> |
| <p>Now when we run the application, the output will be something like |
| this (a solution similar to before but with 1000+ extra lines of |
| Spark and Wayang log information - truncated for presentation purposes):</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre>[main] INFO org.apache.spark.SparkContext - Running Spark version 3.3.0 |
| [main] INFO org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 62081. |
| ... |
| Centroids: |
| Cluster4: 1.414, 2.448, 0.966, 0.138, 0.034, 0.862, 1.000, 0.483, 1.345, 1.690, 2.103, 2.138 |
| Cluster0: 2.773, 2.455, 1.455, 0.000, 0.000, 1.909, 1.682, 1.955, 2.091, 2.045, 2.136, 1.818 |
| Cluster1: 1.762, 2.286, 1.571, 0.619, 0.143, 1.714, 1.333, 0.905, 1.190, 1.952, 1.095, 1.524 |
| Cluster2: 3.250, 1.500, 3.250, 3.000, 0.500, 0.250, 1.625, 0.375, 1.375, 1.375, 1.250, 0.250 |
| Cluster3: 2.167, 2.000, 2.167, 1.000, 0.333, 0.333, 2.000, 0.833, 0.833, 1.500, 2.333, 1.667 |
| ... |
| [shutdown-hook-0] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext |
| [shutdown-hook-0] INFO org.apache.spark.util.ShutdownHookManager - Shutdown hook called</pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_discussion">Discussion</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>A goal of Apache Wayang is to allow developers to write |
| platform-agnostic applications. While this is mostly true, |
| the abstractions aren’t perfect. As an example, if I know I |
| am only using the streams-backed platform, I don’t need to worry |
| about making any of my classes serializable (which is a Spark |
| requirement). In our example, we could have omitted the |
| <code>implements Serializable</code> part of the <code>TaggedPointCounter</code> record, |
| and we could have used a method reference |
| <code>TaggedPointCounter::average</code> instead of our <code>Average</code> |
| helper class. This isn’t meant to be a criticism of Wayang, |
| after all if you want to write cross-platform UDFs, you might |
| expect to have to follow some rules. Instead, it is meant to |
| just indicate that abstractions often have leaks around the edges. |
| Sometimes those leaks can be beneficially used, other times they |
| are traps waiting for unknowing developers.</p> |
| </div> |
| <div class="paragraph"> |
| <p>To summarise, if using the Java streams-backed platform, you can |
| run the application on JDK17 (which uses native records) as well |
| as JDK11 and JDK8 (where Groovy provides emulated records). |
| Also, we could make numerous simplifications if we desired. |
| When using the Spark processing platform, the potential |
| simplifications aren’t applicable, and we can run on JDK8 and |
| JDK11 (Spark isn’t yet supported on JDK17).</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_conclusion">Conclusion</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>We have looked at using Apache Wayang to implement a KMeans |
| algorithm that runs either backed by the JDK streams capabilities |
| or by Apache Spark. The Wayang API hid from us some of the |
| complexities of writing code that works on a distributed platform |
| and some of the intricacies of dealing with the Spark platform. |
| The abstractions aren’t perfect, but they certainly aren’t hard to |
| use and provide extra protection should we wish to move between |
| platforms. As an added bonus, they open up numerous optimization |
| possibilities.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Apache Wayang is an incubating project at Apache and still has |
| work to do before it graduates but lots of work has gone on |
| previously (it was previously known as Rheem and was started |
| in 2015). Platform-agnostic applications is a holy grail that |
| has been desired for many years but is hard to achieve. |
| It should be exciting to see how far Apache Wayang progresses |
| in achieving this goal.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_more_information">More Information</h2> |
| <div class="sectionbody"> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Repo containing the source code: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/WhiskeyWayang">WhiskeyWayang</a></p> |
| </li> |
| <li> |
| <p>Repo containing similar examples using a variety of libraries including Apache Commons CSV, Weka, Smile, Tribuo and others: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/Whiskey">Whiskey</a></p> |
| </li> |
| <li> |
| <p>A similar example using Apache Spark directly but with a built-in parallelized KMeans from the <code>spark-mllib</code> library rather than a hand-crafted algorithm: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/WhiskeySpark">WhiskeySpark</a></p> |
| </li> |
| <li> |
| <p>A similar example using Apache Ignite directly but with a built-in clustered KMeans from the <code>ignite-ml</code> library rather than a hand-crafted algorithm: <a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/WhiskeyIgnite">WhiskeyIgnite</a></p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div></div></div></div></div><footer id='footer'> |
| <div class='row'> |
| <div class='colset-3-footer'> |
| <div class='col-1'> |
| <h1>Groovy</h1><ul> |
| <li><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li><a href='/download.html'>Download</a></li><li><a href='https://groovy-lang.org/support.html'>Support</a></li><li><a href='/'>Contribute</a></li><li><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li><a href='/blog'>Blog posts</a></li><li><a href='https://groovy.apache.org/events.html'></a></li> |
| </ul> |
| </div><div class='col-2'> |
| <h1>About</h1><ul> |
| <li><a href='https://github.com/apache/groovy'>Source code</a></li><li><a href='https://groovy-lang.org/security.html'>Security</a></li><li><a href='https://groovy-lang.org/learn.html#books'>Books</a></li><li><a href='https://groovy-lang.org/thanks.html'>Thanks</a></li><li><a href='http://www.apache.org/foundation/sponsorship.html'>Sponsorship</a></li><li><a href='https://groovy-lang.org/faq.html'>FAQ</a></li><li><a href='https://groovy-lang.org/search.html'>Search</a></li> |
| </ul> |
| </div><div class='col-3'> |
| <h1>Socialize</h1><ul> |
| <li><a href='https://groovy-lang.org/mailing-lists.html'>Discuss on the mailing-list</a></li><li><a href='https://twitter.com/ApacheGroovy'>Groovy on Twitter</a></li><li><a href='https://groovy-lang.org/events.html'>Events and conferences</a></li><li><a href='https://github.com/apache/groovy'>Source code on GitHub</a></li><li><a href='https://groovy-lang.org/reporting-issues.html'>Report issues in Jira</a></li><li><a href='http://stackoverflow.com/questions/tagged/groovy'>Stack Overflow questions</a></li><li><a href='http://groovycommunity.com/'>Slack Community</a></li> |
| </ul> |
| </div><div class='col-right'> |
| <p> |
| The Groovy programming language is supported by the <a href='http://www.apache.org'>Apache Software Foundation</a> and the Groovy community. |
| </p><div text-align='right'> |
| <img src='../img/asf_logo.png' title='The Apache Software Foundation' alt='The Apache Software Foundation' style='width:60%'/> |
| </div><p>Apache® and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p> |
| </div> |
| </div><div class='clearfix'>© 2003-2023 the Apache Groovy project — Groovy is Open Source: <a href='http://www.apache.org/licenses/LICENSE-2.0.html' alt='Apache 2 License'>license</a>, <a href='https://privacy.apache.org/policies/privacy-policy-public.html'>privacy policy</a>.</div> |
| </div> |
| </footer></div> |
| </div> |
| </div> |
| </div> |
| </div><script src='../js/vendor/jquery-1.10.2.min.js' defer></script><script src='../js/vendor/classie.js' defer></script><script src='../js/vendor/bootstrap.js' defer></script><script src='../js/vendor/sidebarEffects.js' defer></script><script src='../js/vendor/modernizr-2.6.2.min.js' defer></script><script src='../js/plugins.js' defer></script><script src='https://cdnjs.cloudflare.com/ajax/libs/prettify/r298/prettify.min.js'></script><script>document.addEventListener('DOMContentLoaded',prettyPrint)</script><script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-257558-10', 'auto'); |
| ga('send', 'pageview'); |
| </script> |
| </body></html> |