<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US"><generator uri="https://jekyllrb.com/" version="3.9.2">Jekyll</generator><link href="http://nemo.apache.org//feed.xml" rel="self" type="application/atom+xml" /><link href="http://nemo.apache.org//" rel="alternate" type="text/html" hreflang="en-US" /><updated>2022-09-10T00:38:05+09:00</updated><id>http://nemo.apache.org//feed.xml</id><title type="html">Nemo</title><subtitle>A Data Processing System for Flexible Employment With Different Deployment Characteristics.
</subtitle><entry><title type="html">Nemo Release 0.4</title><link href="http://nemo.apache.org//blog/2022/09/10/release-note-0.4/" rel="alternate" type="text/html" title="Nemo Release 0.4" /><published>2022-09-10T00:00:00+09:00</published><updated>2022-09-10T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2022/09/10/release-note-0.4</id><content type="html" xml:base="http://nemo.apache.org//blog/2022/09/10/release-note-0.4/">&lt;p&gt;Release Notes - Apache Nemo - Version 0.4&lt;/p&gt;

&lt;h3 id=&quot;heres-a-brief-summary-about-the-release&quot;&gt;Here’s a brief summary about the release:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;This release fixes a few high-priority bugs in 0.3 and has a variety
of smaller fixes.&lt;/li&gt;
  &lt;li&gt;The release also includes several major functionalities regarding
stream processing.&lt;/li&gt;
  &lt;li&gt;Installation script for nemo prerequisites for easier usage by new users&lt;/li&gt;
  &lt;li&gt;Cleanup of unused stream metrics&lt;/li&gt;
  &lt;li&gt;Addition of critical stream metrics (e.g. latency, throughput)&lt;/li&gt;
  &lt;li&gt;Addition of examples of stream applications&lt;/li&gt;
  &lt;li&gt;Scripts for network profiling within the Nemo cluster&lt;/li&gt;
  &lt;li&gt;Bump library versions to react to security risks and keep libraries
up-to-date with new features&lt;/li&gt;
  &lt;li&gt;Fix the heisenbug regarding communication through stream pipe channels&lt;/li&gt;
  &lt;li&gt;Upgrading the CI with new platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-398&quot;&gt;NEMO-398&lt;/a&gt;] -         ExecutorRepresenter interface and LambdaExecutorRepresenter 
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-299&quot;&gt;NEMO-299&lt;/a&gt;] -         WindowedWordCountITCase Hangs (Heisenbug)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-460&quot;&gt;NEMO-460&lt;/a&gt;] -         Setting coders in CombinePerKey transformation
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-481&quot;&gt;NEMO-481&lt;/a&gt;] -         Add Stream examples
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-394&quot;&gt;NEMO-394&lt;/a&gt;] -         Exchange data via shared memory when two tasks are in the same Executor in streaming
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-429&quot;&gt;NEMO-429&lt;/a&gt;] -         Dec 2nd, 2019 Code Session
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-434&quot;&gt;NEMO-434&lt;/a&gt;] -         Logical DAG modification for Dynamic sampling of task metrics during the execution of a stage
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-436&quot;&gt;NEMO-436&lt;/a&gt;] -         Dynamic re-configuration based on the Sampled Metric Data
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-422&quot;&gt;NEMO-422&lt;/a&gt;] -         SonarCloud issues
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-440&quot;&gt;NEMO-440&lt;/a&gt;] -         Migrate to Java11 and use Java 11 Features
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-482&quot;&gt;NEMO-482&lt;/a&gt;] -         Keep library versions updated
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Umbrella
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-483&quot;&gt;NEMO-483&lt;/a&gt;] -         Record Metrics associated with stream processing
&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Won Wook SONG</name></author><summary type="html">Release Notes - Apache Nemo - Version 0.4</summary></entry><entry><title type="html">Beam Nemo Runner documents updated!</title><link href="http://nemo.apache.org//blog/2022/08/15/beam-runner/" rel="alternate" type="text/html" title="Beam Nemo Runner documents updated!" /><published>2022-08-15T00:00:00+09:00</published><updated>2022-08-15T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2022/08/15/beam-runner</id><content type="html" xml:base="http://nemo.apache.org//blog/2022/08/15/beam-runner/">&lt;p&gt;Hi!&lt;/p&gt;

&lt;p&gt;We’ve updated our introductions for the &lt;a href=&quot;https://beam.apache.org/documentation/runners/nemo/&quot;&gt;NemoRunner page of the Apache Beam website&lt;/a&gt;. Please check it out!&lt;/p&gt;</content><author><name>Won Wook SONG</name></author><summary type="html">Hi!</summary></entry><entry><title type="html">Nemo Release 0.3</title><link href="http://nemo.apache.org//blog/2022/06/07/release-note-0.3/" rel="alternate" type="text/html" title="Nemo Release 0.3" /><published>2022-06-07T00:00:00+09:00</published><updated>2022-06-07T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2022/06/07/release-note-0.3</id><content type="html" xml:base="http://nemo.apache.org//blog/2022/06/07/release-note-0.3/">&lt;p&gt;Release Notes - Apache Nemo - Version 0.3&lt;/p&gt;

&lt;h3 id=&quot;heres-a-brief-summary-about-the-release&quot;&gt;Here’s a brief summary about the release:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;This release fixes a few high-priority bugs in 0.2 and has a variety of smaller fixes.&lt;/li&gt;
  &lt;li&gt;The release also includes several major functionalities regarding stream processing and dynamic optimizations, such as dynamic task sizing, as well as implementation of the simulation scheduler.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-351&quot;&gt;NEMO-351&lt;/a&gt;] -         Empowering Nemo with fast I/O using Apache Crail
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-377&quot;&gt;NEMO-377&lt;/a&gt;] -         Fix watermark emission when there are no outputs in GBKWindowTransform
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-411&quot;&gt;NEMO-411&lt;/a&gt;] -         Bug in ScheduleGroupPass, OutputTag, DuplicateEdgeGroup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-447&quot;&gt;NEMO-447&lt;/a&gt;] -         Fix beam pom.xml to resolve build failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-457&quot;&gt;NEMO-457&lt;/a&gt;] -         Improve test coverage score on sonar cloud test
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-438&quot;&gt;NEMO-438&lt;/a&gt;] -         Create a Simulator for Simulating an Execution of an Execution Plan
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-324&quot;&gt;NEMO-324&lt;/a&gt;] -         Distinguish Beam&amp;#39;s run and waitUntilFinish methods.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-392&quot;&gt;NEMO-392&lt;/a&gt;] -         Support combine in streaming
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-433&quot;&gt;NEMO-433&lt;/a&gt;] -         Improvement of Task Metrics and Collecting Them For Sampling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-439&quot;&gt;NEMO-439&lt;/a&gt;] -         Upgrade current working version to 0.3-SNAPSHOT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-459&quot;&gt;NEMO-459&lt;/a&gt;] -         Enable Automatic Analysis for sonar cloud
&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Won Wook SONG</name></author><summary type="html">Release Notes - Apache Nemo - Version 0.3</summary></entry><entry><title type="html">Nemo Release 0.2</title><link href="http://nemo.apache.org//blog/2020/03/09/release-note-0.2/" rel="alternate" type="text/html" title="Nemo Release 0.2" /><published>2020-03-09T00:00:00+09:00</published><updated>2020-03-09T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2020/03/09/release-note-0.2</id><content type="html" xml:base="http://nemo.apache.org//blog/2020/03/09/release-note-0.2/">&lt;p&gt;Release Notes - Apache Nemo - Version 0.2&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-350&quot;&gt;NEMO-350&lt;/a&gt;] -         Implement Off-heap SerializedMemoryStore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-385&quot;&gt;NEMO-385&lt;/a&gt;] -         Support Lambda Pass with lambda policy and lambda resource property
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-387&quot;&gt;NEMO-387&lt;/a&gt;] -         Support Lambda scheduler
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-80&quot;&gt;NEMO-80&lt;/a&gt;] -         SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-304&quot;&gt;NEMO-304&lt;/a&gt;] -         Fail-fast for mis-configuration in user application
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-321&quot;&gt;NEMO-321&lt;/a&gt;] -         Fix the data skew pass metric mismatch
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-327&quot;&gt;NEMO-327&lt;/a&gt;] -         Fix skew handling for multi shuffle edge receiver
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-357&quot;&gt;NEMO-357&lt;/a&gt;] -         Fix broken link on README
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-368&quot;&gt;NEMO-368&lt;/a&gt;] -         NEMO-353 breaks the application from running in YARN environments
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-402&quot;&gt;NEMO-402&lt;/a&gt;] -         Broken guava version conflicts cause ERROR: Trying to remove a RunningJob that is unknown 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-413&quot;&gt;NEMO-413&lt;/a&gt;] -         Fix index checking for byte access of MemoryChunk using UNSAFE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-414&quot;&gt;NEMO-414&lt;/a&gt;] -         Command-line specified runtime data plane configurations not applied
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-416&quot;&gt;NEMO-416&lt;/a&gt;] -         Guava vendor version conflict when deserializing Task object
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-420&quot;&gt;NEMO-420&lt;/a&gt;] -         OffHeapMemory configuration only supports a single type of executor
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-335&quot;&gt;NEMO-335&lt;/a&gt;] -         Using a database for recording metric data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-336&quot;&gt;NEMO-336&lt;/a&gt;] -         Cost prediction using the metric data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-359&quot;&gt;NEMO-359&lt;/a&gt;] -         implementation of getEstimatedSizeBytes in SourceVertex
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-360&quot;&gt;NEMO-360&lt;/a&gt;] -         Implementing an &amp;#39;XGBoostPolicy&amp;#39;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-388&quot;&gt;NEMO-388&lt;/a&gt;] -         Off-heap memory management (reuse ByteBuffer)
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-11&quot;&gt;NEMO-11&lt;/a&gt;] -         Generalize Equality of Int Predicates for Loops
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-162&quot;&gt;NEMO-162&lt;/a&gt;] -         Add insertVertex() API in optimization pass
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-182&quot;&gt;NEMO-182&lt;/a&gt;] -         Consider reshaping in run-time optimization
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-253&quot;&gt;NEMO-253&lt;/a&gt;] -         Refactor getInternal(Main/Additional)OutputMap in TaskExecutor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-275&quot;&gt;NEMO-275&lt;/a&gt;] -         Eager Garbage Collection for GroupByKey
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-319&quot;&gt;NEMO-319&lt;/a&gt;] -         Fix path to beam resources in examples in README
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-320&quot;&gt;NEMO-320&lt;/a&gt;] -         Make WebUI scale to big workloads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-323&quot;&gt;NEMO-323&lt;/a&gt;] -         Upgrade current working version to 0.2-SNAPSHOT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-328&quot;&gt;NEMO-328&lt;/a&gt;] -         Refactor IRDAG
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-332&quot;&gt;NEMO-332&lt;/a&gt;] -         Refactor RunTimePass
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-337&quot;&gt;NEMO-337&lt;/a&gt;] -         IRDAG Unit Tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-338&quot;&gt;NEMO-338&lt;/a&gt;] -         SkewSamplingPass
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-340&quot;&gt;NEMO-340&lt;/a&gt;] -         SonarCloud for PRs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-353&quot;&gt;NEMO-353&lt;/a&gt;] -         Launch NEXMark applications
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-356&quot;&gt;NEMO-356&lt;/a&gt;] -         Visualize the name of beam transform in DAG
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-358&quot;&gt;NEMO-358&lt;/a&gt;] -         Recycling vertex ids while cloning a vertex
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-369&quot;&gt;NEMO-369&lt;/a&gt;] -         DirectByteArrayOutputStream usage refactoring
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-390&quot;&gt;NEMO-390&lt;/a&gt;] -         Address SonarCloud issues for the IR package
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-391&quot;&gt;NEMO-391&lt;/a&gt;] -         Set GrpcMessageEnvironment as a default implementation 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-395&quot;&gt;NEMO-395&lt;/a&gt;] -         Address SonarCloud issues for the scheduler package
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-404&quot;&gt;NEMO-404&lt;/a&gt;] -         Provide user argument to use lambda executor representer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-412&quot;&gt;NEMO-412&lt;/a&gt;] -         Address Sonar Cloud issue for MemoryChunk
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-418&quot;&gt;NEMO-418&lt;/a&gt;] -         BlockFetchFailureProperty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-424&quot;&gt;NEMO-424&lt;/a&gt;] -         Fix Sonarcloud bugs regarding Optional
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-428&quot;&gt;NEMO-428&lt;/a&gt;] -         Ignore .factorypath for rat check and version control
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-25&quot;&gt;NEMO-25&lt;/a&gt;] -         Improve WebUI to use RESTful APIs by Nemo
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-153&quot;&gt;NEMO-153&lt;/a&gt;] -         IR-based dynamic optimization for WordCount application
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-154&quot;&gt;NEMO-154&lt;/a&gt;] -         Handle skewness information in SchedulingConstraint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-193&quot;&gt;NEMO-193&lt;/a&gt;] -         Revised version of IR-based dynamic optimization
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-322&quot;&gt;NEMO-322&lt;/a&gt;] -         Committer&amp;#39;s guide
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-361&quot;&gt;NEMO-361&lt;/a&gt;] -         Consistency on indentations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-362&quot;&gt;NEMO-362&lt;/a&gt;] -         Upgrade of checkstyle version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-364&quot;&gt;NEMO-364&lt;/a&gt;] -         Upgrade Beam
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-375&quot;&gt;NEMO-375&lt;/a&gt;] -         Add option to turn off metric collection to DB
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-379&quot;&gt;NEMO-379&lt;/a&gt;] -         Change javadoc goal to a proper one
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-383&quot;&gt;NEMO-383&lt;/a&gt;] -         Implement DirectByteBufferOutputStream for Off-heap SerializedMemoryStore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-384&quot;&gt;NEMO-384&lt;/a&gt;] -         Implement DirectByteBufferInputStream for Off-heap SerializedMemoryStore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-397&quot;&gt;NEMO-397&lt;/a&gt;] -         Separation of JVM heap region and off-heap memory region
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-399&quot;&gt;NEMO-399&lt;/a&gt;] -         Include the official WordCount example on the Beam website
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-400&quot;&gt;NEMO-400&lt;/a&gt;] -         Javadoc compile error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-421&quot;&gt;NEMO-421&lt;/a&gt;] -         Release v0.2
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-437&quot;&gt;NEMO-437&lt;/a&gt;] -         Support Java version 11
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Umbrella
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-60&quot;&gt;NEMO-60&lt;/a&gt;] -         IR-based dynamic optimization
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-156&quot;&gt;NEMO-156&lt;/a&gt;] -         Support Beam Nemo Runner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-157&quot;&gt;NEMO-157&lt;/a&gt;] -         Support Nemo Streaming
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-158&quot;&gt;NEMO-158&lt;/a&gt;] -         Support Spark SQL Example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-159&quot;&gt;NEMO-159&lt;/a&gt;] -         Nemo Web UI
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h3&gt;

&lt;p&gt;Nemo 0.2 was the work of many contributors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Arun Lakshman R&lt;/li&gt;
  &lt;li&gt;Byung-Gon Chun&lt;/li&gt;
  &lt;li&gt;Davor Bonaci&lt;/li&gt;
  &lt;li&gt;Eunji Jeong&lt;/li&gt;
  &lt;li&gt;Geon Woo Kim&lt;/li&gt;
  &lt;li&gt;Gyewon Lee&lt;/li&gt;
  &lt;li&gt;Haeyoon Cho&lt;/li&gt;
  &lt;li&gt;Jae Hyeon Park&lt;/li&gt;
  &lt;li&gt;JangHo Seo&lt;/li&gt;
  &lt;li&gt;Jangho Seo&lt;/li&gt;
  &lt;li&gt;Jean-Baptiste Onofré&lt;/li&gt;
  &lt;li&gt;Jeongyoon Eo&lt;/li&gt;
  &lt;li&gt;John Yang&lt;/li&gt;
  &lt;li&gt;Joo Yeon Kim&lt;/li&gt;
  &lt;li&gt;Kenn Knowles&lt;/li&gt;
  &lt;li&gt;Markus Weimer&lt;/li&gt;
  &lt;li&gt;Minhyeok Kweun&lt;/li&gt;
  &lt;li&gt;Sanha Lee&lt;/li&gt;
  &lt;li&gt;Seonghyun Park&lt;/li&gt;
  &lt;li&gt;Soojeong Kim&lt;/li&gt;
  &lt;li&gt;Taegeon Um&lt;/li&gt;
  &lt;li&gt;Won Wook SONG&lt;/li&gt;
  &lt;li&gt;Wooyeon Lee&lt;/li&gt;
  &lt;li&gt;Yunseong Lee&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Wooyeon Lee</name></author><summary type="html">Release Notes - Apache Nemo - Version 0.2 Sub-task [NEMO-350] - Implement Off-heap SerializedMemoryStore [NEMO-385] - Support Lambda Pass with lambda policy and lambda resource property [NEMO-387] - Support Lambda scheduler Bug [NEMO-80] - SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder” [NEMO-304] - Fail-fast for mis-configuration in user application [NEMO-321] - Fix the data skew pass metric mismatch [NEMO-327] - Fix skew handling for multi shuffle edge receiver [NEMO-357] - Fix broken link on README [NEMO-368] - NEMO-353 breaks the application from running in YARN environments [NEMO-402] - Broken guava version conflicts cause ERROR: Trying to remove a RunningJob that is unknown [NEMO-413] - Fix index checking for byte access of MemoryChunk using UNSAFE [NEMO-414] - Command-line specified runtime data plane configurations not applied [NEMO-416] - Guava vendor version conflict when deserializing Task object [NEMO-420] - OffHeapMemory configuration only supports a single type of executor New Feature [NEMO-335] - Using a database for recording metric data [NEMO-336] - Cost prediction using the metric data [NEMO-359] - implementation of getEstimatedSizeBytes in SourceVertex [NEMO-360] - Implementing an &amp;#39;XGBoostPolicy&amp;#39; [NEMO-388] - Off-heap memory management (reuse ByteBuffer) Improvement [NEMO-11] - Generalize Equality of Int Predicates for Loops [NEMO-162] - Add insertVertex() API in optimization pass [NEMO-182] - Consider reshaping in run-time optimization [NEMO-253] - Refactor getInternal(Main/Additional)OutputMap in TaskExecutor [NEMO-275] - Eager Garbage Collection for GroupByKey [NEMO-319] - Fix path to beam resources in examples in README [NEMO-320] - Make WebUI scale to big workloads [NEMO-323] - Upgrade current working version to 0.2-SNAPSHOT [NEMO-328] - Refactor IRDAG [NEMO-332] - Refactor RunTimePass [NEMO-337] - IRDAG Unit Tests [NEMO-338] - SkewSamplingPass [NEMO-340] - SonarCloud for PRs [NEMO-353] - Launch NEXMark applications [NEMO-356] - Visualize the name of beam transform in DAG [NEMO-358] - Recycling vertex ids while cloning a vertex [NEMO-369] - DirectByteArrayOutputStream usage refactoring [NEMO-390] - Address SonarCloud issues for the IR package [NEMO-391] - Set GrpcMessageEnvironment as a default implementation [NEMO-395] - Address SonarCloud issues for the scheduler package [NEMO-404] - Provide user argument to use lambda executor representer [NEMO-412] - Address Sonar Cloud issue for MemoryChunk [NEMO-418] - BlockFetchFailureProperty [NEMO-424] - Fix Sonarcloud bugs regarding Optional [NEMO-428] - Ignore .factorypath for rat check and version control Task [NEMO-25] - Improve WebUI to use RESTful APIs by Nemo [NEMO-153] - IR-based dynamic optimization for WordCount application [NEMO-154] - Handle skewness information in SchedulingConstraint [NEMO-193] - Revised version of IR-based dynamic optimization [NEMO-322] - Committer&amp;#39;s guide [NEMO-361] - Consistency on indentations [NEMO-362] - Upgrade of checkstyle version [NEMO-364] - Upgrade Beam [NEMO-375] - Add option to turn off metric collection to DB [NEMO-379] - Change javadoc goal to a proper one [NEMO-383] - Implement DirectByteBufferOutputStream for Off-heap SerializedMemoryStore [NEMO-384] - Implement DirectByteBufferInputStream for Off-heap SerializedMemoryStore [NEMO-397] - Separation of JVM heap region and off-heap memory region [NEMO-399] - Include the official WordCount example on the Beam website [NEMO-400] - Javadoc compile error [NEMO-421] - Release v0.2 [NEMO-437] - Support Java version 11 Umbrella [NEMO-60] - IR-based dynamic optimization [NEMO-156] - Support Beam Nemo Runner [NEMO-157] - Support Nemo Streaming [NEMO-158] - Support Spark SQL Example [NEMO-159] - Nemo Web UI</summary></entry><entry><title type="html">Nemo Release 0.1</title><link href="http://nemo.apache.org//blog/2019/03/02/release-note-0.1/" rel="alternate" type="text/html" title="Nemo Release 0.1" /><published>2019-03-02T00:00:00+09:00</published><updated>2019-03-02T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2019/03/02/release-note-0.1</id><content type="html" xml:base="http://nemo.apache.org//blog/2019/03/02/release-note-0.1/">&lt;p&gt;Nemo 0.1 is an initial release that brings several features and performance enhancements. The most visible features are the frontend support for Beam and Spark; IR DAG with execution properties, optimization passes and policies; support for loops;  dynamic optimization; multiple-job submission; and various features of the execution runtime.&lt;/p&gt;

&lt;p&gt;You can view the JIRA-generated release note here: &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12321823&amp;amp;version=12344546&quot;&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;new-features--improvements&quot;&gt;New features / Improvements&lt;/h2&gt;

&lt;h3 id=&quot;frontend&quot;&gt;Frontend&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Beam&lt;/li&gt;
  &lt;li&gt;Official Support for Apache Beam (details available on &lt;a href=&quot;https://beam.apache.org/documentation/runners/nemo/&quot;&gt;this link&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Spark&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;core&quot;&gt;Core&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Intermediate representation (IR)&lt;/li&gt;
  &lt;li&gt;Execution properties&lt;/li&gt;
  &lt;li&gt;Passes&lt;/li&gt;
  &lt;li&gt;Policies&lt;/li&gt;
  &lt;li&gt;Loop support&lt;/li&gt;
  &lt;li&gt;Dynamic optimization&lt;/li&gt;
  &lt;li&gt;Multi-job submission&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;execution-runtime&quot;&gt;Execution Runtime&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Data stores&lt;/li&gt;
  &lt;li&gt;Metadata management&lt;/li&gt;
  &lt;li&gt;Inter-executor data transfer (memory/disk/gluserFS for batch, pipe for streaming)&lt;/li&gt;
  &lt;li&gt;Data encoding &amp;amp; decoding&lt;/li&gt;
  &lt;li&gt;Partitioners&lt;/li&gt;
  &lt;li&gt;Block manager&lt;/li&gt;
  &lt;li&gt;Data location based scheduling&lt;/li&gt;
  &lt;li&gt;Scheduler (based on different execution properties including locality, site, anti affinity, …)&lt;/li&gt;
  &lt;li&gt;RDD caching&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;other-components&quot;&gt;Other Components&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Web UI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;known-issues&quot;&gt;Known Issues&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/NEMO-302?jql=project%20%3D%20NEMO%20AND%20issuetype%20%3D%20Bug%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20key%20DESC&quot;&gt;A number of bugs &amp;amp; unresolved JIRA issues&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;miscellaneous-fixes&quot;&gt;Miscellaneous Fixes&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Website&lt;/li&gt;
  &lt;li&gt;CI integration&lt;/li&gt;
  &lt;li&gt;Documentations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;maven-artifacts&quot;&gt;Maven Artifacts&lt;/h2&gt;
&lt;p&gt;Nemo is available in Maven Central, making it easier to link into your programs without having to build as a JAR. Use the following Maven identifiers to add it to a project:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;lt;!-- https://mvnrepository.com/artifact/org.apache.nemo/nemo-project --&amp;gt;
&amp;lt;dependency&amp;gt;
   &amp;lt;groupId&amp;gt;org.apache.nemo&amp;lt;/groupId&amp;gt;
   &amp;lt;artifactId&amp;gt;nemo-project&amp;lt;/artifactId&amp;gt;
   &amp;lt;version&amp;gt;0.1&amp;lt;/version&amp;gt;
   &amp;lt;type&amp;gt;pom&amp;lt;/type&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h3&gt;

&lt;p&gt;Nemo 0.1 was the work of many contributors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Arun Lakshman R&lt;/li&gt;
  &lt;li&gt;Byung-Gon Chun&lt;/li&gt;
  &lt;li&gt;Davor Bonaci&lt;/li&gt;
  &lt;li&gt;Eunji Jeong&lt;/li&gt;
  &lt;li&gt;Geon Woo Kim&lt;/li&gt;
  &lt;li&gt;Gyewon Lee&lt;/li&gt;
  &lt;li&gt;Jae Hyeon Park&lt;/li&gt;
  &lt;li&gt;JangHo Seo&lt;/li&gt;
  &lt;li&gt;Jangho Seo&lt;/li&gt;
  &lt;li&gt;Jean-Baptiste Onofré&lt;/li&gt;
  &lt;li&gt;Jeongyoon Eo&lt;/li&gt;
  &lt;li&gt;John Yang&lt;/li&gt;
  &lt;li&gt;Joo Yeon Kim&lt;/li&gt;
  &lt;li&gt;Kenn Knowles&lt;/li&gt;
  &lt;li&gt;Markus Weimer&lt;/li&gt;
  &lt;li&gt;Minhyeok Kweun&lt;/li&gt;
  &lt;li&gt;Sanha Lee&lt;/li&gt;
  &lt;li&gt;Seonghyun Park&lt;/li&gt;
  &lt;li&gt;Soojeong Kim&lt;/li&gt;
  &lt;li&gt;Taegeon Um&lt;/li&gt;
  &lt;li&gt;Won Wook SONG&lt;/li&gt;
  &lt;li&gt;Wooyeon Lee&lt;/li&gt;
  &lt;li&gt;Yunseong Lee&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Won Wook SONG</name></author><summary type="html">Nemo 0.1 is an initial release that brings several features and performance enhancements. The most visible features are the frontend support for Beam and Spark; IR DAG with execution properties, optimization passes and policies; support for loops; dynamic optimization; multiple-job submission; and various features of the execution runtime.</summary></entry><entry><title type="html">Harnessing transient resources using Nemo</title><link href="http://nemo.apache.org//blog/2018/03/23/pado-on-nemo/" rel="alternate" type="text/html" title="Harnessing transient resources using Nemo" /><published>2018-03-23T00:00:00+09:00</published><updated>2018-03-23T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2018/03/23/pado-on-nemo</id><content type="html" xml:base="http://nemo.apache.org//blog/2018/03/23/pado-on-nemo/">&lt;p&gt;To increase datacenter utilization, data processing jobs are increasingly being deployed on transient resources temporarily borrowed from latency-critical jobs. However, these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations that substantially degrade job performance.&lt;/p&gt;

&lt;p&gt;Pado[1] is an optimization technique that reduces the number of recomputations triggered by eviction of transient resources. Specifically, Pado uses a placement algorithm for selectively retaining intermediate results on reserved resources that are not evicted.&lt;/p&gt;

&lt;p&gt;Nemo provides an optimization policy interface that makes it easy for users to employ techniques like Pado to improve application performance. To demonstrate the flexibility of Nemo, we have developed and evaluated PadoPolicy. We summarize preliminary evaluation results as follows.&lt;/p&gt;

&lt;h3 id=&quot;experimentation-setup&quot;&gt;Experimentation setup&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Systems: Spark 2.2.0, Nemo with PadoPolicy&lt;/li&gt;
  &lt;li&gt;Resources:
    &lt;ul&gt;
      &lt;li&gt;m4.2xlarge AWS EC2 instances (8 vCPU, 32GB memory)&lt;/li&gt;
      &lt;li&gt;35 transient nodes, and 5 reserved nodes
        &lt;ul&gt;
          &lt;li&gt;5-minute mean poisson eviction rate for each transient node&lt;/li&gt;
          &lt;li&gt;An evicted transient node is immediately replaced with a new transient node&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Dataset: 10GB Yahoo! music ratings dataset[2]&lt;/li&gt;
  &lt;li&gt;Application: A machine learning recommendation algorithm - Alternating least squares (ALS)
    &lt;ul&gt;
      &lt;li&gt;Spark MLlib ALS implementation for Spark, and our ALS implementation written in Beam for Nemo&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;job-completion-time-jct&quot;&gt;Job completion time (JCT)&lt;/h3&gt;

&lt;p&gt;Spark was not able to complete the job even after running for &lt;strong&gt;120 minutes&lt;/strong&gt;, because it was stuck repeatedly recomputing intermediate results that are lost due to eviction. In contrast, Nemo completed in &lt;strong&gt;18 minutes&lt;/strong&gt; by selectively retaining intermediate results on reserved nodes using the Pado technique. This &lt;strong&gt;6.7X speedup&lt;/strong&gt; validates the flexibility of Nemo.&lt;/p&gt;

&lt;p&gt;[1] Youngseok Yang, Geon-Woo Kim, Won Wook Song, Yunseong Lee, Andrew Chung, Zhengping Qian, Brian Cho, and Byung-Gon Chun. 2017. Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ‘17).&lt;/p&gt;

&lt;p&gt;[2] Yahoo! Music User Ratings of Songs with Artist, Album, and Genre Meta Information, v. 1.0. https://webscope. sandbox.yahoo.com/catalog.php?datatype=r.&lt;/p&gt;</content><author><name>John Yang</name></author><summary type="html">To increase datacenter utilization, data processing jobs are increasingly being deployed on transient resources temporarily borrowed from latency-critical jobs. However, these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations that substantially degrade job performance.</summary></entry><entry><title type="html">Optimizing shuffle performance using Nemo</title><link href="http://nemo.apache.org//blog/2018/03/23/shuffle-on-nemo/" rel="alternate" type="text/html" title="Optimizing shuffle performance using Nemo" /><published>2018-03-23T00:00:00+09:00</published><updated>2018-03-23T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2018/03/23/shuffle-on-nemo</id><content type="html" xml:base="http://nemo.apache.org//blog/2018/03/23/shuffle-on-nemo/">&lt;p&gt;Data shuffle is a key operation that underlies almost all large-scale data processing jobs. A shuffle operation typically involves writing intermediate data to disk, and reading the data back later when the successive computations are scheduled.&lt;/p&gt;

&lt;p&gt;Sailfish[1] is an optimization technique that reduces disk overheads associated with a shuffle operation. Specifically, Sailfish minimizes the number of disk seeks involved in reading intermediate data back from disk. Jobs that handle large volumes of data can especially benefit from the Sailfish technique.&lt;/p&gt;

&lt;p&gt;Nemo provides an optimization policy interface that makes it easy for users to employ techniques like Sailfish to improve application performance. To demonstrate the flexibility of Nemo, we have developed and evaluated SailfishPolicy. We summarize preliminary evaluation results as follows.&lt;/p&gt;

&lt;h3 id=&quot;experimentation-setup&quot;&gt;Experimentation setup&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Systems: Spark[2] 2.3.0 (a state-of-the-art system), and Nemo with SailfishPolicy&lt;/li&gt;
  &lt;li&gt;Resources: 20 h1.4xlarge (16 vCPU, 64GB memory, 2 HDDs) AWS instances
    &lt;ul&gt;
      &lt;li&gt;One of the disk is used by a HDFS cluster, and the other is used as a scratch disk by Nemo and Spark for maintaining intermediate data&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Dataset: 2TB Wikipedia pageview statistics[3] stored in the HDFS cluster&lt;/li&gt;
  &lt;li&gt;Application: A MapReduce application that reads input data from HDFS, computes the sum of pageview counts per Wikipedia project, and writes the results to HDFS
    &lt;ul&gt;
      &lt;li&gt;Spark’s app is written in Spark DSL, and Nemo’s app is written in Beam&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;job-completion-time-jct&quot;&gt;Job completion time (JCT)&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://user-images.githubusercontent.com/6691311/37783061-d7c62970-2e37-11e8-89d5-9ef3da8fd846.png&quot; alt=&quot;Figure 1&quot; /&gt;&lt;/p&gt;
&lt;center&gt;Figure 1&lt;/center&gt;

&lt;p&gt;As shown in Figure 1, Nemo outperforms Spark by 2.26X primarily because Nemo’s reduce stage completes faster than Spark’s.&lt;/p&gt;

&lt;h3 id=&quot;mean-disk-throughput-mbs&quot;&gt;Mean disk throughput (MB/s)&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://user-images.githubusercontent.com/6691311/37783098-f17b55d4-2e37-11e8-9cf3-bf082562c1e6.png&quot; alt=&quot;Figure 2&quot; /&gt;&lt;/p&gt;
&lt;center&gt;Figure 2&lt;/center&gt;

&lt;p&gt;To understand the performance difference, we’ve measured the mean throughput of the scratch disks that Nemo and Spark use for handling intermediate data. As depicted in Figure 2, Nemo’s reduce stage enjoys much higher disk read throughput with a smaller number of disk seeks. This explains why Nemo’s reduce stage was able to complete more quickly, and validates the effectiveness of SailfishPolicy.&lt;/p&gt;

&lt;p&gt;[1] Sriram Rao, Raghu Ramakrishnan, Adam Silberstein, Mike Ovsiannikov, and Damian Reeves. 2012. Sailfish: a framework for large scale data processing. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC ‘12).&lt;/p&gt;

&lt;p&gt;[2] Apache Spark. https://spark.apache.org/.&lt;/p&gt;

&lt;p&gt;[3] Wikipedia pageview statistics. https://dumps.wikimedia.org/other/pagecounts-raw/.&lt;/p&gt;</content><author><name>John Yang</name></author><summary type="html">Data shuffle is a key operation that underlies almost all large-scale data processing jobs. A shuffle operation typically involves writing intermediate data to disk, and reading the data back later when the successive computations are scheduled.</summary></entry><entry><title type="html">Nemo blog published!</title><link href="http://nemo.apache.org//blog/2018/03/20/nemo-blog-published/" rel="alternate" type="text/html" title="Nemo blog published!" /><published>2018-03-20T00:00:00+09:00</published><updated>2018-03-20T00:00:00+09:00</updated><id>http://nemo.apache.org//blog/2018/03/20/nemo-blog-published</id><content type="html" xml:base="http://nemo.apache.org//blog/2018/03/20/nemo-blog-published/">&lt;p&gt;Our blog is published and is online! We’ll be posting exciting news related to our project on our blog.&lt;/p&gt;

&lt;p&gt;Your contribution is welcome!&lt;/p&gt;</content><author><name>Won Wook SONG</name></author><summary type="html">Our blog is published and is online! We’ll be posting exciting news related to our project on our blog.</summary></entry></feed>