blob: 75ebcbeb9cabf508484d7ba9dd03a1c879e0ca12 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Locality Awareness In LoadAwareShuffleGrouping</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-5">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-5">
<h1>Version: 2.3.0</h1>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/releases/2.3.0/index.html">2.3.0</a></li>
<li><a href="/releases/2.2.0/index.html">2.2.0</a></li>
<li><a href="/releases/2.1.0/index.html">2.1.0</a></li>
<li><a href="/releases/2.0.0/index.html">2.0.0</a></li>
<li><a href="/releases/1.2.3/index.html">1.2.3</a></li>
</ul>
</li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2021/09/27/storm230-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Locality Awareness In LoadAwareShuffleGrouping</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<div class="documentation-content"><h1 id="locality-awareness-in-loadawareshufflegrouping">Locality Awareness In LoadAwareShuffleGrouping</h1>
<h3 id="motivation">Motivation</h3>
<p>Apache Storm has introduced locality awareness to LoadAwareShuffleGrouping based on Bang-Bang control theory.
It aims to keep traffic to closer downstream executors to avoid network latency when those executors are not under heavy load.
It can also avoid serialization/deserialization overhead if the traffic happens in the same worker. </p>
<h3 id="how-it-works">How it works</h3>
<p>An executor (say <code>E</code>) which has LoadAwareShuffleGrouping to downstream executors views them in four <code>scopes</code> based on their locations relative to the executor <code>E</code> itself.
The four scopes are:</p>
<ul>
<li><code>WORKER_LOCAL</code>: every downstream executor located on the same worker as this executor <code>E</code></li>
<li><code>HOST_LOCAL</code>: every downstream executor located on the same host as this executor <code>E</code></li>
<li><code>RACK_LOCAL</code>: every downstream executor located on the same rack as this executor <code>E</code></li>
<li><code>EVERYTHING</code>: every downstream executor of the executor <code>E</code></li>
</ul>
<p>It starts with sending tuples to the downstream executors in the scope of <code>WORKER_LOCAL</code>.
The downstream executors in the scope are chosen based on their load. Executors with lower load are more likely to be chosen.
Once the average load of these <code>WORKER_LOCAL</code> executors reaches <code>topology.localityaware.higher.bound</code>,
it switches to the higher scope which is <code>HOST_LOCAL</code> and starts sending tuples in that scope.
And if the average load is still higher than the <code>higher bound</code>, it switches to a higher scope.</p>
<p>On the other hand, it switches to a lower scope if the average load of the lower scope is less than <code>topology.localityaware.lower.bound</code>. </p>
<h3 id="how-is-load-calculated">How is Load calculated</h3>
<p>The load of an downstream executor is the maximum of the following two:</p>
<ul>
<li>The population percentage of the receive queue</li>
<li>Math.min(pendingMessages, 1024) / 1024. </li>
</ul>
<p><code>pendingMessages</code>: The upstream executor <code>E</code> sends messages to the downstream executor through Netty and the <code>pendingMessages</code> is the number of messages that haven&#39;t got through to the server.</p>
<p>If the downstream executor located on the same worker as the executor <code>E</code>, the load of that downstream executor is:
* The population percentage of the receive queue</p>
<h3 id="relationship-between-load-and-capacity">Relationship between Load and Capacity</h3>
<p>The capacity of a bolt executor on Storm UI is calculated as:
* (number executed * average execute latency) / measurement time</p>
<p>It basically means how busy this executor is. If this is around 1.0, the corresponding Bolt is running as fast as it can. A <code>__capacity</code> metric exists to track this value for each executor.</p>
<p>The <code>Capacity</code> is not related to the <code>Load</code>:</p>
<ul>
<li>If the <code>Load</code> of the executor <code>E1</code> is high,
<ul>
<li>the <code>Capacity</code> of <code>E1</code> could be high: population of the receive queue of <code>E1</code> could be high and it means the executor <code>E</code> has more work to do.</li>
<li>the <code>Capacity</code> could also be low: <code>pendingMessage</code> could be high because other executors share the netty connection between the two workers and they are sending too many messages.
But the actual population of the receive queue of <code>E1</code> might be low.</li>
</ul></li>
<li>If the <code>Load</code> is low,
<ul>
<li>the <code>Capacity</code> could be low: lower <code>Load</code> means less work to do. </li>
<li>the <code>Capacity</code> could also be high: because the executor could be receiving tuples and executing tuples at the similar average rate.</li>
</ul></li>
<li>If the <code>Capacity</code> is high,
<ul>
<li>the <code>Load</code> could be high: high <code>Capacity</code> means the executor is busy. It could be because it&#39;s receiving too many tuples.</li>
<li>the <code>Load</code> could also be low: because the executor could be receiving tuples and executing tuples at the similar average rate.</li>
</ul></li>
<li>If the <code>Capacity</code> is low,
<ul>
<li>the <code>Load</code> could be low: if the <code>pendingMessage</code> is low</li>
<li>the <code>Load</code> could also be high: because the <code>pendingMessage</code> might be very high.</li>
</ul></li>
</ul>
<h3 id="troubleshooting">Troubleshooting</h3>
<h4 id="i-am-seeing-high-capacity-close-to-1-0-on-some-executors-and-low-capacity-close-to-0-on-other-executors">I am seeing high capacity (close to 1.0) on some executors and low capacity (close to 0) on other executors</h4>
<ol>
<li><p>It could mean that you could reduce parallelism. Your executors are able to keep up and the load never gets to a very high point.</p></li>
<li><p>You can try to adjust <code>topology.localityaware.higher.bound</code> and <code>topology.localityaware.lower.bound</code></p></li>
<li><p>You can try to enable <code>topology.ras.order.executors.by.proximity.needs</code>. With this config, unassigned executors will be sorted by topological order
with network proximity needs before being scheduled. This is a best-effort to split the topology to slices and allocate executors in each slice to as closest physical location as possible.</p></li>
</ol>
<h4 id="i-just-want-the-capacity-on-every-downstream-executor-to-be-even">I just want the capacity on every downstream executor to be even</h4>
<p>You can turn off LoadAwareShuffleGrouping by setting <code>topology.disable.loadaware.messaging</code> to <code>true</code>.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Apache Storm</h5>
<p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/releases/current/Rationale.html">Rationale</a></li>
<li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
<li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/releases/current/index.html">Index</a></li>
<li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
<li><a href="/releases/current/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>