| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| |
| <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> |
| <link rel="icon" href="/favicon.ico" type="image/x-icon"> |
| |
| <title>Locality Awareness In LoadAwareShuffleGrouping</title> |
| |
| <!-- Bootstrap core CSS --> |
| <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> |
| <!-- Bootstrap theme --> |
| <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> |
| |
| <!-- Custom styles for this template --> |
| <link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css"> |
| <link href="/css/style.css" rel="stylesheet"> |
| <link href="/assets/css/owl.theme.css" rel="stylesheet"> |
| <link href="/assets/css/owl.carousel.css" rel="stylesheet"> |
| <script type="text/javascript" src="/assets/js/jquery.min.js"></script> |
| <script type="text/javascript" src="/assets/js/bootstrap.min.js"></script> |
| <script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script> |
| <script type="text/javascript" src="/assets/js/storm.js"></script> |
| <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> |
| <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| |
| |
| <body> |
| <header> |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-5"> |
| <a href="/index.html"><img src="/images/logo.png" class="logo" /></a> |
| </div> |
| <div class="col-md-5"> |
| |
| <h1>Version: 2.3.0</h1> |
| |
| </div> |
| <div class="col-md-2"> |
| <a href="/downloads.html" class="btn-std btn-block btn-download">Download</a> |
| </div> |
| </div> |
| </div> |
| </header> |
| <!--Header End--> |
| <!--Navigation Begin--> |
| <div class="navbar" role="banner"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse"> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| </div> |
| <nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation"> |
| <ul class="nav navbar-nav"> |
| <li><a href="/index.html" id="home">Home</a></li> |
| <li><a href="/getting-help.html" id="getting-help">Getting Help</a></li> |
| <li><a href="/about/integrates.html" id="project-info">Project Information</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| |
| |
| <li><a href="/releases/2.4.0/index.html">2.4.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.3.0/index.html">2.3.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.2.1/index.html">2.2.1</a></li> |
| |
| |
| |
| <li><a href="/releases/2.2.0/index.html">2.2.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.1.1/index.html">2.1.1</a></li> |
| |
| |
| |
| <li><a href="/releases/2.1.0/index.html">2.1.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.0.0/index.html">2.0.0</a></li> |
| |
| |
| |
| <li><a href="/releases/1.2.4/index.html">1.2.4</a></li> |
| |
| |
| |
| <li><a href="/releases/1.2.3/index.html">1.2.3</a></li> |
| |
| |
| </ul> |
| </li> |
| <li><a href="/talksAndVideos.html">Talks and Slideshows</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li> |
| <li><a href="/contribute/People.html">People</a></li> |
| <li><a href="/contribute/BYLAWS.html">ByLaws</a></li> |
| <li><a href="/Powered-By.html">PoweredBy</a></li> |
| </ul> |
| </li> |
| <li><a href="/2022/03/25/storm240-released.html" id="news">News</a></li> |
| </ul> |
| </nav> |
| </div> |
| </div> |
| |
| |
| |
| <div class="container-fluid"> |
| <h1 class="page-title">Locality Awareness In LoadAwareShuffleGrouping</h1> |
| <div class="row"> |
| <div class="col-md-12"> |
| <!-- Documentation --> |
| |
| <p class="post-meta"></p> |
| |
| <div class="documentation-content"><h1 id="locality-awareness-in-loadawareshufflegrouping">Locality Awareness In LoadAwareShuffleGrouping</h1> |
| |
| <h3 id="motivation">Motivation</h3> |
| |
| <p>Apache Storm has introduced locality awareness to LoadAwareShuffleGrouping based on Bang-Bang control theory. |
| It aims to keep traffic to closer downstream executors to avoid network latency when those executors are not under heavy load. |
| It can also avoid serialization/deserialization overhead if the traffic happens in the same worker. </p> |
| |
| <h3 id="how-it-works">How it works</h3> |
| |
| <p>An executor (say <code>E</code>) which has LoadAwareShuffleGrouping to downstream executors views them in four <code>scopes</code> based on their locations relative to the executor <code>E</code> itself. |
| The four scopes are:</p> |
| |
| <ul> |
| <li><code>WORKER_LOCAL</code>: every downstream executor located on the same worker as this executor <code>E</code></li> |
| <li><code>HOST_LOCAL</code>: every downstream executor located on the same host as this executor <code>E</code></li> |
| <li><code>RACK_LOCAL</code>: every downstream executor located on the same rack as this executor <code>E</code></li> |
| <li><code>EVERYTHING</code>: every downstream executor of the executor <code>E</code></li> |
| </ul> |
| |
| <p>It starts with sending tuples to the downstream executors in the scope of <code>WORKER_LOCAL</code>. |
| The downstream executors in the scope are chosen based on their load. Executors with lower load are more likely to be chosen. |
| Once the average load of these <code>WORKER_LOCAL</code> executors reaches <code>topology.localityaware.higher.bound</code>, |
| it switches to the higher scope which is <code>HOST_LOCAL</code> and starts sending tuples in that scope. |
| And if the average load is still higher than the <code>higher bound</code>, it switches to a higher scope.</p> |
| |
| <p>On the other hand, it switches to a lower scope if the average load of the lower scope is less than <code>topology.localityaware.lower.bound</code>. </p> |
| |
| <h3 id="how-is-load-calculated">How is Load calculated</h3> |
| |
| <p>The load of an downstream executor is the maximum of the following two:</p> |
| |
| <ul> |
| <li>The population percentage of the receive queue</li> |
| <li>Math.min(pendingMessages, 1024) / 1024. </li> |
| </ul> |
| |
| <p><code>pendingMessages</code>: The upstream executor <code>E</code> sends messages to the downstream executor through Netty and the <code>pendingMessages</code> is the number of messages that haven't got through to the server.</p> |
| |
| <p>If the downstream executor located on the same worker as the executor <code>E</code>, the load of that downstream executor is: |
| * The population percentage of the receive queue</p> |
| |
| <h3 id="relationship-between-load-and-capacity">Relationship between Load and Capacity</h3> |
| |
| <p>The capacity of a bolt executor on Storm UI is calculated as: |
| * (number executed * average execute latency) / measurement time</p> |
| |
| <p>It basically means how busy this executor is. If this is around 1.0, the corresponding Bolt is running as fast as it can. A <code>__capacity</code> metric exists to track this value for each executor.</p> |
| |
| <p>The <code>Capacity</code> is not related to the <code>Load</code>:</p> |
| |
| <ul> |
| <li>If the <code>Load</code> of the executor <code>E1</code> is high, |
| |
| <ul> |
| <li>the <code>Capacity</code> of <code>E1</code> could be high: population of the receive queue of <code>E1</code> could be high and it means the executor <code>E</code> has more work to do.</li> |
| <li>the <code>Capacity</code> could also be low: <code>pendingMessage</code> could be high because other executors share the netty connection between the two workers and they are sending too many messages. |
| But the actual population of the receive queue of <code>E1</code> might be low.</li> |
| </ul></li> |
| <li>If the <code>Load</code> is low, |
| |
| <ul> |
| <li>the <code>Capacity</code> could be low: lower <code>Load</code> means less work to do. </li> |
| <li>the <code>Capacity</code> could also be high: because the executor could be receiving tuples and executing tuples at the similar average rate.</li> |
| </ul></li> |
| <li>If the <code>Capacity</code> is high, |
| |
| <ul> |
| <li>the <code>Load</code> could be high: high <code>Capacity</code> means the executor is busy. It could be because it's receiving too many tuples.</li> |
| <li>the <code>Load</code> could also be low: because the executor could be receiving tuples and executing tuples at the similar average rate.</li> |
| </ul></li> |
| <li>If the <code>Capacity</code> is low, |
| |
| <ul> |
| <li>the <code>Load</code> could be low: if the <code>pendingMessage</code> is low</li> |
| <li>the <code>Load</code> could also be high: because the <code>pendingMessage</code> might be very high.</li> |
| </ul></li> |
| </ul> |
| |
| <h3 id="troubleshooting">Troubleshooting</h3> |
| |
| <h4 id="i-am-seeing-high-capacity-close-to-1-0-on-some-executors-and-low-capacity-close-to-0-on-other-executors">I am seeing high capacity (close to 1.0) on some executors and low capacity (close to 0) on other executors</h4> |
| |
| <ol> |
| <li><p>It could mean that you could reduce parallelism. Your executors are able to keep up and the load never gets to a very high point.</p></li> |
| <li><p>You can try to adjust <code>topology.localityaware.higher.bound</code> and <code>topology.localityaware.lower.bound</code></p></li> |
| <li><p>You can try to enable <code>topology.ras.order.executors.by.proximity.needs</code>. With this config, unassigned executors will be sorted by topological order |
| with network proximity needs before being scheduled. This is a best-effort to split the topology to slices and allocate executors in each slice to as closest physical location as possible.</p></li> |
| </ol> |
| |
| <h4 id="i-just-want-the-capacity-on-every-downstream-executor-to-be-even">I just want the capacity on every downstream executor to be even</h4> |
| |
| <p>You can turn off LoadAwareShuffleGrouping by setting <code>topology.disable.loadaware.messaging</code> to <code>true</code>.</p> |
| </div> |
| |
| |
| </div> |
| </div> |
| </div> |
| <footer> |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>Meetups</h5> |
| <ul class="latest-news"> |
| |
| <li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li> |
| |
| <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li> |
| |
| <li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li> |
| |
| <li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li> |
| |
| <li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li> |
| |
| <li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li> |
| |
| <!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> --> |
| </ul> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>About Apache Storm</h5> |
| <p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>First Look</h5> |
| <ul class="footer-list"> |
| <li><a href="/releases/current/Rationale.html">Rationale</a></li> |
| <li><a href="/releases/current/Tutorial.html">Tutorial</a></li> |
| <li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li> |
| <li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>Documentation</h5> |
| <ul class="footer-list"> |
| <li><a href="/releases/current/index.html">Index</a></li> |
| <li><a href="/releases/current/javadocs/index.html">Javadoc</a></li> |
| <li><a href="/releases/current/FAQ.html">FAQ</a></li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <hr/> |
| <div class="row"> |
| <div class="col-md-12"> |
| <p align="center">Copyright © 2022 <a href="https://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. |
| <br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation. |
| <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> |
| </div> |
| </div> |
| </div> |
| </footer> |
| <!--Footer End--> |
| <!-- Scroll to top --> |
| <span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span> |
| |
| </body> |
| |
| </html> |
| |