| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <title>Apache Mesos - Oversubscription</title> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <meta property="og:locale" content="en_US"/> |
| <meta property="og:type" content="website"/> |
| <meta property="og:title" content="Apache Mesos"/> |
| <meta property="og:site_name" content="Apache Mesos"/> |
| <meta property="og:url" content="http://mesos.apache.org/"/> |
| <meta property="og:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/> |
| <meta property="og:description" |
| content="Apache Mesos abstracts resources away from machines, |
| enabling fault-tolerant and elastic distributed systems |
| to easily be built and run effectively."/> |
| |
| <meta name="twitter:card" content="summary"/> |
| <meta name="twitter:site" content="@ApacheMesos"/> |
| <meta name="twitter:title" content="Apache Mesos"/> |
| <meta name="twitter:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/> |
| <meta name="twitter:description" |
| content="Apache Mesos abstracts resources away from machines, |
| enabling fault-tolerant and elastic distributed systems |
| to easily be built and run effectively."/> |
| |
| <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet"> |
| <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml"> |
| <link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" /> |
| |
| |
| |
| <!-- Google Analytics Magic --> |
| <script type="text/javascript"> |
| var _gaq = _gaq || []; |
| _gaq.push(['_setAccount', 'UA-20226872-1']); |
| _gaq.push(['_setDomainName', 'apache.org']); |
| _gaq.push(['_trackPageview']); |
| |
| (function() { |
| var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; |
| ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; |
| var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); |
| })(); |
| </script> |
| |
| </head> |
| <body> |
| <!-- magical breadcrumbs --> |
| <div class="topnav"> |
| <div class="container"> |
| <ul class="breadcrumb"> |
| <li> |
| <div class="dropdown"> |
| <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel"> |
| <li><a href="http://www.apache.org">Apache Homepage</a></li> |
| <li><a href="http://www.apache.org/licenses/">License</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li><a href="http://www.apache.org/security/">Security</a></li> |
| </ul> |
| </div> |
| </li> |
| |
| <li><a href="http://mesos.apache.org">Apache Mesos</a></li> |
| |
| |
| <li><a href="/documentation |
| /">Documentation |
| </a></li> |
| |
| |
| </ul><!-- /.breadcrumb --> |
| </div><!-- /.container --> |
| </div><!-- /.topnav --> |
| |
| <!-- navbar excitement --> |
| <div class="navbar navbar-default navbar-static-top" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo"/></a> |
| </div><!-- /.navbar-header --> |
| |
| <div class="navbar-collapse collapse" id="mesos-menu"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li><a href="/gettingstarted/">Getting Started</a></li> |
| <li><a href="/blog/">Blog</a></li> |
| <li><a href="/documentation/latest/">Documentation</a></li> |
| <li><a href="/downloads/">Downloads</a></li> |
| <li><a href="/community/">Community</a></li> |
| </ul> |
| </div><!-- /#mesos-menu --> |
| </div><!-- /.container --> |
| </div><!-- /.navbar --> |
| |
| <div class="content"> |
| <div class="container"> |
| <div class="row-fluid"> |
| <div class="col-md-4"> |
| <h4>If you're new to Mesos</h4> |
| <p>See the <a href="/gettingstarted/">getting started</a> page for more |
| information about downloading, building, and deploying Mesos.</p> |
| |
| <h4>If you'd like to get involved or you're looking for support</h4> |
| <p>See our <a href="/community/">community</a> page for more details.</p> |
| </div> |
| <div class="col-md-8"> |
| <h1>Oversubscription</h1> |
| |
| <p>High-priority user-facing services are typically provisioned on large clusters |
| for peak load and unexpected load spikes. Hence, for most of time, the |
| provisioned resources remain underutilized. Oversubscription takes advantage of |
| temporarily unused resources to execute best-effort tasks such as background |
| analytics, video/image processing, chip simulations, and other low priority |
| jobs.</p> |
| |
| <h2>How does it work?</h2> |
| |
| <p>Oversubscription was introduced in Mesos 0.23.0 and adds two new agent |
| components: a Resource Estimator and a Quality of Service (QoS) Controller, |
| alongside extending the existing resource allocator, resource monitor, and |
| mesos agent. The new components and their interactions are illustrated below.</p> |
| |
| <p><img src="/assets/img/documentation/oversubscription-overview.jpg" alt="Oversubscription overview" /></p> |
| |
| <h3>Resource estimation</h3> |
| |
| <ul> |
| <li><p>(1) The first step is to identify the amount of oversubscribed resources. |
| The resource estimator taps into the resource monitor and periodically gets |
| usage statistics via <code>ResourceStatistic</code> messages. The resource estimator |
| applies logic based on the collected resource statistics to determine the |
| amount of oversubscribed resources. This can be a series of control algorithms |
| based on measured resource usage slack (allocated but unused resources) and |
| allocation slack.</p></li> |
| <li><p>(2) The agent keeps polling estimates from the resource estimator and tracks |
| the latest estimate.</p></li> |
| <li><p>(3) The agent will send the total amount of oversubscribed resources to the |
| master when the latest estimate is different from the previous estimate.</p></li> |
| </ul> |
| |
| |
| <h3>Resource tracking & scheduling algorithm</h3> |
| |
| <ul> |
| <li>(4) The allocator keeps track of the oversubscribed resources separately |
| from regular resources and annotate those resources as <code>revocable</code>. It is up |
| to the resource estimator to determine which types of resources can be |
| oversubscribed. It is recommended only to oversubscribe <em>compressible</em> |
| resources such as cpu shares, bandwidth, etc.</li> |
| </ul> |
| |
| |
| <h3>Frameworks</h3> |
| |
| <ul> |
| <li>(5) Frameworks can choose to launch tasks on revocable resources by using |
| the regular <code>launchTasks()</code> API. To safe-guard frameworks that are not |
| designed to deal with preemption, only frameworks registering with the |
| <code>REVOCABLE_RESOURCES</code> capability set in its framework info will receive offers |
| with revocable resources. Further more, revocable resources cannot be |
| dynamically reserved and persistent volumes should not be created on revocable |
| disk resources.</li> |
| </ul> |
| |
| |
| <h3>Task launch</h3> |
| |
| <ul> |
| <li>The revocable task is launched as usual when the <code>runTask</code> request is received |
| on the agent. The resources will still be marked as revocable and isolators |
| can take appropriate actions, if certain resources need to be setup differently |
| for revocable and regular tasks.</li> |
| </ul> |
| |
| |
| <blockquote><p>NOTE: If any resource used by a task or executor is |
| revocable, the whole container is treated as a revocable container and can |
| therefore be killed or throttled by the QoS Controller.</p></blockquote> |
| |
| <h3>Interference detection</h3> |
| |
| <ul> |
| <li>(6) When the revocable task is running, it is important to constantly |
| monitor the original task running on those resources and guarantee |
| performance based on an SLA. In order to react to detected interference, the |
| QoS controller needs to be able to kill or throttle running revocable tasks.</li> |
| </ul> |
| |
| |
| <h2>Enabling frameworks to use oversubscribed resources</h2> |
| |
| <p>Frameworks planning to use oversubscribed resources need to register with the |
| <code>REVOCABLE_RESOURCES</code> capability set:</p> |
| |
| <pre><code class="{.cpp}">FrameworkInfo framework; |
| framework.set_name("Revocable framework"); |
| |
| framework.add_capabilities()->set_type( |
| FrameworkInfo::Capability::REVOCABLE_RESOURCES); |
| </code></pre> |
| |
| <p>From that point on, the framework will start to receive revocable resources in |
| offers.</p> |
| |
| <blockquote><p>NOTE: That there is no guarantee that the Mesos cluster has oversubscription |
| enabled. If not, no revocable resources will be offered. See below for |
| instructions how to configure Mesos for oversubscription.</p></blockquote> |
| |
| <h3>Launching tasks using revocable resources</h3> |
| |
| <p>Launching tasks using revocable resources is done through the existing |
| <code>launchTasks</code> API. Revocable resources will have the <code>revocable</code> field set. See |
| below for an example offer with regular and revocable resources.</p> |
| |
| <pre><code class="{.json}">{ |
| "id": "20150618-112946-201330860-5050-2210-0000", |
| "framework_id": "20141119-101031-201330860-5050-3757-0000", |
| "agent_id": "20150618-112946-201330860-5050-2210-S1", |
| "hostname": "foobar", |
| "resources": [ |
| { |
| "name": "cpus", |
| "type": "SCALAR", |
| "scalar": { |
| "value": 2.0 |
| }, |
| "role": "*" |
| }, { |
| "name": "mem", |
| "type": "SCALAR", |
| "scalar": { |
| "value": 512.0 |
| }, |
| "role": "*" |
| }, |
| { |
| "name": "cpus", |
| "type": "SCALAR", |
| "scalar": { |
| "value": 0.45 |
| }, |
| "role": "*", |
| "revocable": {} |
| } |
| ] |
| } |
| </code></pre> |
| |
| <h2>Writing a custom resource estimator</h2> |
| |
| <p>The resource estimator estimates and predicts the total resources used on the |
| agent and informs the master about resources that can be oversubscribed. By |
| default, Mesos comes with a <code>noop</code> and a <code>fixed</code> resource estimator. The <code>noop</code> |
| estimator only provides an empty estimate to the agent and stalls, effectively |
| disabling oversubscription. The <code>fixed</code> estimator doesn’t use the actual |
| measured slack, but oversubscribes the node with fixed resource amount (defined |
| via a command line flag).</p> |
| |
| <p>The interface is defined below:</p> |
| |
| <pre><code class="{.cpp}">class ResourceEstimator |
| { |
| public: |
| // Initializes this resource estimator. This method needs to be |
| // called before any other member method is called. It registers |
| // a callback in the resource estimator. The callback allows the |
| // resource estimator to fetch the current resource usage for each |
| // executor on agent. |
| virtual Try<Nothing> initialize( |
| const lambda::function<process::Future<ResourceUsage>()>& usage) = 0; |
| |
| // Returns the current estimation about the *maximum* amount of |
| // resources that can be oversubscribed on the agent. A new |
| // estimation will invalidate all the previously returned |
| // estimations. The agent will be calling this method periodically |
| // to forward it to the master. As a result, the estimator should |
| // respond with an estimate every time this method is called. |
| virtual process::Future<Resources> oversubscribable() = 0; |
| }; |
| </code></pre> |
| |
| <h2>Writing a custom QoS controller</h2> |
| |
| <p>The interface for implementing custom QoS Controllers is defined below:</p> |
| |
| <pre><code class="{.cpp}">class QoSController |
| { |
| public: |
| // Initializes this QoS Controller. This method needs to be |
| // called before any other member method is called. It registers |
| // a callback in the QoS Controller. The callback allows the |
| // QoS Controller to fetch the current resource usage for each |
| // executor on agent. |
| virtual Try<Nothing> initialize( |
| const lambda::function<process::Future<ResourceUsage>()>& usage) = 0; |
| |
| // A QoS Controller informs the agent about corrections to carry |
| // out, but returning futures to QoSCorrection objects. For more |
| // information, please refer to mesos.proto. |
| virtual process::Future<std::list<QoSCorrection>> corrections() = 0; |
| }; |
| </code></pre> |
| |
| <blockquote><p>NOTE The QoS Controller must not block <code>corrections()</code>. Back the QoS |
| Controller with its own libprocess actor instead.</p></blockquote> |
| |
| <p>The QoS Controller informs the agent that particular corrective actions need to |
| be made. Each corrective action contains information about executor or task and |
| the type of action to perform.</p> |
| |
| <p>Mesos comes with a <code>noop</code> and a <code>load</code> qos controller. The <code>noop</code> controller |
| does not provide any corrections, thus does not assure any quality of service |
| for regular tasks. The <code>load</code> controller is ensuring the total system load |
| doesn’t exceed a configurable thresholds and as a result try to avoid the cpu |
| congestion on the node. If the load is above the thresholds controller evicts |
| all the revocable executors. These thresholds are configurable via two module |
| parameters <code>load_threshold_5min</code> and <code>load_threshold_15min</code>. They represent |
| standard unix load averages in the system. 1 minute system load is ignored, |
| since for oversubscription use case it can be a misleading signal.</p> |
| |
| <pre><code class="{.proto}">message QoSCorrection { |
| enum Type { |
| KILL = 1; // Terminate an executor. |
| } |
| |
| message Kill { |
| optional FrameworkID framework_id = 1; |
| optional ExecutorID executor_id = 2; |
| } |
| |
| required Type type = 1; |
| optional Kill kill = 2; |
| } |
| </code></pre> |
| |
| <h2>Configuring oversubscription</h2> |
| |
| <p>Five new flags has been added to the agent:</p> |
| |
| <table class="table table-striped"> |
| <thead> |
| <tr> |
| <th width="30%"> |
| Flag |
| </th> |
| <th> |
| Explanation |
| </th> |
| </thead> |
| |
| <tr> |
| <td> |
| --oversubscribed_resources_interval=VALUE |
| </td> |
| <td> |
| The agent periodically updates the master with the current estimation |
| about the total amount of oversubscribed resources that are allocated and |
| available. The interval between updates is controlled by this flag. (default: |
| 15secs) |
| </td> |
| </tr> |
| |
| <tr> |
| <td> |
| --qos_controller=VALUE |
| </td> |
| <td> |
| The name of the QoS Controller to use for oversubscription. |
| </td> |
| </tr> |
| |
| <tr> |
| <td> |
| --qos_correction_interval_min=VALUE |
| </td> |
| <td> |
| The agent polls and carries out QoS corrections from the QoS Controller |
| based on its observed performance of running tasks. The smallest interval |
| between these corrections is controlled by this flag. (default: 0ns) |
| </td> |
| </tr> |
| |
| <tr> |
| <td> |
| --resource_estimator=VALUE |
| </td> |
| <td> |
| The name of the resource estimator to use for oversubscription. |
| </td> |
| </tr> |
| |
| </table> |
| |
| |
| <p>The <code>fixed</code> resource estimator is enabled as follows:</p> |
| |
| <pre><code>--resource_estimator="org_apache_mesos_FixedResourceEstimator" |
| |
| --modules='{ |
| "libraries": { |
| "file": "/usr/local/lib64/libfixed_resource_estimator.so", |
| "modules": { |
| "name": "org_apache_mesos_FixedResourceEstimator", |
| "parameters": { |
| "key": "resources", |
| "value": "cpus:14" |
| } |
| } |
| } |
| }' |
| </code></pre> |
| |
| <p>In the example above, a fixed amount of 14 cpus will be offered as revocable |
| resources.</p> |
| |
| <p>The <code>load</code> qos controller is enabled as follows:</p> |
| |
| <pre><code>--qos_controller="org_apache_mesos_LoadQoSController" |
| |
| --qos_correction_interval_min="20secs" |
| |
| --modules='{ |
| "libraries": { |
| "file": "/usr/local/lib64/libload_qos_controller.so", |
| "modules": { |
| "name": "org_apache_mesos_LoadQoSController", |
| "parameters": [ |
| { |
| "key": "load_threshold_5min", |
| "value": "6" |
| }, |
| { |
| "key": "load_threshold_15min", |
| "value": "4" |
| } |
| ] |
| } |
| } |
| }' |
| </code></pre> |
| |
| <p>In the example above, when standard unix system load average for 5 minutes will |
| be above 6, or for 15 minutes will be above 4 then agent will evict all the |
| <code>revocable</code> executors. <code>LoadQoSController</code> will be effectively run every 20 |
| seconds.</p> |
| |
| <p>To install a custom resource estimator and QoS controller, please refer to the |
| <a href="/documentation/latest/./modules/">modules documentation</a>.</p> |
| |
| </div> |
| </div> |
| |
| </div><!-- /.container --> |
| </div><!-- /.content --> |
| |
| <hr> |
| |
| |
| |
| <!-- footer --> |
| <div class="footer"> |
| <div class="container"> |
| <div class="col-md-4 social-blk"> |
| <span class="social"> |
| <a href="https://twitter.com/ApacheMesos" |
| class="twitter-follow-button" |
| data-show-count="false" data-size="large">Follow @ApacheMesos</a> |
| <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> |
| <a href="https://twitter.com/intent/tweet?button_hashtag=mesos" |
| class="twitter-hashtag-button" |
| data-size="large" |
| data-related="ApacheMesos">Tweet #mesos</a> |
| <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> |
| </span> |
| </div> |
| |
| <div class="col-md-8 trademark"> |
| <p>© 2012-2017 <a href="http://apache.org">The Apache Software Foundation</a>. |
| Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation. |
| <p> |
| </div> |
| </div><!-- /.container --> |
| </div><!-- /.footer --> |
| |
| <!-- JS --> |
| <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script> |
| <script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script> |
| </body> |
| </html> |