blob: 18f028a2c24c2cb69b65ea59d52ee35d661bbefd [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Apache Mesos - Oversubscription</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="og:locale" content="en_US"/>
<meta property="og:type" content="website"/>
<meta property="og:title" content="Apache Mesos"/>
<meta property="og:site_name" content="Apache Mesos"/>
<meta property="og:url" content="http://mesos.apache.org/"/>
<meta property="og:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
<meta property="og:description"
content="Apache Mesos abstracts resources away from machines,
enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively."/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:site" content="@ApacheMesos"/>
<meta name="twitter:title" content="Apache Mesos"/>
<meta name="twitter:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
<meta name="twitter:description"
content="Apache Mesos abstracts resources away from machines,
enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively."/>
<link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
<link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
<link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
<!-- Google Analytics Magic -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-20226872-1']);
_gaq.push(['_setDomainName', 'apache.org']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<!-- magical breadcrumbs -->
<div class="topnav">
<div class="container">
<ul class="breadcrumb">
<li>
<div class="dropdown">
<a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
<li><a href="http://www.apache.org">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
</ul>
</div>
</li>
<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
<li><a href="/documentation
/">Documentation
</a></li>
</ul><!-- /.breadcrumb -->
</div><!-- /.container -->
</div><!-- /.topnav -->
<!-- navbar excitement -->
<div class="navbar navbar-default navbar-static-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo"/></a>
</div><!-- /.navbar-header -->
<div class="navbar-collapse collapse" id="mesos-menu">
<ul class="nav navbar-nav navbar-right">
<li><a href="/gettingstarted/">Getting Started</a></li>
<li><a href="/blog/">Blog</a></li>
<li><a href="/documentation/latest/">Documentation</a></li>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/community/">Community</a></li>
</ul>
</div><!-- /#mesos-menu -->
</div><!-- /.container -->
</div><!-- /.navbar -->
<div class="content">
<div class="container">
<div class="row-fluid">
<div class="col-md-4">
<h4>If you're new to Mesos</h4>
<p>See the <a href="/gettingstarted/">getting started</a> page for more
information about downloading, building, and deploying Mesos.</p>
<h4>If you'd like to get involved or you're looking for support</h4>
<p>See our <a href="/community/">community</a> page for more details.</p>
</div>
<div class="col-md-8">
<h1>Oversubscription</h1>
<p>High-priority user-facing services are typically provisioned on large clusters
for peak load and unexpected load spikes. Hence, for most of time, the
provisioned resources remain underutilized. Oversubscription takes advantage of
temporarily unused resources to execute best-effort tasks such as background
analytics, video/image processing, chip simulations, and other low priority
jobs.</p>
<h2>How does it work?</h2>
<p>Oversubscription was introduced in Mesos 0.23.0 and adds two new agent
components: a Resource Estimator and a Quality of Service (QoS) Controller,
alongside extending the existing resource allocator, resource monitor, and
mesos agent. The new components and their interactions are illustrated below.</p>
<p><img src="/assets/img/documentation/oversubscription-overview.jpg" alt="Oversubscription overview" /></p>
<h3>Resource estimation</h3>
<ul>
<li><p>(1) The first step is to identify the amount of oversubscribed resources.
The resource estimator taps into the resource monitor and periodically gets
usage statistics via <code>ResourceStatistic</code> messages. The resource estimator
applies logic based on the collected resource statistics to determine the
amount of oversubscribed resources. This can be a series of control algorithms
based on measured resource usage slack (allocated but unused resources) and
allocation slack.</p></li>
<li><p>(2) The agent keeps polling estimates from the resource estimator and tracks
the latest estimate.</p></li>
<li><p>(3) The agent will send the total amount of oversubscribed resources to the
master when the latest estimate is different from the previous estimate.</p></li>
</ul>
<h3>Resource tracking &amp; scheduling algorithm</h3>
<ul>
<li>(4) The allocator keeps track of the oversubscribed resources separately
from regular resources and annotate those resources as <code>revocable</code>. It is up
to the resource estimator to determine which types of resources can be
oversubscribed. It is recommended only to oversubscribe <em>compressible</em>
resources such as cpu shares, bandwidth, etc.</li>
</ul>
<h3>Frameworks</h3>
<ul>
<li>(5) Frameworks can choose to launch tasks on revocable resources by using
the regular <code>launchTasks()</code> API. To safe-guard frameworks that are not
designed to deal with preemption, only frameworks registering with the
<code>REVOCABLE_RESOURCES</code> capability set in its framework info will receive offers
with revocable resources. Further more, revocable resources cannot be
dynamically reserved and persistent volumes should not be created on revocable
disk resources.</li>
</ul>
<h3>Task launch</h3>
<ul>
<li>The revocable task is launched as usual when the <code>runTask</code> request is received
on the agent. The resources will still be marked as revocable and isolators
can take appropriate actions, if certain resources need to be setup differently
for revocable and regular tasks.</li>
</ul>
<blockquote><p>NOTE: If any resource used by a task or executor is
revocable, the whole container is treated as a revocable container and can
therefore be killed or throttled by the QoS Controller.</p></blockquote>
<h3>Interference detection</h3>
<ul>
<li>(6) When the revocable task is running, it is important to constantly
monitor the original task running on those resources and guarantee
performance based on an SLA. In order to react to detected interference, the
QoS controller needs to be able to kill or throttle running revocable tasks.</li>
</ul>
<h2>Enabling frameworks to use oversubscribed resources</h2>
<p>Frameworks planning to use oversubscribed resources need to register with the
<code>REVOCABLE_RESOURCES</code> capability set:</p>
<pre><code class="{.cpp}">FrameworkInfo framework;
framework.set_name("Revocable framework");
framework.add_capabilities()-&gt;set_type(
FrameworkInfo::Capability::REVOCABLE_RESOURCES);
</code></pre>
<p>From that point on, the framework will start to receive revocable resources in
offers.</p>
<blockquote><p>NOTE: That there is no guarantee that the Mesos cluster has oversubscription
enabled. If not, no revocable resources will be offered. See below for
instructions how to configure Mesos for oversubscription.</p></blockquote>
<h3>Launching tasks using revocable resources</h3>
<p>Launching tasks using revocable resources is done through the existing
<code>launchTasks</code> API. Revocable resources will have the <code>revocable</code> field set. See
below for an example offer with regular and revocable resources.</p>
<pre><code class="{.json}">{
"id": "20150618-112946-201330860-5050-2210-0000",
"framework_id": "20141119-101031-201330860-5050-3757-0000",
"agent_id": "20150618-112946-201330860-5050-2210-S1",
"hostname": "foobar",
"resources": [
{
"name": "cpus",
"type": "SCALAR",
"scalar": {
"value": 2.0
},
"role": "*"
}, {
"name": "mem",
"type": "SCALAR",
"scalar": {
"value": 512.0
},
"role": "*"
},
{
"name": "cpus",
"type": "SCALAR",
"scalar": {
"value": 0.45
},
"role": "*",
"revocable": {}
}
]
}
</code></pre>
<h2>Writing a custom resource estimator</h2>
<p>The resource estimator estimates and predicts the total resources used on the
agent and informs the master about resources that can be oversubscribed. By
default, Mesos comes with a <code>noop</code> and a <code>fixed</code> resource estimator. The <code>noop</code>
estimator only provides an empty estimate to the agent and stalls, effectively
disabling oversubscription. The <code>fixed</code> estimator doesn&rsquo;t use the actual
measured slack, but oversubscribes the node with fixed resource amount (defined
via a command line flag).</p>
<p>The interface is defined below:</p>
<pre><code class="{.cpp}">class ResourceEstimator
{
public:
// Initializes this resource estimator. This method needs to be
// called before any other member method is called. It registers
// a callback in the resource estimator. The callback allows the
// resource estimator to fetch the current resource usage for each
// executor on agent.
virtual Try&lt;Nothing&gt; initialize(
const lambda::function&lt;process::Future&lt;ResourceUsage&gt;()&gt;&amp; usage) = 0;
// Returns the current estimation about the *maximum* amount of
// resources that can be oversubscribed on the agent. A new
// estimation will invalidate all the previously returned
// estimations. The agent will be calling this method periodically
// to forward it to the master. As a result, the estimator should
// respond with an estimate every time this method is called.
virtual process::Future&lt;Resources&gt; oversubscribable() = 0;
};
</code></pre>
<h2>Writing a custom QoS controller</h2>
<p>The interface for implementing custom QoS Controllers is defined below:</p>
<pre><code class="{.cpp}">class QoSController
{
public:
// Initializes this QoS Controller. This method needs to be
// called before any other member method is called. It registers
// a callback in the QoS Controller. The callback allows the
// QoS Controller to fetch the current resource usage for each
// executor on agent.
virtual Try&lt;Nothing&gt; initialize(
const lambda::function&lt;process::Future&lt;ResourceUsage&gt;()&gt;&amp; usage) = 0;
// A QoS Controller informs the agent about corrections to carry
// out, but returning futures to QoSCorrection objects. For more
// information, please refer to mesos.proto.
virtual process::Future&lt;std::list&lt;QoSCorrection&gt;&gt; corrections() = 0;
};
</code></pre>
<blockquote><p>NOTE The QoS Controller must not block <code>corrections()</code>. Back the QoS
Controller with its own libprocess actor instead.</p></blockquote>
<p>The QoS Controller informs the agent that particular corrective actions need to
be made. Each corrective action contains information about executor or task and
the type of action to perform.</p>
<p>Mesos comes with a <code>noop</code> and a <code>load</code> qos controller. The <code>noop</code> controller
does not provide any corrections, thus does not assure any quality of service
for regular tasks. The <code>load</code> controller is ensuring the total system load
doesn&rsquo;t exceed a configurable thresholds and as a result try to avoid the cpu
congestion on the node. If the load is above the thresholds controller evicts
all the revocable executors. These thresholds are configurable via two module
parameters <code>load_threshold_5min</code> and <code>load_threshold_15min</code>. They represent
standard unix load averages in the system. 1 minute system load is ignored,
since for oversubscription use case it can be a misleading signal.</p>
<pre><code class="{.proto}">message QoSCorrection {
enum Type {
KILL = 1; // Terminate an executor.
}
message Kill {
optional FrameworkID framework_id = 1;
optional ExecutorID executor_id = 2;
}
required Type type = 1;
optional Kill kill = 2;
}
</code></pre>
<h2>Configuring oversubscription</h2>
<p>Five new flags has been added to the agent:</p>
<table class="table table-striped">
<thead>
<tr>
<th width="30%">
Flag
</th>
<th>
Explanation
</th>
</thead>
<tr>
<td>
--oversubscribed_resources_interval=VALUE
</td>
<td>
The agent periodically updates the master with the current estimation
about the total amount of oversubscribed resources that are allocated and
available. The interval between updates is controlled by this flag. (default:
15secs)
</td>
</tr>
<tr>
<td>
--qos_controller=VALUE
</td>
<td>
The name of the QoS Controller to use for oversubscription.
</td>
</tr>
<tr>
<td>
--qos_correction_interval_min=VALUE
</td>
<td>
The agent polls and carries out QoS corrections from the QoS Controller
based on its observed performance of running tasks. The smallest interval
between these corrections is controlled by this flag. (default: 0ns)
</td>
</tr>
<tr>
<td>
--resource_estimator=VALUE
</td>
<td>
The name of the resource estimator to use for oversubscription.
</td>
</tr>
</table>
<p>The <code>fixed</code> resource estimator is enabled as follows:</p>
<pre><code>--resource_estimator="org_apache_mesos_FixedResourceEstimator"
--modules='{
"libraries": {
"file": "/usr/local/lib64/libfixed_resource_estimator.so",
"modules": {
"name": "org_apache_mesos_FixedResourceEstimator",
"parameters": {
"key": "resources",
"value": "cpus:14"
}
}
}
}'
</code></pre>
<p>In the example above, a fixed amount of 14 cpus will be offered as revocable
resources.</p>
<p>The <code>load</code> qos controller is enabled as follows:</p>
<pre><code>--qos_controller="org_apache_mesos_LoadQoSController"
--qos_correction_interval_min="20secs"
--modules='{
"libraries": {
"file": "/usr/local/lib64/libload_qos_controller.so",
"modules": {
"name": "org_apache_mesos_LoadQoSController",
"parameters": [
{
"key": "load_threshold_5min",
"value": "6"
},
{
"key": "load_threshold_15min",
"value": "4"
}
]
}
}
}'
</code></pre>
<p>In the example above, when standard unix system load average for 5 minutes will
be above 6, or for 15 minutes will be above 4 then agent will evict all the
<code>revocable</code> executors. <code>LoadQoSController</code> will be effectively run every 20
seconds.</p>
<p>To install a custom resource estimator and QoS controller, please refer to the
<a href="/documentation/latest/./modules/">modules documentation</a>.</p>
</div>
</div>
</div><!-- /.container -->
</div><!-- /.content -->
<hr>
<!-- footer -->
<div class="footer">
<div class="container">
<div class="col-md-4 social-blk">
<span class="social">
<a href="https://twitter.com/ApacheMesos"
class="twitter-follow-button"
data-show-count="false" data-size="large">Follow @ApacheMesos</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
<a href="https://twitter.com/intent/tweet?button_hashtag=mesos"
class="twitter-hashtag-button"
data-size="large"
data-related="ApacheMesos">Tweet #mesos</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
</span>
</div>
<div class="col-md-8 trademark">
<p>&copy; 2012-2017 <a href="http://apache.org">The Apache Software Foundation</a>.
Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.
<p>
</div>
</div><!-- /.container -->
</div><!-- /.footer -->
<!-- JS -->
<script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
</body>
</html>