blob: f44d4e7acf3adda602b5d5b611d89a92582a94d4 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Apache Aurora</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
<link href="/assets/css/main.css" rel="stylesheet">
<!-- Analytics -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-45879646-1']);
_gaq.push(['_setDomainName', 'apache.org']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<div class="container-fluid section-header">
<div class="container">
<div class="nav nav-bar">
<a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a>
<ul class="nav navbar-nav navbar-right">
<li><a href="/documentation/latest/">Documentation</a></li>
<li><a href="/community/">Community</a></li>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/blog/">Blog</a></li>
</ul>
</div>
</div>
</div>
<div class="container-fluid">
<div class="container content">
<div class="col-md-12 documentation">
<h5 class="page-header text-uppercase">Documentation
<select onChange="window.location.href='/documentation/' + this.value + '/resource-isolation/'"
value="0.7.0-incubating">
<option value="0.22.0"
>
0.22.0
(latest)
</option>
<option value="0.21.0"
>
0.21.0
</option>
<option value="0.20.0"
>
0.20.0
</option>
<option value="0.19.1"
>
0.19.1
</option>
<option value="0.19.0"
>
0.19.0
</option>
<option value="0.18.1"
>
0.18.1
</option>
<option value="0.18.0"
>
0.18.0
</option>
<option value="0.17.0"
>
0.17.0
</option>
<option value="0.16.0"
>
0.16.0
</option>
<option value="0.15.0"
>
0.15.0
</option>
<option value="0.14.0"
>
0.14.0
</option>
<option value="0.13.0"
>
0.13.0
</option>
<option value="0.12.0"
>
0.12.0
</option>
<option value="0.11.0"
>
0.11.0
</option>
<option value="0.10.0"
>
0.10.0
</option>
<option value="0.9.0"
>
0.9.0
</option>
<option value="0.8.0"
>
0.8.0
</option>
<option value="0.7.0-incubating"
selected="selected">
0.7.0-incubating
</option>
<option value="0.6.0-incubating"
>
0.6.0-incubating
</option>
<option value="0.5.0-incubating"
>
0.5.0-incubating
</option>
</select>
</h5>
<h1 id="resource-isolation-and-sizing">Resource Isolation and Sizing</h1>
<p><strong>NOTE</strong>: Resource Isolation and Sizing is very much a work in progress.
Both user-facing aspects and how it works under the hood are subject to
change.</p>
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#cpu-isolation">CPU Isolation</a></li>
<li><a href="#cpu-sizing">CPU Sizing</a></li>
<li><a href="#memory-isolation">Memory Isolation</a></li>
<li><a href="#memory-sizing">Memory Sizing</a></li>
<li><a href="#disk-space">Disk Space</a></li>
<li><a href="#disk-space-sizing">Disk Space Sizing</a></li>
<li><a href="#other-resources">Other Resources</a></li>
</ul>
<h2 id="introduction">Introduction</h2>
<p>Aurora is a multi-tenant system; a single software instance runs on a
server, serving multiple clients/tenants. To share resources among
tenants, it implements isolation of:</p>
<ul>
<li>CPU</li>
<li>memory</li>
<li>disk space</li>
</ul>
<p>CPU is a soft limit, and handled differently from memory and disk space.
Too low a CPU value results in throttling your application and
slowing it down. Memory and disk space are both hard limits; when your
application goes over these values, it&rsquo;s killed.</p>
<p>Let&rsquo;s look at each resource type in more detail:</p>
<h2 id="cpu-isolation">CPU Isolation</h2>
<p>Mesos uses a quota based CPU scheduler (the <em>Completely Fair Scheduler</em>)
to provide consistent and predictable performance. This is effectively
a guarantee of resources &ndash; you receive at least what you requested, but
also no more than you&rsquo;ve requested.</p>
<p>The scheduler gives applications a CPU quota for every 100 ms interval.
When an application uses its quota for an interval, it is throttled for
the rest of the 100 ms. Usage resets for each interval and unused
quota does not carry over.</p>
<p>For example, an application specifying 4.0 CPU has access to 400 ms of
CPU time every 100 ms. This CPU quota can be used in different ways,
depending on the application and available resources. Consider the
scenarios shown in this diagram.</p>
<p><img alt="CPU Availability" src="../images/CPUavailability.png" /></p>
<ul>
<li><p><em>Scenario A</em>: the application can use up to 4 cores continuously for
every 100 ms interval. It is never throttled and starts processing
new requests immediately.</p></li>
<li><p><em>Scenario B</em> : the application uses up to 8 cores (depending on
availability) but is throttled after 50 ms. The CPU quota resets at the
start of each new 100 ms interval.</p></li>
<li><p><em>Scenario C</em> : is like Scenario A, but there is a garbage collection
event in the second interval that consumes all CPU quota. The
application throttles for the remaining 75 ms of that interval and
cannot service requests until the next interval. In this example, the
garbage collection finished in one interval but, depending on how much
garbage needs collecting, it may take more than one interval and further
delay service of requests.</p></li>
</ul>
<p><em>Technical Note</em>: Mesos considers logical cores, also known as
hyperthreading or SMT cores, as the unit of CPU.</p>
<h2 id="cpu-sizing">CPU Sizing</h2>
<p>To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value
that lets the task run at its desired performance when at peak load
distributed across all shards. Include reserve capacity of at least 50%,
possibly more, depending on how critical your service is (or how
confident you are about your original estimate : -)), ideally by
increasing the number of shards to also improve resiliency. When running
your application, observe its CPU stats over time. If consistently at or
near your quota during peak load, you should consider increasing either
per-shard CPU or the number of shards.</p>
<h2 id="memory-isolation">Memory Isolation</h2>
<p>Mesos uses dedicated memory allocation. Your application always has
access to the amount of memory specified in your configuration. The
application&rsquo;s memory use is defined as the sum of the resident set size
(RSS) of all processes in a shard. Each shard is considered
independently.</p>
<p>In other words, say you specified a memory size of 10GB. Each shard
would receive 10GB of memory. If an individual shard&rsquo;s memory demands
exceed 10GB, that shard is killed, but the other shards continue
working.</p>
<p><em>Technical note</em>: Total memory size is not enforced at allocation time,
so your application can request more than its allocation without getting
an ENOMEM. However, it will be killed shortly after.</p>
<h2 id="memory-sizing">Memory Sizing</h2>
<p>Size for your application&rsquo;s peak requirement. Observe the per-instance
memory statistics over time, as memory requirements can vary over
different periods. Remember that if your application exceeds its memory
value, it will be killed, so you should also add a safety margin of
around 10-20%. If you have the ability to do so, you may also want to
put alerts on the per-instance memory.</p>
<h2 id="disk-space">Disk Space</h2>
<p>Disk space used by your application is defined as the sum of the files&rsquo;
disk space in your application&rsquo;s directory, including the <code>stdout</code> and
<code>stderr</code> logged from your application. Each shard is considered
independently. You should use off-node storage for your application&rsquo;s
data whenever possible.</p>
<p>In other words, say you specified disk space size of 100MB. Each shard
would receive 100MB of disk space. If an individual shard&rsquo;s disk space
demands exceed 100MB, that shard is killed, but the other shards
continue working.</p>
<p>After your application finishes running, its allocated disk space is
reclaimed. Thus, your job&rsquo;s final action should move any disk content
that you want to keep, such as logs, to your home file system or other
less transitory storage. Disk reclamation takes place an undefined
period after the application finish time; until then, the disk contents
are still available but you shouldn&rsquo;t count on them being so.</p>
<p><em>Technical note</em> : Disk space is not enforced at write so your
application can write above its quota without getting an ENOSPC, but it
will be killed shortly after. This is subject to change.</p>
<h2 id="disk-space-sizing">Disk Space Sizing</h2>
<p>Size for your application&rsquo;s peak requirement. Rotate and discard log
files as needed to stay within your quota. When running a Java process,
add the maximum size of the Java heap to your disk space requirement, in
order to account for an out of memory error dumping the heap
into the application&rsquo;s sandbox space.</p>
<h2 id="other-resources">Other Resources</h2>
<p>Other resources, such as network bandwidth, do not have any performance
guarantees. For some resources, such as memory bandwidth, there are no
practical sharing methods so some application combinations collocated on
the same host may cause contention.</p>
</div>
</div>
</div>
<div class="container-fluid section-footer buffer">
<div class="container">
<div class="row">
<div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
<ul>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/community/">Mailing Lists</a></li>
<li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
<li><a href="/documentation/latest/contributing/">How To Contribute</a></li>
</ul>
</div>
<div class="col-md-2"><h3>The ASF</h3>
<ul>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
</ul>
</div>
<div class="col-md-6">
<p class="disclaimer">&copy; 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
</div>
</div>
</div>
</body>
</html>