blob: 7b867bf212e86e8dcf9ede60833ba3e3a440fb57 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Apache Mesos - Mesos Containerizer</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="og:locale" content="en_US"/>
<meta property="og:type" content="website"/>
<meta property="og:title" content="Apache Mesos"/>
<meta property="og:site_name" content="Apache Mesos"/>
<meta property="og:url" content="http://mesos.apache.org/"/>
<meta property="og:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
<meta property="og:description"
content="Apache Mesos abstracts resources away from machines,
enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively."/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:site" content="@ApacheMesos"/>
<meta name="twitter:title" content="Apache Mesos"/>
<meta name="twitter:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
<meta name="twitter:description"
content="Apache Mesos abstracts resources away from machines,
enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively."/>
<link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
<link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
<link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
<!-- Google Analytics Magic -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-20226872-1']);
_gaq.push(['_setDomainName', 'apache.org']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<!-- magical breadcrumbs -->
<div class="topnav">
<div class="container">
<ul class="breadcrumb">
<li>
<div class="dropdown">
<a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
<li><a href="http://www.apache.org">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
</ul>
</div>
</li>
<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
<li><a href="/documentation
/">Documentation
</a></li>
</ul><!-- /.breadcrumb -->
</div><!-- /.container -->
</div><!-- /.topnav -->
<!-- navbar excitement -->
<div class="navbar navbar-default navbar-static-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo"/></a>
</div><!-- /.navbar-header -->
<div class="navbar-collapse collapse" id="mesos-menu">
<ul class="nav navbar-nav navbar-right">
<li><a href="/gettingstarted/">Getting Started</a></li>
<li><a href="/blog/">Blog</a></li>
<li><a href="/documentation/latest/">Documentation</a></li>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/community/">Community</a></li>
</ul>
</div><!-- /#mesos-menu -->
</div><!-- /.container -->
</div><!-- /.navbar -->
<div class="content">
<div class="container">
<div class="row-fluid">
<div class="col-md-4">
<h4>If you're new to Mesos</h4>
<p>See the <a href="/gettingstarted/">getting started</a> page for more
information about downloading, building, and deploying Mesos.</p>
<h4>If you'd like to get involved or you're looking for support</h4>
<p>See our <a href="/community/">community</a> page for more details.</p>
</div>
<div class="col-md-8">
<h1>Mesos Containerizer</h1>
<p>The MesosContainerizer provides lightweight containerization and
resource isolation of executors using Linux-specific functionality
such as control cgroups and namespaces. It is composable so operators
can selectively enable different isolators.</p>
<p>It also provides basic support for POSIX systems (e.g., OSX) but
without any actual isolation, only resource usage reporting.</p>
<h3>Shared Filesystem</h3>
<p>The SharedFilesystem isolator can optionally be used on Linux hosts to
enable modifications to each container&rsquo;s view of the shared
filesystem.</p>
<p>The modifications are specified in the ContainerInfo included in the
ExecutorInfo, either by a framework or by using the
<code>--default_container_info</code> agent flag.</p>
<p>ContainerInfo specifies Volumes which map parts of the shared
filesystem (host_path) into the container&rsquo;s view of the filesystem
(container_path), as read-write or read-only. The host_path can be
absolute, in which case it will make the filesystem subtree rooted at
host_path also accessible under container_path for each container.
If host_path is relative then it is considered as a directory
relative to the executor&rsquo;s work directory. The directory will be
created and permissions copied from the corresponding directory (which
must exist) in the shared filesystem.</p>
<p>The primary use-case for this isolator is to selectively make parts of
the shared filesystem private to each container. For example, a
private &ldquo;/tmp&rdquo; directory can be achieved with <code>host_path="tmp"</code> and
<code>container_path="/tmp"</code> which will create a directory &ldquo;tmp&rdquo; inside the
executor&rsquo;s work directory (mode 1777) and simultaneously mount it as
/tmp inside the container. This is transparent to processes running
inside the container. Containers will not be able to see the host&rsquo;s
/tmp or any other container&rsquo;s /tmp.</p>
<h3>Pid Namespace</h3>
<p>The Pid Namespace isolator can be used to isolate each container in
a separate pid namespace with two main benefits:</p>
<ol>
<li><p>Visibility: Processes running in the container (executor and
descendants) are unable to see or signal processes outside the
namespace.</p></li>
<li><p>Clean termination: Termination of the leading process in a pid
namespace will result in the kernel terminating all other processes
in the namespace.</p></li>
</ol>
<p>The Launcher will use (2) during destruction of a container in
preference to the freezer cgroup, avoiding known kernel issues related
to freezing cgroups under OOM conditions.</p>
<p>/proc will be mounted for containers so tools such as &lsquo;ps&rsquo; will work
correctly.</p>
<h3>Posix Disk Isolator</h3>
<p>The Posix Disk isolator provides basic disk isolation. It is able to
report the disk usage for each sandbox and optionally enforce the disk
quota. It can be used on both Linux and OS X.</p>
<p>To enable the Posix Disk isolator, append <code>disk/du</code> to the <code>--isolation</code>
flag when starting the agent.</p>
<p>By default, the disk quota enforcement is disabled. To enable it,
specify <code>--enforce_container_disk_quota</code> when starting the agent.</p>
<p>The Posix Disk isolator reports disk usage for each sandbox by
periodically running the <code>du</code> command. The disk usage can be retrieved
from the resource statistics endpoint (<a href="/documentation/latest/./endpoints/slave/monitor/statistics/">/monitor/statistics</a>).</p>
<p>The interval between two <code>du</code>s can be controlled by the agent flag
<code>--container_disk_watch_interval</code>. For example,
<code>--container_disk_watch_interval=1mins</code> sets the interval to be 1
minute. The default interval is 15 seconds.</p>
<h3>XFS Disk Isolator</h3>
<p>The XFS Disk isolator uses XFS project quotas to track the disk
space used by each container sandbox and to enforce the corresponding
disk space allocation. Write operations performed by tasks exceeding
their disk allocation will fail with an <code>EDQUOT</code> error. The task
will not be terminated by the containerizer.</p>
<p>The XFS disk isolator is functionally similar to Posix Disk isolator
but avoids the cost of repeatedly running the <code>du</code>. Though they will
not interfere with each other, it is not recommended to use them together.</p>
<p>To enable the XFS Disk isolator, append <code>disk/xfs</code> to the <code>--isolation</code>
flag when starting the agent.</p>
<p>The XFS Disk isolator requires the sandbox directory to be located
on an XFS filesystem that is mounted with the <code>pquota</code> option. There
is no need to configure
<a href="http://man7.org/linux/man-pages/man5/projects.5.html">projects</a>
or <a href="http://man7.org/linux/man-pages/man5/projid.5.html">projid</a>
files. The range of project IDs given to the <code>--xfs_project_range</code>
must not overlap any project IDs allocated for other uses.</p>
<p>The XFS disk isolator does not natively support an accounting-only mode
like that of the Posix Disk isolator. Quota enforcement can be disabled
by mounting the filesystem with the <code>pqnoenforce</code> mount option.</p>
<p>The <a href="http://man7.org/linux/man-pages/man8/xfs_quota.8.html">xfs_quota</a>
command can be used to show the current allocation of project IDs
and quota. For example:</p>
<pre><code>$ xfs_quota -x -c "report -a -n -L 5000 -U 1000"
</code></pre>
<p>To show which project a file belongs to, use the
<a href="http://man7.org/linux/man-pages/man8/xfs_io.8.html">xfs_io</a> command
to display the <code>fsxattr.projid</code> field. For example:</p>
<pre><code>$ xfs_io -r -c stat /mnt/mesos/
</code></pre>
<p>Note that the Posix Disk isolator flags <code>--enforce_container_disk_quota</code>,
<code>--container_disk_watch_interval</code> and <code>--enforce_container_disk_quota</code> do
not apply to the XFS Disk isolator.</p>
<h3>Docker Runtime Isolator</h3>
<p>The Docker Runtime isolator is used for supporting runtime
configurations from the docker image (e.g., Entrypoint/Cmd, Env,
etc.). This isolator is tied with <code>--image_providers=docker</code>. If
<code>--image_providers</code> contains <code>docker</code>, this isolator must be used.
Otherwise, the agent will refuse to start.</p>
<p>To enable the Docker Runtime isolator, append <code>docker/runtime</code> to the
<code>--isolation</code> flag when starting the agent.</p>
<p>Currently, docker image default <code>Entrypoint</code>, <code>Cmd</code>, <code>Env</code>, and <code>WorkingDir</code> are
supported with docker runtime isolator. Users can specify <code>CommandInfo</code> to
override the default <code>Entrypoint</code> and <code>Cmd</code> in the image (see below for
details). The <code>CommandInfo</code> should be inside of either <code>TaskInfo</code> or
<code>ExecutorInfo</code> (depending on whether the task is a command task or uses a custom
executor, respectively).</p>
<h4>Determine the Launch Command</h4>
<p>If the user specifies a command in <code>CommandInfo</code>, that will override the
default Entrypoint/Cmd in the docker image. Otherwise, we will use the
default Entrypoint/Cmd and append arguments specified in <code>CommandInfo</code>
accordingly. The details are explained in the following table.</p>
<p>Users can specify <code>CommandInfo</code> including <code>shell</code>, <code>value</code> and
<code>arguments</code>, which are represented in the first column of the table
below. <code>0</code> represents <code>not specified</code>, while <code>1</code> represents
<code>specified</code>. The first row is how <code>Entrypoint</code> and <code>Cmd</code> defined in
the docker image. All cells in the table, except the first column and
row, as well as cells labeled as <code>Error</code>, have the first element
(i.e., <code>/Entrypt[0]</code>) as executable, and the rest as appending
arguments.</p>
<table class="table table-striped">
<tr>
<th></th>
<th>Entrypoint=0<br>Cmd=0</th>
<th>Entrypoint=0<br>Cmd=1</th>
<th>Entrypoint=1<br>Cmd=0</th>
<th>Entrypoint=1<br>Cmd=1</th>
</tr>
<tr>
<td>sh=0<br>value=0<br>argv=0</td>
<td>Error</td>
<td>/Cmd[0]<br>Cmd[1]..</td>
<td>/Entrypt[0]<br>Entrypt[1]..</td>
<td>/Entrypt[0]<br>Entrypt[1]..<br>Cmd..</td>
</tr>
<tr>
<td>sh=0<br>value=0<br>argv=1</td>
<td>Error</td>
<td>/Cmd[0]<br>argv</td>
<td>/Entrypt[0]<br>Entrypt[1]..<br>argv</td>
<td>/Entrypt[0]<br>Entrypt[1]..<br>argv</td>
</tr>
<tr>
<td>sh=0<br>value=1<br>argv=0</td>
<td>/value</td>
<td>/value</td>
<td>/value</td>
<td>/value</td>
</tr>
<tr>
<td>sh=0<br>value=1<br>argv=1</td>
<td>/value<br>argv</td>
<td>/value<br>argv</td>
<td>/value<br>argv</td>
<td>/value<br>argv</td>
</tr>
<tr>
<td>sh=1<br>value=0<br>argv=0</td>
<td>Error</td>
<td>Error</td>
<td>Error</td>
<td>Error</td>
</tr>
<tr>
<td>sh=1<br>value=0<br>argv=1</td>
<td>Error</td>
<td>Error</td>
<td>Error</td>
<td>Error</td>
</tr>
<tr>
<td>sh=1<br>value=1<br>argv=0</td>
<td>/bin/sh -c<br>value</td>
<td>/bin/sh -c<br>value</td>
<td>/bin/sh -c<br>value</td>
<td>/bin/sh -c<br>value</td>
</tr>
<tr>
<td>sh=1<br>value=1<br>argv=1</td>
<td>/bin/sh -c<br>value</td>
<td>/bin/sh -c<br>value</td>
<td>/bin/sh -c<br>value</td>
<td>/bin/sh -c<br>value</td>
</tr>
</table>
<h3>The <code>cgroups/net_cls</code> Isolator</h3>
<p>The cgroups/net_cls isolator allows operators to provide network
performance isolation and network segmentation for containers within
a Mesos cluster. To enable the cgroups/net_cls isolator, append
<code>cgroups/net_cls</code> to the <code>--isolation</code> flag when starting the agent.</p>
<p>As the name suggests, the isolator enables the net_cls subsystem for
Linux cgroups and assigns a net_cls cgroup to each container launched
by the <code>MesosContainerizer</code>. The objective of the net_cls subsystem
is to allow the kernel to tag packets originating from a container
with a 32-bit handle. These handles can be used by kernel modules such
as <code>qdisc</code> (for traffic engineering) and <code>net-filter</code> (for
firewall) to enforce network performance and security policies
specified by the operators. The policies, based on the net_cls
handles, can be specified by the operators through user-space tools
such as
<a href="http://tldp.org/HOWTO/Traffic-Control-HOWTO/software.html#s-iproute2-tc">tc</a>
and <a href="http://linux.die.net/man/8/iptables">iptables</a>.</p>
<p>The 32-bit handle associated with a net_cls cgroup can be specified by
writing the handle to the <code>net_cls.classid</code> file, present within the
net_cls cgroup. The 32-bit handle is of the form <code>0xAAAABBBB</code>, and
consists of a 16-bit primary handle 0xAAAA and a 16-bit secondary
handle 0xBBBB. You can read more about the use cases for the primary
and secondary handles in the <a href="https://www.kernel.org/doc/Documentation/cgroup-v1/net_cls.txt">Linux kernel documentation for
net_cls</a>.</p>
<p>By default, the cgroups/net_cls isolator does not manage the net_cls
handles, and assumes the operator is going to manage/assign these
handles. To enable the management of net_cls handles by the
cgroups/net_cls isolator you need to specify a 16-bit primary handle,
of the form 0xAAAA, using the <code>--cgroups_net_cls_primary_handle</code> flag at
agent startup.</p>
<p>Once a primary handle has been specified for an agent, for each
container the cgroups/net_cls isolator allocates a 16-bit secondary
handle. It then assigns the 32-bit combination of the primary and
secondary handle to the net_cls cgroup associated with the container
by writing to <code>net_cls.classid</code>. The cgroups/net_cls isolator exposes
the assigned net_cls handle to operators by exposing the handle as
part of the <code>ContainerStatus</code> &mdash;associated with any task running within
the container&mdash; in the agent&rsquo;s <a href="/documentation/latest/./endpoints/slave/state/">/state</a> endpoint.</p>
<h3>The <code>docker/volume</code> Isolator</h3>
<p>This is described in a <a href="/documentation/latest/./docker-volume/">separate document</a>.</p>
<h3>The <code>namespaces/ipc</code> Isolator</h3>
<p>The IPC Namespace isolator can be used on Linux to place tasks
in a distinct IPC namespace. The benefit of this is that any
<a href="http://man7.org/linux/man-pages/man7/svipc.7.html">IPC objects</a> created
in the container will be automatically removed when the container is
destroyed.</p>
<h3>The <code>network/cni</code> Isolator</h3>
<p>This is described in a <a href="/documentation/latest/./cni/">separate document</a>.</p>
<h3>The <code>linux/capabilities</code> Isolator</h3>
<p>This is described in a <a href="/documentation/latest/./linux_capabilities/">separate document</a>.</p>
<h3>The <code>posix/rlimits</code> Isolator</h3>
<p>This is described in a <a href="/documentation/latest/./posix_rlimits/">separate document</a>.</p>
</div>
</div>
</div><!-- /.container -->
</div><!-- /.content -->
<hr>
<!-- footer -->
<div class="footer">
<div class="container">
<div class="col-md-4 social-blk">
<span class="social">
<a href="https://twitter.com/ApacheMesos"
class="twitter-follow-button"
data-show-count="false" data-size="large">Follow @ApacheMesos</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
<a href="https://twitter.com/intent/tweet?button_hashtag=mesos"
class="twitter-hashtag-button"
data-size="large"
data-related="ApacheMesos">Tweet #mesos</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
</span>
</div>
<div class="col-md-8 trademark">
<p>&copy; 2012-2017 <a href="http://apache.org">The Apache Software Foundation</a>.
Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.
<p>
</div>
</div><!-- /.container -->
</div><!-- /.footer -->
<!-- JS -->
<script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
</body>
</html>