blob: 3e23188f4f3a56333c9fb4088fe12ff8cdd8f335 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Apache Mesos - Supporting Container Images in Mesos Containerizer</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="og:locale" content="en_US"/>
<meta property="og:type" content="website"/>
<meta property="og:title" content="Apache Mesos"/>
<meta property="og:site_name" content="Apache Mesos"/>
<meta property="og:url" content="http://mesos.apache.org/"/>
<meta property="og:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
<meta property="og:description"
content="Apache Mesos abstracts resources away from machines,
enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively."/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:site" content="@ApacheMesos"/>
<meta name="twitter:title" content="Apache Mesos"/>
<meta name="twitter:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/>
<meta name="twitter:description"
content="Apache Mesos abstracts resources away from machines,
enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively."/>
<link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
<link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml">
<link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" />
<!-- Google Analytics Magic -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-20226872-1']);
_gaq.push(['_setDomainName', 'apache.org']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<!-- magical breadcrumbs -->
<div class="topnav">
<div class="container">
<ul class="breadcrumb">
<li>
<div class="dropdown">
<a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu" aria-labelledby="dLabel">
<li><a href="http://www.apache.org">Apache Homepage</a></li>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
</ul>
</div>
</li>
<li><a href="http://mesos.apache.org">Apache Mesos</a></li>
<li><a href="/documentation
/">Documentation
</a></li>
</ul><!-- /.breadcrumb -->
</div><!-- /.container -->
</div><!-- /.topnav -->
<!-- navbar excitement -->
<div class="navbar navbar-default navbar-static-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo"/></a>
</div><!-- /.navbar-header -->
<div class="navbar-collapse collapse" id="mesos-menu">
<ul class="nav navbar-nav navbar-right">
<li><a href="/gettingstarted/">Getting Started</a></li>
<li><a href="/blog/">Blog</a></li>
<li><a href="/documentation/latest/">Documentation</a></li>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/community/">Community</a></li>
</ul>
</div><!-- /#mesos-menu -->
</div><!-- /.container -->
</div><!-- /.navbar -->
<div class="content">
<div class="container">
<div class="row-fluid">
<div class="col-md-4">
<h4>If you're new to Mesos</h4>
<p>See the <a href="/gettingstarted/">getting started</a> page for more
information about downloading, building, and deploying Mesos.</p>
<h4>If you'd like to get involved or you're looking for support</h4>
<p>See our <a href="/community/">community</a> page for more details.</p>
</div>
<div class="col-md-8">
<h1>Supporting Container Images in <a href="/documentation/latest/./mesos-containerizer/">Mesos Containerizer</a></h1>
<h2>Motivation</h2>
<p>Mesos currently supports several <a href="/documentation/latest/./containerizer/">containerizers</a>,
notably the Mesos containerizer and the Docker containerizer. Mesos
containerizer uses native OS features directly to provide isolation
between containers, while Docker containerizer delegates container
management to the Docker engine.</p>
<p>Maintaining two containerizers is hard. For instance, when we add new
features to Mesos (e.g., persistent volumes, disk isolation), it
becomes a burden to update both containerizers. Even worse, sometimes
the isolation on some resources (e.g., network handles on an agent)
requires coordination between two containerizers, which is very hard
to implement in practice. In addition, we found that extending and
customizing isolation for containers launched by Docker engine is
difficult, mainly because we do not have a way to inject logics during
the life cycle of a container.</p>
<p>Therefore, we made an effort to unify containerizers in Mesos
(<a href="https://issues.apache.org/jira/browse/MESOS-2840">MESOS-2840</a>,
a.k.a. the Universal Containerizer). We improved Mesos containerizer
so that it now supports launching containers that specify container
images (e.g., Docker/Appc images).</p>
<h2>Getting Started</h2>
<p>To support container images, we introduced a new component in Mesos
containerizer, called image provisioner. Image provisioner is
responsible for pulling, caching and preparing container root
filesystems. It also extracts runtime configurations from container
images which will then be passed to the corresponding isolators for
proper isolation.</p>
<p>There are a few container image specifications, notably
<a href="https://github.com/docker/docker/blob/master/image/spec/v1.md">Docker</a>,
<a href="https://github.com/appc/spec/blob/master/SPEC.md">Appc</a>, and
<a href="https://github.com/opencontainers/specs">OCI</a> (future). Currently, we
support Docker and Appc images. More details about what features are
supported or not can be found in the following sections.</p>
<p><strong>NOTE</strong>: container image is only supported on Linux currently.</p>
<h3>Configure the agent</h3>
<p>To enable container image support in Mesos containerizer, the operator
will need to specify the <code>--image_providers</code> agent flag which tells
Mesos containerizer what types of container images are allowed. For
example, setting <code>--image_providers=docker</code> allow containers to use
Docker images. The operators can also specify multiple container image
types. For instance, <code>--image_providers=docker,appc</code> allows both
Docker and Appc container images.</p>
<p>A few isolators need to be turned on in order to provide proper
isolation according to the runtime configurations specified in the
container image. The operator needs to add the following isolators to
the <code>--isolation</code> flag.</p>
<ul>
<li><p><code>filesystem/linux</code>: This is needed because supporting container
images involves changing filesystem root, and only <code>filesystem/linux</code>
support that currently. Note that this isolator requires root
permission.</p></li>
<li><p><code>docker/runtime</code>: This is used to provide support for runtime
configurations specified in Docker images (e.g., Entrypoint/Cmd,
environment variables, etc.). See more details about this isolator in
<a href="/documentation/latest/./mesos-containerizer/">Mesos containerizer doc</a>. Note that if this
isolator is not specified and <code>--image_providers</code> contains <code>docker</code>,
the agent will refuse to start.</p></li>
</ul>
<p>In summary, to enable container image support in Mesos containerizer,
please specify the following agent flags:</p>
<pre><code>$ sudo mesos-agent \
--containerizers=mesos \
--image_providers=appc,docker \
--isolation=filesystem/linux,docker/runtime
</code></pre>
<h3>Framework API</h3>
<p>We introduced a new protobuf message <code>Image</code> which allow frameworks to
specify container images for their containers. It has two types right
now: <code>APPC</code> and <code>DOCKER</code>, representing Appc and Docker images
respectively.</p>
<p>For Appc images, the <code>name</code> and <code>labels</code> are what described in the
<a href="https://github.com/appc/spec/blob/master/spec/aci.md#image-manifest-schema">spec</a>.</p>
<p>For Docker images, the <code>name</code> is the Docker image reference in the
following form (the same format expected by <code>docker pull</code>):
<code>[REGISTRY_HOST[:REGISTRY_PORT]/]REPOSITORY[:TAG|@DIGEST]</code></p>
<pre><code>message Image {
enum Type {
APPC = 1;
DOCKER = 2;
}
message Appc {
required string name = 1;
optional Labels labels = 3;
}
message Docker {
required string name = 1;
}
required Type type = 1;
// Only one of the following image messages should be set to match
// the type.
optional Appc appc = 2;
optional Docker docker = 3;
}
</code></pre>
<p>The framework needs to specify <code>MesosInfo</code> in <code>ContainerInfo</code> in order
to launch containers with container images. In other words, the
framework needs to set the type to <code>ContainerInfo.MESOS</code>, indicating
that it wants to use the Mesos containerizer. If <code>MesosInfo.image</code> is
not specified, the container will use the host filesystem. If
<code>MesosInfo.image</code> is specified, it will be used as the container
image when launching the container.</p>
<pre><code>message ContainerInfo {
enum Type {
DOCKER = 1;
MESOS = 2;
}
message MesosInfo {
optional Image image = 1;
}
required Type type = 1;
optional MesosInfo mesos = 5;
}
</code></pre>
<h3>Test it out!</h3>
<p>First, start the Mesos master:</p>
<pre><code>$ sudo sbin/mesos-master --work_dir=/tmp/mesos/master
</code></pre>
<p>Then, start the Mesos agent:</p>
<pre><code>$ sudo GLOG_v=1 sbin/mesos-agent \
--master=&lt;MASTER_IP&gt;:5050 \
--isolation=docker/runtime,filesystem/linux \
--work_dir=/tmp/mesos/agent \
--image_providers=docker \
--executor_environment_variables="{}"
</code></pre>
<p>Now, use Mesos CLI (i.e., mesos-execute) to launch a Docker container
(e.g., redis). Note that <code>--shell=false</code> tells Mesos to use the
default entrypoint and cmd specified in the Docker image.</p>
<pre><code>$ sudo bin/mesos-execute \
--master=&lt;MASTER_IP&gt;:5050 \
--name=test \
--docker_image=library/redis \
--shell=false
</code></pre>
<p>Verify if your container is running by launching a redis client:</p>
<pre><code>$ sudo docker run -ti --net=host redis redis-cli
127.0.0.1:6379&gt; ping
PONG
127.0.0.1:6379&gt;
</code></pre>
<h2>Docker Support and Current Limitations</h2>
<p>Image provisioner uses <a href="https://docs.docker.com/registry/spec/api/">Docker v2 registry
API</a> to fetch Docker
images/layers. The fetching is based on <code>curl</code>, therefore SSL is
automatically handled. For private registries, the operator needs to
configure <code>curl</code> accordingly so that it knows where to find the
additional certificate files.</p>
<p>Fetching requiring authentication is supported through the
<code>--docker_config</code> agent flag. Starting from 1.0, operators can use
this agent flag to specify a shared docker config file, which is
used for pulling private repositories with authentication. Per
container credential is not supported yet (coming soon).</p>
<p>Operators can either specify the flag as an absolute path pointing to
the docker config file (need to manually configure
<code>.docker/config.json</code> or <code>.dockercfg</code> on each agent), or specify the
flag as a JSON-formatted string. See <a href="/documentation/latest/./configuration/">configuration
documentation</a> for detail. For example:</p>
<pre><code>--docker_config=file:///home/vagrant/.docker/config.json
</code></pre>
<p>or as a JSON object,</p>
<pre><code>--docker_config="{ \
\"auths\": { \
\"https://index.docker.io/v1/\": { \
\"auth\": \"xXxXxXxXxXx=\", \
\"email\": \"username@example.com\" \
} \
} \
}"
</code></pre>
<p>Private registry is supported either through the <code>--docker_registry</code>
agent flag, or specifying private registry for each container using
image name <code>&lt;REGISTRY&gt;/&lt;REPOSITORY&gt;</code> (e.g.,
<code>localhost:80/gilbert/inky:latest</code>). If <code>&lt;REGISTRY&gt;</code> is included as
a prefix in the image name, the registry specified through the agent
flag <code>--docker_registry</code> will be ignored.</p>
<p>If the <code>--docker_registry</code> agent flag points to a local directory
(e.g., <code>/tmp/mesos/images/docker</code>), the provisioner will pull Docker
images from local filesystem, assuming Docker archives (result of
<code>docker save</code>) are stored there based on the image name and tag. For
example, the operator can put a <code>busybox:latest.tar</code> (the result of
<code>docker save -o busybox:latest.tar busybox</code>) under
<code>/tmp/mesos/images/docker</code> and launch the agent by specifying
<code>--docker_registry=/tmp/mesos/images/docker</code>. Then the framework can
launch a Docker container by specifying <code>busybox:latest</code> as the name
of the Docker image.</p>
<p>If the <code>--switch_user</code> flag is set on the agent and the framework
specifies a user (either <code>CommandInfo.user</code> or <code>FrameworkInfo.user</code>),
we expect that user exists in the container image and its uid and gids
matches that on the host. User namespace and capabilities are not
supported yet.</p>
<p>Only host network is supported. We will add bridge network support
soon using CNI support in Mesos
(<a href="https://issues.apache.org/jira/browse/MESOS-4641">MESOS-4641</a>).</p>
<h3>More agent flags</h3>
<p><code>--docker_registry</code>: The default URL for pulling Docker images. It
could either be a Docker registry server URL (i.e:
<code>https://registry.docker.io</code>), or a local path (i.e:
<code>/tmp/docker/images</code>) in which Docker image archives (result of
<code>docker save</code>) are stored. The default value is
<code>https://registry-1.docker.io</code>.</p>
<p><code>--docker_store_dir</code>: Directory the Docker provisioner will store
images in. All the Docker images are cached under this directory. The
default value is <code>/tmp/mesos/store/docker</code>.</p>
<p><code>--docker_config</code>: The default docker config file for agent. Can
be provided either as an absolute path pointing to the agent local
docker config file, or as a JSON-formatted string. The format of
the docker config file should be identical to docker&rsquo;s default one
(e.g., either <code>$HOME/.docker/config.json</code> or <code>$HOME/.dockercfg</code>).</p>
<h2>Appc Support and Current Limitations</h2>
<p>Currently, only the root filesystem specified in the Appc image is
supported. Other runtime configurations like environment variables,
exec, working directory are not supported yet (coming soon).</p>
<p>For image discovery, we current support a simple discovery mechanism.
We allow operators to specify a URI prefix which will be prepend to
the URI template <code>{name}-{version}-{os}-{arch}.{ext}</code>. For example, if
the URI prefix is <code>file:///tmp/appc/</code> and the Appc image name is
<code>example.com/reduce-worker</code> with <code>version:1.0.0</code>, we will fetch the
image at <code>file:///tmp/appc/example.com/reduce-worker-1.0.0.aci</code>.</p>
<h3>More agent flags</h3>
<p><code>appc_simple_discovery_uri_prefix</code>: URI prefix to be used for simple
discovery of appc images, e.g., <code>http://</code>, <code>https://</code>,
<code>hdfs://&lt;hostname&gt;:9000/user/abc/cde</code>. The default value is <code>http://</code>.</p>
<p><code>appc_store_dir</code>: Directory the appc provisioner will store images in.
All the Appc images are cached under this directory. The default value
is <code>/tmp/mesos/store/appc</code>.</p>
<h2>Provisioner Backends</h2>
<p>A provisioner backend takes a set of filesystem layers and stacks them
into a root filesystem. Currently, we support the following backends:
<code>copy</code>, <code>bind</code>, <code>overlay</code> and <code>aufs</code>. Mesos will validate if the
selected backend works with the underlying filesystem (the filesystem
used by the image store <code>--docker_store_dir</code> or <code>--appc_store_dir</code>)
using the following logic table:</p>
<pre><code>+---------+--------------+------------------------------------------+
| Backend | Suggested on | Disabled on |
+---------+--------------+------------------------------------------+
| aufs | ext4 xfs | btrfs aufs eCryptfs |
| overlay | ext4 xfs | btrfs aufs overlay overlay2 zfs eCryptfs |
| bind | | N/A(`--sandbox_directory' must exist) |
| copy | | N/A |
+---------+--------------+------------------------------------------+
</code></pre>
<p>The provisioner backend can be specified through the agent flag
<code>--provisioner_image_backend</code>. If not set, Mesos will select the best
backend automatically for the users/operators. The selection logic is
as following:</p>
<pre><code>1. Use `overlay` backend if the overlayfs is available.
2. Use `aufs` backend if the aufs is available and overlayfs is not supported.
3. Use `copy` backend if none of above is selected.
</code></pre>
<h3>Copy</h3>
<p>The Copy backend simply copies all the layers into a target root
directory to create a root filesystem.</p>
<h3>Bind</h3>
<p>This is a specialized backend that may be useful for deployments using
large (multi-GB) single-layer images <em>and</em> where more recent kernel
features such as overlayfs are not available. For small images (10&rsquo;s
to 100&rsquo;s of MB) the copy backend may be sufficient. Bind backend is
faster than Copy as it requires nearly zero IO.</p>
<p>The bind backend currently has these two limitations:</p>
<ol>
<li><p>The bind backend supports only a single layer. Multi-layer images will
fail to provision and the container will fail to launch!</p></li>
<li><p>The filesystem is read-only because all containers using this image
share the source. Select writable areas can be achieved by mounting
read-write volumes to places like <code>/tmp</code>, <code>/var/tmp</code>, <code>/home</code>, etc.
using the <code>ContainerInfo</code>. These can be relative to the executor work
directory. Since the filesystem is read-only, <code>--sandbox_directory</code>
and <code>/tmp</code> must already exist within the filesystem because the
filesystem isolator is unable to create it (e.g., either the image
writer needs to create the mount point in the image, or the operator
needs to set agent flag <code>--sandbox_directory</code> properly).</p></li>
</ol>
<h3>Overlay</h3>
<p>The reason overlay backend was introduced is because the copy backend
will waste IO and space while the bind backend can only deal with one
layer. The overlay backend allows containizer to utilize the
filesystem to merge multiple filesystems into one efficiently.</p>
<p>The overlay backend depends on support for multiple lower layers,
which requires Linux kernel version 4.0 or later. For more information
of overlayfs, please refer to
<a href="https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt">here</a>.</p>
<h3>AUFS</h3>
<p>The reason AUFS is introduced is because overlayfs support hasn&rsquo;t been
merged until kernel 3.18 and Docker&rsquo;s default storage backend for
ubuntu 14.04 is AUFS.</p>
<p>Like overlayfs, AUFS is also a unioned file system, which is very
stable, has a lot of real-world deployments, and has strong community
support.</p>
<p>Some Linux distributions do not support AUFS. This is usually because
AUFS is not included in the mainline (upstream) Linux kernel.</p>
<p>For more information of AUFS, please refer to
<a href="http://aufs.sourceforge.net/aufs2/man.html">here</a>.</p>
<h2>Executor Dependencies in a Container Image</h2>
<p>Mesos has this concept of executors. All tasks are launched by an
executor. For a general purpose executor (e.g., thermos) of a
framework (e.g., Aurora), requiring it and all its dependencies to be
present in all possible container images that a user might use is
not trivial.</p>
<p>In order to solve this issue, we propose a solution where we allow the
executor to run on the host filesystem (without a container image).
Instead, it can specify a <code>volume</code> whose source is an <code>Image</code>. Mesos
containerizer will provision the <code>image</code> specified in the <code>volume</code>,
and mount it under the sandbox directory. The executor can perform
<code>pivot_root</code> or <code>chroot</code> itself to enter the container root
filesystem.</p>
<h2>References</h2>
<p>For more information on the Mesos containerizer filesystem, namespace,
and isolator features, visit <a href="/documentation/latest/./mesos-containerizer/">Mesos
Containerizer</a>. For more information on
launching Docker containers through the Docker containerizer, visit
<a href="/documentation/latest/./docker-containerizer/">Docker Containerizer</a>.</p>
</div>
</div>
</div><!-- /.container -->
</div><!-- /.content -->
<hr>
<!-- footer -->
<div class="footer">
<div class="container">
<div class="col-md-4 social-blk">
<span class="social">
<a href="https://twitter.com/ApacheMesos"
class="twitter-follow-button"
data-show-count="false" data-size="large">Follow @ApacheMesos</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
<a href="https://twitter.com/intent/tweet?button_hashtag=mesos"
class="twitter-hashtag-button"
data-size="large"
data-related="ApacheMesos">Tweet #mesos</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
</span>
</div>
<div class="col-md-8 trademark">
<p>&copy; 2012-2017 <a href="http://apache.org">The Apache Software Foundation</a>.
Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation.
<p>
</div>
</div><!-- /.container -->
</div><!-- /.footer -->
<!-- JS -->
<script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script>
</body>
</html>