| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <title>Apache Mesos - Maintenance Primitives</title> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <meta property="og:locale" content="en_US"/> |
| <meta property="og:type" content="website"/> |
| <meta property="og:title" content="Apache Mesos"/> |
| <meta property="og:site_name" content="Apache Mesos"/> |
| <meta property="og:url" content="http://mesos.apache.org/"/> |
| <meta property="og:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/> |
| <meta property="og:description" |
| content="Apache Mesos abstracts resources away from machines, |
| enabling fault-tolerant and elastic distributed systems |
| to easily be built and run effectively."/> |
| |
| <meta name="twitter:card" content="summary"/> |
| <meta name="twitter:site" content="@ApacheMesos"/> |
| <meta name="twitter:title" content="Apache Mesos"/> |
| <meta name="twitter:image" content="http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png"/> |
| <meta name="twitter:description" |
| content="Apache Mesos abstracts resources away from machines, |
| enabling fault-tolerant and elastic distributed systems |
| to easily be built and run effectively."/> |
| |
| <link href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet"> |
| <link rel="alternate" type="application/atom+xml" title="Apache Mesos Blog" href="/blog/feed.xml"> |
| <link href="../../assets/css/main.css" media="screen" rel="stylesheet" type="text/css" /> |
| |
| |
| |
| <!-- Google Analytics Magic --> |
| <script type="text/javascript"> |
| var _gaq = _gaq || []; |
| _gaq.push(['_setAccount', 'UA-20226872-1']); |
| _gaq.push(['_setDomainName', 'apache.org']); |
| _gaq.push(['_trackPageview']); |
| |
| (function() { |
| var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; |
| ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; |
| var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); |
| })(); |
| </script> |
| |
| </head> |
| <body> |
| <!-- magical breadcrumbs --> |
| <div class="topnav"> |
| <div class="container"> |
| <ul class="breadcrumb"> |
| <li> |
| <div class="dropdown"> |
| <a data-toggle="dropdown" href="#">Apache Software Foundation <span class="caret"></span></a> |
| <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel"> |
| <li><a href="http://www.apache.org">Apache Homepage</a></li> |
| <li><a href="http://www.apache.org/licenses/">License</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li><a href="http://www.apache.org/security/">Security</a></li> |
| </ul> |
| </div> |
| </li> |
| |
| <li><a href="http://mesos.apache.org">Apache Mesos</a></li> |
| |
| |
| <li><a href="/documentation |
| /">Documentation |
| </a></li> |
| |
| |
| </ul><!-- /.breadcrumb --> |
| </div><!-- /.container --> |
| </div><!-- /.topnav --> |
| |
| <!-- navbar excitement --> |
| <div class="navbar navbar-default navbar-static-top" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#mesos-menu" aria-expanded="false"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a class="navbar-brand" href="/"><img src="/assets/img/mesos_logo.png" alt="Apache Mesos logo"/></a> |
| </div><!-- /.navbar-header --> |
| |
| <div class="navbar-collapse collapse" id="mesos-menu"> |
| <ul class="nav navbar-nav navbar-right"> |
| <li><a href="/gettingstarted/">Getting Started</a></li> |
| <li><a href="/blog/">Blog</a></li> |
| <li><a href="/documentation/latest/">Documentation</a></li> |
| <li><a href="/downloads/">Downloads</a></li> |
| <li><a href="/community/">Community</a></li> |
| </ul> |
| </div><!-- /#mesos-menu --> |
| </div><!-- /.container --> |
| </div><!-- /.navbar --> |
| |
| <div class="content"> |
| <div class="container"> |
| <div class="row-fluid"> |
| <div class="col-md-4"> |
| <h4>If you're new to Mesos</h4> |
| <p>See the <a href="/gettingstarted/">getting started</a> page for more |
| information about downloading, building, and deploying Mesos.</p> |
| |
| <h4>If you'd like to get involved or you're looking for support</h4> |
| <p>See our <a href="/community/">community</a> page for more details.</p> |
| </div> |
| <div class="col-md-8"> |
| <h1>Maintenance Primitives</h1> |
| |
| <p>Operators regularly need to perform maintenance tasks on machines that comprise |
| a Mesos cluster. Most Mesos upgrades can be done without affecting running |
| tasks, but there are situations where maintenance may affect running tasks. |
| For example:</p> |
| |
| <ul> |
| <li>Hardware repair</li> |
| <li>Kernel upgrades</li> |
| <li>Agent upgrades (e.g., adjusting agent attributes or resources)</li> |
| </ul> |
| |
| |
| <p>Frameworks require visibility into any actions that disrupt cluster operation |
| in order to meet Service Level Agreements or to ensure uninterrupted services |
| for their end users. Therefore, to reconcile the requirements of frameworks |
| and operators, frameworks must be aware of planned maintenance events and |
| operators must be aware of frameworks' ability to adapt to maintenance. |
| Maintenance primitives add a layer to facilitate communication between the |
| frameworks and operator.</p> |
| |
| <h2>Terminology</h2> |
| |
| <p>For the purpose of this document, an “Operator” is a person, tool, or script |
| that manages a Mesos cluster.</p> |
| |
| <p>Maintenance primitives add several new concepts to Mesos. Those concepts are:</p> |
| |
| <ul> |
| <li><strong>Maintenance</strong>: An operation that makes resources on a machine unavailable, |
| either temporarily or permanently.</li> |
| <li><strong>Maintenance window</strong>: A set of machines and an associated time interval during |
| which some maintenance is planned on those machines.</li> |
| <li><strong>Maintenance schedule</strong>: A list of maintenance windows. |
| A single machine may only appear in a schedule once.</li> |
| <li><strong>Unavailability</strong>: An operator-specified interval, defined by a start time |
| and duration, during which an associated machine may become unavailable. |
| In general, no assumptions should be made about the availability of the |
| machine (or resources) after the unavailability.</li> |
| <li><strong>Drain</strong>: An interval between the scheduling of maintenance and when the |
| machine(s) become unavailable. Offers sent with resources from draining |
| machines will contain unavailability information. Frameworks running on |
| draining machines will receive inverse offers (see next). Frameworks |
| utilizing resources on affected machines are expected either to take |
| preemptive steps to prepare for the unavailability; or to communicate the |
| framework’s inability to conform to the maintenance schedule.</li> |
| <li><strong>Inverse offer</strong>: A communication mechanism for the master to ask for |
| resources back from a framework. This notifies frameworks about any |
| unavailability and gives frameworks a mechanism to respond about their |
| ability to comply. Inverse offers are similar to offers in that they |
| can be accepted, declined, re-offered, and rescinded.</li> |
| </ul> |
| |
| |
| <p><strong>Note</strong>: Unavailability and inverse offers are not specific to maintenance. |
| The same concepts can be used for non-maintenance goals, such as reallocating |
| resources or resource preemption.</p> |
| |
| <h2>How does it work?</h2> |
| |
| <p>Maintenance primitives were introduced in Mesos 0.25.0. Several machine |
| maintenance modes were also introduced. Those modes are illustrated below.</p> |
| |
| <p><img src="/assets/img/documentation/maintenance-primitives-modes.png" alt="Maintenance mode transitions" /></p> |
| |
| <p>All mode transitions must be initiated by the operator. Mesos will not |
| change the mode of any machine, regardless of the estimate provided in |
| the maintenance schedule.</p> |
| |
| <h3>Scheduling maintenance</h3> |
| |
| <p>A machine is transitioned from Up mode to Draining mode as soon as it is |
| scheduled for maintenance. To transition a machine into Draining mode, an |
| operator constructs a maintenance schedule as a JSON document and posts it to |
| the <a href="/documentation/latest/./endpoints/master/maintenance/schedule/">/maintenance/schedule</a> HTTP |
| endpoint on the Mesos master. Each Mesos cluster has a single maintenance |
| schedule; posting a new schedule replaces the previous schedule, if any.</p> |
| |
| <p>See the definition of a <a href="https://github.com/apache/mesos/blob/016b02d7ed5a65bcad9261a133c8237c2df66e6e/include/mesos/maintenance/maintenance.proto#L48-L67">maintenance::Schedule</a> |
| and of <a href="https://github.com/apache/mesos/blob/016b02d7ed5a65bcad9261a133c8237c2df66e6e/include/mesos/v1/mesos.proto#L140-L154">Unavailability</a>.</p> |
| |
| <p>In a production environment, the schedule should be constructed to ensure that |
| enough agents are operational at any given point in time to ensure |
| uninterrupted service by the frameworks.</p> |
| |
| <p>For example, in a cluster of three machines, the operator might schedule two |
| machines for one hour of maintenance, followed by another hour for the last |
| machine. The timestamps for unavailability are expressed in nanoseconds since |
| the Unix epoch (note that making reliable use of maintenance primitives requires |
| that the system clocks of all machines in the cluster are roughly synchronized).</p> |
| |
| <p>The schedule might look like:</p> |
| |
| <pre><code>{ |
| "windows" : [ |
| { |
| "machine_ids" : [ |
| { "hostname" : "machine1", "ip" : "10.0.0.1" }, |
| { "hostname" : "machine2", "ip" : "10.0.0.2" } |
| ], |
| "unavailability" : { |
| "start" : { "nanoseconds" : 1443830400000000000 }, |
| "duration" : { "nanoseconds" : 3600000000000 } |
| } |
| }, { |
| "machine_ids" : [ |
| { "hostname" : "machine3", "ip" : "10.0.0.3" } |
| ], |
| "unavailability" : { |
| "start" : { "nanoseconds" : 1443834000000000000 }, |
| "duration" : { "nanoseconds" : 3600000000000 } |
| } |
| } |
| ] |
| } |
| </code></pre> |
| |
| <p>The operator can then post the schedule to the master’s |
| <a href="/documentation/latest/./endpoints/master/maintenance/schedule/">/maintenance/schedule</a> endpoint:</p> |
| |
| <pre><code>curl http://localhost:5050/maintenance/schedule \ |
| -H "Content-type: application/json" \ |
| -X POST \ |
| -d @schedule.json |
| </code></pre> |
| |
| <p>The machines in a maintenance schedule do not need to be registered with the |
| Mesos master at the time when the schedule is set. The operator may add a |
| machine to the maintenance schedule prior to launching an agent on the machine. |
| For example, this can be useful to prevent a faulty machine from launching an |
| agent on boot.</p> |
| |
| <p><strong>Note</strong>: Each machine in the maintenance schedule should have as |
| complete information as possible. In order for Mesos to recognize an agent |
| as coming from a particular machine, both the <code>hostname</code> and <code>ip</code> fields must |
| match. Any omitted data defaults to the empty string <code>""</code>. If there are |
| multiple hostnames or IPs for a machine, the machine’s fields need to match |
| what the agent announces to the master. If there is any ambiguity in a |
| machine’s configuration, the operator should use the <code>--hostname</code> and <code>--ip</code> |
| options when starting agents.</p> |
| |
| <p>The master checks that a maintenance schedule has the following properties:</p> |
| |
| <ul> |
| <li>Each maintenance window in the schedule must have at least one machine |
| and a specified unavailability interval.</li> |
| <li>Each machine must only appear in the schedule once.</li> |
| <li>Each machine must have at least a hostname or IP included. |
| The hostname is not case-sensitive.</li> |
| <li>All machines that are in Down mode must be present in the schedule. |
| This is required because this endpoint does not handle the transition |
| from Down mode to Up mode.</li> |
| </ul> |
| |
| |
| <p>If any of these properties are not met, the maintenance schedule is rejected |
| with a corresponding error message and the master’s state is not changed.</p> |
| |
| <p>To update the maintenance schedule, the operator should first read the current |
| schedule, make any necessary changes, and then post the modified schedule. The |
| current maintenance schedule can be obtained by sending a GET request to the |
| master’s <code>/maintenance/schedule</code> endpoint.</p> |
| |
| <p>To cancel the maintenance schedule, the operator should post an empty schedule.</p> |
| |
| <h3>Draining mode</h3> |
| |
| <p>As soon as a schedule is posted to the Mesos master, the following things occur:</p> |
| |
| <ul> |
| <li>The schedule is stored in the <a href="/documentation/latest/./replicated-log-internals/">replicated log</a>. |
| This means the schedule is persisted in case of master failover.</li> |
| <li>All machines in the schedule are immediately transitioned into Draining |
| mode. The mode of each machine is also persisted in the replicated log.</li> |
| <li>All frameworks using resources on affected agents are immediately |
| notified. Existing offers from the affected agents are rescinded |
| and re-sent with additional unavailability data. All frameworks using |
| resources from the affected agents are given inverse offers.</li> |
| <li>New offers from the affected agents will also include |
| the additional unavailability data.</li> |
| </ul> |
| |
| |
| <p>Frameworks should use this additional information to schedule tasks in a |
| maintenance-aware fashion. Exactly how to do this depends on the design |
| requirements of each scheduler, but tasks should typically be scheduled in a way |
| that maximizes utilization but that also attempts to vacate machines before that |
| machine’s advertised unavailability period occurs. A scheduler might choose to |
| place long-running tasks on machines with no unavailability, or failing that, on |
| machines whose unavailability is the furthest away.</p> |
| |
| <p>How a framework responds to an inverse offer indicates its ability to conform to |
| the maintenance schedule. Accepting an inverse offer communicates that the |
| framework is okay with the current maintenance schedule, given the current state |
| of the framework’s resources. The master and operator should interpret |
| acceptance as a best-effort promise by the framework to free all the resources |
| contained in the inverse offer before the start of the unavailability |
| interval. Declining an inverse offer is an advisory notice to the operator that |
| the framework is unable or unlikely to meet to the maintenance schedule.</p> |
| |
| <p>For example:</p> |
| |
| <ul> |
| <li>A data store may choose to start a new replica if one of its agents is |
| scheduled for maintenance. The data store should accept an inverse offer if it |
| can reasonably copy the data on the machine to a new host before the |
| unavailability interval described in the inverse offer begins. Otherwise, the |
| data store should decline the offer.</li> |
| <li>A stateful task on an agent with an impending unavailability may be migrated |
| to another available agent. If the framework has sufficient resources to do |
| so, it would accept any inverse offers. Otherwise, it would decline them.</li> |
| </ul> |
| |
| |
| <p>A framework can use a filter to control when it wants to be contacted again |
| with an inverse offer. This is useful since future circumstances may change |
| the viability of the maintenance schedule. The filter for inverse offers is |
| identical to the existing mechanism for re-offering offers to frameworks.</p> |
| |
| <p><strong>Note</strong>: Accepting or declining an inverse offer does not result in |
| immediate changes in the maintenance schedule or in the way Mesos acts. |
| Inverse offers only represent extra information that frameworks may |
| find useful. In the same manner, rejecting or accepting an inverse offer is a |
| hint for an operator. The operator may or may not choose to take that hint |
| into account.</p> |
| |
| <h3>Starting maintenance</h3> |
| |
| <p>The operator starts maintenance by posting a list of machines to the |
| <a href="/documentation/latest/./endpoints/master/machine/down/">/machine/down</a> HTTP endpoint. The list of |
| machines is specified in JSON format; each element of the list is a |
| <a href="https://github.com/apache/mesos/blob/016b02d7ed5a65bcad9261a133c8237c2df66e6e/include/mesos/v1/mesos.proto#L157-L167">MachineID</a>.</p> |
| |
| <p>For example, to start maintenance on two machines:</p> |
| |
| <pre><code>[ |
| { "hostname" : "machine1", "ip" : "10.0.0.1" }, |
| { "hostname" : "machine2", "ip" : "10.0.0.2" } |
| ] |
| </code></pre> |
| |
| <pre><code>curl http://localhost:5050/machine/down \ |
| -H "Content-type: application/json" \ |
| -X POST \ |
| -d @machines.json |
| </code></pre> |
| |
| <p>The master checks that a list of machines has the following properties:</p> |
| |
| <ul> |
| <li>The list of machines must not be empty.</li> |
| <li>Each machine must only appear once.</li> |
| <li>Each machine must have at least a hostname or IP included. |
| The hostname is not case-sensitive.</li> |
| <li>If a machine’s IP is included, it must be correctly formed.</li> |
| <li>All listed machines must be present in the schedule.</li> |
| </ul> |
| |
| |
| <p>If any of these properties are not met, the operation is rejected with a |
| corresponding error message and the master’s state is not changed.</p> |
| |
| <p>The operator can start maintenance on any machine that is scheduled for |
| maintenance. Machines that are not scheduled for maintenance cannot be directly |
| transitioned from Up mode into Down mode. However, the operator may schedule a |
| machine for maintenance with a timestamp equal to the current time or in the |
| past, and then immediately start maintenance on that machine.</p> |
| |
| <p>This endpoint can be used to start maintenance on machines that are not |
| currently registered with the Mesos master. This can be useful if a machine has |
| failed and the operator intends to remove it from the cluster; starting |
| maintenance on the machine prevents the machine from being accidentally rebooted |
| and rejoining the Mesos cluster.</p> |
| |
| <p>The operator must explicitly transition a machine from Draining to Deactived |
| mode. That is, Mesos will keep a machine in Draining mode even if the |
| unavailability window arrives or passes. This means that the operation of the |
| machine is not disrupted in any way and offers (with unavailability information) |
| are still sent for this machine.</p> |
| |
| <p>When maintenance is triggered by the operator, all agents on the machine are |
| told to shutdown. These agents are removed from the master, which means that a |
| <code>TASK_LOST</code> status update will be sent for every task running on each of those |
| agents. The scheduler driver’s <code>slaveLost</code> callback will also be invoked for |
| each of the removed agents. Any agents on machines in maintenance are also |
| prevented from re-registering with the master in the future (until maintenance |
| is completed and the machine is brought back up).</p> |
| |
| <h3>Completing maintenance</h3> |
| |
| <p>When maintenance is complete or if maintenance needs to be cancelled, |
| the operator can stop maintenance. The process is very similar |
| to starting maintenance (same validation criteria as the previous section). |
| The operator posts a list of machines to the master’s <a href="/documentation/latest/./endpoints/master/machine/up/">/machine/up</a> endpoint:</p> |
| |
| <pre><code>[ |
| { "hostname" : "machine1", "ip" : "10.0.0.1" }, |
| { "hostname" : "machine2", "ip" : "10.0.0.2" } |
| ] |
| </code></pre> |
| |
| <pre><code>curl http://localhost:5050/machine/up \ |
| -H "Content-type: application/json" \ |
| -X POST \ |
| -d @machines.json |
| </code></pre> |
| |
| <p><strong>Note</strong>: The duration of the maintenance window, as indicated by the |
| “unavailability” field in the maintenance schedule, is a best-effort guess made |
| by the operator. Stopping maintenance before the end of the unavailability |
| interval is allowed, as is stopping maintenance after the end of the |
| unavailability interval. Machines are never automatically transitioned out of |
| maintenance.</p> |
| |
| <p>Frameworks are informed about the completion or cancellation of maintenance when |
| offers from that machine start being sent. There is no explicit mechanism for |
| notifying frameworks when maintenance has finished. After maintenance has |
| finished, new offers are no longer tagged with unavailability and inverse offers |
| are no longer sent. Also, agents running on the machine will be allowed to |
| register with the Mesos master.</p> |
| |
| <h3>Viewing maintenance status</h3> |
| |
| <p>The current maintenance status (Up, Draining, or Down) of each machine in the |
| cluster can be viewed by accessing the master’s |
| <a href="/documentation/latest/./endpoints/master/maintenance/status/">/maintenance/status</a> HTTP endpoint. For |
| each machine that is Draining, this endpoint also includes the frameworks' responses to |
| inverse offers for resources on that machine. For more information, see the |
| format of the <a href="https://github.com/apache/mesos/blob/fa36917dd142f66924c5f7ed689b87d5ceabbf79/include/mesos/maintenance/maintenance.proto#L73-L84">ClusterStatus message</a>.</p> |
| |
| <blockquote><p>NOTE: The format of the data returned by this endpoint may change in a |
| future release of Mesos.</p></blockquote> |
| |
| </div> |
| </div> |
| |
| </div><!-- /.container --> |
| </div><!-- /.content --> |
| |
| <hr> |
| |
| |
| |
| <!-- footer --> |
| <div class="footer"> |
| <div class="container"> |
| <div class="col-md-4 social-blk"> |
| <span class="social"> |
| <a href="https://twitter.com/ApacheMesos" |
| class="twitter-follow-button" |
| data-show-count="false" data-size="large">Follow @ApacheMesos</a> |
| <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> |
| <a href="https://twitter.com/intent/tweet?button_hashtag=mesos" |
| class="twitter-hashtag-button" |
| data-size="large" |
| data-related="ApacheMesos">Tweet #mesos</a> |
| <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script> |
| </span> |
| </div> |
| |
| <div class="col-md-8 trademark"> |
| <p>© 2012-2017 <a href="http://apache.org">The Apache Software Foundation</a>. |
| Apache Mesos, the Apache feather logo, and the Apache Mesos project logo are trademarks of The Apache Software Foundation. |
| <p> |
| </div> |
| </div><!-- /.container --> |
| </div><!-- /.footer --> |
| |
| <!-- JS --> |
| <script src="//code.jquery.com/jquery-1.11.0.min.js" type="text/javascript"></script> |
| <script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js" type="text/javascript"></script> |
| </body> |
| </html> |