This document serves as a guide for users who wish to upgrade an existing Mesos cluster. Some versions require particular upgrade techniques when upgrading a running cluster. Some upgrades will have incompatible changes.
This section provides an overview of the changes for each version (in particular when upgrading from the next lower version). For more details please check the respective sections below.
We categorize the changes as follows:
A New feature/behavior C Changed feature/behavior D Deprecated feature/behavior R Removed feature/behavior
The allocator metric named allocator/event_queue_dispatches is now deprecated. The new name is allocator/mesos/event_queue_dispatches to better support metrics for alternative allocator implementations.
Mesos 1.0 deprecates the use of plain text credential files in favor of JSON-formatted credential files.
Mesos 1.0 deprecates SET_QUOTA_WITH_ROLE
and DESTROY_QUOTA_WITH_PRINCIPAL
actions with UPDATE_QUOTA_WITH_ROLE
, as well as the SetQuota
and RemoveQuota
ACLs with UpdateQuota
ACL, to control which principal(s) is authorized to set, remove and (in future releases) update quota for role(s). A new GET_QUOTA_WITH_ROLE
action and get_quotas
ACL are introduced to control which principal(s) can query quota status for given role(s). This affects --acls
flag for local authorizer in the following way:
update_quotas
and any of set_quotas
or remove_quotas
at the same time. Local authorizor will error out in such case;set_quotas
or remove_quotas
were set previously, operator should upgrade binary first, after which the deprecated ACLs are still reinforced;set_quotas
and remove_quotas
with compatible values for update_quotas
;get_quotas
after upgrade to control which principal(s) is allowed to query quota status for given role(s).When a persistent volume is destroyed, Mesos will now remove any data that was stored on the volume from the filesystem of the appropriate slave. In prior versions of Mesos, destroying a volume would not delete data (this was a known missing feature that has now been implemented).
Mesos 1.0 changes the HTTP status code of the following endpoints from 200 OK
to 202 Accepted
:
/reserve
/unreserve
/create-volumes
/destroy-volumes
In order to upgrade a running cluster:
ReserveResources
and CreateVolume
ACLs have been changed to roles
. In both cases, principals can now be authorized to perform these operations for particular roles. This means that by default, a framework or operator can reserve resources/create volumes for any role. To restrict this behavior, ACLs can be added to the master which authorize principals to reserve resources/create volumes for specified roles only. Previously, frameworks could only reserve resources for their own role; this behavior can be preserved by configuring the ReserveResources
ACLs such that the framework‘s principal is only authorized to reserve for the framework’s role. NOTE This renders existing ReserveResources
and CreateVolume
ACL definitions obsolete; if you are authorizing these operations, your ACL definitions should be updated.In order to upgrade a running cluster:
--roles
flag). In Mesos 0.27, if --roles
is omitted, any role name can be used; controlling which principals are allowed to register as which roles should be done using ACLs. The role whitelist functionality is still supported but is deprecated.executorLost
callback in the Scheduler interface will now be called whenever the agent detects termination of a custom executor. This callback was never called in previous versions, so please make sure any framework schedulers can now safely handle this callback. Note that this callback may not be reliably delivered.prepare
interface has been changed slightly. Instead of keeping adding parameters to the prepare
interface, we decide to use a protobuf (ContainerConfig
). Also, we renamed ContainerPrepareInfo
to ContainerLaunchInfo
to better capture the purpose of this struct. See MESOS-4240 and MESOS-4282 for more information. If you are an isolator module writer, you will have to adjust your isolator module according to the new interface and re-compile with 0.27.ACLs.shutdown_frameworks has been deprecated in favor of the new ACLs.teardown_frameworks. This affects the --acls
master flag for the local authorizer.
Reserved resources are now accounted for in the DRF role sorter. Previously unaccounted reservations will influence the weighted DRF sorter. If role weights were explicitly set, they may need to be adjusted in order to account for the reserved resources in the cluster.
In order to upgrade a running cluster:
The names of some TaskStatus::Reason enums have been changed. But the tag numbers remain unchanged, so it is backwards compatible. Frameworks using the new version might need to do some compile time adjustments:
Credential
protobuf has been changed. Credential
field secret
is now a string, it used to be bytes. This will affect framework developers and language bindings ought to update their generated protobuf with the new version. This fixes JSON based credentials file support./state
endpoints on master and agent will no longer include data
fields as part of the JSON models for ExecutorInfo
and TaskInfo
out of consideration for memory scalability (see MESOS-3794 and this email thread).data
field was originally found via frameworks[*].executors[*].data
.data
field was originally found via executors[*].tasks[*].data
.NetworkInfo
protobuf has been changed. The fields protocol
and ip_address
are now deprecated. The new field ip_addresses
subsumes the information provided by them.In order to upgrade a running cluster:
The following endpoints will be deprecated in favor of new endpoints. Both versions will be available in 0.25 but the deprecated endpoints will be removed in a subsequent release.
For master endpoints:
For agent endpoints:
For both master and agent:
In order to upgrade a running cluster:
Support for live upgrading a driver based scheduler to HTTP based (experimental) scheduler has been added.
Master now publishes its information in ZooKeeper in JSON (instead of protobuf). Make sure schedulers are linked against >= 0.23.0 libmesos before upgrading the master.
In order to upgrade a running cluster:
The ‘stats.json’ endpoints for masters and agents have been removed. Please use the ‘metrics/snapshot’ endpoints instead.
The ‘/master/shutdown’ endpoint is deprecated in favor of the new ‘/master/teardown’ endpoint.
In order to enable decorator modules to remove metadata (environment variables or labels), we changed the meaning of the return value for decorator hooks in Mesos 0.23.0. Please refer to the modules documentation for more details.
Agent ping timeouts are now configurable on the master via --slave_ping_timeout
and --max_slave_ping_timeouts
. Agents should be upgraded to 0.23.x before changing these flags.
A new scheduler driver API, acceptOffers
, has been introduced. This is a more general version of the launchTasks
API, which allows the scheduler to accept an offer and specify a list of operations (Offer.Operation) to perform using the resources in the offer. Currently, the supported operations include LAUNCH (launching tasks), RESERVE (making dynamic reservations), UNRESERVE (releasing dynamic reservations), CREATE (creating persistent volumes) and DESTROY (releasing persistent volumes). Similar to the launchTasks
API, any unused resources will be considered declined, and the specified filters will be applied on all unused resources.
The Resource protobuf has been extended to include more metadata for supporting persistence (DiskInfo), dynamic reservations (ReservationInfo) and oversubscription (RevocableInfo). You must not combine two Resource objects if they have different metadata.
In order to upgrade a running cluster:
Agent checkpoint flag has been removed as it will be enabled for all agents. Frameworks must still enable checkpointing during registration to take advantage of checkpointing their tasks.
The stats.json endpoints for masters and agents have been deprecated. Please refer to the metrics/snapshot endpoint.
The C++/Java/Python scheduler bindings have been updated. In particular, the driver can be constructed with an additional argument that specifies whether to use implicit driver acknowledgements. In statusUpdate
, the TaskStatus
now includes a UUID to make explicit acknowledgements possible.
The Authentication API has changed slightly in this release to support additional authentication mechanisms. The change from ‘string’ to ‘bytes’ for AuthenticationStartMessage.data has no impact on C++ or the over-the-wire representation, so it only impacts pure language bindings for languages like Java and Python that use different types for UTF-8 strings vs. byte arrays.
message AuthenticationStartMessage { required string mechanism = 1; optional bytes data = 2; }
All Mesos arguments can now be passed using file:// to read them out of a file (either an absolute or relative path). The --credentials, --whitelist, and any flags that expect JSON backed arguments (such as --modules) behave as before, although support for just passing an absolute path for any JSON flags rather than file:// has been deprecated and will produce a warning (and the absolute path behavior will be removed in a future release).
In order to upgrade a running cluster:
MesosSchedulerDriverImpl
with Credentials
, your code must be updated to pass the implicitAcknowledgements
argument before Credentials
. You may run a 0.21.0 Python scheduler against a 0.22.0 master, and vice versa.In order to upgrade a running cluster:
The Mesos API has been changed slightly in this release. The CommandInfo has been changed (see below), which makes launching a command more flexible. The ‘value’ field has been changed from required to optional. However, it will not cause any issue during the upgrade (since the existing schedulers always set this field).
message CommandInfo { ... // There are two ways to specify the command: // 1) If 'shell == true', the command will be launched via shell // (i.e., /bin/sh -c 'value'). The 'value' specified will be // treated as the shell command. The 'arguments' will be ignored. // 2) If 'shell == false', the command will be launched by passing // arguments to an executable. The 'value' specified will be // treated as the filename of the executable. The 'arguments' // will be treated as the arguments to the executable. This is // similar to how POSIX exec families launch processes (i.e., // execlp(value, arguments(0), arguments(1), ...)). optional bool shell = 6 [default = true]; optional string value = 3; repeated string arguments = 7; ... }
The Python bindings are also changing in this release. There are now sub-modules which allow you to use either the interfaces and/or the native driver.
import mesos.native
for the native driversimport mesos.interface
for the stub implementations and protobufsTo ensure a smooth upgrade, we recommend to upgrade your python framework and executor first. You will be able to either import using the new configuration or the old. Replace the existing imports with something like the following:
try: from mesos.native import MesosExecutorDriver, MesosSchedulerDriver from mesos.interface import Executor, Scheduler from mesos.interface import mesos_pb2 except ImportError: from mesos import Executor, MesosExecutorDriver, MesosSchedulerDriver, Scheduler import mesos_pb2
If you're using a pure language binding, please ensure that it sends status update acknowledgements through the master before upgrading.
In order to upgrade a running cluster:
There are new required flags on the master (--work_dir
and --quorum
) to support the Registrar feature, which adds replicated state on the masters.
No required upgrade ordering across components.
In order to upgrade a running cluster:
In order to upgrade a running cluster:
In order to upgrade a running cluster:
In order to upgrade a running cluster:
reconcileTasks
driver method.MesosSchedulerDriver
constructor that takes Credential
to authenticate.In order to upgrade a running cluster:
NOTE: After the restart unauthenticated frameworks will not be allowed to register.
In order to upgrade a running cluster:
In order to upgrade a running cluster:
In order to upgrade a running cluster: