Leader election and detection in Mesos is done via ZooKeeper. See: http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection for more information how ZooKeeper leader election works.
In order to spin up a Mesos cluster using multiple masters for fault-tolerance:
--zk
flag, e.g. --zk=zk://host1:port1/path,host2:port2/path,...
--master=zk://host1:port1/path,host2:port2/path,...
The detector is implemented in the src/detector
folder. In particular, we watch for several ZooKeeper session events:
We also explicitly timeout our sessions, when disconnected from ZooKeeper for an amount of time, see: ZOOKEEPER_SESSION_TIMEOUT
. This is because the ZooKeeper client libraries only notify of session expiration upon reconnection. These timeouts are of particular interest in the case of network partitions.
When a network partition occurs, if a particular component is disconnected from ZooKeeper, the Master Detector of the partitioned component will induce a timeout event. This causes the component to be notified that there is no leading master.
When slaves are master-less, they ignore incoming messages from masters to ensure that we don‘t act on a non-leading master’s decision.
When masters enter a leader-less state, they commit suicide.
The following semantics are enforced: