The Myriad high availability (HA) feature provides no job failure or downtime in case of failure. In addition, self recovery from a failure is provided to restore it back to a highly available state after the failure.
A Myriad HA environment allows the Node Managers to reconnect to the new Resource Manager instance upon failover.
On failover, the following occurs:
Note: All clients that are connected to Resource Manager continue to work as long as the FQDN (for example, rm.marathon.mesos) is used to connect to the Resource Manager.
Step 1: Create a directory for Mesos-DNS. For example, /etc/mesos-dns.
Step 2: Install Mesos-DNS on one node in your cluster.
Step 3: Configure Mesos-DNS by providing the required parameters in the /etc/mesos-dns/config.json file. See the Mesos-DNS configuration documentation for more information. The following example parameters represent a minimum configuration.
{ "zk": "zk:10.10.100.19:2181/mesos", "refreshSeconds": 60, "ttl": 60, "domain": "mesos", "port": 53, "resolvers": ["10.10.1.10"], "timeout": 5, }
Step 4: If you are on Linux, add the following Mesos-DNS name server to the /etc/resolv.conf file (at the top of the file) on all cluster nodes and clients. For example, clients running RM UI, Myriad UI, and so on.
nameserver <mesos-dnsIP address>
Note: Add the entries at the top (in the beginning) of the /etc/resolv.conf file. If the entries are not at the top, Mesos-DNS may not work correctly.
Configuring Myriad for HA involves adding HA configuration properties to the $YARN_HOME/etc/hadoop/yarn-site.xml file and the $YARN_HOME/etc/hadoop/myriad-config-default.yml file.
To the $YARN_HOME/etc/hadoop/yarn-site.xml file, add the following properties:
To the $YARN_HOME/etc/hadoop/myriad-config-default.yml file, modify the following values:
frameworkFailoverTimeout: <non-zero value> haEnabled: true
Note: The Myriad Mesos frameworkFailoverTimeout parameter is specified in milliseconds. This paramenter indicates to Mesos that Myriad will failover within this time interval.