We have ported version 0.20.205.0 of Hadoop to run on Mesos. Most of the Mesos port is implemented by a pluggable Hadoop scheduler, which communicates with Mesos to receive nodes to launch tasks on. However, a few small additions to Hadoop's internal APIs are also required.
You can build the ported version of Hadoop using make hadoop
. It gets placed in the hadoop/hadoop-0.20.205.0
directory. However, if you want to patch your own version of Hadoop to add Mesos support, you can also use .patch files located in <Mesos directory>/hadoop
. These patches are likely to work on other Hadoop versions derived from 0.20. For example, for Cloudera's Distribution, GitHub user patelh has already created a Mesos-compatible version of CDH3u3.
To run Hadoop on Mesos, follow these steps:
Note that when you run on a cluster, Hadoop (and Mesos) should be located on the same path on all nodes.
If you wish to run multiple JobTrackers, the easiest way is to give each one a different port by using a different Hadoop conf
directory for each one and passing the --conf
flag to bin/hadoop
to specify which config directory to use. You can copy Hadoop's existing conf
directory to a new location and modify it to achieve this.