src/contrib/failmon/README - hadoop - Git at Google

 ****************** FailMon Quick Start Guide ***********************

 This document is a guide to quickly setting up and running FailMon.
 For more information and details please see the FailMon User Manual.

 ***** Building FailMon *****

 Normally, FailMon lies under <hadoop-dir>/src/contrib/failmon, where
 <hadoop-source-dir> is the Hadoop project root folder. To compile it,
 one can either run ant for the whole Hadoop project, i.e.:

 $ cd <hadoop-dir>
 $ ant

 or run ant only for FailMon:

 $ cd <hadoop-dir>/src/contrib/failmon
 $ ant

 The above will compile FailMon and place all class files under
 <hadoop-dir>/build/contrib/failmon/classes.

 By invoking:

 $ cd <hadoop-dir>/src/contrib/failmon
 $ ant tar

 FailMon is packaged as a standalone jar application in
 <hadoop-dir>/src/contrib/failmon/failmon.tar.gz.


 ***** Deploying FailMon *****

 There are two ways FailMon can be deployed in a cluster:

 a) Within Hadoop, in which case the whole Hadoop package is uploaded
 to the cluster nodes. In that case, nothing else needs to be done on
 individual nodes.

 b) Independently of the Hadoop deployment, i.e., by uploading
 failmon.tar.gz to all nodes and uncompressing it. In that case, the
 bin/failmon.sh script needs to be edited; environment variable
 HADOOPDIR should point to the root directory of the Hadoop
 distribution. Also the location of the Hadoop configuration files
 should be pointed by the property 'hadoop.conf.path' in file
 conf/failmon.properties. Note that these files refer to the HDFS in
 which we want to store the FailMon data (which can potentially be
 different than the one on the cluster we are monitoring).

 We assume that either way FailMon is placed in the same directory on
 all nodes, which is typical for most clusters. If this is not
 feasible, one should create the same symbolic link on all nodes of the
 cluster, that points to the FailMon directory of each node.

 One should also edit the conf/failmon.properties file on each node to
 set his own property values. However, the default values are expected
 to serve most practical cases. Refer to the FailMon User Manual about
 the various properties and configuration parameters.


 ***** Running FailMon *****

 In order to run FailMon using a node to do the ad-hoc scheduling of
 monitoring jobs, one needs edit the hosts.list file to specify the
 list of machine hostnames on which FailMon is to be run. Also, in file
 conf/global.config the username used to connect to the machines has to
 be specified (passwordless SSH is assumed) in property 'ssh.username'.
 In property 'failmon.dir', the path to the FailMon folder has to be
 specified as well (it is assumed to be the same on all machines in the
 cluster). Then one only needs to invoke the command:

 $ cd <hadoop-dir>
 $ bin/scheduler.py

 to start the system.


 ***** Merging HDFS files *****

 For the purpose of merging the files created on HDFS by FailMon, the
 following command can be used:

 $ cd <hadoop-dir>
 $ bin/failmon.sh --mergeFiles

 This will concatenate all files in the HDFS folder (pointed to by the
 'hdfs.upload.dir' property in conf/failmon.properties file) into a
 single file, which will be placed in the same folder. Also the
 location of the Hadoop configuration files should be pointed by the
 property 'hadoop.conf.path' in file conf/failmon.properties. Note that
 these files refer to the HDFS in which have stored the FailMon data
 (which can potentially be different than the one on the cluster we are
 monitoring). Also, the scheduler.py script can be set up to merge the
 HDFS files when their number surpasses a configurable limit (see
 'conf/global.config' file).

 Please refer to the FailMon User Manual for more details.
	**************** FailMon Quick Start Guide *********************

	This document is a guide to quickly setting up and running FailMon.
	For more information and details please see the FailMon User Manual.

	*** Building FailMon ***

	Normally, FailMon lies under <hadoop-dir>/src/contrib/failmon, where
	<hadoop-source-dir> is the Hadoop project root folder. To compile it,
	one can either run ant for the whole Hadoop project, i.e.:

	$ cd <hadoop-dir>
	$ ant

	or run ant only for FailMon:

	$ cd <hadoop-dir>/src/contrib/failmon
	$ ant

	The above will compile FailMon and place all class files under
	<hadoop-dir>/build/contrib/failmon/classes.

	By invoking:

	$ cd <hadoop-dir>/src/contrib/failmon
	$ ant tar

	FailMon is packaged as a standalone jar application in
	<hadoop-dir>/src/contrib/failmon/failmon.tar.gz.


	*** Deploying FailMon ***

	There are two ways FailMon can be deployed in a cluster:

	a) Within Hadoop, in which case the whole Hadoop package is uploaded
	to the cluster nodes. In that case, nothing else needs to be done on
	individual nodes.

	b) Independently of the Hadoop deployment, i.e., by uploading
	failmon.tar.gz to all nodes and uncompressing it. In that case, the
	bin/failmon.sh script needs to be edited; environment variable
	HADOOPDIR should point to the root directory of the Hadoop
	distribution. Also the location of the Hadoop configuration files
	should be pointed by the property 'hadoop.conf.path' in file
	conf/failmon.properties. Note that these files refer to the HDFS in
	which we want to store the FailMon data (which can potentially be
	different than the one on the cluster we are monitoring).

	We assume that either way FailMon is placed in the same directory on
	all nodes, which is typical for most clusters. If this is not
	feasible, one should create the same symbolic link on all nodes of the
	cluster, that points to the FailMon directory of each node.

	One should also edit the conf/failmon.properties file on each node to
	set his own property values. However, the default values are expected
	to serve most practical cases. Refer to the FailMon User Manual about
	the various properties and configuration parameters.


	*** Running FailMon ***

	In order to run FailMon using a node to do the ad-hoc scheduling of
	monitoring jobs, one needs edit the hosts.list file to specify the
	list of machine hostnames on which FailMon is to be run. Also, in file
	conf/global.config the username used to connect to the machines has to
	be specified (passwordless SSH is assumed) in property 'ssh.username'.
	In property 'failmon.dir', the path to the FailMon folder has to be
	specified as well (it is assumed to be the same on all machines in the
	cluster). Then one only needs to invoke the command:

	$ cd <hadoop-dir>
	$ bin/scheduler.py

	to start the system.


	*** Merging HDFS files ***

	For the purpose of merging the files created on HDFS by FailMon, the
	following command can be used:

	$ cd <hadoop-dir>
	$ bin/failmon.sh --mergeFiles

	This will concatenate all files in the HDFS folder (pointed to by the
	'hdfs.upload.dir' property in conf/failmon.properties file) into a
	single file, which will be placed in the same folder. Also the
	location of the Hadoop configuration files should be pointed by the
	property 'hadoop.conf.path' in file conf/failmon.properties. Note that
	these files refer to the HDFS in which have stored the FailMon data
	(which can potentially be different than the one on the cluster we are
	monitoring). Also, the scheduler.py script can be set up to merge the
	HDFS files when their number surpasses a configurable limit (see
	'conf/global.config' file).

	Please refer to the FailMon User Manual for more details.