src/site/xdoc/hama_graph_tutorial.xml - hama - Git at Google

 <?xml version="1.0" encoding="iso-8859-1"?>
 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <document xmlns="http://maven.apache.org/XDOC/2.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
   <properties></properties>
   <title>Graph Tutorial</title>
   <body>
     <section name="Hama Graph Tutorial"></section>
     <p>This document describes the Graph computing framework and serves as a tutorial.</p>
     <subsection name="Overview"></subsection>
     <p>Hama includes the Graph package for vertex-centric graph computations.
     Hama's Graph package allows you to program Google's Pregel style applications with simple programming interface.</p>

     <subsection name="Vertex"></subsection>

     <p>Writing a Hama graph application involves subclassing the predefined Vertex class. Its template arguments define three value types, associated with vertices, edges, and messages.</p>
     <pre>
   public abstract class Vertex&lt;V extends Writable, E extends Writable, M extends Writable&gt;
       implements VertexInterface&lt;V, E, M&gt; {

     public void compute(Iterator&lt;M&gt; messages) throws IOException;
     ..

   }</pre>

    <p>The user overrides the Compute() method, which will be executed at each active vertex in every superstep. Predefined Vertex methods allow Compute() to query information about the current vertex and its edges, and to send messages to other vertices. Compute() can inspect the value associated with its vertex via GetValue().</p>

    <subsection name="Vertex Reader and Writer"></subsection>
    <p>Hama Graph provides very flexible input and output options, and allows to extract Vertex from your data without any pre-processing. You can create your own VertexReader for your data format by exending org.apache.hama.graph.<b>VertexInputReader</b> class.

    For example, an sequence file contains a linked list of Vertex, can be parse as following:
    </p>
    <pre>
   public static class PagerankSeqReader
       extends
       VertexInputReader&lt;Text, TextArrayWritable, Text, NullWritable, DoubleWritable&gt; {
     @Override
     public boolean parseVertex(Text key, TextArrayWritable value,
         Vertex&lt;Text, NullWritable, DoubleWritable&gt; vertex) throws Exception {
       vertex.setVertexID(key);

       for (Writable v : value.get()) {
         vertex.addEdge(new Edge&lt;Text, NullWritable&gt;((Text) v, null));
       }

       return true;
     }
   }
 </pre>

    And also, you can create your own Writer by implementing org.apache.hama.graph.<b>VertexOutputWriter</b> class.
    See the SemiClusterVertexOutputWriter example:
    <pre>
   @Override
   public void write(Vertex&lt;V, E, M&gt; vertex,
       BSPPeer&lt;Writable, Writable, KEYOUT, VALUEOUT, GraphJobMessage&gt; peer)
       throws IOException {
     SemiClusterMessage vertexValue = (SemiClusterMessage) vertex.getValue();
     peer.write((KEYOUT) vertex.getVertexID(), (VALUEOUT) new Text(vertexValue
         .getSemiClusterContainThis().toString()));
   }
   </pre>

    <subsection name="Combiners"></subsection>
    <p>Sending a message to another vertex that exists on a different machine has some overhead. However if the algorithm doesn't require each message explicitly but a function of it (example sum) then combiners can be used.</p>
    <h4>Write your own Combiner</h4>
    <p>To write your own combiner, you have to extend Combiner class and implement the methods of #combine(Iterable&lt;M&gt; messages).
    For more, please see the implementation of MinIntCombiner in org.apache.hama.example.SSSP example.</p>

    <subsection name="Counters"></subsection>
    <p>Counters are used for measuring the progress or counting the number of events within job. For your own Counter, you need to define the enum type as follow:</p>
 <pre>
   private static enum DYNAMIC_GRAPH_COUNTER {
     ADDED_VERTEX_COUNT,
     REMOVED_VERTEX_COUNT
   }
 </pre>
    <p>Then you can increment your own counter by calling increment method as follow:</p>
 <pre>
   this.getCounter(DYNAMIC_GRAPH_COUNTER.ADDED_VERTEX_COUNT).increment(1);
 </pre>

    <subsection name="Aggregators"></subsection>
    <p>Aggregators are a mechanism for global communication, monitoring, and data. Each vertex can provide a value to an aggregator in superstep S, the system combines those values using a reduction operator, and the resulting value is made available to all vertices in superstep S + 1.
    </p>
    <h4>Registering aggregators</h4>
    <p>To start using aggregators, you must declare them in your GraphJob:</p>
    <pre>
   HamaConfiguration conf = new HamaConfiguration(new Configuration());
   GraphJob graphJob = new GraphJob(conf, MyClass.class);

   // To add an average aggregator
   graphJob.setAggregatorClass(AverageAggregator.class);

   // To add a sum aggregator
   graphJob.setAggregatorClass(SumAggregator.class);</pre>
    <p>There are multiple different aggregators and you can also make your own. You can look for already implemented aggregators in org.apache.hama.graph package.</p>
    <h4>Start working with aggregators</h4>
    <p>In order to aggregate values from your vertices, use:</p>
 <pre>
   this.aggregate(index,value);</pre>

   <p>This method is called from inside each vertex. Though it's not mandatory all vertices to make use of this method. The index parameter of this method is a number that is equivalent to the order of the registered aggregator. (The first registered aggregator has index 0, second has index 1 etc.) </p>
   <h4>Get results</h4>
   <p>Inside your vertex, you can get the results of each aggregator by using the method:</p>
   <pre>
   this.getAggregatedValue(index);</pre>

   <h4>Write your own aggregators</h4>
   <p>To write your own aggregator, you have to extend org.apache.hama.graph.<b>AbstractAggregator</b> class and implement the methods of #aggregate(M value) and #getValue(). For more, please see the default implementation of aggregators in org.apache.hama.graph package.</p>
    <subsection name="Example: PageRankVertex"></subsection>
    <p>To solve the Page Rank problem using Hama Graph, you can extends the Vertex class to create a PageRankVertex class.
 In this example, the algorithm described Google's Pregel paper was used. The value of a vertex represents the tentative page rank of the vertex. The graph is intialized with each vertex value equal to 1/numOfVertices. In each of the first 30 supersteps, each vertex sends its tentative page rank along all of its outgoing edges.
 <br/><br/>
 From Superstep 1 to 30, each vertex sums up the values arriving on all its messages and sets its tentative page rank to (1 - 0.85) / numOfVertices + (0.85 * sum).
    </p>

     <pre>
   public static class PageRankVertex extends
       Vertex&lt;Text, NullWritable, DoubleWritable&gt; {

     @Override
     public void compute(Iterator&lt;DoubleWritable&gt; messages) throws IOException {
       // initialize this vertex to 1 / count of global vertices in this graph
       if (this.getSuperstepCount() == 0) {
         setValue(new DoubleWritable(1.0 / this.getNumVertices()));
       } else if (this.getSuperstepCount() >= 1) {
         double sum = 0;
         for (DoubleWritable msg : messages) {
           sum += msg.get();
         }
         double alpha = (1.0d - DAMPING_FACTOR) / this.getNumVertices();
         setValue(new DoubleWritable(alpha + (sum * DAMPING_FACTOR)));
         aggregate(0, this.getValue());
       }

       // if we have not reached our global error yet, then proceed.
       DoubleWritable globalError = getAggregatedValue(0);

       if (globalError != null &amp;&amp; this.getSuperstepCount() &gt; 2
           &amp;&amp; MAXIMUM_CONVERGENCE_ERROR > globalError.get()) {
         voteToHalt();
       } else {
         // in each superstep we are going to send a new rank to our neighbours
         sendMessageToNeighbors(new DoubleWritable(this.getValue().get()
             / this.getEdges().size()));
       }
     }
   }</pre>

   </body>
 </document>
	<?xml version="1.0" encoding="iso-8859-1"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<document xmlns="http://maven.apache.org/XDOC/2.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
	<properties></properties>
	<title>Graph Tutorial</title>
	<body>
	<section name="Hama Graph Tutorial"></section>
	<p>This document describes the Graph computing framework and serves as a tutorial.</p>
	<subsection name="Overview"></subsection>
	<p>Hama includes the Graph package for vertex-centric graph computations.
	Hama's Graph package allows you to program Google's Pregel style applications with simple programming interface.</p>

	<subsection name="Vertex"></subsection>

	<p>Writing a Hama graph application involves subclassing the predefined Vertex class. Its template arguments define three value types, associated with vertices, edges, and messages.</p>
	<pre>
	public abstract class Vertex<V extends Writable, E extends Writable, M extends Writable>
	implements VertexInterface<V, E, M> {

	public void compute(Iterator<M> messages) throws IOException;
	..

	}</pre>

	<p>The user overrides the Compute() method, which will be executed at each active vertex in every superstep. Predefined Vertex methods allow Compute() to query information about the current vertex and its edges, and to send messages to other vertices. Compute() can inspect the value associated with its vertex via GetValue().</p>

	<subsection name="Vertex Reader and Writer"></subsection>
	<p>Hama Graph provides very flexible input and output options, and allows to extract Vertex from your data without any pre-processing. You can create your own VertexReader for your data format by exending org.apache.hama.graph.<b>VertexInputReader</b> class.

	For example, an sequence file contains a linked list of Vertex, can be parse as following:
	</p>
	<pre>
	public static class PagerankSeqReader
	extends
	VertexInputReader<Text, TextArrayWritable, Text, NullWritable, DoubleWritable> {
	@Override
	public boolean parseVertex(Text key, TextArrayWritable value,
	Vertex<Text, NullWritable, DoubleWritable> vertex) throws Exception {
	vertex.setVertexID(key);

	for (Writable v : value.get()) {
	vertex.addEdge(new Edge<Text, NullWritable>((Text) v, null));
	}

	return true;
	}
	}
	</pre>

	And also, you can create your own Writer by implementing org.apache.hama.graph.<b>VertexOutputWriter</b> class.
	See the SemiClusterVertexOutputWriter example:
	<pre>
	@Override
	public void write(Vertex<V, E, M> vertex,
	BSPPeer<Writable, Writable, KEYOUT, VALUEOUT, GraphJobMessage> peer)
	throws IOException {
	SemiClusterMessage vertexValue = (SemiClusterMessage) vertex.getValue();
	peer.write((KEYOUT) vertex.getVertexID(), (VALUEOUT) new Text(vertexValue
	.getSemiClusterContainThis().toString()));
	}
	</pre>

	<subsection name="Combiners"></subsection>
	<p>Sending a message to another vertex that exists on a different machine has some overhead. However if the algorithm doesn't require each message explicitly but a function of it (example sum) then combiners can be used.</p>
	<h4>Write your own Combiner</h4>
	<p>To write your own combiner, you have to extend Combiner class and implement the methods of #combine(Iterable<M> messages).
	For more, please see the implementation of MinIntCombiner in org.apache.hama.example.SSSP example.</p>

	<subsection name="Counters"></subsection>
	<p>Counters are used for measuring the progress or counting the number of events within job. For your own Counter, you need to define the enum type as follow:</p>
	<pre>
	private static enum DYNAMIC_GRAPH_COUNTER {
	ADDED_VERTEX_COUNT,
	REMOVED_VERTEX_COUNT
	}
	</pre>
	<p>Then you can increment your own counter by calling increment method as follow:</p>
	<pre>
	this.getCounter(DYNAMIC_GRAPH_COUNTER.ADDED_VERTEX_COUNT).increment(1);
	</pre>

	<subsection name="Aggregators"></subsection>
	<p>Aggregators are a mechanism for global communication, monitoring, and data. Each vertex can provide a value to an aggregator in superstep S, the system combines those values using a reduction operator, and the resulting value is made available to all vertices in superstep S + 1.
	</p>
	<h4>Registering aggregators</h4>
	<p>To start using aggregators, you must declare them in your GraphJob:</p>
	<pre>
	HamaConfiguration conf = new HamaConfiguration(new Configuration());
	GraphJob graphJob = new GraphJob(conf, MyClass.class);

	// To add an average aggregator
	graphJob.setAggregatorClass(AverageAggregator.class);

	// To add a sum aggregator
	graphJob.setAggregatorClass(SumAggregator.class);</pre>
	<p>There are multiple different aggregators and you can also make your own. You can look for already implemented aggregators in org.apache.hama.graph package.</p>
	<h4>Start working with aggregators</h4>
	<p>In order to aggregate values from your vertices, use:</p>
	<pre>
	this.aggregate(index,value);</pre>

	<p>This method is called from inside each vertex. Though it's not mandatory all vertices to make use of this method. The index parameter of this method is a number that is equivalent to the order of the registered aggregator. (The first registered aggregator has index 0, second has index 1 etc.) </p>
	<h4>Get results</h4>
	<p>Inside your vertex, you can get the results of each aggregator by using the method:</p>
	<pre>
	this.getAggregatedValue(index);</pre>

	<h4>Write your own aggregators</h4>
	<p>To write your own aggregator, you have to extend org.apache.hama.graph.<b>AbstractAggregator</b> class and implement the methods of #aggregate(M value) and #getValue(). For more, please see the default implementation of aggregators in org.apache.hama.graph package.</p>
	<subsection name="Example: PageRankVertex"></subsection>
	<p>To solve the Page Rank problem using Hama Graph, you can extends the Vertex class to create a PageRankVertex class.
	In this example, the algorithm described Google's Pregel paper was used. The value of a vertex represents the tentative page rank of the vertex. The graph is intialized with each vertex value equal to 1/numOfVertices. In each of the first 30 supersteps, each vertex sends its tentative page rank along all of its outgoing edges.
	<br/><br/>
	From Superstep 1 to 30, each vertex sums up the values arriving on all its messages and sets its tentative page rank to (1 - 0.85) / numOfVertices + (0.85 * sum).
	</p>

	<pre>
	public static class PageRankVertex extends
	Vertex<Text, NullWritable, DoubleWritable> {

	@Override
	public void compute(Iterator<DoubleWritable> messages) throws IOException {
	// initialize this vertex to 1 / count of global vertices in this graph
	if (this.getSuperstepCount() == 0) {
	setValue(new DoubleWritable(1.0 / this.getNumVertices()));
	} else if (this.getSuperstepCount() >= 1) {
	double sum = 0;
	for (DoubleWritable msg : messages) {
	sum += msg.get();
	}
	double alpha = (1.0d - DAMPING_FACTOR) / this.getNumVertices();
	setValue(new DoubleWritable(alpha + (sum * DAMPING_FACTOR)));
	aggregate(0, this.getValue());
	}

	// if we have not reached our global error yet, then proceed.
	DoubleWritable globalError = getAggregatedValue(0);

	if (globalError != null && this.getSuperstepCount() > 2
	&& MAXIMUM_CONVERGENCE_ERROR > globalError.get()) {
	voteToHalt();
	} else {
	// in each superstep we are going to send a new rank to our neighbours
	sendMessageToNeighbors(new DoubleWritable(this.getValue().get()
	/ this.getEdges().size()));
	}
	}
	}</pre>

	</body>
	</document>