blob: e707198172029df6e6cbb73248fee3a2b8c372d3 [file] [log] [blame]
<?xml version="1.0" encoding="iso-8859-1"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document xmlns="http://maven.apache.org/XDOC/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
<properties></properties>
<title>Graph Tutorial</title>
<body>
<section name="Hama Graph Tutorial"></section>
<p>This document describes the Graph computing framework and serves as a tutorial.</p>
<subsection name="Overview"></subsection>
<p>Hama includes the Graph package for vertex-centric graph computations.
Hama's Graph package allows you to program Google's Pregel style applications with simple programming interface.</p>
<subsection name="Vertex"></subsection>
<p>Writing a Hama graph application involves subclassing the predefined Vertex class. Its template arguments define three value types, associated with vertices, edges, and messages.</p>
<pre>
public abstract class Vertex&lt;V extends Writable, E extends Writable, M extends Writable&gt;
implements VertexInterface&lt;V, E, M&gt; {
public void compute(Iterator&lt;M&gt; messages) throws IOException;
..
}</pre>
<p>The user overrides the Compute() method, which will be executed at each active vertex in every superstep. Predefined Vertex methods allow Compute() to query information about the current vertex and its edges, and to send messages to other vertices. Compute() can inspect the value associated with its vertex via GetValue().</p>
<subsection name="Vertex Reader and Writer"></subsection>
<p>Hama Graph provides very flexible input and output options, and allows to extract Vertex from your data without any pre-processing. You can create your own VertexReader for your data format by exending org.apache.hama.graph.<b>VertexInputReader</b> class.
For example, an sequence file contains a linked list of Vertex, can be parse as following:
</p>
<pre>
public static class PagerankSeqReader
extends
VertexInputReader&lt;Text, TextArrayWritable, Text, NullWritable, DoubleWritable&gt; {
@Override
public boolean parseVertex(Text key, TextArrayWritable value,
Vertex&lt;Text, NullWritable, DoubleWritable&gt; vertex) throws Exception {
vertex.setVertexID(key);
for (Writable v : value.get()) {
vertex.addEdge(new Edge&lt;Text, NullWritable&gt;((Text) v, null));
}
return true;
}
}
</pre>
And also, you can create your own Writer by implementing org.apache.hama.graph.<b>VertexOutputWriter</b> class.
See the SemiClusterVertexOutputWriter example:
<pre>
@Override
public void write(Vertex&lt;V, E, M&gt; vertex,
BSPPeer&lt;Writable, Writable, KEYOUT, VALUEOUT, GraphJobMessage&gt; peer)
throws IOException {
SemiClusterMessage vertexValue = (SemiClusterMessage) vertex.getValue();
peer.write((KEYOUT) vertex.getVertexID(), (VALUEOUT) new Text(vertexValue
.getSemiClusterContainThis().toString()));
}
</pre>
<subsection name="Combiners"></subsection>
<p>Sending a message to another vertex that exists on a different machine has some overhead. However if the algorithm doesn't require each message explicitly but a function of it (example sum) then combiners can be used.</p>
<h4>Write your own Combiner</h4>
<p>To write your own combiner, you have to extend Combiner class and implement the methods of #combine(Iterable&lt;M&gt; messages).
For more, please see the implementation of MinIntCombiner in org.apache.hama.example.SSSP example.</p>
<subsection name="Counters"></subsection>
<p>Counters are used for measuring the progress or counting the number of events within job. For your own Counter, you need to define the enum type as follow:</p>
<pre>
private static enum DYNAMIC_GRAPH_COUNTER {
ADDED_VERTEX_COUNT,
REMOVED_VERTEX_COUNT
}
</pre>
<p>Then you can increment your own counter by calling increment method as follow:</p>
<pre>
this.getCounter(DYNAMIC_GRAPH_COUNTER.ADDED_VERTEX_COUNT).increment(1);
</pre>
<subsection name="Aggregators"></subsection>
<p>Aggregators are a mechanism for global communication, monitoring, and data. Each vertex can provide a value to an aggregator in superstep S, the system combines those values using a reduction operator, and the resulting value is made available to all vertices in superstep S + 1.
</p>
<h4>Registering aggregators</h4>
<p>To start using aggregators, you must declare them in your GraphJob:</p>
<pre>
HamaConfiguration conf = new HamaConfiguration(new Configuration());
GraphJob graphJob = new GraphJob(conf, MyClass.class);
// To add an average aggregator
graphJob.setAggregatorClass(AverageAggregator.class);
// To add a sum aggregator
graphJob.setAggregatorClass(SumAggregator.class);</pre>
<p>There are multiple different aggregators and you can also make your own. You can look for already implemented aggregators in org.apache.hama.graph package.</p>
<h4>Start working with aggregators</h4>
<p>In order to aggregate values from your vertices, use:</p>
<pre>
this.aggregate(index,value);</pre>
<p>This method is called from inside each vertex. Though it's not mandatory all vertices to make use of this method. The index parameter of this method is a number that is equivalent to the order of the registered aggregator. (The first registered aggregator has index 0, second has index 1 etc.) </p>
<h4>Get results</h4>
<p>Inside your vertex, you can get the results of each aggregator by using the method:</p>
<pre>
this.getAggregatedValue(index);</pre>
<h4>Write your own aggregators</h4>
<p>To write your own aggregator, you have to extend org.apache.hama.graph.<b>AbstractAggregator</b> class and implement the methods of #aggregate(M value) and #getValue(). For more, please see the default implementation of aggregators in org.apache.hama.graph package.</p>
<subsection name="Example: PageRankVertex"></subsection>
<p>To solve the Page Rank problem using Hama Graph, you can extends the Vertex class to create a PageRankVertex class.
In this example, the algorithm described Google's Pregel paper was used. The value of a vertex represents the tentative page rank of the vertex. The graph is intialized with each vertex value equal to 1/numOfVertices. In each of the first 30 supersteps, each vertex sends its tentative page rank along all of its outgoing edges.
<br/><br/>
From Superstep 1 to 30, each vertex sums up the values arriving on all its messages and sets its tentative page rank to (1 - 0.85) / numOfVertices + (0.85 * sum).
</p>
<pre>
public static class PageRankVertex extends
Vertex&lt;Text, NullWritable, DoubleWritable&gt; {
@Override
public void compute(Iterator&lt;DoubleWritable&gt; messages) throws IOException {
// initialize this vertex to 1 / count of global vertices in this graph
if (this.getSuperstepCount() == 0) {
setValue(new DoubleWritable(1.0 / this.getNumVertices()));
} else if (this.getSuperstepCount() >= 1) {
double sum = 0;
for (DoubleWritable msg : messages) {
sum += msg.get();
}
double alpha = (1.0d - DAMPING_FACTOR) / this.getNumVertices();
setValue(new DoubleWritable(alpha + (sum * DAMPING_FACTOR)));
aggregate(0, this.getValue());
}
// if we have not reached our global error yet, then proceed.
DoubleWritable globalError = getAggregatedValue(0);
if (globalError != null &amp;&amp; this.getSuperstepCount() &gt; 2
&amp;&amp; MAXIMUM_CONVERGENCE_ERROR > globalError.get()) {
voteToHalt();
} else {
// in each superstep we are going to send a new rank to our neighbours
sendMessageToNeighbors(new DoubleWritable(this.getValue().get()
/ this.getEdges().size()));
}
}
}</pre>
</body>
</document>