| <?xml version="1.0" encoding="iso-8859-1"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <document xmlns="http://maven.apache.org/XDOC/2.0" |
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" |
| xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd"> |
| <properties></properties> |
| <title>Graph Tutorial</title> |
| <body> |
| <section name="Hama Graph Tutorial"></section> |
| <p>This document describes the Graph computing framework and serves as a tutorial.</p> |
| <subsection name="Overview"></subsection> |
| <p>Hama includes the Graph package for vertex-centric graph computations. |
| Hama's Graph package allows you to program Google's Pregel style applications with simple programming interface.</p> |
| |
| <subsection name="Vertex"></subsection> |
| |
| <p>Writing a Hama graph application involves subclassing the predefined Vertex class. Its template arguments define three value types, associated with vertices, edges, and messages.</p> |
| <pre> |
| public abstract class Vertex<V extends Writable, E extends Writable, M extends Writable> |
| implements VertexInterface<V, E, M> { |
| |
| public void compute(Iterator<M> messages) throws IOException; |
| .. |
| |
| }</pre> |
| |
| <p>The user overrides the Compute() method, which will be executed at each active vertex in every superstep. Predefined Vertex methods allow Compute() to query information about the current vertex and its edges, and to send messages to other vertices. Compute() can inspect the value associated with its vertex via GetValue().</p> |
| |
| <subsection name="Vertex Reader and Writer"></subsection> |
| <p>Hama Graph provides very flexible input and output options, and allows to extract Vertex from your data without any pre-processing. You can create your own VertexReader for your data format by exending org.apache.hama.graph.<b>VertexInputReader</b> class. |
| |
| For example, an sequence file contains a linked list of Vertex, can be parse as following: |
| </p> |
| <pre> |
| public static class PagerankSeqReader |
| extends |
| VertexInputReader<Text, TextArrayWritable, Text, NullWritable, DoubleWritable> { |
| @Override |
| public boolean parseVertex(Text key, TextArrayWritable value, |
| Vertex<Text, NullWritable, DoubleWritable> vertex) throws Exception { |
| vertex.setVertexID(key); |
| |
| for (Writable v : value.get()) { |
| vertex.addEdge(new Edge<Text, NullWritable>((Text) v, null)); |
| } |
| |
| return true; |
| } |
| } |
| </pre> |
| |
| And also, you can create your own Writer by implementing org.apache.hama.graph.<b>VertexOutputWriter</b> class. |
| See the SemiClusterVertexOutputWriter example: |
| <pre> |
| @Override |
| public void write(Vertex<V, E, M> vertex, |
| BSPPeer<Writable, Writable, KEYOUT, VALUEOUT, GraphJobMessage> peer) |
| throws IOException { |
| SemiClusterMessage vertexValue = (SemiClusterMessage) vertex.getValue(); |
| peer.write((KEYOUT) vertex.getVertexID(), (VALUEOUT) new Text(vertexValue |
| .getSemiClusterContainThis().toString())); |
| } |
| </pre> |
| |
| <subsection name="Combiners"></subsection> |
| <p>Sending a message to another vertex that exists on a different machine has some overhead. However if the algorithm doesn't require each message explicitly but a function of it (example sum) then combiners can be used.</p> |
| <h4>Write your own Combiner</h4> |
| <p>To write your own combiner, you have to extend Combiner class and implement the methods of #combine(Iterable<M> messages). |
| For more, please see the implementation of MinIntCombiner in org.apache.hama.example.SSSP example.</p> |
| |
| <subsection name="Counters"></subsection> |
| <p>Counters are used for measuring the progress or counting the number of events within job. For your own Counter, you need to define the enum type as follow:</p> |
| <pre> |
| private static enum DYNAMIC_GRAPH_COUNTER { |
| ADDED_VERTEX_COUNT, |
| REMOVED_VERTEX_COUNT |
| } |
| </pre> |
| <p>Then you can increment your own counter by calling increment method as follow:</p> |
| <pre> |
| this.getCounter(DYNAMIC_GRAPH_COUNTER.ADDED_VERTEX_COUNT).increment(1); |
| </pre> |
| |
| <subsection name="Aggregators"></subsection> |
| <p>Aggregators are a mechanism for global communication, monitoring, and data. Each vertex can provide a value to an aggregator in superstep S, the system combines those values using a reduction operator, and the resulting value is made available to all vertices in superstep S + 1. |
| </p> |
| <h4>Registering aggregators</h4> |
| <p>To start using aggregators, you must declare them in your GraphJob:</p> |
| <pre> |
| HamaConfiguration conf = new HamaConfiguration(new Configuration()); |
| GraphJob graphJob = new GraphJob(conf, MyClass.class); |
| |
| // To add an average aggregator |
| graphJob.setAggregatorClass(AverageAggregator.class); |
| |
| // To add a sum aggregator |
| graphJob.setAggregatorClass(SumAggregator.class);</pre> |
| <p>There are multiple different aggregators and you can also make your own. You can look for already implemented aggregators in org.apache.hama.graph package.</p> |
| <h4>Start working with aggregators</h4> |
| <p>In order to aggregate values from your vertices, use:</p> |
| <pre> |
| this.aggregate(index,value);</pre> |
| |
| <p>This method is called from inside each vertex. Though it's not mandatory all vertices to make use of this method. The index parameter of this method is a number that is equivalent to the order of the registered aggregator. (The first registered aggregator has index 0, second has index 1 etc.) </p> |
| <h4>Get results</h4> |
| <p>Inside your vertex, you can get the results of each aggregator by using the method:</p> |
| <pre> |
| this.getAggregatedValue(index);</pre> |
| |
| <h4>Write your own aggregators</h4> |
| <p>To write your own aggregator, you have to extend org.apache.hama.graph.<b>AbstractAggregator</b> class and implement the methods of #aggregate(M value) and #getValue(). For more, please see the default implementation of aggregators in org.apache.hama.graph package.</p> |
| <subsection name="Example: PageRankVertex"></subsection> |
| <p>To solve the Page Rank problem using Hama Graph, you can extends the Vertex class to create a PageRankVertex class. |
| In this example, the algorithm described Google's Pregel paper was used. The value of a vertex represents the tentative page rank of the vertex. The graph is intialized with each vertex value equal to 1/numOfVertices. In each of the first 30 supersteps, each vertex sends its tentative page rank along all of its outgoing edges. |
| <br/><br/> |
| From Superstep 1 to 30, each vertex sums up the values arriving on all its messages and sets its tentative page rank to (1 - 0.85) / numOfVertices + (0.85 * sum). |
| </p> |
| |
| <pre> |
| public static class PageRankVertex extends |
| Vertex<Text, NullWritable, DoubleWritable> { |
| |
| @Override |
| public void compute(Iterator<DoubleWritable> messages) throws IOException { |
| // initialize this vertex to 1 / count of global vertices in this graph |
| if (this.getSuperstepCount() == 0) { |
| setValue(new DoubleWritable(1.0 / this.getNumVertices())); |
| } else if (this.getSuperstepCount() >= 1) { |
| double sum = 0; |
| for (DoubleWritable msg : messages) { |
| sum += msg.get(); |
| } |
| double alpha = (1.0d - DAMPING_FACTOR) / this.getNumVertices(); |
| setValue(new DoubleWritable(alpha + (sum * DAMPING_FACTOR))); |
| aggregate(0, this.getValue()); |
| } |
| |
| // if we have not reached our global error yet, then proceed. |
| DoubleWritable globalError = getAggregatedValue(0); |
| |
| if (globalError != null && this.getSuperstepCount() > 2 |
| && MAXIMUM_CONVERGENCE_ERROR > globalError.get()) { |
| voteToHalt(); |
| } else { |
| // in each superstep we are going to send a new rank to our neighbours |
| sendMessageToNeighbors(new DoubleWritable(this.getValue().get() |
| / this.getEdges().size())); |
| } |
| } |
| }</pre> |
| |
| </body> |
| </document> |