blob: 7b589d24b6474ca706115edc1dd1a10b464bd6a5 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
<document xmlns=""
<section name="Implementation">
<p>Giraph is an Apache open source project. A Giraph computation runs as a Hadoop job, hence any existing Hadoop user can immediately benefit from Giraph. Workers use ZooKeeper to elect a master that will coordinate computation. The graph is loaded and partitioned across workers. The master then dictates when workers should start computing consecutive supersteps. Once the computation has halted, workers save the output. Checkpoints are initiated at user-defined intervals and are used for automatic application restarts when any worker fails. Any worker can act as the master and one will automatically take over if the current master fails.</p>
<p>Giraph offers several mechanisms that help implement graph algorithms at scale. You can input vertices and edges (see the <a href="io.html">input/output</a> section) from any input source. We support several Hadoop input formats as well as Hive tables. Aggregators allow applications to compute a global value from contributing values provided by each vertex, see the <a href="aggregators.html">aggregators</a> section. By default vertex and edge values and messages are stored in workers’ memory. However, you can decide to store the values and messages on disk, for example on a Hadoop cluster with limited memory but ample disk space.</p>