docs/src/reference/implementations-giraph.asciidoc - tinkerpop - Git at Google

 ////
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 ////
 [[giraphgraphcomputer]]
 ==== GiraphGraphComputer

 [source,xml]
 ----
 <dependency>
    <groupId>org.apache.tinkerpop</groupId>
    <artifactId>giraph-gremlin</artifactId>
    <version>x.y.z</version>
 </dependency>
 ----

 image:giraph-logo.png[width=100,float=left] link:http://giraph.apache.org[Giraph] is an Apache Software Foundation
 project focused on OLAP-based graph processing. Giraph makes use of the distributed graph computing paradigm made
 popular by Google's Pregel. In Giraph, developers write "vertex programs" that get executed at each vertex in
 parallel. These programs communicate with one another in a bulk synchronous parallel (BSP) manner. This model aligns
 with TinkerPop3's `GraphComputer` API. TinkerPop3 provides an implementation of `GraphComputer` that works for Giraph
 called `GiraphGraphComputer`. Moreover, with TinkerPop3's <<mapreduce,MapReduce>>-framework, the standard
 Giraph/Pregel model is extended to support an arbitrary number of MapReduce phases to aggregate and yield results
 from the graph. Below are examples using `GiraphGraphComputer` from the <<gremlin-console,Gremlin-Console>>.

 WARNING: Giraph uses a large number of Hadoop counters. The default for Hadoop is 120. In `mapred-site.xml` it is
 possible to increase the limit it via the `mapreduce.job.counters.max` property. A good value to use is 1000. This
 is a cluster-wide property so be sure to restart the cluster after updating.

 WARNING: The maximum number of workers can be no larger than the number of map-slots in the Hadoop cluster minus 1.
 For example, if the Hadoop cluster has 4 map slots, then `giraph.maxWorkers` can not be larger than 3. One map-slot
 is reserved for the master compute node and all other slots can be allocated as workers to execute the VertexPrograms
 on the vertices of the graph.

 If `GiraphGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
 specified in `HADOOP_GREMLIN_LIBS`.

 [source,shell]
 export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/giraph-gremlin/lib

 Or, the user can specify the directory in the Gremlin Console.

 [source,groovy]
 System.setProperty('HADOOP_GREMLIN_LIBS',System.getProperty('HADOOP_GREMLIN_LIBS') + ':' + '/usr/local/gremlin-console/ext/giraph-gremlin/lib')

 [gremlin-groovy]
 ----
 graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
 g = graph.traversal().withComputer(GiraphGraphComputer)
 g.V().count()
 g.V().out().out().values('name')
 ----

 IMPORTANT: The examples above do not use lambdas (i.e. closures in Gremlin-Groovy). This makes the traversal
 serializable and thus, able to be distributed to all machines in the Hadoop cluster. If a lambda is required in a
 traversal, then the traversal must be sent as a `String` and compiled locally at each machine in the cluster. The
 following example demonstrates the `:remote` command which allows for submitting Gremlin traversals as a `String`.

 [gremlin-groovy]
 ----
 graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
 g = graph.traversal().withComputer(GiraphGraphComputer)
 :remote connect tinkerpop.hadoop graph g
 :> g.V().group().by{it.value('name')[1]}.by('name')
 result
 result.memory.runtime
 ----

 NOTE: If the user explicitly specifies `giraph.maxWorkers` and/or `giraph.numComputeThreads` in the configuration,
 then these values will be used by Giraph. However, if these are not specified and the user never calls
 `GraphComputer.workers()` then `GiraphGraphComputer` will try to compute the number of workers/threads to use based
 on the cluster's profile.
	////
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	////
	[[giraphgraphcomputer]]
	==== GiraphGraphComputer

	[source,xml]
	----
	<dependency>
	<groupId>org.apache.tinkerpop</groupId>
	<artifactId>giraph-gremlin</artifactId>
	<version>x.y.z</version>
	</dependency>
	----

	image:giraph-logo.png[width=100,float=left] link:http://giraph.apache.org[Giraph] is an Apache Software Foundation
	project focused on OLAP-based graph processing. Giraph makes use of the distributed graph computing paradigm made
	popular by Google's Pregel. In Giraph, developers write "vertex programs" that get executed at each vertex in
	parallel. These programs communicate with one another in a bulk synchronous parallel (BSP) manner. This model aligns
	with TinkerPop3's `GraphComputer` API. TinkerPop3 provides an implementation of `GraphComputer` that works for Giraph
	called `GiraphGraphComputer`. Moreover, with TinkerPop3's <<mapreduce,MapReduce>>-framework, the standard
	Giraph/Pregel model is extended to support an arbitrary number of MapReduce phases to aggregate and yield results
	from the graph. Below are examples using `GiraphGraphComputer` from the <<gremlin-console,Gremlin-Console>>.

	WARNING: Giraph uses a large number of Hadoop counters. The default for Hadoop is 120. In `mapred-site.xml` it is
	possible to increase the limit it via the `mapreduce.job.counters.max` property. A good value to use is 1000. This
	is a cluster-wide property so be sure to restart the cluster after updating.

	WARNING: The maximum number of workers can be no larger than the number of map-slots in the Hadoop cluster minus 1.
	For example, if the Hadoop cluster has 4 map slots, then `giraph.maxWorkers` can not be larger than 3. One map-slot
	is reserved for the master compute node and all other slots can be allocated as workers to execute the VertexPrograms
	on the vertices of the graph.

	If `GiraphGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
	specified in `HADOOP_GREMLIN_LIBS`.

	[source,shell]
	export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/giraph-gremlin/lib

	Or, the user can specify the directory in the Gremlin Console.

	[source,groovy]
	System.setProperty('HADOOP_GREMLIN_LIBS',System.getProperty('HADOOP_GREMLIN_LIBS') + ':' + '/usr/local/gremlin-console/ext/giraph-gremlin/lib')

	[gremlin-groovy]
	----
	graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
	g = graph.traversal().withComputer(GiraphGraphComputer)
	g.V().count()
	g.V().out().out().values('name')
	----

	IMPORTANT: The examples above do not use lambdas (i.e. closures in Gremlin-Groovy). This makes the traversal
	serializable and thus, able to be distributed to all machines in the Hadoop cluster. If a lambda is required in a
	traversal, then the traversal must be sent as a `String` and compiled locally at each machine in the cluster. The
	following example demonstrates the `:remote` command which allows for submitting Gremlin traversals as a `String`.

	[gremlin-groovy]
	----
	graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
	g = graph.traversal().withComputer(GiraphGraphComputer)
	:remote connect tinkerpop.hadoop graph g
	:> g.V().group().by{it.value('name')[1]}.by('name')
	result
	result.memory.runtime
	----

	NOTE: If the user explicitly specifies `giraph.maxWorkers` and/or `giraph.numComputeThreads` in the configuration,
	then these values will be used by Giraph. However, if these are not specified and the user never calls
	`GraphComputer.workers()` then `GiraphGraphComputer` will try to compute the number of workers/threads to use based
	on the cluster's profile.