blob: af65f0785695bea31981c8e42a6032ed5d40269f [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[graphcomputer]]
The GraphComputer
=================
image:graphcomputer-puffers.png[width=350,float=right] TinkerPop3 provides two primary means of interacting with a
graph: link:http://en.wikipedia.org/wiki/Online_transaction_processing[online transaction processing] (OLTP) and
link:http://en.wikipedia.org/wiki/Online_analytical_processing[online analytical processing] (OLAP). OTLP-based
graph systems allow the user to query the graph in real-time. However, typically, real-time performance is only
possible when a local traversal is enacted. A local traversal is one that starts at a particular vertex (or small set
of vertices) and touches a small set of connected vertices (by any arbitrary path of arbitrary length). In short, OLTP
queries interact with a limited set of data and respond on the order of milliseconds or seconds. On the other hand,
with OLAP graph processing, the entire graph is processed and thus, every vertex and edge is analyzed (some times
more than once for iterative, recursive algorithms). Due to the amount of data being processed, the results are
typically not returned in real-time and for massive graphs (i.e. graphs represented across a cluster of machines),
results can take on the order of minutes or hours.
* *OLTP*: real-time, limited data accessed, random data access, sequential processing, querying
* *OLAP*: long running, entire data set accessed, sequential data access, parallel processing, batch processing
image::oltp-vs-olap.png[width=600]
The image above demonstrates the difference between Gremlin OLTP and Gremlin OLAP. With Gremlin OLTP, the graph is
walked by moving from vertex-to-vertex via incident edges. With Gremlin OLAP, all vertices are provided a
`VertexProgram`. The programs send messages to one another with the topological structure of the graph acting as the
communication network (though random message passing possible). In many respects, the messages passed are like
the OLTP traversers moving from vertex-to-vertex. However, all messages are moving independent of one another, in
parallel. Once a vertex program is finished computing, TinkerPop3's OLAP engine supports any number
link:http://en.wikipedia.org/wiki/MapReduce[`MapReduce`] jobs over the resultant graph.
IMPORTANT: `GraphComputer` was designed from the start to be used within a multi-JVM, distributed environment --
in other words, a multi-machine compute cluster. As such, all the computing objects must be able to be migrated
between JVMs. The pattern promoted is to store state information in a `Configuration` object to later be regenerated
by a loading process. It is important to realize that `VertexProgram`, `MapReduce`, and numerous particular instances
rely heavily on the state of the computing classes (not the structure, but the processes) to be stored in a
`Configuration`.
[[vertexprogram]]
VertexProgram
-------------
image:bsp-diagram.png[width=400,float=right] GraphComputer takes a `VertexProgram`. A VertexProgram can be thought of
as a piece of code that is executed at each vertex in logically parallel manner until some termination condition is
met (e.g. a number of iterations have occurred, no more data is changing in the graph, etc.). A submitted
`VertexProgram` is copied to all the workers in the graph. A worker is not an explicit concept in the API, but is
assumed of all `GraphComputer` implementations. At minimum each vertex is a worker (though this would be inefficient
due to the fact that each vertex would maintain a VertexProgram). In practice, the workers partition the vertex set
and and are responsible for the execution of the VertexProgram over all the vertices within their sphere of influence.
The workers orchestrate the execution of the `VertexProgram.execute()` method on all their vertices in an
link:http://en.wikipedia.org/wiki/Bulk_synchronous_parallel[bulk synchronous parallel] (BSP) fashion. The vertices
are able to communicate with one another via messages. There are two kinds of messages in Gremlin OLAP:
`MessageScope.Local` and `MessageScope.Global`. A local message is a message to an adjacent vertex. A global
message is a message to any arbitrary vertex in the graph. Once the VertexProgram has completed its execution,
any number of `MapReduce` jobs are evaluated. MapReduce jobs are provided by the user via `GraphComputer.mapReduce()`
or by the `VertexProgram` via `VertexProgram.getMapReducers()`.
image::graphcomputer.png[width=500]
The example below demonstrates how to submit a VertexProgram to a graph's GraphComputer. `GraphComputer.submit()`
yields a `Future<ComputerResult>`. The `ComputerResult` has the resultant computed graph which can be a full copy
of the original graph (see <<hadoop-gremlin,Hadoop-Gremlin>>) or a view over the original graph (see
<<tinkergraph,TinkerGraph>>). The ComputerResult also provides access to computational side-effects called `Memory`
(which includes, for example, runtime, number of iterations, results of MapReduce jobs, and VertexProgram-specific
memory manipulations).
[gremlin-groovy,modern]
----
result = graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
result.memory().runtime
g = result.graph().traversal(standard())
g.V().valueMap('name',PageRankVertexProgram.PAGE_RANK)
----
NOTE: This model of "vertex-centric graph computing" was made popular by Google's
link:http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html[Pregel] graph engine.
In the open source world, this model is found in OLAP graph computing systems such as link:https://giraph.apache.org/[Giraph],
link:https://hama.apache.org/[Hama], and link:http://faunus.thinkaurelius.com[Faunus]. TinkerPop3 extends the
popularized model with integrated post-processing <<mapreduce,MapReduce>> jobs over the vertex set.
[[mapreduce]]
MapReduce
---------
The BSP model proposed by Pregel stores the results of the computation in a distributed manner as properties on the
elements in the graph. In many situations, it is necessary to aggregate those resultant properties into a single
result set (i.e. a statistic). For instance, assume a VertexProgram that computes a nominal cluster for each vertex
(i.e. link:http://en.wikipedia.org/wiki/Community_structure[a graph clustering algorithm]). At the end of the
computation, each vertex will have a property denoting the cluster it was assigned to. TinkerPop3 provides the
ability to answer global questions about the clusters. For instance, in order to answer the following questions,
`MapReduce` jobs are required:
* How many vertices are in each cluster? (*presented below*)
* How many unique clusters are there? (*presented below*)
* What is the average age of each vertex in each cluster?
* What is the degree distribution of the vertices in each cluster?
A compressed representation of the `MapReduce` API in TinkerPop3 is provided below. The key idea is that the
`map`-stage processes all vertices to emit key/value pairs. Those values are aggregated on their respective key
for the `reduce`-stage to do its processing to ultimately yield more key/value pairs.
[source,java]
public interface MapReduce<MK, MV, RK, RV, R> {
public void map(final Vertex vertex, final MapEmitter<MK, MV> emitter);
public void reduce(final MK key, final Iterator<MV> values, final ReduceEmitter<RK, RV> emitter);
// there are more methods
}
IMPORTANT: The vertex that is passed into the `MapReduce.map()` method does not contain edges. The vertex only
contains original and computed vertex properties. This reduces the amount of data required to be loaded and ensures
that MapReduce is used for post-processing computed results. All edge-based computing should be accomplished in the
`VertexProgram`.
image::mapreduce.png[width=650]
The `MapReduce` extension to GraphComputer is made explicit when examining the
<<peerpressurevertexprogram,`PeerPressureVertexProgram`>> and corresponding `ClusterPopulationMapReduce`.
In the code below, the GraphComputer result returns the computed on `Graph` as well as the `Memory` of the
computation (`ComputerResult`). The memory maintain the results of any MapReduce jobs. The cluster population
MapReduce result states that there are 5 vertices in cluster 1 and 1 vertex in cluster 6. This can be verified
(in a serial manner) by looking at the `PeerPressureVertexProgram.CLUSTER` property of the resultant graph. Notice
that the property is "hidden" unless it is directly accessed via name.
[gremlin-groovy,modern]
----
graph = TinkerFactory.createModern()
result = graph.compute().program(PeerPressureVertexProgram.build().create()).mapReduce(ClusterPopulationMapReduce.build().create()).submit().get()
result.memory().get('clusterPopulation')
g = result.graph().traversal(standard())
g.V().values(PeerPressureVertexProgram.CLUSTER).groupCount().next()
g.V().valueMap()
----
If there are numerous statistics desired, then its possible to register as many MapReduce jobs as needed. For
instance, the `ClusterCountMapReduce` determines how many unique clusters were created by the peer pressure algorithm.
Below both `ClusterCountMapReduce` and `ClusterPopulationMapReduce` are computed over the resultant graph.
[gremlin-groovy,modern]
----
result = graph.compute().program(PeerPressureVertexProgram.build().create()).
mapReduce(ClusterPopulationMapReduce.build().create()).
mapReduce(ClusterCountMapReduce.build().create()).submit().get()
result.memory().clusterPopulation
result.memory().clusterCount
----
IMPORTANT: The MapReduce model of TinkerPop3 does not support MapReduce chaining. Thus, the order in which the
MapReduce jobs are executed is irrelevant. This is made apparent when realizing that the `map()`-stage takes a
`Vertex` as its input and the `reduce()`-stage yields key/value pairs. Thus, the results of reduce can not feed back
into map.
A Collection of VertexPrograms
------------------------------
TinkerPop3 provides a collection of VertexPrograms that implement common algorithms. This section discusses the various implementations.
IMPORTANT: The vertex programs presented are what are provided as of TinkerPop x.y.z. Over time, with future releases,
more algorithms will be added.
[[pagerankvertexprogram]]
PageRankVertexProgram
~~~~~~~~~~~~~~~~~~~~~
image:gremlin-pagerank.png[width=400,float=right] link:http://en.wikipedia.org/wiki/PageRank[PageRank] is perhaps the
most popular OLAP-oriented graph algorithm. This link:http://en.wikipedia.org/wiki/Centrality[eigenvector centrality]
variant was developed by Brin and Page of Google. PageRank defines a centrality value for all vertices in the graph,
where centrality is defined recursively where a vertex is central if it is connected to central vertices. PageRank is
an iterative algorithm that converges to a link:http://en.wikipedia.org/wiki/Ergodicity[steady state distribution]. If
the pageRank values are normalized to 1.0, then the pageRank value of a vertex is the probability that a random walker
will be seen that that vertex in the graph at any arbitrary moment in time. In order to help developers understand the
methods of a `VertexProgram`, the PageRankVertexProgram code is analyzed below.
[source,java]
----
public class PageRankVertexProgram implements VertexProgram<Double> { <1>
private MessageScope.Local<Double> incidentMessageScope = MessageScope.Local.of(__::outE); <2>
private MessageScope.Local<Double> countMessageScope = MessageScope.Local.of(new MessageScope.Local.ReverseTraversalSupplier(this.incidentMessageScope));
public static final String PAGE_RANK = "gremlin.pageRankVertexProgram.pageRank"; <3>
public static final String EDGE_COUNT = "gremlin.pageRankVertexProgram.edgeCount";
private static final String VERTEX_COUNT = "gremlin.pageRankVertexProgram.vertexCount";
private static final String ALPHA = "gremlin.pageRankVertexProgram.alpha";
private static final String TOTAL_ITERATIONS = "gremlin.pageRankVertexProgram.totalIterations";
private static final String INCIDENT_TRAVERSAL_SUPPLIER = "gremlin.pageRankVertexProgram.incidentTraversalSupplier";
private ConfigurationTraversal<Vertex, Edge> configurationTraversal;
private double vertexCountAsDouble = 1.0d;
private double alpha = 0.85d;
private int totalIterations = 30;
private static final Set<String> COMPUTE_KEYS = new HashSet<>(Arrays.asList(PAGE_RANK, EDGE_COUNT));
private PageRankVertexProgram() {}
@Override
public void loadState(final Graph graph, final Configuration configuration) { <4>
if (configuration.containsKey(TRAVERSAL_SUPPLIER)) {
this.configurationTraversal = ConfigurationTraversal.loadState(graph, configuration, TRAVERSAL_SUPPLIER);
this.incidentMessageScope = MessageScope.Local.of(this.configurationTraversal);
this.countMessageScope = MessageScope.Local.of(new MessageScope.Local.ReverseTraversalSupplier(this.incidentMessageScope));
}
this.vertexCountAsDouble = configuration.getDouble(VERTEX_COUNT, 1.0d);
this.alpha = configuration.getDouble(ALPHA, 0.85d);
this.totalIterations = configuration.getInt(TOTAL_ITERATIONS, 30);
}
@Override
public void storeState(final Configuration configuration) {
configuration.setProperty(VERTEX_PROGRAM, PageRankVertexProgram.class.getName());
configuration.setProperty(VERTEX_COUNT, this.vertexCountAsDouble);
configuration.setProperty(ALPHA, this.alpha);
configuration.setProperty(TOTAL_ITERATIONS, this.totalIterations);
if (null != this.traversalSupplier) {
this.traversalSupplier.storeState(configuration);
}
}
@Override
public Set<String> getElementComputeKeys() { <5>
return COMPUTE_KEYS;
}
@Override
public Optional<MessageCombiner<Double>> getMessageCombiner() {
return (Optional) PageRankMessageCombiner.instance();
}
@Override
public Set<MessageScope> getMessageScopes(final int iteration) {
final Set<MessageScope> set = new HashSet<>();
set.add(0 == iteration ? this.countMessageScope : this.incidentMessageScope);
return set;
}
@Override
public void setup(final Memory memory) {
}
@Override
public void execute(final Vertex vertex, Messenger<Double> messenger, final Memory memory) { <6>
if (memory.isInitialIteration()) { <7>
messenger.sendMessage(this.countMessageScope, 1.0d);
} else if (1 == memory.getIteration()) { <8>
double initialPageRank = 1.0d / this.vertexCountAsDouble;
double edgeCount = IteratorUtils.reduce(messenger.receiveMessages(), 0.0d, (a, b) -> a + b);
vertex.property(PAGE_RANK, initialPageRank);
vertex.property(EDGE_COUNT, edgeCount);
messenger.sendMessage(this.incidentMessageScope, initialPageRank / edgeCount);
} else { <9>
double newPageRank = IteratorUtils.reduce(messenger.receiveMessages(), 0.0d, (a, b) -> a + b);
newPageRank = (this.alpha * newPageRank) + ((1.0d - this.alpha) / this.vertexCountAsDouble);
vertex.property(PAGE_RANK, newPageRank);
messenger.sendMessage(this.incidentMessageScope, newPageRank / vertex.<Double>value(EDGE_COUNT));
}
}
@Override
public boolean terminate(final Memory memory) { <10>
return memory.getIteration() >= this.totalIterations;
}
@Override
public String toString() {
return StringFactory.vertexProgramString(this, "alpha=" + this.alpha + ",iterations=" + this.totalIterations);
}
}
----
<1> `PageRankVertexProgram` implements `VertexProgram<Double>` because the messages it sends are Java doubles.
<2> The default path of energy propagation is via outgoing edges from the current vertex.
<3> The resulting PageRank values for the vertices are stored as a hidden property.
<4> A vertex program is constructed using an Apache `Configuration` to ensure easy dissemination across a cluster of JVMs.
<5> A vertex program must define the "compute keys" that are the properties being operated on during the computation.
<6> The "while"-loop of the vertex program.
<7> In order to determine how to distribute the energy to neighbors, a "1"-count is used to determine how many incident vertices exist for the `MessageScope`.
<8> Initially, each vertex is provided an equal amount of energy represented as a double.
<9> Energy is aggregated, computed on according to the PageRank algorithm, and then disseminated according to the defined `MessageScope.Local`.
<10> The computation is terminated after a pre-defined number of iterations.
The above `PageRankVertexProgram` is used as follows.
[gremlin-groovy,modern]
----
result = graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
result.memory().runtime
g = result.graph().traversal(standard())
g.V().valueMap('name',PageRankVertexProgram.PAGE_RANK)
----
[[peerpressurevertexprogram]]
PeerPressureVertexProgram
~~~~~~~~~~~~~~~~~~~~~~~~~
The `PeerPressureVertexProgram` is a clustering algorithm that assigns a nominal value to each vertex in the graph.
The nominal value represents the vertex's cluster. If two vertices have the same nominal value, then they are in the
same cluster. The algorithm proceeds in the following manner.
. Every vertex assigns itself to a unique cluster ID (initially, its vertex ID).
. Every vertex determines its per neighbor vote strength as 1.0d / incident edges count.
. Every vertex sends its cluster ID and vote strength to its adjacent vertices as a `Pair<Serializable,Double>`
. Every vertex generates a vote energy distribution of received cluster IDs and changes its current cluster ID to the most frequent cluster ID.
.. If there is a tie, then the cluster with the lowest `toString()` comparison is selected.
. Steps 3 and 4 repeat until either a max number of iterations has occurred or no vertex has adjusted its cluster anymore.
[[bulkloadervertexprogram]]
BulkLoaderVertexProgram
~~~~~~~~~~~~~~~~~~~~~~~
The `BulkLoaderVertexProgram` provides a generalized way for loading graphs of any size (preferably large sized graphs)
into a persistent `Graph`. The input can be any existing `Graph` database supporting TinkerPop3 or any of the Hadoop
GraphInputFormats (e.g. `GraphSONInputFormat`, `GryoInputFormat` or `ScriptInputFormat`). The following example
demonstrates how to load data from one TinkerGraph to another:
[gremlin-groovy]
----
writeGraphConf = new BaseConfiguration()
writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")
writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")
writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/tinkergraph.kryo")
modern = TinkerFactory.createModern()
blvp = BulkLoaderVertexProgram.build().
keepOriginalIds(false).
writeGraph(writeGraphConf).create(modern)
modern.compute().workers(1).program(blvp).submit().get()
graph = GraphFactory.open(writeGraphConf)
g = graph.traversal()
g.V().valueMap()
graph.close()
----
.Available configuration options
[width="800px",options="header"]
|========================================
|Builder Method |Purpose | Default Value
|`bulkLoader(Class\|String)` | Sets the class of the bulk loader implementation. | `IncrementalBulkLoader`
|`vertexIdProperty(String)` | Sets the name of the property in the target graph that holds the vertex id from the
source graph. | `bulkLoader.vertex.id`
|`keepOriginalIds(boolean)` |Whether to keep the id's from the source graph in the target graph or not. It's
recommended to keep them if it's planned to do further bulk loads using the same datasources. | `true`
|`userSuppliedIds(boolean)` |Whether to use the id's from the source graph as id's in the target graph. If set to
`true`, `vertexIdProperty` will be ignored. Note, that the target graph must support user supplied identifiers. | `false`
|`intermediateBatchSize(int)` |Sets the batch size for intermediate transactions. This is per thread in a
multi-threaded environment. +0+ means that transactions will only be committed at the end of an iteration cycle.
It's recommended to tune this property for the target graph and not use the default value of +0+. | `0`
|`writeGraph(String)` | Sets the path to a `GraphFactory` compatible configuration file for the target graph. | _none_
|========================================
NOTE: `BulkLoaderVertexProgram` comes with a default `BulkLoader` implementation, namely `IncrementalBulkLoader`. It
will work for the most use-cases, but has one limitation though: It doesn't support multi-valued properties.
`IncrementalBulkLoader` will handle every property as a single-valued property. A custom `BulkLoader` implementation
has to be used if the default behavior is not sufficient.
NOTE: A custom `BulkLoader` implementation for incremental loading should use `GraphTraversal` methods to create/update
elements (e.g. `g.addV()` instead of `graph.addVertex()`). This way the `BulkLoaderVertexProgram` is able to efficiently
track changes in the underlying graph and can apply several optimization techniques.
[[traversalvertexprogram]]
TraversalVertexProgram
~~~~~~~~~~~~~~~~~~~~~~
image:traversal-vertex-program.png[width=250,float=left] The `TraversalVertexProgram` is a "special" VertexProgram in
that it can be executed via a `GraphTraversal` with a `ComputerTraversalEngine`. In Gremlin, it is possible to have
the same traversal executed using either the standard OTLP-engine or the `GraphComputer` OLAP-engine. The difference
being where the traversal is submitted.
NOTE: This model of graph traversal in a BSP system was first implemented by the
link:http://faunus.thinkaurelius.com[Faunus] graph analytics engine and originally described in
link:http://markorodriguez.com/2011/04/19/local-and-distributed-traversal-engines/[Local and Distributed Traversal Engines].
[gremlin-groovy,modern]
----
g = graph.traversal(standard())
g.V().both().hasLabel('person').values('age').groupCount().next() // OLTP
g = graph.traversal(computer())
g.V().both().hasLabel('person').values('age').groupCount().next() // OLAP
----
image::olap-traversal.png[width=650]
In the OLAP example above, a `TraversalVertexProgram` is (logically) sent to each vertex in the graph. Each instance
evaluation requires (logically) 5 BSP iterations and each iteration is interpreted as such:
. `g.V()`: Put a traverser on each vertex in the graph.
. `both()`: Propagate each traverser to the vertices `both`-adjacent to its current vertex.
. `hasLabel('person')`: If the vertex is not a person, kill the traversers at that vertex.
. `values('age')`: Have all the traversers reference the integer age of their current vertex.
. `groupCount()`: Count how many times a particular age has been seen.
While 5 iterations were presented, in fact, `TraversalVertexProgram` will execute the traversal in only
3 iterations. The reason being is that `hasLabel('person').values('age').groupCount()` can all be executed in a
single iteration as any message sent would simply be to the current executing vertex. Thus, a simple optimization
exists in Gremlin OLAP called "reflexive message passing" which simulates non-message-passing BSP iterations within a
single BSP iteration.
When the computation is complete a <<mapreduce,MapReduce>> job executes which aggregates all the `groupCount()`
sideEffect Map (i.e. "`HashMap`") objects on each vertex into a single local representation (thus, turning the
distributed Map representation into a local Map representation).
////
The same OLAP traversal can be executed using the standard `g.compute()` model, though at the expense of verbosity.
`TraversalVertexProgram` provides a fluent `Builder` for constructing a `TraversalVertexProgram`. The specified
`traversal()` can be either a `Supplier<Traversal>` object, a `Supplier<Traversal>` class, or a
link:http://en.wikipedia.org/wiki/Scripting_for_the_Java_Platform[JSR-223] script that will generate (i.e. supply) a
`Traversal`. If `traversal()` is supplied a single string, it is assumed that "gremlin-groovy" is the `ScriptEngine`
to use. If two strings are supplied, then the first string denotes the `ScriptEngine` to evaluate the second string
script with in order to generate (i.e. supply) the `Traversal`.
[gremlin-groovy,modern]
----
//g.engine(computer())
//result = g.compute().program(TraversalVertexProgram.build().traversal(g.V().both().hasLabel('person').values('age').groupCount('a')).create()).submit().get()
//result.memory().a
//result.memory().iteration
//result.memory().runtime
----
////
[[distributed-gremlin-gotchas]]
Distributed Gremlin Gotchas
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Gremlin OLTP is not identical to Gremlin OLAP.
IMPORTANT: There are two primary theoretical differences between Gremlin OLTP and Gremlin OLAP. First, Gremlin OLTP
(via `Traversal`) leverages a link:http://en.wikipedia.org/wiki/Depth-first_search[depth-first] execution engine.
Depth-first execution has a limited memory footprint due to link:http://en.wikipedia.org/wiki/Lazy_evaluation[lazy evaluation].
On the other hand, Gremlin OLAP (via `TraversalVertexProgram`) leverages a
link:http://en.wikipedia.org/wiki/Breadth-first_search[breadth-first] execution engine which maintains a larger memory
footprint, but a better time complexity due to vertex-local traversers being able to be merged. The second difference
is that Gremlin OLTP is executed in a serial fashion, while Gremlin OLAP is executed in a parallel fashion. These two
fundamental differences lead to the behaviors enumerated below.
image::gremlin-without-a-cause.png[width=200,float=right]
. Traversal sideEffects are represented as a distributed data structure across the graph's vertex set. It is not
possible to get a global view of a sideEffect until it is aggregated via a <<mapreduce,MapReduce>> job. In some
situations, the local vertex representation of the sideEffect is sufficient to ensure the intended semantics of the
traversal are respected. However, this is not generally true so be wary of traversals that require global views of a
sideEffect.
. When evaluating traversals that rely on path information (i.e. the history of the traversal), practical
computational limits can easily be reached due the link:http://en.wikipedia.org/wiki/Combinatorial_explosion[combinatoric explosion]
of data. With path computing enabled, every traverser is unique and thus, must be enumerated as opposed to being
counted/merged. The difference being a collection of paths vs. a single 64-bit long at a single vertex. For more
information on this concept, please see link:http://thinkaurelius.com/2012/11/11/faunus-provides-big-graph-data-analytics/[Faunus Provides Big Graph Data].
. When traversals of the form `x.as('a').y.someSideEffectStep('a').z` are evaluated, the `a` object is stored in the
path information of the traverser and thus, such traversals (may) turn on path calculations when executed on a
`GraphComputer`
. Steps that are concerned with the global ordering of traversers do not have a meaningful representation in
OLAP. For example, what does <<order-step,`order()`>>-step mean when all traversers are being processed in parallel?
Even if the traversers were aggregated and ordered, then at the next step they would return to being executed in
parallel and thus, in an unpredictable order. When `order()`-like steps are executed at the end of a traversal (i.e
the final step), the `TraverserMapReduce` job ensures the resultant serial representation is ordered accordingly.
. Steps that are concerned with providing a global aggregate to the next step of computation do not have a correlate
in OLAP. For example, <<fold-step,`fold()`>>-step can only fold up the objects at each executing vertex. Next, even
if a global fold was possible, where would it go? Which vertex would be the host of the data structure? The
`fold()`-step only makes sense as an end-step whereby a MapReduce job can generate the proper global-to-local data
reduction.