blob: 7c958e64af4e9b0aebc091c04907b38fb8ad4494 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[traversal]]
= The Traversal
image::gremlin-running.png[width=125]
At the most general level there is `Traversal<S,E>` which implements `Iterator<E>`, where the `S` stands for start and
the `E` stands for end. A traversal is composed of four primary components:
. `Step<S,E>`: an individual function applied to `S` to yield `E`. Steps are chained within a traversal.
. `TraversalStrategy`: interceptor methods to alter the execution of the traversal (e.g. query re-writing).
. `TraversalSideEffects`: key/value pairs that can be used to store global information about the traversal.
. `Traverser<T>`: the object propagating through the `Traversal` currently representing an object of type `T`.
The classic notion of a graph traversal is provided by `GraphTraversal<S,E>` which extends `Traversal<S,E>`.
`GraphTraversal` provides an interpretation of the graph data in terms of vertices, edges, etc. and thus, a graph
traversal link:http://en.wikipedia.org/wiki/Domain-specific_language[DSL].
image::step-types.png[width=650]
A `GraphTraversal<S,E>` is spawned from a `GraphTraversalSource`. It can also be spawned anonymously (i.e. empty)
via `__`. A graph traversal is composed of an ordered list of steps. All the steps provided by `GraphTraversal`
inherit from the more general forms diagrammed above. A list of all the steps (and their descriptions) are provided
in the TinkerPop link:https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html[GraphTraversal JavaDoc].
IMPORTANT: The basics for starting a traversal are described in <<the-graph-process,The Graph Process>> section as
well as in the link:https://tinkerpop.apache.org/docs/current/tutorials/getting-started/[Getting Started] tutorial.
NOTE: To reduce the verbosity of the expression, it is good to
`+import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.*+`. This way, instead of doing `+__.inE()+`
for an anonymous traversal, it is possible to simply write `inE()`. Be aware of language-specific reserved keywords
when using anonymous traversals. For example, `in` and `as` are reserved keywords in Groovy, therefore you must use
the verbose syntax `+__.in()+` and `+__.as()+` to avoid collisions.
IMPORTANT: The underlying `Step` implementations provided by TinkerPop should encompass most of the functionality
required by a DSL author. It is important that DSL authors leverage the provided steps as then the common optimization
and decoration strategies can reason on the underlying traversal sequence. If new steps are introduced, then common
traversal strategies may not function properly.
[[transactions]]
== Traversal Transactions
image:gremlin-coins.png[width=100,float=right] A link:http://en.wikipedia.org/wiki/Database_transaction[database transaction]
represents a unit of work to execute against the database. A traversals unit of work is affected by usage convention
(i.e. the method of <<connecting-gremlin, connecting>>) and the graph provider's transaction model. Without diving
deeply into different conventions and models the most general and recommended approach to working with transactions is
demonstrated as follows:
[source,java]
----
GraphTraversalSource g = traversal().withEmbedded(graph);
// or
GraphTraversalSource g = traversal().withRemote(conn);
Transaction tx = g.tx();
// spawn a GraphTraversalSource from the Transaction. Traversals spawned
// from gtx will be essentially be bound to tx
GraphTraversalSource gtx = tx.begin();
try {
gtx.addV('person').iterate();
gtx.addV('software').iterate();
tx.commit();
} catch (Exception ex) {
tx.rollback();
}
----
The above example is straightforward and represents a good starting point for discussing the nuances of transactions
in relation to the usage convention and graph provider caveats alluded to earlier.
Focusing on remote contexts first, note that it is still possible to issue traversals from `g`, but those will have a
transaction scope outside of `gtx` and will simply `commit()` on the server if successfully executed or `rollback()`
on the server otherwise (i.e. one traversal is one transaction). Each isolated transaction will require its own
`Transaction` object. Multiple `begin()` calls on the same `Transaction` object will produce `GraphTraversalSource`
instances that are bound to the same transaction, therefore:
[source,java]
----
GraphTraversalSource g = traversal().withRemote(conn);
Transaction tx1 = g.tx();
Transaction tx2 = g.tx();
// both gtx1a and gtx1b will be bound to the same transaction
GraphTraversalSource gtx1a = tx1.begin();
GraphTraversalSource gtx1b = tx1.begin();
// g and gtx2 will not have knowledge of what happens in tx1
GraphTraversalSource gtx2 = tx2.begin();
----
In remote cases, `GraphTraversalSource` instances spawned from `begin()` are safe to use in multiple threads though
on the server side they will be processed serially as they arrive. The default behavior of `close()` on a
`Transaction` for remote cases is to `commit()`, so the following re-write of the earlier example is also valid:
[source,java]
----
// note here that we dispense with creating a Transaction object and
// simply spawn the gtx in a more inline fashion
GraphTraversalSource gtx = g.tx().begin();
try {
gtx.addV('person').iterate();
gtx.addV('software').iterate();
gtx.close();
} catch (Exception ex) {
tx.rollback();
}
----
IMPORTANT: Transactions with non-JVM languages are always "remote". For specific transaction syntax in a particular
language, please see the "Transactions" sub-section of your language of interest in the
<<gremlin-drivers-variants,Gremlin Drivers and Variants>> section.
In embedded cases, that initial recommended model for defining transactions holds, but users have more options here
on deeper inspection. For embedded use cases (and perhaps even in configuration of a graph instance in Gremlin Server),
the type of `Transaction` object that is returned from `g.tx()` is an important indicator as to the features of that
graph's transaction model. In most cases, inspection of that object will indicate an instance that derives from the
`AbstractThreadLocalTransaction` class, which means that the transaction is bound to the current thread and therefore
all traversals that execute within that thread are tied to that transaction.
A `ThreadLocal` transaction differs then from the remote case described before because technically any traversal
spawned from `g` or from a `Transaction` will fall under the same transaction scope. As a result, it is wise, when
trying to write context agnostic Gremlin, to follow the more rigid conventions of the initial example.
The sub-sections that follow offer a bit more insight into each of the usage contexts.
[[tx-embedded]]
=== Embedded
When on the JVM using an <<connecting-embedded,embedded graph>>, there is considerable flexibility for working with
transactions. With the Graph API, transactions are controlled by an implementation of the `Transaction` interface and
that object can be obtained from the `Graph` interface using the `tx()` method. It is important to note that the
`Transaction` object does not represent a "transaction" itself. It merely exposes the methods for working with
transactions (e.g. committing, rolling back, etc).
Most `Graph` implementations that `supportsTransactions` will implement an "automatic" `ThreadLocal` transaction,
which means that when a read or write occurs after the `Graph` is instantiated, a transaction is automatically
started within that thread. There is no need to manually call a method to "create" or "start" a transaction. Simply
modify the graph as required and call `graph.tx().commit()` to apply changes or `graph.tx().rollback()` to undo them.
When the next read or write action occurs against the graph, a new transaction will be started within that current
thread of execution.
When using transactions in this fashion, especially in web application (e.g. HTTP server), it is important to ensure
that transactions do not leak from one request to the next. In other words, unless a client is somehow bound via
session to process every request on the same server thread, every request must be committed or rolled back at the end
of the request. By ensuring that the request encapsulates a transaction, it ensures that a future request processed
on a server thread is starting in a fresh transactional state and will not have access to the remains of one from an
earlier request. A good strategy is to rollback a transaction at the start of a request, so that if it so happens that
a transactional leak does occur between requests somehow, a fresh transaction is assured by the fresh request.
TIP: The `tx()` method is on the `Graph` interface, but it is also available on the `TraversalSource` spawned from a
`Graph`. Calls to `TraversalSource.tx()` are proxied through to the underlying `Graph` as a convenience.
TIP: Some graphs may throw an exception that implements `TemporaryException`. In this case, this marker interface is
designed to inform the client that it may choose to retry the operation at a later time for possible success.
WARNING: TinkerPop provides for basic transaction control, however, like many aspects of TinkerPop, it is up to the
graph system provider to choose the specific aspects of how their implementation will work and how it fits into the
TinkerPop stack. Be sure to understand the transaction semantics of the specific graph implementation that is being
utilized as it may present differing functionality than described here.
==== Configuring
Determining when a transaction starts is dependent upon the behavior assigned to the `Transaction`. It is up to the
`Graph` implementation to determine the default behavior and unless the implementation doesn't allow it, the behavior
itself can be altered via these `Transaction` methods:
[source,java]
----
public Transaction onReadWrite(Consumer<Transaction> consumer);
public Transaction onClose(Consumer<Transaction> consumer);
----
Providing a `Consumer` function to `onReadWrite` allows definition of how a transaction starts when a read or a write
occurs. `Transaction.READ_WRITE_BEHAVIOR` contains pre-defined `Consumer` functions to supply to the `onReadWrite`
method. It has two options:
* `AUTO` - automatic transactions where the transaction is started implicitly to the read or write operation
* `MANUAL` - manual transactions where it is up to the user to explicitly open a transaction, throwing an exception
if the transaction is not open
Providing a `Consumer` function to `onClose` allows configuration of how a transaction is handled when
`Transaction.close()` is called. `Transaction.CLOSE_BEHAVIOR` has several pre-defined options that can be supplied to
this method:
* `COMMIT` - automatically commit an open transaction
* `ROLLBACK` - automatically rollback an open transaction
* `MANUAL` - throw an exception if a transaction is open, forcing the user to explicitly close the transaction
IMPORTANT: As transactions are `ThreadLocal` in nature, so are the transaction configurations for `onReadWrite` and
`onClose`.
Once there is an understanding for how transactions are configured, most of the rest of the `Transaction` interface
is self-explanatory. Note that <<neo4j-gremlin,Neo4j-Gremlin>> is used for the examples to follow as TinkerGraph does
not support transactions.
IMPORTANT: The following example is meant to demonstrate specific use of `ThreadLocal` transactions and is at odds
with the more generalized transaction convention that is recommended for both embedded and remote contexts. Please be
sure to understand the preferred approach described at in the <<transactions,Traversal Transactions Section>> before
using this method.
[source,groovy]
----
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
gremlin> graph.features()
==>FEATURES
> GraphFeatures
>-- Transactions: true <1>
>-- Computer: false
>-- Persistence: true
...
gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.AUTO) <2>
==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
gremlin> g.addV("person").("name","stephen") <3>
==>v[0]
gremlin> g.tx().commit() <4>
==>null
gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.MANUAL) <5>
==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
gremlin> g.tx().isOpen()
==>false
gremlin> g.addV("person").("name","marko") <6>
Open a transaction before attempting to read/write the transaction
gremlin> g.tx().open() <7>
==>null
gremlin> g.addV("person").("name","marko") <8>
==>v[1]
gremlin> g.tx().commit()
==>null
----
<1> Check `features` to ensure that the graph supports transactions.
<2> By default, `Neo4jGraph` is configured with "automatic" transactions, so it is set here for demonstration purposes only.
<3> When the vertex is added, the transaction is automatically started. From this point, more mutations can be staged
or other read operations executed in the context of that open transaction.
<4> Calling `commit` finalizes the transaction.
<5> Change transaction behavior to require manual control.
<6> Adding a vertex now results in failure because the transaction was not explicitly opened.
<7> Explicitly open a transaction.
<8> Adding a vertex now succeeds as the transaction was manually opened.
NOTE: It may be important to consult the documentation of the `Graph` implementation you are using when it comes to the
specifics of how transactions will behave. TinkerPop allows some latitude in this area and implementations may not have
the exact same behaviors and link:https://en.wikipedia.org/wiki/ACID[ACID] guarantees.
[[tx-gremlin-server]]
=== Gremlin Server
The available capability for transactions with <<gremlin-server,Gremlin Server>> is dependent upon the method of
interaction that is used. The preferred method for <<connecting-gremlin-server,interacting with Gremlin Server>>
is via websockets and bytecode based requests. The start of the <<transactions,Transactions Section>> describes this
approach in detail with examples.
Gremlin Server also has the option to accept Gremlin-based scripts. The scripting approach provides access to the
Graph API and thus also the transactional model described in the <<tx-embedded,embedded>> section. Therefore a single
script can have the ability to execute multiple transactions per request with complete control provided to the
developer to commit or rollback transactions as needed.
There are two methods for sending scripts to Gremlin Server: sessionless and session-based. With sessionless requests
there will always be an attempt to close the transaction at the end of the request with a commit if there are no errors
or a rollback if there is a failure. It is therefore unnecessary to close transactions manually within scripts
themselves. By default, session-based requests do not have this quality. The transaction will be held open on the
server until the user closes it manually. There is an option to have automatic transaction management for sessions.
More information on this topic can be found in the <<considering-transactions,Considering Transactions>> Section and
the <<sessions,Considering Sessions>> Section.
[[tx-rgp]]
=== Remote Gremlin Providers
At this time, transactional patterns for Remote Gremlin Providers are largely in line with Gremlin Server. As most of
RGPs do not expose a `Graph` instance, access to lower level transactional functions available to embedded graphs
even in a sessionless fashion are not typically permitted. For example, without a `Graph` instance it is not possible
to link:https://tinkerpop.apache.org/docs/current/reference/#tx-embedded[configure] transaction close or read-write
behaviors. The nature of what a "transaction" means will be dependent on the RGP as is the case with any
TinkerPop-enabled graph system, so it is important to consult that systems documentation for more details.
[[configuration-steps]]
== Configuration Steps
Many of the methods on the `GraphTraversalSource` are meant to configure the source for usage. These configuration
affect the manner in which a traversals are spawned from it. Configuration methods can be identified by their names
with make use of "with" as a prefix:
[[configuration-steps-with]]
=== With Configuration
The `with()` configuration adds arbitrary data to a `TraversalSource` which can then be used by graph providers as
configuration options for a traversal execution. This configuration is similar to <<with-step,with()>>-modulator which
has similar functionality when applied to an individual step.
[source,groovy]
----
g.with('providerDefinedVariable', 0.33).V()
----
The `0.33` value for the "providerDefinedVariable" will be bound to each traversal spawned that way. Consult the
graph system being used to determine if any such configuration options are available.
[[configuration-steps-withbulk]]
=== WithBulk Configuration
The `withBulk()` configuration allows for control of bulking operations. This value is `true` by default allowing for
normal <<barrier-step,bulking>> operations, but when set to `false`, introduces a subtle change in that behavior as
shown in examples in <<sack-step,sack()-step>>.
[[configuration-steps-withcomputer]]
=== WithComputer Configuration
The `withComputer()` configuration adds a `Computer` that will be used to process the traversal and is necessary for
OLAP based processing and steps that require that processing. See <<sparkgraphcomputer,examples>> related to
`SparkGraphComputer` or see examples in the computer required steps, like <<pagerank-step,pageRank()>> or
<<shortestpath-shortestPath()>>.
[[configuration-steps-withsack]]
=== WithSack Configuration
The `withSack()` configuration adds a "sack" that can be accessed by traversals spawned from this source. This
functionality is shown in more detail in the examples for (<<sack-step,sack()>>)-step.
[[configuration-steps-withsideeffect]]
=== WithSideEffect Configuration
The `withSideEffect()` configuration adds an arbitrary `Object` to traversals spawned from this source which can be
accessed as a side-effect given the supplied key.
[gremlin-groovy,modern]
----
g.withSideEffect('x',['dog','cat','fish']).
V().has('person','name','marko').select('x').unfold()
----
More practical examples can be found in other examples elsewhere in the documentation. The `math()`-step
<<math-step,example>> and the `where()`-step <<where-step,example>> should both be helpful in examining this
configuration step more closely.
[[configuration-steps-withstrategies]]
=== WithStrategies Configuration
The `withStrategies()` configuration allows inclusion of additional `TraversalStrategy` instances to be applied to
any traversals spawned from the configured source. Please see the <<traversal-strategy,Traversal Strategy Section>>
for more details on how this configuration works.
[[configuration-steps-withoutstrategies]]
=== WithoutStrategies Configuration
The `withoutStrategies()` configuration removes a particular `TraversalStrategy` from those to be applied to traversals
spawned from the configured source. Please see the <<traversalstrategy,Traversal Strategy Section>> for more details
on how this configuration works.
[[start-steps]]
== Start Steps
Not all steps are capable of starting a `GraphTraversal`. Only those steps on the `GraphTraversalSource` can do that.
Many of the methods on `GraphTraversalSource` are actually for its <<configuration-steps,configuration>> and start
steps should not be confused with those.
Spawn steps, which actually yield a traversal, typically match the names of existing steps:
* `addE()` - Adds an `Edge` to start the traversal (<<addedge-step, example>>).
* `addV()` - Adds a `Vertex` to start the traversal (<<addvertex-step, example>>).
* `E()` - Reads edges from the graph to start the traversal (<<graph-step, example>>).
* `inject()` - Inserts arbitrary objects to start the traversal (<<inject-step, example>>).
* `V()` - Reads vertices from the graph to start the traversal (<<graph-step, example>>).
[[graph-traversal-steps]]
== Graph Traversal Steps
Gremlin steps are chained together to produce the actual traversal and are triggered by way of <<start-steps,start steps>>
on the `GraphTraversalSource`.
[[general-steps]]
=== General Steps
There are five general steps, each having a traversal and a lambda representation, by which all other specific steps described later extend.
[width="100%",cols="10,12",options="header"]
|=========================================================
| Step| Description
| `map(Traversal<S, E>)` `map(Function<Traverser<S>, E>)` | map the traverser to some object of type `E` for the next step to process.
| `flatMap(Traversal<S, E>)` `flatMap(Function<Traverser<S>, Iterator<E>>)` | map the traverser to an iterator of `E` objects that are streamed to the next step.
| `filter(Traversal<?, ?>)` `filter(Predicate<Traverser<S>>)` | map the traverser to either true or false, where false will not pass the traverser to the next step.
| `sideEffect(Traversal<S, S>)` `sideEffect(Consumer<Traverser<S>>)` | perform some operation on the traverser and pass it to the next step.
| `branch(Traversal<S, M>)` `branch(Function<Traverser<S>,M>)` | split the traverser to all the traversals indexed by the `M` token.
|=========================================================
WARNING: Lambda steps are presented for educational purposes as they represent the foundational constructs of the
Gremlin language. In practice, lambda steps should be avoided in favor of their traversals representation and traversal
verification strategies exist to disallow their use unless explicitly "turned off." For more information on the problems
with lambdas, please read <<a-note-on-lambdas,A Note on Lambdas>>.
The `Traverser<S>` object provides access to:
. The current traversed `S` object -- `Traverser.get()`.
. The current path traversed by the traverser -- `Traverser.path()`.
.. A helper shorthand to get a particular path-history object -- `Traverser.path(String) == Traverser.path().get(String)`.
. The number of times the traverser has gone through the current loop -- `Traverser.loops()`.
. The number of objects represented by this traverser -- `Traverser.bulk()`.
. The local data structure associated with this traverser -- `Traverser.sack()`.
. The side-effects associated with the traversal -- `Traverser.sideEffects()`.
.. A helper shorthand to get a particular side-effect -- `Traverser.sideEffect(String) == Traverser.sideEffects().get(String)`.
image:map-lambda.png[width=150,float=right]
[gremlin-groovy,modern]
----
g.V(1).out().values('name') <1>
g.V(1).out().map {it.get().value('name')} <2>
g.V(1).out().map(values('name')) <3>
----
<1> An outgoing traversal from vertex 1 to the name values of the adjacent vertices.
<2> The same operation, but using a lambda to access the name property values.
<3> Again the same operation, but using the traversal representation of `map()`.
image:filter-lambda.png[width=160,float=right]
[gremlin-groovy,modern]
----
g.V().filter {it.get().label() == 'person'} <1>
g.V().filter(label().is('person')) <2>
g.V().hasLabel('person') <3>
----
<1> A filter that only allows the vertex to pass if it has the "person" label
<2> The same operation, but using the traversal representation of `filter()`.
<3> The more specific `has()`-step is implemented as a `filter()` with respective predicate.
image:side-effect-lambda.png[width=175,float=right]
[gremlin-groovy,modern]
----
g.V().hasLabel('person').sideEffect(System.out.&println) <1>
g.V().sideEffect(outE().count().aggregate(local,"o")).
sideEffect(inE().count().aggregate(local,"i")).cap("o","i") <2>
----
<1> Whatever enters `sideEffect()` is passed to the next step, but some intervening process can occur.
<2> Compute the out- and in-degree for each vertex. Both `sideEffect()` are fed with the same vertex.
image:branch-lambda.png[width=180,float=right]
[gremlin-groovy,modern]
----
g.V().branch {it.get().value('name')}.
option('marko', values('age')).
option(none, values('name')) <1>
g.V().branch(values('name')).
option('marko', values('age')).
option(none, values('name')) <2>
g.V().choose(has('name','marko'),
values('age'),
values('name')) <3>
----
<1> If the vertex is "marko", get his age, else get the name of the vertex.
<2> The same operation, but using the traversal representing of `branch()`.
<3> The more specific boolean-based `choose()`-step is implemented as a `branch()`.
[[terminal-steps]]
=== Terminal Steps
Typically, when a step is concatenated to a traversal a traversal is returned. In this way, a traversal is built up
in a link:https://en.wikipedia.org/wiki/Fluent_interface[fluent], link:https://en.wikipedia.org/wiki/Monoid[monadic] fashion.
However, some steps do not return a traversal, but instead, execute the traversal and return a result. These steps are known
as terminal steps (*terminal*) and they are explained via the examples below.
[gremlin-groovy,modern]
----
g.V().out('created').hasNext() <1>
g.V().out('created').next() <2>
g.V().out('created').next(2) <3>
g.V().out('nothing').tryNext() <4>
g.V().out('created').toList() <5>
g.V().out('created').toSet() <6>
g.V().out('created').toBulkSet() <7>
results = ['blah',3]
g.V().out('created').fill(results) <8>
g.addV('person').iterate() <9>
----
<1> `hasNext()` determines whether there are available results (not supported in `gremlin-javascript`).
<2> `next()` will return the next result.
<3> `next(n)` will return the next `n` results in a list (not supported in `gremlin-javascript` or Gremlin.NET).
<4> `tryNext()` will return an `Optional` and thus, is a composite of `hasNext()`/`next()` (only supported for JVM languages).
<5> `toList()` will return all results in a list.
<6> `toSet()` will return all results in a set and thus, duplicates removed (not supported in `gremlin-javascript`).
<7> `toBulkSet()` will return all results in a weighted set and thus, duplicates preserved via weighting (only supported for JVM languages).
<8> `fill(collection)` will put all results in the provided collection and return the collection when complete (only supported for JVM languages).
<9> `iterate()` does not exactly fit the definition of a terminal step in that it doesn't return a result, but still
returns a traversal - it does however behave as a terminal step in that it iterates the traversal and generates side
effects without returning the actual result.
There is also the `promise()` terminator step, which can only be used with remote traversals to
<<connecting-gremlin-server,Gremlin Server>> or <<connecting-rgp,RGPs>>. It starts a promise to execute a function
on the current `Traversal` that will be completed in the future.
Finally, <<explain-step,`explain()`>>-step is also a terminal step and is described in its own section.
[[addedge-step]]
=== AddEdge Step
link:http://en.wikipedia.org/wiki/Automated_reasoning[Reasoning] is the process of making explicit what is implicit
in the data. What is explicit in a graph are the objects of the graph -- i.e. vertices and edges. What is implicit
in the graph is the traversal. In other words, traversals expose meaning where the meaning is determined by the
traversal definition. For example, take the concept of a "co-developer." Two people are co-developers if they have
worked on the same project together. This concept can be represented as a traversal and thus, the concept of
"co-developers" can be derived. Moreover, what was once implicit can be made explicit via the `addE()`-step
(*map*/*sideEffect*).
image::addedge-step.png[width=450]
[gremlin-groovy,modern]
----
g.V(1).as('a').out('created').in('created').where(neq('a')).
addE('co-developer').from('a').property('year',2009) <1>
g.V(3,4,5).aggregate('x').has('name','josh').as('a').
select('x').unfold().hasLabel('software').addE('createdBy').to('a') <2>
g.V().as('a').out('created').addE('createdBy').to('a').property('acl','public') <3>
g.V(1).as('a').out('knows').
addE('livesNear').from('a').property('year',2009).
inV().inE('livesNear').values('year') <4>
g.V().match(
__.as('a').out('knows').as('b'),
__.as('a').out('created').as('c'),
__.as('b').out('created').as('c')).
addE('friendlyCollaborator').from('a').to('b').
property(id,23).property('project',select('c').values('name')) <5>
g.E(23).valueMap()
vMarko = g.V().has('name','marko').next()
vPeter = g.V().has('name','peter').next()
g.V(vMarko).addE('knows').to(vPeter) <6>
g.addE('knows').from(vMarko).to(vPeter) <7>
----
<1> Add a co-developer edge with a year-property between marko and his collaborators.
<2> Add incoming createdBy edges from the josh-vertex to the lop- and ripple-vertices.
<3> Add an inverse createdBy edge for all created edges.
<4> The newly created edge is a traversable object.
<5> Two arbitrary bindings in a traversal can be joined ``from()``->``to()``, where `id` can be provided for graphs that
supports user provided ids.
<6> Add an edge between marko and peter given the directed (detached) vertex references.
<7> Add an edge between marko and peter given the directed (detached) vertex references.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#addE-java.lang.String-++[`addE(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#addE-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`addE(Traversal)`]
[[addvertex-step]]
=== AddVertex Step
The `addV()`-step is used to add vertices to the graph (*map*/*sideEffect*). For every incoming object, a vertex is
created. Moreover, `GraphTraversalSource` maintains an `addV()` method.
[gremlin-groovy,modern]
----
g.addV('person').property('name','stephen')
g.V().values('name')
g.V().outE('knows').addV().property('name','nothing')
g.V().has('name','nothing')
g.V().has('name','nothing').bothE()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#addV--++[`addV()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#addV-java.lang.String-++[`addV(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#addV-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`addV(Traversal)`]
[[addproperty-step]]
=== AddProperty Step
The `property()`-step is used to add properties to the elements of the graph (*sideEffect*). Unlike `addV()` and
`addE()`, `property()` is a full sideEffect step in that it does not return the property it created, but the element
that streamed into it. Moreover, if `property()` follows an `addV()` or `addE()`, then it is "folded" into the
previous step to enable vertex and edge creation with all its properties in one creation operation.
[gremlin-groovy,modern]
----
g.V(1).property('country','usa')
g.V(1).property('city','santa fe').property('state','new mexico').valueMap()
g.V(1).property(list,'age',35) <1>
g.V(1).valueMap()
g.V(1).property('friendWeight',outE('knows').values('weight').sum(),'acl','private') <2>
g.V(1).properties('friendWeight').valueMap() <3>
----
<1> For vertices, a cardinality can be provided for <<vertex-properties,vertex properties>>.
<2> It is possible to select the property value (as well as key) via a traversal.
<3> For vertices, the `property()`-step can add meta-properties.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#property-java.lang.Object-java.lang.Object-java.lang.Object...-++[`property(Object, Object, Object...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#property-org.apache.tinkerpop.gremlin.structure.VertexProperty.Cardinality-java.lang.Object-java.lang.Object-java.lang.Object...-++[`property(Cardinality, Object, Object, Object...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/structure/VertexProperty.Cardinality.html++[`Cardinality`]
[[aggregate-step]]
=== [[store-step]]Aggregate Step
image::aggregate-step.png[width=800]
The `aggregate()`-step (*sideEffect*) is used to aggregate all the objects at a particular point of traversal into a
`Collection`. The step is uses `Scope` to help determine the aggregating behavior. For `global` scope this means that
the step will use link:http://en.wikipedia.org/wiki/Eager_evaluation[eager evaluation] in that no objects continue on
until all previous objects have been fully aggregated. The eager evaluation model is crucial in situations
where everything at a particular point is required for future computation. By default, when the overload of
`aggregate()` is called without a `Scope`, the default is `global`. An example is provided below.
[gremlin-groovy,modern]
----
g.V(1).out('created') <1>
g.V(1).out('created').aggregate('x') <2>
g.V(1).out('created').aggregate(global, 'x') <3>
g.V(1).out('created').aggregate('x').in('created') <4>
g.V(1).out('created').aggregate('x').in('created').out('created') <5>
g.V(1).out('created').aggregate('x').in('created').out('created').
where(without('x')).values('name') <6>
----
<1> What has marko created?
<2> Aggregate all his creations.
<3> Identical to the previous line.
<3> Who are marko's collaborators?
<4> What have marko's collaborators created?
<5> What have marko's collaborators created that he hasn't created?
In link:http://en.wikipedia.org/wiki/Recommender_system[recommendation systems], the above pattern is used:
"What has userA liked? Who else has liked those things? What have they liked that userA hasn't already liked?"
Finally, `aggregate()`-step can be modulated via `by()`-projection.
[gremlin-groovy,modern]
----
g.V().out('knows').aggregate('x').cap('x')
g.V().out('knows').aggregate('x').by('name').cap('x')
----
For `local` scope the aggregation will occur in a link:http://en.wikipedia.org/wiki/Lazy_evaluation[lazy] fashion.
NOTE: Prior to 3.4.3, `local` aggregation (i.e. lazy) evaluation was handled by `store()`-step.
[gremlin-groovy,modern]
----
g.V().aggregate(global, 'x').limit(1).cap('x')
g.V().aggregate(local, 'x').limit(1).cap('x')
g.withoutStrategies(EarlyLimitStrategy).V().aggregate(local,'x').limit(1).cap('x')
----
It is important to note that `EarlyLimitStrategy` introduced in 3.3.5 alters the behavior of `aggregate(local)`.
Without that strategy (which is installed by default), there are two results in the `aggregate()` side-effect even
though the interval selection is for 1 object. Realize that when the second object is on its way to the `range()`
filter (i.e. `[0..1]`), it passes through `aggregate()` and thus, stored before filtered.
[gremlin-groovy,modern]
----
g.E().aggregate(local,'x').by('weight').cap('x')
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#aggregate-java.lang.String-++[`aggregate(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#aggregate-org.apache.tinkerpop.gremlin.process.traversal.Scope,java.lang.String-++[`aggregate(Scope,String)`]
[[and-step]]
=== And Step
The `and()`-step ensures that all provided traversals yield a result (*filter*). Please see <<or-step,`or()`>> for or-semantics.
[NOTE, caption=Python]
====
The term `and` is a reserved word in Python, and therefore must be referred to in Gremlin with `and_()`.
====
[gremlin-groovy,modern]
----
g.V().and(
outE('knows'),
values('age').is(lt(30))).
values('name')
----
The `and()`-step can take an arbitrary number of traversals. All traversals must produce at least one output for the
original traverser to pass to the next step.
An link:http://en.wikipedia.org/wiki/Infix_notation[infix notation] can be used as well.
[gremlin-groovy,modern]
----
g.V().where(outE('created').and().outE('knows')).values('name')
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#and-org.apache.tinkerpop.gremlin.process.traversal.Traversal...-++[`and(Traversal...)`]
[[as-step]]
=== As Step
The `as()`-step is not a real step, but a "step modulator" similar to <<by-step,`by()`>> and <<option-step,`option()`>>.
With `as()`, it is possible to provide a label to the step that can later be accessed by steps and data structures
that make use of such labels -- e.g., <<select-step,`select()`>>, <<match-step,`match()`>>, and path.
[NOTE, caption=Groovy]
====
The term `as` is a reserved word in Groovy, and when therefore used as part of an anonymous traversal must be referred
to in Gremlin with the double underscore `__.as()`.
====
[NOTE, caption=Python]
====
The term `as` is a reserved word in Python, and therefore must be referred to in Gremlin with `as_()`.
====
[gremlin-groovy,modern]
----
g.V().as('a').out('created').as('b').select('a','b') <1>
g.V().as('a').out('created').as('b').select('a','b').by('name') <2>
----
<1> Select the objects labeled "a" and "b" from the path.
<2> Select the objects labeled "a" and "b" from the path and, for each object, project its name value.
A step can have any number of labels associated with it. This is useful for referencing the same step multiple times in a future step.
[gremlin-groovy,modern]
----
g.V().hasLabel('software').as('a','b','c').
select('a','b','c').
by('name').
by('lang').
by(__.in('created').values('name').fold())
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#as-java.lang.String-java.lang.String...-++[`as(String,String...)`]
[[barrier-step]]
=== Barrier Step
The `barrier()`-step (*barrier*) turns the lazy traversal pipeline into a bulk-synchronous pipeline. This step is
useful in the following situations:
* When everything prior to `barrier()` needs to be executed before moving onto the steps after the `barrier()` (i.e. ordering).
* When "stalling" the traversal may lead to a "bulking optimization" in traversals that repeatedly touch many of the same elements (i.e. optimizing).
[gremlin-groovy,modern]
----
g.V().sideEffect{println "first: ${it}"}.sideEffect{println "second: ${it}"}.iterate()
g.V().sideEffect{println "first: ${it}"}.barrier().sideEffect{println "second: ${it}"}.iterate()
----
The theory behind a "bulking optimization" is simple. If there are one million traversers at vertex 1, then there is
no need to calculate one million `both()`-computations. Instead, represent those one million traversers as a single
traverser with a `Traverser.bulk()` equal to one million and execute `both()` once. A bulking optimization example is
made more salient on a larger graph. Therefore, the example below leverages the <<grateful-dead,Grateful Dead graph>>.
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
g = traversal().withEmbedded(graph).withoutStrategies(LazyBarrierStrategy) <1>
clockWithResult(1){g.V().both().both().both().count().next()} <2>
clockWithResult(1){g.V().repeat(both()).times(3).count().next()} <3>
clockWithResult(1){g.V().both().barrier().both().barrier().both().barrier().count().next()} <4>
----
<1> Explicitly remove `LazyBarrierStrategy` which yields a bulking optimization.
<2> A non-bulking traversal where each traverser is processed.
<3> Each traverser entering `repeat()` has its recursion bulked.
<4> A bulking traversal where implicit traversers are not processed.
If `barrier()` is provided an integer argument, then the barrier will only hold `n`-number of unique traversers in its
barrier before draining the aggregated traversers to the next step. This is useful in the aforementioned bulking
optimization scenario with the added benefit of reducing the risk of an out-of-memory exception.
`LazyBarrierStrategy` inserts `barrier()`-steps into a traversal where appropriate in order to gain the
"bulking optimization."
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph) <1>
g.io('data/grateful-dead.xml').read().iterate()
clockWithResult(1){g.V().both().both().both().count().next()}
g.V().both().both().both().count().iterate().toString() <2>
----
<1> `LazyBarrierStrategy` is a default strategy and thus, does not need to be explicitly activated.
<2> With `LazyBarrierStrategy` activated, `barrier()`-steps are automatically inserted where appropriate.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#barrier--++[`barrier()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#barrier-java.util.function.Consumer-++[`barrier(Consumer)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#barrier-int-++[`barrier(int)`]
[[by-step]]
=== By Step
The `by()`-step is not an actual step, but instead is a "step-modulator" similar to <<as-step,`as()`>> and
<<option-step,`option()`>>. If a step is able to accept traversals, functions, comparators, etc. then `by()` is the
means by which they are added. The general pattern is `step().by()...by()`. Some steps can only accept one `by()`
while others can take an arbitrary amount.
[gremlin-groovy,modern]
----
g.V().group().by(bothE().count()) <1>
g.V().group().by(bothE().count()).by('name') <2>
g.V().group().by(bothE().count()).by(count()) <3>
----
<1> `by(outE().count())` will group the elements by their edge count (*traversal*).
<2> `by('name')` will process the grouped elements by their name (*element property projection*).
<3> `by(count())` will count the number of elements in each group (*traversal*).
The following steps all support `by()`-modulation. Note that the semantics of such modulation should be understood
on a step-by-step level and thus, as discussed in their respective section of the documentation.
* <<aggregate-step, `aggregate()`>>: aggregate all objects into a set but only store their `by()`-modulated values.
* <<cyclicpath-step, `cyclicPath()`>>: filter if the traverser's path is cyclic given `by()`-modulation.
* <<dedup-step, `dedup()`>>: dedup on the results of a `by()`-modulation.
* <<group-step, `group()`>>: create group keys and values according to `by()`-modulation.
* <<groupcount-step,`groupCount()`>>: count those groups where the group keys are the result of `by()`-modulation.
* <<math-step, `math()`>>: transform a traverser provided to the step by way of the `by()` modulator before it processed by it.
* <<order-step, `order()`>>: order the objects by the results of a `by()`-modulation.
* <<path-step, `path()`>>: get the path of the traverser where each path element is `by()`-modulated.
* <<project-step, `project()`>>: project a map of results given various `by()`-modulations off the current object.
* <<propertymap-step, `propertyMap()`>>: transform the result of the values in the resulting `Map` using the `by()` modulator.
* <<sack-step, `sack()`>>: provides the transformation for a traverser to a value to be stored in the sack.
* <<sample-step, `sample()`>>: sample using the value returned by `by()`-modulation.
* <<select-step, `select()`>>: select path elements and transform them via `by()`-modulation.
* <<simplepath-step, `simplePath()`>>: filter if the traverser's path is simple given `by()`-modulation.
* <<tree-step, `tree()`>>: get a tree of traversers objects where the objects have been `by()`-modulated.
* <<valuemap-step, `valueMap()`>>: transform the result of the values in the resulting `Map` using the `by()` modulator.
* <<where-step, `where()`>>: determine the predicate given the testing of the results of `by()`-modulation.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by--++[`by()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-java.util.Comparator-++[`by(Comparator)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-java.util.function.Function-java.util.Comparator-++[`by(Function,Comparator)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-java.util.function.Function-++[`by(Function)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-org.apache.tinkerpop.gremlin.process.traversal.Order-++[`by(Order)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-java.lang.String-++[`by(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-java.lang.String-java.util.Comparator-++[`by(String,Comparator)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-org.apache.tinkerpop.gremlin.structure.T-++[`by(T)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`by(Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#by-org.apache.tinkerpop.gremlin.process.traversal.Traversal-java.util.Comparator-++[`by(Traversal,Comparator)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/structure/T.html++[`T`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Order.html++[`Order`]
[[cap-step]]
=== Cap Step
The `cap()`-step (*barrier*) iterates the traversal up to itself and emits the sideEffect referenced by the provided
key. If multiple keys are provided, then a `Map<String,Object>` of sideEffects is emitted.
[gremlin-groovy,modern]
----
g.V().groupCount('a').by(label).cap('a') <1>
g.V().groupCount('a').by(label).groupCount('b').by(outE().count()).cap('a','b') <2>
----
<1> Group and count vertices by their label. Emit the side effect labeled 'a', which is the group count by label.
<2> Same as statement 1, but also emit the side effect labeled 'b' which groups vertices by the number of out edges.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#cap-java.lang.String-java.lang.String...-++[`cap(String,String...)`]
[[choose-step]]
=== Choose Step
image::choose-step.png[width=700]
The `choose()`-step (*branch*) routes the current traverser to a particular traversal branch option. With `choose()`,
it is possible to implement if/then/else-semantics as well as more complicated selections.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').
choose(values('age').is(lte(30)),
__.in(),
__.out()).values('name') <1>
g.V().hasLabel('person').
choose(values('age')).
option(27, __.in()).
option(32, __.out()).values('name') <2>
----
<1> If the traversal yields an element, then do `in`, else do `out` (i.e. true/false-based option selection).
<2> Use the result of the traversal as a key to the map of traversal options (i.e. value-based option selection).
If the "false"-branch is not provided, then if/then-semantics are implemented.
[gremlin-groovy,modern]
----
g.V().choose(hasLabel('person'), out('created')).values('name') <1>
g.V().choose(hasLabel('person'), out('created'), identity()).values('name') <2>
----
<1> If the vertex is a person, emit the vertices they created, else emit the vertex.
<2> If/then/else with an `identity()` on the false-branch is equivalent to if/then with no false-branch.
Note that `choose()` can have an arbitrary number of options and moreover, can take an anonymous traversal as its choice function.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').
choose(values('name')).
option('marko', values('age')).
option('josh', values('name')).
option('vadas', elementMap()).
option('peter', label())
----
The `choose()`-step can leverage the `Pick.none` option match. For anything that does not match a specified option, the `none`-option is taken.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').
choose(values('name')).
option('marko', values('age')).
option(none, values('name'))
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#choose-java.util.function.Function-++[`choose(Function)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#choose-java.util.function.Predicate-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`choose(Predicate,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#choose-java.util.function.Predicate-org.apache.tinkerpop.gremlin.process.traversal.Traversal-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`choose(Predicate,Traversal,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#choose-org.apache.tinkerpop.gremlin.process.traversal.Traversal-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`choose(Traversal,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#choose-org.apache.tinkerpop.gremlin.process.traversal.Traversal-org.apache.tinkerpop.gremlin.process.traversal.Traversal-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`choose(Traversal,Traversal,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#choose-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`choose(Traversal)`]
[[coalesce-step]]
=== Coalesce Step
The `coalesce()`-step evaluates the provided traversals in order and returns the first traversal that emits at
least one element.
[gremlin-groovy,modern]
----
g.V(1).coalesce(outE('knows'), outE('created')).inV().path().by('name').by(label)
g.V(1).coalesce(outE('created'), outE('knows')).inV().path().by('name').by(label)
g.V(1).property('nickname', 'okram')
g.V().hasLabel('person').coalesce(values('nickname'), values('name'))
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#coalesce-org.apache.tinkerpop.gremlin.process.traversal.Traversal...-++[`coalesce(Traversal...)`]
[[coin-step]]
=== Coin Step
To randomly filter out a traverser, use the `coin()`-step (*filter*). The provided double argument biases the "coin toss."
[gremlin-groovy,modern]
----
g.V().coin(0.5)
g.V().coin(0.0)
g.V().coin(1.0)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#coin-double-++[`coin(double)`]
[[connectedcomponent-step]]
=== ConnectedComponent Step
The `connectedComponent()` step performs a computation to identify link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component]
instances in a graph. When this step completes, the vertices will be labelled with a component identifier to denote
the component to which they are associated.
IMPORTANT: The `connectedComponent()`-step is a `VertexComputing`-step and as such, can only be used against a graph
that supports `GraphComputer` (OLAP).
[gremlin-groovy,modern]
----
g = traversal().withEmbedded(graph).withComputer()
g.V().
connectedComponent().
with(ConnectedComponent.propertyName, 'component').
project('name','component').
by('name').
by('component')
g.V().hasLabel('person').
connectedComponent().
with(ConnectedComponent.propertyName, 'component').
with(ConnectedComponent.edges, outE('knows')).
project('name','component').
by('name').
by('component')
----
Note the use of the `with()` modulating step which provides configuration options to the algorithm. It takes
configuration keys from the `ConnectedComponent` class and is automatically imported to the Gremlin Console.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#connectedComponent--++[`connectedComponent()`]
[[constant-step]]
=== Constant Step
To specify a constant value for a traverser, use the `constant()`-step (*map*). This is often useful with conditional
steps like <<choose-step,`choose()`-step>> or <<coalesce-step,`coalesce()`-step>>.
[gremlin-groovy,modern]
----
g.V().choose(hasLabel('person'),
values('name'),
constant('inhuman')) <1>
g.V().coalesce(
hasLabel('person').values('name'),
constant('inhuman')) <2>
----
<1> Show the names of people, but show "inhuman" for other vertices.
<2> Same as statement 1 (unless there is a person vertex with no name).
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#constant-E2-++[`constant(Object)`]
[[count-step]]
=== Count Step
image::count-step.png[width=195]
The `count()`-step (*map*) counts the total number of represented traversers in the streams (i.e. the bulk count).
[gremlin-groovy,modern]
----
g.V().count()
g.V().hasLabel('person').count()
g.V().hasLabel('person').outE('created').count().path() <1>
g.V().hasLabel('person').outE('created').count().map {it.get() * 10}.path() <2>
----
<1> `count()`-step is a <<a-note-on-barrier-steps,reducing barrier step>> meaning that all of the previous traversers are folded into a new traverser.
<2> The path of the traverser emanating from `count()` starts at `count()`.
IMPORTANT: `count(local)` counts the current, local object (not the objects in the traversal stream). This works for
`Collection`- and `Map`-type objects. For any other object, a count of 1 is returned.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#count--++[`count()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#count-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`count(Scope)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[cyclicpath-step]]
=== CyclicPath Step
image::cyclicpath-step.png[width=400]
Each traverser maintains its history through the traversal over the graph -- i.e. its <<path-data-structure,path>>.
If it is important that the traverser repeat its course, then `cyclic()`-path should be used (*filter*). The step
analyzes the path of the traverser thus far and if there are any repeats, the traverser is filtered out over the
traversal computation. If non-cyclic behavior is desired, see <<simplepath-step,`simplePath()`>>.
[gremlin-groovy,modern]
----
g.V(1).both().both()
g.V(1).both().both().cyclicPath()
g.V(1).both().both().cyclicPath().path()
g.V(1).as('a').out('created').as('b').
in('created').as('c').
cyclicPath().
path()
g.V(1).as('a').out('created').as('b').
in('created').as('c').
cyclicPath().from('a').to('b').
path()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#cyclicPath--++[`cyclicPath()`]
[[dedup-step]]
=== Dedup Step
With `dedup()`-step (*filter*), repeatedly seen objects are removed from the traversal stream. Note that if a
traverser's bulk is greater than 1, then it is set to 1 before being emitted.
[gremlin-groovy,modern]
----
g.V().values('lang')
g.V().values('lang').dedup()
g.V(1).repeat(bothE('created').dedup().otherV()).emit().path() <1>
g.V().bothE().properties().dedup() <2>
----
<1> Traverse all `created` edges, but don't touch any edge twice.
<2> Note that `Property` instances will compare on key and value, whereas a `VertexProperty` will also include its
element as it is a first-class citizen.
If a by-step modulation is provided to `dedup()`, then the object is processed accordingly prior to determining if it
has been seen or not.
[gremlin-groovy,modern]
----
g.V().elementMap('name')
g.V().dedup().by(label).values('name')
----
Finally, if `dedup()` is provided an array of strings, then it will ensure that the de-duplication is not with respect
to the current traverser object, but to the path history of the traverser.
[gremlin-groovy,modern]
----
g.V().as('a').out('created').as('b').in('created').as('c').select('a','b','c')
g.V().as('a').out('created').as('b').in('created').as('c').dedup('a','b').select('a','b','c') <1>
----
<1> If the current `a` and `b` combination has been seen previously, then filter the traverser.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#dedup-org.apache.tinkerpop.gremlin.process.traversal.Scope-java.lang.String...-++[`dedup(Scope,String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#dedup-java.lang.String...-++[`dedup(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[drop-step]]
=== Drop Step
The `drop()`-step (*filter*/*sideEffect*) is used to remove element and properties from the graph (i.e. remove). It
is a filter step because the traversal yields no outgoing objects.
[gremlin-groovy,modern]
----
g.V().outE().drop()
g.E()
g.V().properties('name').drop()
g.V().elementMap()
g.V().drop()
g.V()
----
*Additional References*
* link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#drop--++[`drop()`]
[[elementmap-step]]
=== ElementMap Step
The `elementMap()`-step yields a `Map` representation of the structure of an element.
[gremlin-groovy,modern]
----
g.V().elementMap()
g.V().elementMap('age')
g.V().elementMap('age','blah')
g.E().elementMap()
----
It is important to note that the map of a vertex assumes that cardinality for each key is `single` and if it is `list`
then only the first item encountered will be returned. As `single` is the more common cardinality for properties this
assumption should serve the greatest number of use cases.
[gremlin-groovy,theCrew]
----
g.V().elementMap()
g.V().has('name','marko').properties('location')
g.V().has('name','marko').properties('location').elementMap()
----
IMPORTANT: The `elementMap()`-step does not return the vertex labels for incident vertices when using `GraphComputer`
as the `id` is the only available data to the star graph.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#elementMap-java.lang.String...-++[`elementMap(String...)`]
[[emit-step]]
=== Emit Step
The `emit`-step is not an actual step, but is instead a step modulator for `<<repeat-step,repeat()>>` (find more
documentation on the `emit()` there).
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#emit--++[`emit()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#emit-java.util.function.Predicate-++[`emit(Predicate)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#emit-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`emit(Traversal)`]
[[explain-step]]
=== Explain Step
The `explain()`-step (*terminal*) will return a `TraversalExplanation`. A traversal explanation details how the
traversal (prior to `explain()`) will be compiled given the registered <<traversalstrategy,traversal strategies>>.
A `TraversalExplanation` has a `toString()` representation with 3-columns. The first column is the
traversal strategy being applied. The second column is the traversal strategy category: [D]ecoration, [O]ptimization,
[P]rovider optimization, [F]inalization, and [V]erification. Finally, the third column is the state of the traversal
post strategy application. The final traversal is the resultant execution plan.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').outE().identity().inV().count().is(gt(5)).explain()
----
For traversal profiling information, please see <<profile-step,`profile()`>>-step.
[[fold-step]]
=== Fold Step
There are situations when the traversal stream needs a "barrier" to aggregate all the objects and emit a computation
that is a function of the aggregate. The `fold()`-step (*map*) is one particular instance of this. Please see
<<unfold-step,`unfold()`>>-step for the inverse functionality.
[gremlin-groovy,modern]
----
g.V(1).out('knows').values('name')
g.V(1).out('knows').values('name').fold() <1>
g.V(1).out('knows').values('name').fold().next().getClass() <2>
g.V(1).out('knows').values('name').fold(0) {a,b -> a + b.length()} <3>
g.V().values('age').fold(0) {a,b -> a + b} <4>
g.V().values('age').fold(0, sum) <5>
g.V().values('age').sum() <6>
g.inject(["a":1],["b":2]).fold([], addAll) <7>
----
<1> A parameterless `fold()` will aggregate all the objects into a list and then emit the list.
<2> A verification of the type of list returned.
<3> `fold()` can be provided two arguments -- a seed value and a reduce bi-function ("vadas" is 5 characters + "josh" with 4 characters).
<4> What is the total age of the people in the graph?
<5> The same as before, but using a built-in bi-function.
<6> The same as before, but using the <<sum-step,`sum()`-step>>.
<7> A mechanism for merging `Map` instances. If a key occurs in more than a single `Map`, the later occurrence will replace the earlier.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#fold--++[`fold()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#fold-E2-java.util.function.BiFunction-++[`fold(Object,BiFunction)`]
[[from-step]]
=== From Step
The `from()`-step is not an actual step, but instead is a "step-modulator" similar to <<as-step,`as()`>> and
<<by-step,`by()`>>. If a step is able to accept traversals or strings then `from()` is the
means by which they are added. The general pattern is `step().from()`. See <<to-step,`to()`>>-step.
The list of steps that support `from()`-modulation are: <<simplepath-step,`simplePath()`>>, <<cyclicpath-step,`cyclicPath()`>>,
<<path-step,`path()`>>, and <<addedge-step,`addE()`>>.
[NOTE, caption=Javascript]
====
The term `from` is a reserved word in Javascript, and therefore must be referred to in Gremlin with `from_()`.
====
[NOTE, caption=Python]
====
The term `from` is a reserved word in Python, and therefore must be referred to in Gremlin with `from_()`.
====
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#from-java.lang.String-++[`from(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#from-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`from(Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#from-org.apache.tinkerpop.gremlin.structure.Vertex-++[`from(Vertex)`]
[[graph-step]]
=== Graph Step
Graph steps are those that read vertices, `V()`, or edges, `E()`, from the graph. The `V()`-step is usually used to
start a `GraphTraversal`, but can also be used mid-traversal. The `E()`-step on the other hand can only be used as a
start step.
[gremlin-groovy,modern]
----
g.V(1) <1>
g.V().has('name', within('marko', 'vadas', 'josh')).as('person').
V().has('name', within('lop', 'ripple')).addE('uses').from('person') <2>
g.E(11) <3>
g.E().hasLabel('knows').has('weight', gt(0.75))
----
<1> Find the vertex by its unique identifier (i.e. `T.id`) - not all graphs will use a numeric value for their identifier.
<2> An example where `V()` is used both as a start step and in the middle of a traversal.
<3> Find the edge by its unique identifier (i.e. `T.id`) - not all graphs will use a numeric value for their identifier.
NOTE: Whether a mid-traversal `V()` uses an index or not, depends on a) whether suitable index exists and b) if the
particular graph system provider implemented this functionality.
[gremlin-groovy,modern]
----
g.V().has('name', within('marko', 'vadas', 'josh')).as('person').
V().has('name', within('lop', 'ripple')).addE('uses').from('person').toString() <1>
g.V().has('name', within('marko', 'vadas', 'josh')).as('person').
V().has('name', within('lop', 'ripple')).addE('uses').from('person').iterate().toString() <2>
----
<1> Normally the `V()`-step will iterate over all vertices. However, graph strategies can fold ``HasContainer``'s into a `GraphStep` to allow index lookups.
<2> Whether the graph system provider supports mid-traversal `V()` index lookups or not can easily be determined by inspecting the `toString()` output of the iterated traversal. If `has` conditions were folded into the `V()`-step, an index - if one exists - will be used.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#V-java.lang.Object...-++[`V(Object...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#E-java.lang.Object...-++[`E(Object...)`]
[[group-step]]
=== Group Step
As traversers propagate across a graph as defined by a traversal, sideEffect computations are sometimes required.
That is, the actual path taken or the current location of a traverser is not the ultimate output of the computation,
but some other representation of the traversal. The `group()`-step (*map*/*sideEffect*) is one such sideEffect that
organizes the objects according to some function of the object. Then, if required, that organization (a list) is
reduced. An example is provided below.
[gremlin-groovy,modern]
----
g.V().group().by(label) <1>
g.V().group().by(label).by('name') <2>
g.V().group().by(label).by(count()) <3>
----
<1> Group the vertices by their label.
<2> For each vertex in the group, get their name.
<3> For each grouping, what is its size?
The two projection parameters available to `group()` via `by()` are:
. Key-projection: What feature of the object to group on (a function that yields the map key)?
. Value-projection: What feature of the group to store in the key-list?
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#group--++[`group()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#group-java.lang.String-++[`group(String)`]
[[groupcount-step]]
=== GroupCount Step
When it is important to know how many times a particular object has been at a particular part of a traversal,
`groupCount()`-step (*map*/*sideEffect*) is used.
"What is the distribution of ages in the graph?"
[gremlin-groovy,modern]
----
g.V().hasLabel('person').values('age').groupCount()
g.V().hasLabel('person').groupCount().by('age') <1>
----
<1> You can also supply a pre-group projection, where the provided <<by-step,`by()`>>-modulation determines what to
group the incoming object by.
There is one person that is 32, one person that is 35, one person that is 27, and one person that is 29.
"Iteratively walk the graph and count the number of times you see the second letter of each name."
image::groupcount-step.png[width=420]
[gremlin-groovy,modern]
----
g.V().repeat(both().groupCount('m').by(label)).times(10).cap('m')
----
The above is interesting in that it demonstrates the use of referencing the internal `Map<Object,Long>` of
`groupCount()` with a string variable. Given that `groupCount()` is a sideEffect-step, it simply passes the object
it received to its output. Internal to `groupCount()`, the object's count is incremented.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#groupCount--++[`groupCount()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#groupCount-java.lang.String-++[`groupCount(String)`]
[[has-step]]
=== Has Step
image::has-step.png[width=670]
It is possible to filter vertices, edges, and vertex properties based on their properties using `has()`-step
(*filter*). There are numerous variations on `has()` including:
* `has(key,value)`: Remove the traverser if its element does not have the provided key/value property.
* `has(label, key, value)`: Remove the traverser if its element does not have the specified label and provided key/value property.
* `has(key,predicate)`: Remove the traverser if its element does not have a key value that satisfies the bi-predicate. For more information on predicates, please read <<a-note-on-predicates,A Note on Predicates>>.
* `hasLabel(labels...)`: Remove the traverser if its element does not have any of the labels.
* `hasId(ids...)`: Remove the traverser if its element does not have any of the ids.
* `hasKey(keys...)`: Remove the `Property` traverser if it does not match one of the provided keys.
* `hasValue(values...)`: Remove the `Property` traverser if it does not match one of the provided values.
* `has(key)`: Remove the traverser if its element does not have a value for the key.
* `hasNot(key)`: Remove the traverser if its element has a value for the key.
* `has(key, traversal)`: Remove the traverser if its object does not yield a result through the traversal off the property value.
[gremlin-groovy,modern]
----
g.V().hasLabel('person')
g.V().hasLabel('person','name','marko')
g.V().hasLabel('person').out().has('name',within('vadas','josh'))
g.V().hasLabel('person').out().has('name',within('vadas','josh')).
outE().hasLabel('created')
g.V().has('age',inside(20,30)).values('age') <1>
g.V().has('age',outside(20,30)).values('age') <2>
g.V().has('name',within('josh','marko')).elementMap() <3>
g.V().has('name',without('josh','marko')).elementMap() <4>
g.V().has('name',not(within('josh','marko'))).elementMap() <5>
g.V().properties().hasKey('age').value() <6>
g.V().hasNot('age').values('name') <7>
g.V().has('person','name', startingWith('m')) <8>
g.V().has(null, 'vadas') <9>
g.V().has(label, __.is('person')) <10>
----
<1> Find all vertices whose ages are between 20 (exclusive) and 30 (exclusive). In other words, the age must be greater than 20 and less than 30.
<2> Find all vertices whose ages are not between 20 (inclusive) and 30 (inclusive). In other words, the age must be less than 20 or greater than 30.
<3> Find all vertices whose names are exact matches to any names in the collection `[josh,marko]`, display all
the key,value pairs for those vertices.
<4> Find all vertices whose names are not in the collection `[josh,marko]`, display all the key,value pairs for those vertices.
<5> Same as the prior example save using `not` on `within` to yield `without`.
<6> Find all age-properties and emit their value.
<7> Find all vertices that do not have an age-property and emit their name.
<8> Find all "person" vertices that have a name property that starts with the letter "m".
<9> Property key is always stored as `String` and therefore an equality check with `null` will produce no result.
<10> An example of `has()` where the argument is a `Traversal` and does not quite behave the way most expect.
Item 10 in the above set of examples bears some discussion. The behavior is not such that the result of the `Traversal`
is used as the comparing value for `has()`, but the current `Traverser`, which in this case is the vertex `label`, is
given to the `Traversal` to behave as a filter itself. In other words, if the `Traversal` (i.e. `is('person')`) returns
a value then the `has()` is effectively `true`. A common mistake is to try to use `select()` in this context where one
would do `has('name', select('n'))` to try to inject the value of "n" into the step to get `has('name', <value-of-n>)`,
but this would instead simply produce an always `true` filter for `has()`.
TinkerPop does not support a regular expression predicate, although specific graph databases that leverage TinkerPop
may provide a partial match extension.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-java.lang.String-++[`has(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-java.lang.String-java.lang.Object-++[`has(String,Object)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-java.lang.String-org.apache.tinkerpop.gremlin.process.traversal.P-++[`has(String,P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-java.lang.String-java.lang.String-java.lang.Object-++[`has(String,String,Object)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-java.lang.String-java.lang.String-org.apache.tinkerpop.gremlin.process.traversal.P-++[`has(String,String,P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-java.lang.String-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`has(String,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-org.apache.tinkerpop.gremlin.structure.T-java.lang.Object-++[`has(T,Object)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-org.apache.tinkerpop.gremlin.structure.T-org.apache.tinkerpop.gremlin.process.traversal.P-++[`has(T,P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#has-org.apache.tinkerpop.gremlin.structure.T-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`has(T,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasId-java.lang.Object-java.lang.Object...-++[`hasId(Object,Object...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasId-org.apache.tinkerpop.gremlin.process.traversal.P-++[`hasId(P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasKey-org.apache.tinkerpop.gremlin.process.traversal.P-++[`hasKey(P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasKey-java.lang.String-java.lang.String...-++[`hasKey(String,String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasLabel-org.apache.tinkerpop.gremlin.process.traversal.P-++[`hasLabel(P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasLabel-java.lang.String-java.lang.String...-++[`hasLabel(String,String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasNot-java.lang.String-++[`hasNot(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasValue-java.lang.Object-java.lang.Object...-++[`hasValue(Object,Object...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#hasValue-org.apache.tinkerpop.gremlin.process.traversal.P-++[`hasValue(P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/P.html++[`P`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/TextP.html++[`TextP`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/structure/T.html++[`T`],
link:++https://tinkerpop.apache.org/docs/current/recipes/#has-traversal++[Recipes - Anti-pattern]
[[id-step]]
=== Id Step
The `id()`-step (*map*) takes an `Element` and extracts its identifier from it.
[gremlin-groovy,modern]
----
g.V().id()
g.V(1).out().id().is(2)
g.V(1).outE().id()
g.V(1).properties().id()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#id--++[`id()`]
[[identity-step]]
=== Identity Step
The `identity()`-step (*map*) is an link:https://en.wikipedia.org/wiki/Identity_function[identity function] which maps
the current object to itself.
[gremlin-groovy,modern]
----
g.V().identity()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#identity--++[`identity()`]
[[index-step]]
=== Index Step
The `index()`-step (*map*) indexes each element in the current collection. If the current traverser's value is not a collection, then it's treated as a single-item collection. There are two indexers
available, which can be chosen using the `with()` modulator. The list indexer (default) creates a list for each collection item, with the first item being the original element and the second element
being the index. The map indexer created a linked hash map in which the index represents the key and the original item is used as the value.
[gremlin-groovy,modern]
----
g.V().hasLabel("software").index() <1>
g.V().hasLabel("software").values("name").fold().
order(Scope.local).
index().
unfold().
order().
by(__.tail(Scope.local, 1)) <2>
g.V().hasLabel("software").values("name").fold().
order(Scope.local).
index().
with(WithOptions.indexer, WithOptions.list).
unfold().
order().
by(__.tail(Scope.local, 1)) <3>
g.V().hasLabel("person").values("name").fold().
order(Scope.local).
index().
with(WithOptions.indexer, WithOptions.map) <4>
----
<1> Indexing non-collection items results in multiple indexed single-item collections.
<2> Index all software names in their alphabetical order.
<3> Same as statement 1, but with an explicitely specified list indexer.
<4> Index all person names in their alphabetical order and store the result in an ordered map.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#index--++[`index()`]
[[inject-step]]
=== Inject Step
image::inject-step.png[width=800]
The concept of "injectable steps" makes it possible to insert objects arbitrarily into a traversal stream. In general,
`inject()`-step (*sideEffect*) exists and a few examples are provided below.
[gremlin-groovy,modern]
----
g.V(4).out().values('name').inject('daniel')
g.V(4).out().values('name').inject('daniel').map {it.get().length()}
g.V(4).out().values('name').inject('daniel').map {it.get().length()}.path()
----
In the last example above, note that the path starting with `daniel` is only of length 2. This is because the
`daniel` string was inserted half-way in the traversal. Finally, a typical use case is provided below -- when the
start of the traversal is not a graph object.
[gremlin-groovy,modern]
----
inject(1,2)
inject(1,2).map {it.get() + 1}
inject(1,2).map {it.get() + 1}.map {g.V(it.get()).next()}.values('name')
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#inject-E...-++[`inject(Object)`]
anchor:_gremlin_i_o[]
[[io-step]]
=== IO Step
image:gremlin-io.png[width=250,float=left] The task of importing and exporting the data of `Graph` instances is the
job of the `io()`-step. By default, TinkerPop supports three formats for importing and exporting graph data in
<<graphml,GraphML>>, <<graphson,GraphSON>>, and <<gryo,Gryo>>.
NOTE: Additional documentation for TinkerPop IO formats can be found in the link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/[IO Reference].
By itself the `io()`-step merely configures the kind of importing and exporting that is going
to occur and it is the follow-on call to the `read()` or `write()` step that determines which of those actions will
execute. Therefore, a typical usage of the `io()`-step would look like this:
[source,java]
----
g.io(someInputFile).read().iterate()
g.io(someOutputFile).write().iterate()
----
IMPORTANT: The commands above are still traversals and therefore require iteration to be executed, hence the use of
`iterate()` as a termination step.
By default, the `io()`-step will try to detect the right file format using the file name extension. To gain greater
control of the format use the `with()` step modulator to provide further information to `io()`. For example:
[source,java]
----
g.io(someInputFile).
with(IO.reader, IO.graphson).
read().iterate()
g.io(someOutputFile).
with(IO.writer,IO.graphml).
write().iterate()
----
The `IO` class is a helper for the `io()`-step that provides expressions that can be used to help configure it
and in this case it allows direct specification of the "reader" or "writer" to use. The "reader" actually refers to
a `GraphReader` implementation and the "writer" refers to a `GraphWriter` implementation. The implementations of
those interfaces provided by default are the standard TinkerPop implementations.
That default is an important point to consider for users. The default TinkerPop implementations are not designed with
massive, complex, parallel bulk loading in mind. They are designed to do single-threaded, OLTP-style loading of data
in the most generic way possible so as to accommodate the greatest number of graph databases out there. As such, from
a reading perspective, they work best for small datasets (or perhaps medium datasets where memory is plentiful and
time is not critical) that are loading to an empty graph - incremental loading is not supported. The story from the
writing perspective is not that different in there are no parallel operations in play, however streaming the output
to disk requires a single pass of the data without high memory requirements for larger datasets.
In general, TinkerPop recommends that users examine the native bulk import/export tools of the graph implementation
that they choose. Those tools will often outperform the `io()`-step and perhaps be easier to use with a greater
feature set. That said, graph providers do have the option to optimize `io()` to back it with their own
import/export utilities and therefore the default behavior provided by TinkerPop described above might be overridden
by the graph.
An excellent example of this lies in <<hadoop-gremlin,HadoopGraph>> with <<sparkgraphcomputer,SparkGraphComputer>>
which replaces the default single-threaded implementation with a more advanced OLAP style bulk import/export
functionality internally using <<clonevertexprogram,CloneVertexProgram>>. With this model, graphs of arbitrary size
can be imported/exported assuming that there is a Hadoop `InputFormat` or `OutputFormat` to support it.
IMPORTANT: Remote Gremlin Console users or Gremlin Language Variant (GLV) users (e.g. gremlin-python) who utilize
the `io()`-step should recall that their `read()` or `write()` operation will occur on the server and not locally
and therefore the file specified for import/export must be something accessible by the server.
GraphSON and Gryo formats are extensible allowing users and graph providers to extend supported serialization options.
These extensions are exposed through `IoRegistry` implementations. To apply an `IoRegistry` use the `with()` option
and the `IO.registry` key, where the value is either an actual `IoRegistry` instance or the fully qualified class
name of one.
[source,java]
----
g.io(someInputFile).
with(IO.reader, IO.gryo).
with(IO.registry, TinkerIoRegistryV3d0.instance())
read().iterate()
g.io(someOutputFile).
with(IO.writer,IO.graphson).
with(IO.registry, "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0")
write().iterate()
----
GLVs will obviously always be forced to use the latter form as they can't explicitly create an instance of an
`IoRegistry` to pass to the server (nor are `IoRegistry` instances necessarily serializable).
The version of the formats (e.g. GraphSON 2.0 or 3.0) utilized by `io()` is determined entirely by the `IO.reader` and
`IO.writer` configurations or their defaults. The defaults will always be the latest version for the current release
of TinkerPop. It is also possible for graph providers to override these defaults, so consult the documentation of the
underlying graph database in use for any details on that.
For more advanced configuration of `GraphReader` and `GraphWriter` operations (e.g. normalized output for GraphSON,
disabling class registrations for Gryo, etc.) then construct the appropriate `GraphReader` and `GraphWriter` using
the `build()` method on their implementations and use it directly. It can be passed directly to the `IO.reader` or
`IO.writer` options. Obviously, these are JVM based operations and thus not available to GLVs as portable features.
anchor:_graphml_reader_writer[]
[[graphml]]
==== GraphML
image:gremlin-graphml.png[width=350,float=left] The link:http://graphml.graphdrawing.org/[GraphML] file format is a
common XML-based representation of a graph. It is widely supported by graph-related tools and libraries making it a
solid interchange format for TinkerPop. In other words, if the intent is to work with graph data in conjunction with
applications outside of TinkerPop, GraphML may be the best choice to do that. Common use cases might be:
* Generate a graph using link:https://networkx.github.io/[NetworkX], export it with GraphML and import it to TinkerPop.
* Produce a subgraph and export it to GraphML to be consumed by and visualized in link:https://gephi.org/[Gephi].
* Migrate the data of an entire graph to a different graph database not supported by TinkerPop.
WARNING: GraphML is a "lossy" format in that it only supports primitive values for properties and does not have
support for `Graph` variables. It will use `toString` to serialize property values outside of those primitives.
WARNING: GraphML as a specification allows for `<edge>` and `<node>` elements to appear in any order. Most software
that writes GraphML (including as TinkerPop's `GraphMLWriter`) write `<node>` elements before `<edge>` elements. However it
is important to note that `GraphMLReader` will read this data in order and order can matter. This is because TinkerPop
does not allow the vertex label to be changed after the vertex has been created. Therefore, if an `<edge>` element
comes before the `<node>`, the label on the vertex will be ignored. It is thus better to order `<node>` elements in the
GraphML to appear before all `<edge>` elements if vertex labels are important to the graph.
[source,java]
----
g.io("graph.xml").read().iterate()
g.io("graph.xml").write().iterate()
----
NOTE: If using GraphML generated from TinkerPop 2.x, read more about its incompatibilities in the
link:https://tinkerpop.apache.org/docs/x.y.z/upgrade/#graphml-format[Upgrade Documentation].
anchor:graphson-reader-writer[]
[[graphson]]
==== GraphSON
image:gremlin-graphson.png[width=350,float=left] GraphSON is a link:http://json.org/[JSON]-based format extended
from earlier versions of TinkerPop. It is important to note that TinkerPop's GraphSON is not backwards compatible
with prior TinkerPop GraphSON versions. GraphSON has some support from graph-related application outside of TinkerPop,
but it is generally best used in two cases:
* A text format of the graph or its elements is desired (e.g. debugging, usage in source control, etc.)
* The graph or its elements need to be consumed by code that is not JVM-based (e.g. JavaScript, Python, .NET, etc.)
[source,java]
----
g.io("graph.json").read().iterate()
g.io("graph.json").write().iterate()
----
NOTE: Additional documentation for GraphSON can be found in the link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[IO Reference].
anchor:gryo-reader-writer[]
[[gryo]]
==== Gryo
image:gremlin-kryo.png[width=400,float=left] link:https://github.com/EsotericSoftware/kryo[Kryo] is a popular
serialization package for the JVM. Gremlin-Kryo is a binary `Graph` serialization format for use on the JVM by JVM
languages. It is designed to be space efficient, non-lossy and is promoted as the standard format to use when working
with graph data inside of the TinkerPop stack. A list of common use cases is presented below:
* Migration from one Gremlin Structure implementation to another (e.g. `TinkerGraph` to `Neo4jGraph`)
* Serialization of individual graph elements to be sent over the network to another JVM.
* Backups of in-memory graphs or subgraphs.
WARNING: When migrating between Gremlin Structure implementations, Kryo may not lose data, but it is important to
consider the features of each `Graph` and whether or not the data types supported in one will be supported in the
other. Failure to do so, may result in errors.
[source,java]
----
g.io("graph.kryo").read().iterate()
g.io("graph.kryo").write().iterate()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalSource.html#io-java.lang.String-++[`io(String)`]
[[is-step]]
=== Is Step
It is possible to filter scalar values using `is()`-step (*filter*).
[NOTE, caption=Python]
====
The term `is` is a reserved word in Python, and therefore must be referred to in Gremlin with `is_()`.
====
[gremlin-groovy,modern]
----
g.V().values('age').is(32)
g.V().values('age').is(lte(30))
g.V().values('age').is(inside(30, 40))
g.V().where(__.in('created').count().is(1)).values('name') <1>
g.V().where(__.in('created').count().is(gte(2))).values('name') <2>
g.V().where(__.in('created').values('age').
mean().is(inside(30d, 35d))).values('name') <3>
----
<1> Find projects having exactly one contributor.
<2> Find projects having two or more contributors.
<3> Find projects whose contributors average age is between 30 and 35.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#is-java.lang.Object-++[`is(Object)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#is-org.apache.tinkerpop.gremlin.process.traversal.P-++[`is(P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/P.html++[`P`]
[[key-step]]
=== Key Step
The `key()`-step (*map*) takes a `Property` and extracts the key from it.
[gremlin-groovy,theCrew]
----
g.V(1).properties().key()
g.V(1).properties().properties().key()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#key--++[`key()`]
[[label-step]]
=== Label Step
The `label()`-step (*map*) takes an `Element` and extracts its label from it.
[gremlin-groovy,modern]
----
g.V().label()
g.V(1).outE().label()
g.V(1).properties().label()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#label--++[`label()`]
[[limit-step]]
=== Limit Step
The `limit()`-step is analogous to <<range-step,`range()`-step>> save that the lower end range is set to 0.
[gremlin-groovy,modern]
----
g.V().limit(2)
g.V().range(0, 2)
----
The `limit()`-step can also be applied with `Scope.local`, in which case it operates on the incoming collection.
The examples below use the <<the-crew-toy-graph,The Crew>> toy data set.
[gremlin-groovy,theCrew]
----
g.V().valueMap().select('location').limit(local,2) <1>
g.V().valueMap().limit(local, 1) <2>
----
<1> `List<String>` for each vertex containing the first two locations.
<2> `Map<String, Object>` for each vertex, but containing only the first property value.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#limit-long-++[`limit(long)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#limit-org.apache.tinkerpop.gremlin.process.traversal.Scope-long-++[`limit(Scope,long)`]
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[local-step]]
=== Local Step
image::local-step.png[width=450]
A `GraphTraversal` operates on a continuous stream of objects. In many situations, it is important to operate on a
single element within that stream. To do such object-local traversal computations, `local()`-step exists (*branch*).
Note that the examples below use the <<the-crew-toy-graph,The Crew>> toy data set.
[gremlin-groovy,theCrew]
----
g.V().as('person').
properties('location').order().by('startTime',asc).limit(2).value().as('location').
select('person','location').by('name').by() <1>
g.V().as('person').
local(properties('location').order().by('startTime',asc).limit(2)).value().as('location').
select('person','location').by('name').by() <2>
----
<1> Get the first two people and their respective location according to the most historic location start time.
<2> For every person, get their two most historic locations.
The two traversals above look nearly identical save the inclusion of `local()` which wraps a section of the traversal
in a object-local traversal. As such, the `order().by()` and the `limit()` refer to a particular object, not to the
stream as a whole.
Local Step is quite similar in functionality to <<general-steps,Flat Map Step>> where it can often be confused.
`local()` propagates the traverser through the internal traversal as is without splitting/cloning it. Thus, its
a “global traversal” with local processing. Its use is subtle and primarily finds application in compilation
optimizations (i.e. when writing `TraversalStrategy` implementations. As another example consider:
[gremlin-groovy,modern]
----
g.V().both().barrier().flatMap(groupCount().by("name"))
g.V().both().barrier().local(groupCount().by("name"))
----
WARNING: The anonymous traversal of `local()` processes the current object "locally." In OLAP, where the atomic unit
of computing is the vertex and its local "star graph," it is important that the anonymous traversal does not leave
the confines of the vertex's star graph. In other words, it can not traverse to an adjacent vertex's properties or edges.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#local-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`local(Traversal)`]
[[loops-step]]
=== Loops Step
The `loops()`-step (*map*) extracts the number of times the `Traverser` has gone through the current loop.
[gremlin-groovy,modern]
----
g.V().emit(__.has("name", "marko").or().loops().is(2)).repeat(__.out()).values("name")
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#loops--++[`loops()`]
link:++https://tinkerpop.apache.org/docs/x.y.z/recipes/#looping[`Looping Recipes`]
[[match-step]]
=== Match Step
The `match()`-step (*map*) provides a more link:http://en.wikipedia.org/wiki/Declarative_programming[declarative]
form of graph querying based on the notion of link:http://en.wikipedia.org/wiki/Pattern_matching[pattern matching].
With `match()`, the user provides a collection of "traversal fragments," called patterns, that have variables defined
that must hold true throughout the duration of the `match()`. When a traverser is in `match()`, a registered
`MatchAlgorithm` analyzes the current state of the traverser (i.e. its history based on its
<<path-data-structure,path data>>), the runtime statistics of the traversal patterns, and returns a traversal-pattern
that the traverser should try next. The default `MatchAlgorithm` provided is called `CountMatchAlgorithm` and it
dynamically revises the pattern execution plan by sorting the patterns according to their filtering capabilities
(i.e. largest set reduction patterns execute first). For very large graphs, where the developer is uncertain of the
statistics of the graph (e.g. how many `knows`-edges vs. `worksFor`-edges exist in the graph), it is advantageous to
use `match()`, as an optimal plan will be determined automatically. Furthermore, some queries are much easier to
express via `match()` than with single-path traversals.
"Who created a project named 'lop' that was also created by someone who is 29 years old? Return the two creators."
image::match-step.png[width=500]
[gremlin-groovy,modern]
----
g.V().match(
__.as('a').out('created').as('b'),
__.as('b').has('name', 'lop'),
__.as('b').in('created').as('c'),
__.as('c').has('age', 29)).
select('a','c').by('name')
----
Note that the above can also be more concisely written as below which demonstrates that standard inner-traversals can
be arbitrarily defined.
[gremlin-groovy,modern]
----
g.V().match(
__.as('a').out('created').has('name', 'lop').as('b'),
__.as('b').in('created').has('age', 29).as('c')).
select('a','c').by('name')
----
In order to improve readability, `as()`-steps can be given meaningful labels which better reflect your domain. The
previous query can thus be written in a more expressive way as shown below.
[gremlin-groovy,modern]
----
g.V().match(
__.as('creators').out('created').has('name', 'lop').as('projects'), <1>
__.as('projects').in('created').has('age', 29).as('cocreators')). <2>
select('creators','cocreators').by('name') <3>
----
<1> Find vertices that created something and match them as 'creators', then find out what they created which is
named 'lop' and match these vertices as 'projects'.
<2> Using these 'projects' vertices, find out their creators aged 29 and remember these as 'cocreators'.
<3> Return the name of both 'creators' and 'cocreators'.
[[grateful-dead]]
.Grateful Dead
image::grateful-dead-schema.png[width=475]
`MatchStep` brings functionality similar to link:http://en.wikipedia.org/wiki/SPARQL[SPARQL] to Gremlin. Like SPARQL,
MatchStep conjoins a set of patterns applied to a graph. For example, the following traversal finds exactly those
songs which Jerry Garcia has both sung and written (using the Grateful Dead graph distributed in the `data/` directory):
[gremlin-groovy]
----
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
g.V().match(
__.as('a').has('name', 'Garcia'),
__.as('a').in('writtenBy').as('b'),
__.as('a').in('sungBy').as('b')).
select('b').values('name')
----
Among the features which differentiate `match()` from SPARQL are:
[gremlin-groovy,modern]
----
g.V().match(
__.as('a').out('created').has('name','lop').as('b'), <1>
__.as('b').in('created').has('age', 29).as('c'),
__.as('c').repeat(out()).times(2)). <2>
select('c').out('knows').dedup().values('name') <3>
----
<1> *Patterns of arbitrary complexity*: `match()` is not restricted to triple patterns or property paths.
<2> *Recursion support*: `match()` supports the branch-based steps within a pattern, including `repeat()`.
<3> *Imperative/declarative hybrid*: Before and after a `match()`, it is possible to leverage classic Gremlin traversals.
To extend point #3, it is possible to support going from imperative, to declarative, to imperative, ad infinitum.
[gremlin-groovy,modern]
----
g.V().match(
__.as('a').out('knows').as('b'),
__.as('b').out('created').has('name','lop')).
select('b').out('created').
match(
__.as('x').in('created').as('y'),
__.as('y').out('knows').as('z')).
select('z').values('name')
----
IMPORTANT: The `match()`-step is stateless. The variable bindings of the traversal patterns are stored in the path
history of the traverser. As such, the variables used over all `match()`-steps within a traversal are globally unique.
A benefit of this is that subsequent `where()`, `select()`, `match()`, etc. steps can leverage the same variables in
their analysis.
Like all other steps in Gremlin, `match()` is a function and thus, `match()` within `match()` is a natural consequence
of Gremlin's functional foundation (i.e. recursive matching).
[gremlin-groovy,modern]
----
g.V().match(
__.as('a').out('knows').as('b'),
__.as('b').out('created').has('name','lop'),
__.as('b').match(
__.as('b').out('created').as('c'),
__.as('c').has('name','ripple')).
select('c').as('c')).
select('a','c').by('name')
----
If a step-labeled traversal proceeds the `match()`-step and the traverser entering the `match()` is destined to bind
to a particular variable, then the previous step should be labeled accordingly.
[gremlin-groovy,modern]
----
g.V().as('a').out('knows').as('b').
match(
__.as('b').out('created').as('c'),
__.not(__.as('c').in('created').as('a'))).
select('a','b','c').by('name')
----
There are three types of `match()` traversal patterns.
. `as('a')...as('b')`: both the start and end of the traversal have a declared variable.
. `as('a')...`: only the start of the traversal has a declared variable.
. `...`: there are no declared variables.
If a variable is at the start of a traversal pattern it *must* exist as a label in the path history of the traverser
else the traverser can not go down that path. If a variable is at the end of a traversal pattern then if the variable
exists in the path history of the traverser, the traverser's current location *must* match (i.e. equal) its historic
location at that same label. However, if the variable does not exist in the path history of the traverser, then the
current location is labeled as the variable and thus, becomes a bound variable for subsequent traversal patterns. If a
traversal pattern does not have an end label, then the traverser must simply "survive" the pattern (i.e. not be
filtered) to continue to the next pattern. If a traversal pattern does not have a start label, then the traverser
can go down that path at any point, but will only go down that pattern once as a traversal pattern is executed once
and only once for the history of the traverser. Typically, traversal patterns that do not have a start and end label
are used in conjunction with `and()`, `or()`, and `where()`. Once the traverser has "survived" all the patterns (or at
least one for `or()`), `match()`-step analyzes the traverser's path history and emits a `Map<String,Object>` of the
variable bindings to the next step in the traversal.
[gremlin-groovy,modern]
----
g.V().as('a').out().as('b'). <1>
match( <2>
__.as('a').out().count().as('c'), <3>
__.not(__.as('a').in().as('b')), <4>
or( <5>
__.as('a').out('knows').as('b'),
__.as('b').in().count().as('c').and().as('c').is(gt(2)))). <6>
dedup('a','c'). <7>
select('a','b','c').by('name').by('name').by() <8>
----
<1> A standard, step-labeled traversal can come prior to `match()`.
<2> If the traverser's path prior to entering `match()` has requisite label values, then those historic values are bound.
<3> It is possible to use <<a-note-on-barrier-steps,barrier steps>> though they are computed locally to the pattern (as one would expect).
<4> It is possible to `not()` a pattern.
<5> It is possible to nest `and()`- and `or()`-steps for conjunction matching.
<6> Both infix and prefix conjunction notation is supported.
<7> It is possible to "distinct" the specified label combination.
<8> The bound values are of different types -- vertex ("a"), vertex ("b"), long ("c").
[[using-where-with-match]]
==== Using Where with Match
Match is typically used in conjunction with both `select()` (demonstrated previously) and `where()` (presented here).
A `where()`-step allows the user to further constrain the result set provided by `match()`.
[gremlin-groovy,modern]
----
g.V().match(
__.as('a').out('created').as('b'),
__.as('b').in('created').as('c')).
where('a', neq('c')).
select('a','c').by('name')
----
The `where()`-step can take either a `P`-predicate (example above) or a `Traversal` (example below). Using
`MatchPredicateStrategy`, `where()`-clauses are automatically folded into `match()` and thus, subject to the query
optimizer within `match()`-step.
[gremlin-groovy,modern]
----
traversal = g.V().match(
__.as('a').has(label,'person'), <1>
__.as('a').out('created').as('b'),
__.as('b').in('created').as('c')).
where(__.as('a').out('knows').as('c')). <2>
select('a','c').by('name'); null <3>
traversal.toString() <4>
traversal <5> <6>
traversal.toString() <7>
----
<1> Any `has()`-step traversal patterns that start with the match-key are pulled out of `match()` to enable the graph
system to leverage the filter for index lookups.
<2> A `where()`-step with a traversal containing variable bindings declared in `match()`.
<3> A useful trick to ensure that the traversal is not iterated by Gremlin Console.
<4> The string representation of the traversal prior to its strategies being applied.
<5> The Gremlin Console will automatically iterate anything that is an iterator or is iterable.
<6> Both marko and josh are co-developers and marko knows josh.
<7> The string representation of the traversal after the strategies have been applied (and thus, `where()` is folded into `match()`)
IMPORTANT: A `where()`-step is a filter and thus, variables within a `where()` clause are not globally bound to the
path of the traverser in `match()`. As such, `where()`-steps in `match()` are used for filtering, not binding.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#match-org.apache.tinkerpop.gremlin.process.traversal.Traversal...-++[`match(Traversal...)`]
[[math-step]]
=== Math Step
The `math()`-step (*math*) enables scientific calculator functionality within Gremlin. This step deviates from the common
function composition and nesting formalisms to provide an easy to read string-based math processor. Variables within the
equation map to scopes in Gremlin -- e.g. path labels, side-effects, or incoming map keys. This step supports
`by()`-modulation where the `by()`-modulators are applied in the order in which the variables are first referenced
within the equation. Note that the reserved variable `_` refers to the current numeric traverser object incoming to the
`math()`-step.
[gremlin-groovy,modern]
----
g.V().as('a').out('knows').as('b').math('a + b').by('age')
g.V().as('a').out('created').as('b').
math('b + a').
by(both().count().math('_ + 100')).
by('age')
g.withSideEffect('x',10).V().values('age').math('_ / x')
g.withSack(1).V(1).repeat(sack(sum).by(constant(1))).times(10).emit().sack().math('sin _')
----
The operators supported by the calculator include: `*`, `+`, `/`, `^`, and `%`. Furthermore, the following built in
functions are provided:
* `abs`: absolute value
* `acos`: arc cosine
* `asin`: arc sine
* `atan`: arc tangent
* `cbrt`: cubic root
* `ceil`: nearest upper integer
* `cos`: cosine
* `cosh`: hyperbolic cosine
* `exp`: euler's number raised to the power (`e^x`)
* `floor`: nearest lower integer
* `log`: logarithmus naturalis (base e)
* `log10`: logarithm (base 10)
* `log2`: logarithm (base 2)
* `sin`: sine
* `sinh`: hyperbolic sine
* `sqrt`: square root
* `tan`: tangent
* `tanh`: hyperbolic tangent
* `signum`: signum function
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#math-java.lang.String-++[`math(String)`]
[[max-step]]
=== Max Step
The `max()`-step (*map*) operates on a stream of comparable objects and determines which is the last object according
to its natural order in the stream.
[gremlin-groovy,modern]
----
g.V().values('age').max()
g.V().repeat(both()).times(3).values('age').max()
g.V().values('name').max()
----
When called as `max(local)` it determines the maximum value of the current, local object (not the objects in the
traversal stream). This works for `Collection` and `Comparable`-type objects.
[gremlin-groovy,modern]
----
g.V().values('age').fold().max(local)
----
When there are `null` values being evaluated the `null` objects are ignored, but if all values are recognized as `null`
the return value is `null`.
[gremlin-groovy,modern]
----
g.inject(null,10, 9, null).max()
g.inject([null,null,null]).max(local)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#max--++[`max()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#max-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`max(Scope)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[mean-step]]
=== Mean Step
The `mean()`-step (*map*) operates on a stream of numbers and determines the average of those numbers.
[gremlin-groovy,modern]
----
g.V().values('age').mean()
g.V().repeat(both()).times(3).values('age').mean() <1>
g.V().repeat(both()).times(3).values('age').dedup().mean()
----
<1> Realize that traversers are being bulked by `repeat()`. There may be more of a particular number than another,
thus altering the average.
When called as `mean(local)` it determines the mean of the current, local object (not the objects in the traversal
stream). This works for `Collection` and `Number`-type objects.
[gremlin-groovy,modern]
----
g.V().values('age').fold().mean(local)
----
If `mean()` encounters `null` values, they will be ignored (i.e. their traversers not counted toward toward the
divisor). If all traversers are `null` then the stream will return `null`.
[gremlin-groovy,modern]
----
g.inject(null,10, 9, null).mean()
g.inject([null,null,null]).mean(local)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#mean--++[`mean()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#mean-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`mean(Scope)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[min-step]]
=== Min Step
The `min()`-step (*map*) operates on a stream of comparable objects and determines which is the first object according
to its natural order in the stream.
[gremlin-groovy,modern]
----
g.V().values('age').min()
g.V().repeat(both()).times(3).values('age').min()
g.V().values('name').min()
----
When called as `min(local)` it determines the minimum value of the current, local object (not the objects in the
traversal stream). This works for `Collection` and `Comparable`-type objects.
[gremlin-groovy,modern]
----
g.V().values('age').fold().min(local)
----
When there are `null` values being evaluated the `null` objects are ignored, but if all values are recognized as `null`
the return value is `null`.
[gremlin-groovy,modern]
----
g.inject(null,10, 9, null).min()
g.inject([null,null,null]).min(local)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#min--++[`min()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#min-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`min(Scope)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[none-step]]
=== None Step
The `none()`-step (*filter*) filters all objects from a traversal stream. It is especially useful for to traversals
that are executed remotely where returning results is not useful and the traversal is only meant to generate
side-effects. Choosing not to return results saves in serialization and network costs as the objects are filtered on
the remote end and not returned to the client side. Typically, this step does not need to be used directly and is
quietly used by the `iterate()` terminal step which appends `none()` to the traversal before actually cycling through
results.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/process/traversal/Traversal.html#none--++[`none()`]
link:++https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/process/traversal/Traversal.html#iterate--++[`iterate()`]
[[not-step]]
=== Not Step
The `not()`-step (*filter*) removes objects from the traversal stream when the traversal provided as an argument
returns an object.
[NOTE, caption=Groovy]
====
The term `not` is a reserved word in Groovy, and when therefore used as part of an anonymous traversal must be referred
to in Gremlin with the double underscore `__.not()`.
====
[NOTE, caption=Python]
====
The term `not` is a reserved word in Python, and therefore must be referred to in Gremlin with `not_()`.
====
[gremlin-groovy,modern]
----
g.V().not(hasLabel('person')).elementMap()
g.V().hasLabel('person').
not(out('created').count().is(gt(1))).values('name') <1>
----
<1> josh created two projects and vadas none
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#not-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`not(Traversal)`]
[[option-step]]
=== Option Step
An option to a <<general-steps,`branch()`>> or <<choose-step,`choose()`>>.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#option-M-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`option(Object,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#option-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`option(Traversal)`]
[[optional-step]]
=== Optional Step
The `optional()`-step (*branch/flatMap*) returns the result of the specified traversal if it yields a result else it returns the calling
element, i.e. the `identity()`.
[gremlin-groovy,modern]
----
g.V(2).optional(out('knows')) <1>
g.V(2).optional(__.in('knows')) <2>
----
<1> vadas does not have an outgoing knows-edge so vadas is returned.
<2> vadas does have an incoming knows-edge so marko is returned.
`optional` is particularly useful for lifting entire graphs when used in conjunction with `path` or `tree`.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').optional(out('knows').optional(out('created'))).path() <1>
----
<1> Returns the paths of everybody followed by who they know followed by what they created.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#optional-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`optional(Traversal)`]
[[or-step]]
=== Or Step
The `or()`-step ensures that at least one of the provided traversals yield a result (*filter*). Please see
<<and-step,`and()`>> for and-semantics.
[NOTE, caption=Python]
====
The term `or` is a reserved word in Python, and therefore must be referred to in Gremlin with `or_()`.
====
[gremlin-groovy,modern]
----
g.V().or(
__.outE('created'),
__.inE('created').count().is(gt(1))).
values('name')
----
The `or()`-step can take an arbitrary number of traversals. At least one of the traversals must produce at least one
output for the original traverser to pass to the next step.
An link:http://en.wikipedia.org/wiki/Infix_notation[infix notation] can be used as well.
[gremlin-groovy,modern]
----
g.V().where(outE('created').or().outE('knows')).values('name')
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#or-org.apache.tinkerpop.gremlin.process.traversal.Traversal...-++[`or(Traversal...)`]
[[order-step]]
=== Order Step
When the objects of the traversal stream need to be sorted, `order()`-step (*map*) can be leveraged.
[gremlin-groovy,modern]
----
g.V().values('name').order()
g.V().values('name').order().by(desc)
g.V().hasLabel('person').order().by('age', asc).values('name')
----
One of the most traversed objects in a traversal is an `Element`. An element can have properties associated with it
(i.e. key/value pairs). In many situations, it is desirable to sort an element traversal stream according to a
comparison of their properties.
[gremlin-groovy,modern]
----
g.V().values('name')
g.V().order().by('name',asc).values('name')
g.V().order().by('name',desc).values('name')
----
The `order()`-step allows the user to provide an arbitrary number of comparators for primary, secondary, etc. sorting.
In the example below, the primary ordering is based on the outgoing created-edge count. The secondary ordering is
based on the age of the person.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').order().by(outE('created').count(), asc).
by('age', asc).values('name')
g.V().hasLabel('person').order().by(outE('created').count(), asc).
by('age', desc).values('name')
----
Randomizing the order of the traversers at a particular point in the traversal is possible with `Order.shuffle`.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').order().by(shuffle)
g.V().hasLabel('person').order().by(shuffle)
----
It is possible to use `order(local)` to order the current local object and not the entire traversal stream. This works for
`Collection`- and `Map`-type objects. For any other object, the object is returned unchanged.
[gremlin-groovy,modern]
----
g.V().values('age').fold().order(local).by(desc) <1>
g.V().values('age').order(local).by(desc) <2>
g.V().groupCount().by(inE().count()).order(local).by(values, desc) <3>
g.V().groupCount().by(inE().count()).order(local).by(keys, asc) <4>
----
<1> The ages are gathered into a list and then that list is sorted in decreasing order.
<2> The ages are not gathered and thus `order(local)` is "ordering" single integers and thus, does nothing.
<3> The `groupCount()` map is ordered by its values in decreasing order.
<4> The `groupCount()` map is ordered by its keys in increasing order.
NOTE: The `values` and `keys` enums are from `Column` which is used to select "columns" from a `Map`, `Map.Entry`, or `Path`.
If a property key does not exist, then it will be treated as `null` which will sort it first for `Order.asc` and last
for `Order.desc`.
[gremlin-groovy,modern]
----
g.V().order().by("age").elementMap()
----
NOTE: Prior to version 3.3.4, ordering was defined by `Order.incr` for ascending order and `Order.decr` for descending
order. Those tokens were deprecated and eventually removed in 3.5.0.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#order--++[`order()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#order-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`order(Scope)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Order.html++[`Order`]
[[pagerank-step]]
=== PageRank Step
The `pageRank()`-step (*map*/*sideEffect*) calculates link:http://en.wikipedia.org/wiki/PageRank[PageRank] using
<<pagerankvertexprogram,`PageRankVertexProgram`>>.
IMPORTANT: The `pageRank()`-step is a `VertexComputing`-step and as such, can only be used against a graph that
supports `GraphComputer` (OLAP).
[gremlin-groovy,modern]
----
g = traversal().withEmbedded(graph).withComputer()
g.V().pageRank().with(PageRank.propertyName, 'friendRank').values('pageRank')
g.V().hasLabel('person').
pageRank().
with(PageRank.edges, __.outE('knows')).
with(PageRank.propertyName, 'friendRank').
order().by('friendRank',desc).
elementMap('name','friendRank')
----
Note the use of the `with()` modulating step which provides configuration options to the algorithm. It takes
configuration keys from the `PageRank` and is automatically imported to the Gremlin Console.
The <<explain-step,`explain()`>>-step can be used to understand how the traversal is compiled into multiple
`GraphComputer` jobs.
[gremlin-groovy,modern]
----
g = traversal().withEmbedded(graph).withComputer()
g.V().hasLabel('person').
pageRank().
with(PageRank.edges, __.outE('knows')).
with(PageRank.propertyName, 'friendRank').
order().by('friendRank',desc).
elementMap('name','friendRank').explain()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#pageRank--++[`pageRank()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#pageRank-double-++[`pageRank(double)`]
[[path-step]]
=== Path Step
A traverser is transformed as it moves through a series of steps within a traversal. The history of the traverser is
realized by examining its path with `path()`-step (*map*).
image::path-step.png[width=650]
[gremlin-groovy,modern]
----
g.V().out().out().values('name')
g.V().out().out().values('name').path()
----
If edges are required in the path, then be sure to traverse those edges explicitly.
[gremlin-groovy,modern]
----
g.V().outE().inV().outE().inV().path()
----
It is possible to post-process the elements of the path in a round-robin fashion via `by()`.
[gremlin-groovy,modern]
----
g.V().out().out().path().by('name').by('age')
----
Finally, because `by()`-based post-processing, nothing prevents triggering yet another traversal. In the traversal
below, for each element of the path traversed thus far, if its a person (as determined by having an `age`-property),
then get all of their creations, else if its a creation, get all the people that created it.
[gremlin-groovy,modern]
----
g.V().out().out().path().by(
choose(hasLabel('person'),
out('created').values('name'),
__.in('created').values('name')).fold())
----
It's possible to limit the path using the <<to-step,`to()`>> or <<from-step,`from()`>> step modulators.
[gremlin-groovy,modern]
----
g.V().has('person','name','vadas').as('e').
in('knows').
out('knows').where(neq('e')).
path().by('name') <1>
g.V().has('person','name','vadas').as('e').
in('knows').as('m').
out('knows').where(neq('e')).
path().to('m').by('name') <2>
g.V().has('person','name','vadas').as('e').
in('knows').as('m').
out('knows').where(neq('e')).
path().from('m').by('name') <3>
----
<1> Obtain the full path from vadas to josh.
<2> Save the middle node, marko, and use the `to()` modulator to show only the path from vadas to marko
<3> Use the `from()` mdoulator to show only the path from marko to josh
WARNING: Generating path information is expensive as the history of the traverser is stored into a Java list. With
numerous traversers, there are numerous lists. Moreover, in an OLAP <<graphcomputer,`GraphComputer`>> environment
this becomes exceedingly prohibitive as there are traversers emanating from all vertices in the graph in parallel.
In OLAP there are optimizations provided for traverser populations, but when paths are calculated (and each traverser
is unique due to its history), then these optimizations are no longer possible.
[[path-data-structure]]
==== Path Data Structure
The `Path` data structure is an ordered list of objects, where each object is associated to a `Set<String>` of
labels. An example is presented below to demonstrate both the `Path` API as well as how a traversal yields labeled paths.
image::path-data-structure.png[width=350]
[gremlin-groovy,modern]
----
path = g.V(1).as('a').has('name').as('b').
out('knows').out('created').as('c').
has('name','ripple').values('name').as('d').
identity().as('e').path().next()
path.size()
path.objects()
path.labels()
path.a
path.b
path.c
path.d == path.e
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#path--++[`path()`]
[[peerpressure-step]]
=== PeerPressure Step
The `peerPressure()`-step (*map*/*sideEffect*) clusters vertices using <<peerpressurevertexprogram,`PeerPressureVertexProgram`>>.
IMPORTANT: The `peerPressure()`-step is a `VertexComputing`-step and as such, can only be used against a graph that supports `GraphComputer` (OLAP).
[gremlin-groovy,modern]
----
g = traversal().withEmbedded(graph).withComputer()
g.V().peerPressure().with(PeerPressure.propertyName, 'cluster').values('cluster')
g.V().hasLabel('person').
peerPressure().
with(PeerPressure.propertyName, 'cluster').
group().
by('cluster').
by('name')
----
Note the use of the `with()` modulating step which provides configuration options to the algorithm. It takes
configuration keys from the `PeerPressure` class and is automatically imported to the Gremlin Console.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#peerPressure--++[`peerPressure()`]
[[profile-step]]
=== Profile Step
The `profile()`-step (*sideEffect*) exists to allow developers to profile their traversals to determine statistical
information like step runtime, counts, etc.
WARNING: Profiling a Traversal will impede the Traversal's performance. This overhead is mostly excluded from the
profile results, but durations are not exact. Thus, durations are best considered in relation to each other.
[gremlin-groovy,modern]
----
g.V().out('created').repeat(both()).times(3).hasLabel('person').values('age').sum().profile()
----
The `profile()`-step generates a `TraversalMetrics` sideEffect object that contains the following information:
* `Step`: A step within the traversal being profiled.
* `Count`: The number of _represented_ traversers that passed through the step.
* `Traversers`: The number of traversers that passed through the step.
* `Time (ms)`: The total time the step was actively executing its behavior.
* `% Dur`: The percentage of total time spent in the step.
image:gremlin-exercise.png[width=120,float=left] It is important to understand the difference between "Count"
and "Traversers". Traversers can be merged and as such, when two traversers are "the same" they may be aggregated
into a single traverser. That new traverser has a `Traverser.bulk()` that is the sum of the two merged traverser
bulks. On the other hand, the `Count` represents the sum of all `Traverser.bulk()` results and thus, expresses the
number of "represented" (not enumerated) traversers. `Traversers` will always be less than or equal to `Count`.
For traversal compilation information, please see <<explain-step,`explain()`>>-step.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#profile--++[`profile()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#profile-java.lang.String-++[`profile(String)`]
[[project-step]]
=== Project Step
The `project()`-step (*map*) projects the current object into a `Map<String,Object>` keyed by provided labels. It is similar
to <<select-step,`select()`>>-step, save that instead of retrieving and modulating historic traverser state, it modulates
the current state of the traverser.
[gremlin-groovy,modern]
----
g.V().has('name','marko').
project('id', 'name', 'out', 'in').
by(id).
by('name').
by(outE().count()).
by(inE().count())
g.V().has('name','marko').
project('name', 'friendsNames').
by('name').
by(out('knows').values('name').fold())
g.V().out('created').
project('a','b').
by('name').
by(__.in('created').count()).
order().by(select('b'),desc).
select('a')
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#project-java.lang.String-java.lang.String...-++[`project(String,String...)`]
[[program-step]]
=== Program Step
The `program()`-step (*map*/*sideEffect*) is the "lambda" step for `GraphComputer` jobs. The step takes a
<<vertexprogram,`VertexProgram`>> as an argument and will process the incoming graph accordingly. Thus, the user
can create their own `VertexProgram` and have it execute within a traversal. The configuration provided to the
vertex program includes:
* `gremlin.vertexProgramStep.rootTraversal` is a serialization of a `PureTraversal` form of the root traversal.
* `gremlin.vertexProgramStep.stepId` is the step string id of the `program()`-step being executed.
The user supplied `VertexProgram` can leverage that information accordingly within their vertex program. Example uses
are provided below.
WARNING: Developing a `VertexProgram` is for expert users. Moreover, developing one that can be used effectively within
a traversal requires yet more expertise. This information is recommended to advanced users with a deep understanding of the
mechanics of Gremlin OLAP (<<graphcomputer,`GraphComputer`>>).
[source,java]
----
private TraverserSet<Object> haltedTraversers;
public void loadState(Graph graph, Configuration configuration) {
VertexProgram.super.loadState(graph, configuration);
this.traversal = PureTraversal.loadState(configuration, VertexProgramStep.ROOT_TRAVERSAL, graph);
this.programStep = new TraversalMatrix<>(this.traversal.get()).getStepById(configuration.getString(ProgramVertexProgramStep.STEP_ID));
// if the traversal sideEffects will be used in the computation, add them as memory compute keys
this.memoryComputeKeys.addAll(MemoryTraversalSideEffects.getMemoryComputeKeys(this.traversal.get()));
// if master-traversal traversers may be propagated, create a memory compute key
this.memoryComputeKeys.add(MemoryComputeKey.of(TraversalVertexProgram.HALTED_TRAVERSERS, Operator.addAll, false, false));
// returns an empty traverser set if there are no halted traversers
this.haltedTraversers = TraversalVertexProgram.loadHaltedTraversers(configuration);
}
public void storeState(Configuration configuration) {
VertexProgram.super.storeState(configuration);
// if halted traversers is null or empty, it does nothing
TraversalVertexProgram.storeHaltedTraversers(configuration, this.haltedTraversers);
}
public void setup(Memory memory) {
if(!this.haltedTraversers.isEmpty()) {
// do what you like with the halted master traversal traversers
}
// once used, no need to keep that information around (master)
this.haltedTraversers = null;
}
public void execute(Vertex vertex, Messenger messenger, Memory memory) {
// once used, no need to keep that information around (workers)
if(null != this.haltedTraversers)
this.haltedTraversers = null;
if(vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent()) {
// haltedTraversers in execute() represent worker-traversal traversers
// for example, from a traversal of the form g.V().out().program(...)
TraverserSet<Object> haltedTraversers = vertex.value(TraversalVertexProgram.HALTED_TRAVERSERS);
// create a new halted traverser set that can be used by the next OLAP job in the chain
// these are worker-traversers that are distributed throughout the graph
TraverserSet<Object> newHaltedTraversers = new TraverserSet<>();
haltedTraversers.forEach(traverser -> {
newHaltedTraversers.add(traverser.split(traverser.get().toString(), this.programStep));
});
vertex.property(VertexProperty.Cardinality.single, TraversalVertexProgram.HALTED_TRAVERSERS, newHaltedTraversers);
// it is possible to create master-traversers that are localized to the master traversal (this is how results are ultimately delivered back to the user)
memory.add(TraversalVertexProgram.HALTED_TRAVERSERS,
new TraverserSet<>(this.traversal().get().getTraverserGenerator().generate("an example", this.programStep, 1l)));
}
public boolean terminate(Memory memory) {
// the master-traversal will have halted traversers
assert memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS);
TraverserSet<String> haltedTraversers = memory.get(TraversalVertexProgram.HALTED_TRAVERSERS);
// it will only have the traversers sent to the master traversal via memory
assert haltedTraversers.stream().map(Traverser::get).filter(s -> s.equals("an example")).findAny().isPresent();
// it will not contain the worker traversers distributed throughout the vertices
assert !haltedTraversers.stream().map(Traverser::get).filter(s -> !s.equals("an example")).findAny().isPresent();
return true;
}
----
NOTE: The test case `ProgramTest` in `gremlin-test` has an example vertex program called `TestProgram` that demonstrates
all the various ways in which traversal and traverser information is propagated within a vertex program and ultimately
usable by other vertex programs (including `TraversalVertexProgram`) down the line in an OLAP compute chain.
Finally, an example is provided using `PageRankVertexProgram` which doesn't use <<pagerank-step,`pageRank()`>>-step.
[gremlin-groovy,modern]
----
g = traversal().withEmbedded(graph).withComputer()
g.V().hasLabel('person').
program(PageRankVertexProgram.build().property('rank').create(graph)).
order().by('rank', asc).
elementMap('name', 'rank')
----
[[properties-step]]
=== Properties Step
The `properties()`-step (*map*) extracts properties from an `Element` in the traversal stream.
[gremlin-groovy,theCrew]
----
g.V(1).properties()
g.V(1).properties('location').valueMap()
g.V(1).properties('location').has('endTime').valueMap()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#properties-java.lang.String...-++[`properties(String...)`]
[[propertymap-step]]
=== PropertyMap Step
The `propertiesMap()`-step yields a Map representation of the properties of an element.
[gremlin-groovy,modern]
----
g.V().propertyMap()
g.V().propertyMap('age')
g.V().propertyMap('age','blah')
g.E().propertyMap()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#propertyMap-java.lang.String...-++[`propertyMap(String...)`]
[[range-step]]
=== Range Step
As traversers propagate through the traversal, it is possible to only allow a certain number of them to pass through
with `range()`-step (*filter*). When the low-end of the range is not met, objects are continued to be iterated. When
within the low (inclusive) and high (exclusive) range, traversers are emitted. When above the high range, the traversal
breaks out of iteration. Finally, the use of `-1` on the high range will emit remaining traversers after the low range
begins.
[gremlin-groovy,modern]
----
g.V().range(0,3)
g.V().range(1,3)
g.V().range(1, -1)
g.V().repeat(both()).times(1000000).emit().range(6,10)
----
The `range()`-step can also be applied with `Scope.local`, in which case it operates on the incoming collection.
For example, it is possible to produce a `Map<String, String>` for each traversed path, but containing only the second
property value (the "b" step).
[gremlin-groovy,modern]
----
g.V().as('a').out().as('b').in().as('c').select('a','b','c').by('name').range(local,1,2)
----
The next example uses the <<the-crew-toy-graph,The Crew>> toy data set. It produces a `List<String>` containing the
second and third location for each vertex.
[gremlin-groovy,theCrew]
----
g.V().valueMap().select('location').range(local, 1, 3)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#range-long-long-++[`range(long,long)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#range-org.apache.tinkerpop.gremlin.process.traversal.Scope-long-long-++[`range(Scope,long,long)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[read-step]]
=== Read Step
The `read()`-step is not really a "step" but a step modulator in that it modifies the functionality of the `io()`-step.
More specifically, it tells the `io()`-step that it is expected to use its configuration to read data from some
location. Please see the <<io-step,documentation>> for `io()`-step for more complete details on usage.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#read--[`read()`]
[[repeat-step]]
=== Repeat Step
image::gremlin-fade.png[width=350]
The `repeat()`-step (*branch*) is used for looping over a traversal given some break predicate. Below are some
examples of `repeat()`-step in action.
[gremlin-groovy,modern]
----
g.V(1).repeat(out()).times(2).path().by('name') <1>
g.V().until(has('name','ripple')).
repeat(out()).path().by('name') <2>
----
<1> do-while semantics stating to do `out()` 2 times.
<2> while-do semantics stating to break if the traverser is at a vertex named "ripple".
IMPORTANT: There are two modulators for `repeat()`: `until()` and `emit()`. If `until()` comes after `repeat()` it is
do/while looping. If `until()` comes before `repeat()` it is while/do looping. If `emit()` is placed after `repeat()`,
it is evaluated on the traversers leaving the repeat-traversal. If `emit()` is placed before `repeat()`, it is
evaluated on the traversers prior to entering the repeat-traversal.
The `repeat()`-step also supports an "emit predicate", where the predicate for an empty argument `emit()` is
`true` (i.e. `emit() == emit{true}`). With `emit()`, the traverser is split in two -- the traverser exits the code
block as well as continues back within the code block (assuming `until()` holds true).
[gremlin-groovy,modern]
----
g.V(1).repeat(out()).times(2).emit().path().by('name') <1>
g.V(1).emit().repeat(out()).times(2).path().by('name') <2>
----
<1> The `emit()` comes after `repeat()` and thus, emission happens after the `repeat()` traversal is executed. Thus,
no one vertex paths exist.
<2> The `emit()` comes before `repeat()` and thus, emission happens prior to the `repeat()` traversal being executed.
Thus, one vertex paths exist.
The `emit()`-modulator can take an arbitrary predicate.
[gremlin-groovy,modern]
----
g.V(1).repeat(out()).times(2).emit(has('lang')).path().by('name')
----
image::repeat-step.png[width=500]
[gremlin-groovy,modern]
----
g.V(1).repeat(out()).times(2).emit().path().by('name')
----
The first time through the `repeat()`, the vertices lop, vadas, and josh are seen. Given that `loops==1`, the
traverser repeats. However, because the emit-predicate is declared true, those vertices are emitted. The next time through
`repeat()`, the vertices traversed are ripple and lop (Josh's created projects, as lop and vadas have no out edges).
Given that `loops==2`, the until-predicate fails and ripple and lop are emitted.
Therefore, the traverser has seen the vertices: lop, vadas, josh, ripple, and lop.
`repeat()`-steps may be nested inside each other or inside the `emit()` or `until()` predicates and they can also be 'named' by passing a string as the first parameter to `repeat()`. The loop counter of a named repeat step can be accessed within the looped context with `loops(loopName)` where `loopName` is the name set whe creating the `repeat()`-step.
[gremlin-groovy,modern]
----
g.V(1).
repeat(out("knows")).
until(repeat(out("created")).emit(has("name", "lop"))) <1>
g.V(6).
repeat('a', both('created').simplePath()).
emit(repeat('b', both('knows')).
until(loops('b').as('b').where(loops('a').as('b'))).
hasId(2)).dedup() <2>
----
<1> Starting from vertex 1, keep going taking outgoing 'knows' edges until the vertex was created by 'lop'.
<2> Starting from vertex 6, keep taking created edges in either direction until the vertex is same distance from vertex 2 over knows edges as it is from vertex 6 over created edges.
Finally, note that both `emit()` and `until()` can take a traversal and in such, situations, the predicate is
determined by `traversal.hasNext()`. A few examples are provided below.
[gremlin-groovy,modern]
----
g.V(1).repeat(out()).until(hasLabel('software')).path().by('name') <1>
g.V(1).emit(hasLabel('person')).repeat(out()).path().by('name') <2>
g.V(1).repeat(out()).until(outE().count().is(0)).path().by('name') <3>
----
<1> Starting from vertex 1, keep taking outgoing edges until a software vertex is reached.
<2> Starting from vertex 1, and in an infinite loop, emit the vertex if it is a person and then traverser the outgoing edges.
<3> Starting from vertex 1, keep taking outgoing edges until a vertex is reached that has no more outgoing edges.
WARNING: The anonymous traversal of `emit()` and `until()` (not `repeat()`) process their current objects "locally."
In OLAP, where the atomic unit of computing is the vertex and its local "star graph," it is important that the
anonymous traversals do not leave the confines of the vertex's star graph. In other words, they can not traverse to
an adjacent vertex's properties or edges.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#repeat-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`repeat(Traversal)`]
link:++https://tinkerpop.apache.org/docs/x.y.z/recipes/#looping[`Looping Recipes`]
[[sack-step]]
=== Sack Step
image:gremlin-sacks-running.png[width=175,float=right] A traverser can contain a local data structure called a "sack".
The `sack()`-step is used to read and write sacks (*sideEffect* or *map*). Each sack of each traverser is created
when using `GraphTraversal.withSack(initialValueSupplier,splitOperator?,mergeOperator?)`.
* *Initial value supplier*: A `Supplier` providing the initial value of each traverser's sack.
* *Split operator*: a `UnaryOperator` that clones the traverser's sack when the traverser splits. If no split operator
is provided, then `UnaryOperator.identity()` is assumed.
* *Merge operator*: A `BinaryOperator` that unites two traverser's sack when they are merged. If no merge operator is
provided, then traversers with sacks can not be merged.
Two trivial examples are presented below to demonstrate the *initial value supplier*. In the first example below, a
traverser is created at each vertex in the graph (`g.V()`), with a 1.0 sack (`withSack(1.0f)`), and then the sack
value is accessed (`sack()`). In the second example, a random float supplier is used to generate sack values.
[gremlin-groovy,modern]
----
g.withSack(1.0f).V().sack()
rand = new Random()
g.withSack {rand.nextFloat()}.V().sack()
----
A more complicated initial value supplier example is presented below where the sack values are used in a running
computation and then emitted at the end of the traversal. When an edge is traversed, the edge weight is multiplied
by the sack value (`sack(mult).by('weight')`). Note that the <<by-step,`by()`>>-modulator can be any arbitrary traversal.
[gremlin-groovy,modern]
----
g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2)
g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2).sack()
g.withSack(1.0f).V().repeat(outE().sack(mult).by('weight').inV()).times(2).path().
by().by('weight')
----
image:gremlin-sacks-standing.png[width=100,float=left] When complex objects are used (i.e. non-primitives), then a
*split operator* should be defined to ensure that each traverser gets a clone of its parent's sack. The first example
does not use a split operator and as such, the same map is propagated to all traversers (a global data structure). The
second example, demonstrates how `Map.clone()` ensures that each traverser's sack contains a unique, local sack.
[gremlin-groovy,modern]
----
g.withSack {[:]}.V().out().out().
sack {m,v -> m[v.value('name')] = v.value('lang'); m}.sack() // BAD: single map
g.withSack {[:]}{it.clone()}.V().out().out().
sack {m,v -> m[v.value('name')] = v.value('lang'); m}.sack() // GOOD: cloned map
----
NOTE: For primitives (i.e. integers, longs, floats, etc.), a split operator is not required as a primitives are
encoded in the memory address of the sack, not as a reference to an object.
If a *merge operator* is not provided, then traversers with sacks can not be bulked. However, in many situations,
merging the sacks of two traversers at the same location is algorithmically sound and good to provide so as to gain
the bulking optimization. In the examples below, the binary merge operator is `Operator.sum`. Thus, when two traverser
merge, their respective sacks are added together.
[gremlin-groovy,modern]
----
g.withSack(1.0d).V(1).out('knows').in('knows') <1>
g.withSack(1.0d).V(1).out('knows').in('knows').sack() <2>
g.withSack(1.0d, sum).V(1).out('knows').in('knows').sack() <3>
g.withSack(1.0d).V(1).local(outE('knows').barrier(normSack).inV()).in('knows').barrier() <4>
g.withSack(1.0d).V(1).local(outE('knows').barrier(normSack).inV()).in('knows').barrier().sack() <5>
g.withSack(1.0d,sum).V(1).local(outE('knows').barrier(normSack).inV()).in('knows').barrier().sack() <6>
g.withBulk(false).withSack(1.0f,sum).V(1).local(outE('knows').barrier(normSack).inV()).in('knows').barrier().sack() <7>
g.withBulk(false).withSack(1.0f).V(1).local(outE('knows').barrier(normSack).inV()).in('knows').barrier().sack()<8>
----
<1> We find vertex 1 twice because he knows two other people
<2> Without a merge operation the sack values are 1.0.
<3> When specifying `sum` as the merge operation, the sack values are 2.0 because of bulking
<4> Like 1, but using barrier internally
<5> The `local(...barrier(normSack)...)` ensures that all traversers leaving vertex 1 have an evenly distributed amount of the initial 1.0 "energy" (50-50), i.e. the sack is 0.5 on each result
<6> Like 3, but using `sum` as merge operator leads to the expected 1.0
<7> There is now a single traverser with bulk of 2 and sack of 1.0 and thus, setting `withBulk(false)`` yields the expected 1.0
<8> Like 7, but without the `sum` operator
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#sack--++[`sack()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#sack-java.util.function.BiFunction-++[`sack(BiFunction)`]
[[sample-step]]
=== Sample Step
The `sample()`-step is useful for sampling some number of traversers previous in the traversal.
[gremlin-groovy,modern]
----
g.V().outE().sample(1).values('weight')
g.V().outE().sample(1).by('weight').values('weight')
g.V().outE().sample(2).by('weight').values('weight')
----
One of the more interesting use cases for `sample()` is when it is used in conjunction with <<local-step,`local()`>>.
The combination of the two steps supports the execution of link:http://en.wikipedia.org/wiki/Random_walk[random walks].
In the example below, the traversal starts are vertex 1 and selects one edge to traverse based on a probability
distribution generated by the weights of the edges. The output is always a single path as by selecting a single edge,
the traverser never splits and continues down a single path in the graph.
[gremlin-groovy,modern]
----
g.V(1).
repeat(local(bothE().sample(1).by('weight').otherV())).
times(5)
g.V(1).
repeat(local(bothE().sample(1).by('weight').otherV())).
times(5).
path()
g.V(1).
repeat(local(bothE().sample(1).by('weight').otherV())).
times(10).
path()
----
As a clarification, note that in the above example `local()` is not strictly required as it only does the random walk
over a single vertex, but note what happens without it if multiple vertices are traversed:
[gremlin-groovy,modern]
----
g.V().repeat(bothE().sample(1).by('weight').otherV()).times(5).path()
----
The use of `local()` ensures that the traversal over `bothE()` occurs once per vertex traverser that passes through,
thus allowing one random walk per vertex.
[gremlin-groovy,modern]
----
g.V().repeat(local(bothE().sample(1).by('weight').otherV())).times(5).path()
----
So, while not strictly required, it is likely better to be explicit with the use of `local()` so that the proper intent
of the traversal is expressed.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#sample-int-++[`sample(int)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#sample-org.apache.tinkerpop.gremlin.process.traversal.Scope-int-++[`sample(Scope,int)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[select-step]]
=== Select Step
link:http://en.wikipedia.org/wiki/Functional_programming[Functional languages] make use of function composition and
lazy evaluation to create complex computations from primitive operations. This is exactly what `Traversal` does. One
of the differentiating aspects of Gremlin's data flow approach to graph processing is that the flow need not always go
"forward," but in fact, can go back to a previously seen area of computation. Examples include <<path-step,`path()`>>
as well as the `select()`-step (*map*). There are two general ways to use `select()`-step.
. Select labeled steps within a path (as defined by `as()` in a traversal).
. Select objects out of a `Map<String,Object>` flow (i.e. a sub-map).
The first use case is demonstrated via example below.
[gremlin-groovy,modern]
----
g.V().as('a').out().as('b').out().as('c') // no select
g.V().as('a').out().as('b').out().as('c').select('a','b','c')
g.V().as('a').out().as('b').out().as('c').select('a','b')
g.V().as('a').out().as('b').out().as('c').select('a','b').by('name')
g.V().as('a').out().as('b').out().as('c').select('a') <1>
----
<1> If the selection is one step, no map is returned.
When there is only one label selected, then a single object is returned. This is useful for stepping back in a
computation and easily moving forward again on the object reverted to.
[gremlin-groovy,modern]
----
g.V().out().out()
g.V().out().out().path()
g.V().as('x').out().out().select('x')
g.V().out().as('x').out().select('x')
g.V().out().out().as('x').select('x') // pointless
----
NOTE: When executing a traversal with `select()` on a standard traversal engine (i.e. OLTP), `select()` will do its
best to avoid calculating the path history and instead, will rely on a global data structure for storing the currently
selected object. As such, if only a subset of the path walked is required, `select()` should be used over the more
resource intensive <<path-step,`path()`>>-step.
When the set of keys or values (i.e. columns) of a path or map are needed, use `select(keys)` and `select(values)`,
respectively. This is especially useful when one is only interested in the top N elements in a `groupCount()`
ranking.
[gremlin-groovy]
----
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
g.V().hasLabel('song').out('followedBy').groupCount().by('name').
order(local).by(values,desc).limit(local, 5)
g.V().hasLabel('song').out('followedBy').groupCount().by('name').
order(local).by(values,desc).limit(local, 5).select(keys)
g.V().hasLabel('song').out('followedBy').groupCount().by('name').
order(local).by(values,desc).limit(local, 5).select(keys).unfold()
----
Similarly, for extracting the values from a path or map.
[gremlin-groovy]
----
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
g.V().hasLabel('song').out('sungBy').groupCount().by('name') <1>
g.V().hasLabel('song').out('sungBy').groupCount().by('name').select(values) <2>
g.V().hasLabel('song').out('sungBy').groupCount().by('name').select(values).unfold().
groupCount().order(local).by(values,desc).limit(local, 5) <3>
----
<1> Which artist sung how many songs?
<2> Get an anonymized set of song repertoire sizes.
<3> What are the 5 most common song repertoire sizes?
WARNING: Note that `by()`-modulation is not supported with `select(keys)` and `select(values)`.
There is also an option to supply a `Pop` operation to `select()` to manipulate `List` objects in the `Traverser`:
[gremlin-groovy,modern]
----
g.V(1).as("a").repeat(out().as("a")).times(2).select(first, "a")
g.V(1).as("a").repeat(out().as("a")).times(2).select(last, "a")
g.V(1).as("a").repeat(out().as("a")).times(2).select(all, "a")
----
In addition to the previously shown examples, where `select()` was used to select an element based on a static key, `select()` can also accept a traversal
that emits a key.
WARNING: Since the key used by `select(<traversal>)` cannot be determined at compile time, the `TraversalSelectStep` enables full path tracking.
[gremlin-groovy,modern]
----
g.withSideEffect("alias", ["marko":"okram"]).V(). <1>
values("name").sack(assign). <2>
optional(select("alias").select(sack())) <3>
----
<1> Inject a name alias map and start the traversal from all vertices.
<2> Select all `name` values and store them as the current traverser's sack value.
<3> Optionally select the alias for the current name from the injected map.
[[using-where-with-select]]
==== Using Where with Select
Like <<match-step,`match()`>>-step, it is possible to use `where()`, as where is a filter that processes
`Map<String,Object>` streams.
[gremlin-groovy,modern]
----
g.V().as('a').out('created').in('created').as('b').select('a','b').by('name') <1>
g.V().as('a').out('created').in('created').as('b').
select('a','b').by('name').where('a',neq('b')) <2>
g.V().as('a').out('created').in('created').as('b').
select('a','b'). <3>
where('a',neq('b')).
where(__.as('a').out('knows').as('b')).
select('a','b').by('name')
----
<1> A standard `select()` that generates a `Map<String,Object>` of variables bindings in the path (i.e. `a` and `b`)
for the sake of a running example.
<2> The `select().by('name')` projects each binding vertex to their name property value and `where()` operates to
ensure respective `a` and `b` strings are not the same.
<3> The first `select()` projects a vertex binding set. A binding is filtered if `a` vertex equals `b` vertex. A
binding is filtered if `a` doesn't know `b`. The second and final `select()` projects the name of the vertices.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#select-org.apache.tinkerpop.gremlin.structure.Column-++[`select(Column)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#select-org.apache.tinkerpop.gremlin.process.traversal.Pop-java.lang.String-++[`select(Pop,String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#select-java.lang.String-++[`select(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#select-java.lang.String-java.lang.String-java.lang.String...-++[`select(String,String,String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#select-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`select(Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#select-org.apache.tinkerpop.gremlin.process.traversal.Pop-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`select(Pop,Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/structure/Column.html++[`Column`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Pop.html++[`Pop`]
[[shortestpath-step]]
=== ShortestPath step
The `shortestPath()`-step provides an easy way to find shortest non-cyclic paths in a graph. It is configurable
using the `with()`-modulator with the options given below.
IMPORTANT: The `shortestPath()`-step is a `VertexComputing`-step and as such, can only be used against a graph
that supports `GraphComputer` (OLAP).
[width="100%",cols="3,3,15,5",options="header"]
|=========================================================
| Key | Type | Description | Default
| `target` | `Traversal` | Sets a filter traversal for the end vertices (e.g. `+__.has('name','marko')+`). | all vertices (`+__.identity()+`)
| `edges` | `Traversal` or `Direction` | Sets a `Traversal` that emits the edges to traverse from the current vertex or the `Direction` to traverse during the shortest path discovery. | `Direction.BOTH`
| `distance` | `Traversal` or `String` | Sets the `Traversal` that calculates the distance for the current edge or the name of an edge property to use for the distance calculations. | `__.constant(1)`
| `maxDistance` | `Number` | Sets the distance limit for all shortest paths. | none
| `includeEdges` | `Boolean` | Whether to include edges in the result or not. | `false`
|=========================================================
[gremlin-groovy,modern]
----
g = g.withComputer()
g.V().shortestPath() <1>
g.V().has('person','name','marko').shortestPath() <2>
g.V().shortestPath().with(ShortestPath.target, __.has('name','peter')) <3>
g.V().shortestPath().
with(ShortestPath.edges, Direction.IN).
with(ShortestPath.target, __.has('name','josh')) <4>
g.V().has('person','name','marko').
shortestPath().
with(ShortestPath.target, __.has('name','josh')) <5>
g.V().has('person','name','marko').
shortestPath().
with(ShortestPath.target, __.has('name','josh')).
with(ShortestPath.distance, 'weight') <6>
g.V().has('person','name','marko').
shortestPath().
with(ShortestPath.target, __.has('name','josh')).
with(ShortestPath.includeEdges, true) <7>
----
<1> Find all shortest paths.
<2> Find all shortest paths from `marko`.
<3> Find all shortest paths to `peter`.
<4> Find all in-directed paths to `josh`.
<5> Find all shortest paths from `marko` to `josh`.
<6> Find all shortest paths from `marko` to `josh` using a custom distance property.
<7> Find all shortest paths from `marko` to `josh` and include edges in the result.
[gremlin-groovy,modern]
----
g.inject(g.withComputer().V().shortestPath().
with(ShortestPath.distance, 'weight').
with(ShortestPath.includeEdges, true).
with(ShortestPath.maxDistance, 1).toList().toArray()).
map(unfold().values('name','weight').fold()) <1>
----
<1> Find all shortest paths using a custom distance property and limit the distance to 1. Inject the result into a OLTP `GraphTraversal` in order to be able to select properties from all elements in all paths.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#shortestPath--++[`shortestPath()`]
[[simplepath-step]]
=== SimplePath Step
image::simplepath-step.png[width=400]
When it is important that a traverser not repeat its path through the graph, `simplePath()`-step should be used
(*filter*). The <<path-data-structure,path>> information of the traverser is analyzed and if the path has repeated
objects in it, the traverser is filtered. If cyclic behavior is desired, see <<cyclicpath-step,`cyclicPath()`>>.
[gremlin-groovy,modern]
----
g.V(1).both().both()
g.V(1).both().both().simplePath()
g.V(1).both().both().simplePath().path()
g.V().out().as('a').out().as('b').out().as('c').
simplePath().by(label).
path()
g.V().out().as('a').out().as('b').out().as('c').
simplePath().
by(label).
from('b').
to('c').
path().
by('name')
----
By using the `from()` and `to()` modulators traversers can ensure that only certain sections of the path are acyclic.
[gremlin-groovy]
----
g.addV().property(id, 'A').as('a').
addV().property(id, 'B').as('b').
addV().property(id, 'C').as('c').
addV().property(id, 'D').as('d').
addE('link').from('a').to('b').
addE('link').from('b').to('c').
addE('link').from('c').to('d').iterate()
g.V('A').repeat(both().simplePath()).times(3).path() <1>
g.V('D').repeat(both().simplePath()).times(3).path() <2>
g.V('A').as('a').
repeat(both().simplePath().from('a')).times(3).as('b').
repeat(both().simplePath().from('b')).times(3).path() <3>
----
<1> Traverse all acyclic 3-hop paths starting from vertex `A`
<2> Traverse all acyclic 3-hop paths starting from vertex `D`
<3> Traverse all acyclic 3-hop paths starting from vertex `A` and from there again all 3-hop paths. The second path may
cross the vertices from the first path.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#simplePath--++[`simplePath()`]
[[skip-step]]
=== Skip Step
The `skip()`-step is analogous to <<range-step,`range()`-step>> save that the higher end range is set to -1.
[gremlin-groovy,modern]
----
g.V().values('age').order()
g.V().values('age').order().skip(2)
g.V().values('age').order().range(2, -1)
----
The `skip()`-step can also be applied with `Scope.local`, in which case it operates on the incoming collection.
[gremlin-groovy,modern]
----
g.V().hasLabel('person').filter(outE('created')).as('p'). <1>
map(out('created').values('name').fold()).
project('person','primary','other').
by(select('p').by('name')).
by(limit(local, 1)). <2>
by(skip(local, 1)) <3>
----
<1> For each person who created something...
<2> ...select the first project (random order) as `primary` and...
<3> ...select all other projects as `other`.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#skip-long-++[`skip(long)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#skip-org.apache.tinkerpop.gremlin.process.traversal.Scope-long-++[`skip(Scope,long)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[subgraph-step]]
=== Subgraph Step
image::subgraph-logo.png[width=380]
Extracting a portion of a graph from a larger one for analysis, visualization or other purposes is a fairly common
use case for graph analysts and developers. The `subgraph()`-step (*sideEffect*) provides a way to produce an
link:http://mathworld.wolfram.com/Edge-InducedSubgraph.html[edge-induced subgraph] from virtually any traversal.
The following example demonstrates how to produce the "knows" subgraph:
[gremlin-groovy,modern]
----
subGraph = g.E().hasLabel('knows').subgraph('subGraph').cap('subGraph').next() <1>
sg = traversal().withEmbedded(subGraph)
sg.E() <2>
----
<1> As this function produces "edge-induced" subgraphs, `subgraph()` must be called at edge steps.
<2> The subgraph contains only "knows" edges.
A more common subgraphing use case is to get all of the graph structure surrounding a single vertex:
[gremlin-groovy,modern]
----
subGraph = g.V(3).repeat(__.inE().subgraph('subGraph').outV()).times(3).cap('subGraph').next() <1>
sg = traversal().withEmbedded(subGraph)
sg.E()
----
<1> Starting at vertex `3`, traverse 3 steps away on in-edges, outputting all of that into the subgraph.
The above example is purposely brief so as to focus on `subgraph()` usage, however, it may not be the most optimal
method for constructing the subgraph. For instance, if the graph had cycles, it would attempt to reconstruct parts
of the subgraph which are already present. The duplicates would not be created, but it would involve some unnecessary
processing. If the only interest of the traversal was to populate the subgraph, it would be better to include
`simplePath()` to filter out those cycles, as in `__.inE().subgraph('subGraph').outV().simplePath()`. From another
perspective, it might also make some sense to use `dedup()` to avoid traversing the same vertices repeatedly where
two vertices shared the multiple edges between them, as in `__.inE().dedup().subgraph('subGraph').outV().dedup()`.
There can be multiple `subgraph()` calls within the same traversal. Each operating against either the same graph
(i.e. same side-effect key) or different graphs (i.e. different side-effect keys).
[gremlin-groovy,modern]
----
t = g.V().outE('knows').subgraph('knowsG').inV().outE('created').subgraph('createdG').
inV().inE('created').subgraph('createdG').iterate()
traversal().withEmbedded(t.sideEffects.get('knowsG')).E()
traversal().withEmbedded(t.sideEffects.get('createdG')).E()
----
TinkerGraph is the ideal (and default) `Graph` into which a subgraph is extracted as it's fast, in-memory, and supports
user-supplied identifiers which can be any Java object. It is this last feature that needs some focus as many
TinkerPop-enabled graphs have complex identifier types and TinkerGraph's ability to consume those makes it a perfect
host for an incoming subgraph. However care needs to be taken when using the elements of the TinkerGraph subgraph.
The original graph's identifiers may be preserved, but the elements of the graph are now TinkerGraph objects like,
`TinkerVertex` and `TinkerEdge`. As a result, they can not be used directly in Gremlin running against the original
graph. For example, the following traversal would likely return an error:
[source,text]
----
Vertex v = sg.V().has('name','marko').next(); <1>
List<Vertex> vertices = g.V(v).out().toList(); <2>
----
<1> Here "sg" is a reference to a TinkerGraph subgraph and "v" is a `TinkerVertex`.
<2> The `g.V(v)` has the potential to fail as "g" is the original `Graph` instance and not a TinkerGraph - it could
reject the `TinkerVertex` instance as it will not recognize it.
It is safer to wrap the `TinkerVertex` in a `ReferenceVertex` or simply reference the `id()` as follows:
[source,text]
----
Vertex v = sg.V().has('name','marko').next();
List<Vertex> vertices = g.V(v.id()).out().toList();
// OR
Vertex v = new ReferenceVertex(sg.V().has('name','marko').next());
List<Vertex> vertices = g.V(v).out().toList();
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#subgraph-java.lang.String-++[`subgraph(String)`]
[[sum-step]]
=== Sum Step
The `sum()`-step (*map*) operates on a stream of numbers and sums the numbers together to yield a result. Note that
the current traverser number is multiplied by the traverser bulk to determine how many such numbers are being
represented.
[gremlin-groovy,modern]
----
g.V().values('age').sum()
g.V().repeat(both()).times(3).values('age').sum()
----
When called as `sum(local)` it determines the sum of the current, local object (not the objects in the traversal
stream). This works for `Collection`-type objects.
[gremlin-groovy,modern]
----
g.V().values('age').fold().sum(local)
----
When there are `null` values being evaluated the `null` objects are ignored, but if all values are recognized as `null`
the return value is `null`.
[gremlin-groovy,modern]
----
g.inject(null,10, 9, null).sum()
g.inject([null,null,null]).sum(local)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#sum--++[`sum()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#sum-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`sum(Scope)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[tail-step]]
=== Tail Step
image::tail-step.png[width=530]
The `tail()`-step is analogous to <<limit-step,`limit()`>>-step, except that it emits the last `n`-objects instead of
the first `n`-objects.
[gremlin-groovy,modern]
----
g.V().values('name').order()
g.V().values('name').order().tail() <1>
g.V().values('name').order().tail(1) <2>
g.V().values('name').order().tail(3) <3>
----
<1> Last name (alphabetically).
<2> Same as statement 1.
<3> Last three names.
The `tail()`-step can also be applied with `Scope.local`, in which case it operates on the incoming collection.
[gremlin-groovy,modern]
----
g.V().as('a').out().as('a').out().as('a').select('a').by(tail(local)).values('name') <1>
g.V().as('a').out().as('a').out().as('a').select('a').by(unfold().values('name').fold()).tail(local) <2>
g.V().as('a').out().as('a').out().as('a').select('a').by(unfold().values('name').fold()).tail(local, 2) <3>
g.V().elementMap().tail(local) <4>
----
<1> Only the most recent name from the "a" step (`List<Vertex>` becomes `Vertex`).
<2> Same result as statement 1 (`List<String>` becomes `String`).
<3> `List<String>` for each path containing the last two names from the 'a' step.
<4> `Map<String, Object>` for each vertex, but containing only the last property value.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#tail--++[`tail()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#tail-long-++[`tail(long)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#tail-org.apache.tinkerpop.gremlin.process.traversal.Scope-++[`tail(Scope)`]
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#tail-org.apache.tinkerpop.gremlin.process.traversal.Scope-long-++[`tail(Scope,long)`]
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html++[`Scope`]
[[timelimit-step]]
=== TimeLimit Step
In many situations, a graph traversal is not about getting an exact answer as its about getting a relative ranking.
A classic example is link:http://en.wikipedia.org/wiki/Recommender_system[recommendation]. What is desired is a
relative ranking of vertices, not their absolute rank. Next, it may be desirable to have the traversal execute for
no more than 2 milliseconds. In such situations, `timeLimit()`-step (*filter*) can be used.
image::timelimit-step.png[width=400]
NOTE: The method `clock(int runs, Closure code)` is a utility preloaded in the <<gremlin-console,Gremlin Console>>
that can be used to time execution of a body of code.
[gremlin-groovy,modern]
----
g.V().repeat(both().groupCount('m')).times(16).cap('m').order(local).by(values,desc).next()
clock(1) {g.V().repeat(both().groupCount('m')).times(16).cap('m').order(local).by(values,desc).next()}
g.V().repeat(timeLimit(2).both().groupCount('m')).times(16).cap('m').order(local).by(values,desc).next()
clock(1) {g.V().repeat(timeLimit(2).both().groupCount('m')).times(16).cap('m').order(local).by(values,desc).next()}
----
In essence, the relative order is respected, even through the number of traversers at each vertex is not. The primary
benefit being that the calculation is guaranteed to complete at the specified time limit (in milliseconds). Finally,
note that the internal clock of `timeLimit()`-step starts when the first traverser enters it. When the time limit is
reached, any `next()` evaluation of the step will yield a `NoSuchElementException` and any `hasNext()` evaluation will
yield `false`.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#timeLimit-long-++[`timeLimit(long)`]
[[to-step]]
=== To Step
The `to()`-step is not an actual step, but instead is a "step-modulator" similar to <<as-step,`as()`>> and
<<by-step,`by()`>>. If a step is able to accept traversals or strings then `to()` is the
means by which they are added. The general pattern is `step().to()`. See <<from-step,`from()`>>-step.
The list of steps that support `to()`-modulation are: <<simplepath-step,`simplePath()`>>, <<cyclicpath-step,`cyclicPath()`>>,
<<path-step,`path()`>>, and <<addedge-step,`addE()`>>.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#to-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`to(Direction,String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#to-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`to(String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#to-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`to(Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#to-org.apache.tinkerpop.gremlin.structure.Vertex-++[`to(Vertex)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#toE-org.apache.tinkerpop.gremlin.structure.Direction-java.lang.String...-++[`toE(Direction,String)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#toV-org.apache.tinkerpop.gremlin.structure.Direction-++[`toV(Direction)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/structure/Direction.html++[`Direction`]
[[tree-step]]
=== Tree Step
From any one element (i.e. vertex or edge), the emanating paths from that element can be aggregated to form a
link:http://en.wikipedia.org/wiki/Tree_(data_structure)[tree]. Gremlin provides `tree()`-step (*sideEffect*) for such
this situation.
image::tree-step.png[width=450]
[gremlin-groovy,modern]
----
tree = g.V().out().out().tree().next()
----
It is important to see how the paths of all the emanating traversers are united to form the tree.
image::tree-step2.png[width=500]
The resultant tree data structure can then be manipulated (see `Tree` JavaDoc).
[gremlin-groovy,modern]
----
tree = g.V().out().out().tree().by('name').next()
tree['marko']
tree['marko']['josh']
tree.getObjectsAtDepth(3)
----
Note that when using `by()`-modulation, tree nodes are combined based on projection uniqueness, not on the
uniqueness of the original objects being projected. For instance:
[gremlin-groovy,modern]
----
g.V().has('name','josh').out('created').values('name').tree() <1>
g.V().has('name','josh').out('created').values('name').
tree().by('name').by(label).by() <2>
----
<1> When the `tree()` is created, vertex 3 and 5 are unique and thus, form unique branches in the tree structure.
<2> When the `tree()` is `by()`-modulated by `label`, then vertex 3 and 5 are both "software" and thus are merged to a single node in the tree.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#tree--++[`tree()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#tree-java.lang.String-++[`tree(String)`]
[[unfold-step]]
=== Unfold Step
If the object reaching `unfold()` (*flatMap*) is an iterator, iterable, or map, then it is unrolled into a linear
form. If not, then the object is simply emitted. Please see <<fold-step,`fold()`>> step for the inverse behavior.
[gremlin-groovy,modern]
----
g.V(1).out().fold().inject('gremlin',[1.23,2.34])
g.V(1).out().fold().inject('gremlin',[1.23,2.34]).unfold()
----
Note that `unfold()` does not recursively unroll iterators. Instead, `repeat()` can be used to for recursive unrolling.
[gremlin-groovy,modern]
----
inject(1,[2,3,[4,5,[6]]])
inject(1,[2,3,[4,5,[6]]]).unfold()
inject(1,[2,3,[4,5,[6]]]).repeat(unfold()).until(count(local).is(1)).unfold()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#unfold--++[`unfold()`]
[[union-step]]
=== Union Step
image::union-step.png[width=650]
The `union()`-step (*branch*) supports the merging of the results of an arbitrary number of traversals. When a
traverser reaches a `union()`-step, it is copied to each of its internal steps. The traversers emitted from `union()`
are the outputs of the respective internal traversals.
[gremlin-groovy,modern]
----
g.V(4).union(
__.in().values('age'),
out().values('lang'))
g.V(4).union(
__.in().values('age'),
out().values('lang')).path()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#union-org.apache.tinkerpop.gremlin.process.traversal.Traversal...-++[`union(Traversal...)`]
[[until-step]]
=== Until Step
The `until`-step is not an actual step, but is instead a step modulator for `<<repeat-step,repeat()>>` (find more
documentation on the `until()` there).
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#until-java.util.function.Predicate-++[`until(Predicate)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#until-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`until(Traversal)`]
[[value-step]]
=== Value Step
The `value()`-step (*map*) takes a `Property` and extracts the value from it.
[gremlin-groovy,theCrew]
----
g.V(1).properties().value()
g.V(1).properties().properties().value()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#value--++[`value()`]
[[valuemap-step]]
=== ValueMap Step
The `valueMap()`-step yields a `Map` representation of the properties of an element.
IMPORTANT: This step is the precursor to the <<elementmap-step,elementMap()-step>>. Users should typically
choose `elementMap()` unless they utilize multi-properties. `elementMap()` effectively mimics the functionality of
`valueMap(true).by(unfold())` as a single step.
[gremlin-groovy,modern]
----
g.V().valueMap()
g.V().valueMap('age')
g.V().valueMap('age','blah')
g.E().valueMap()
----
It is important to note that the map of a vertex maintains a list of values for each key. The map of an edge or
vertex-property represents a single property (not a list). The reason is that vertices in TinkerPop leverage
<<vertex-properties,vertex properties>> which support multiple values per key. Using the <<the-crew-toy-graph,
"The Crew">> toy graph, the point is made explicit.
[gremlin-groovy,theCrew]
----
g.V().valueMap()
g.V().has('name','marko').properties('location')
g.V().has('name','marko').properties('location').valueMap()
----
To turn list of values into single items, the `by()` modulator can be used as shown below.
[gremlin-groovy,theCrew]
----
g.V().valueMap().by(unfold())
g.V().valueMap('name','location').by().by(unfold())
----
If the `id`, `label`, `key`, and `value` of the `Element` is desired, then the `with()` modulator can be used to
trigger its insertion into the returned map.
[gremlin-groovy,theCrew]
----
g.V().hasLabel('person').valueMap().with(WithOptions.tokens)
g.V().hasLabel('person').valueMap('name').with(WithOptions.tokens, WithOptions.labels)
g.V().hasLabel('person').properties('location').valueMap().with(WithOptions.tokens, WithOptions.values)
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#valueMap-java.lang.String...-++[`valueMap(String...)`]
[[values-step]]
=== Values Step
The `values()`-step (*map*) extracts the values of properties from an `Element` in the traversal stream.
[gremlin-groovy,theCrew]
----
g.V(1).values()
g.V(1).values('location')
g.V(1).properties('location').values()
----
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#values-java.lang.String...-++[`values(String...)`]
[[vertex-steps]]
=== Vertex Steps
image::vertex-steps.png[width=350]
The vertex steps (*flatMap*) are fundamental to the Gremlin language. Via these steps, its possible to "move" on the
graph -- i.e. traverse.
* `out(string...)`: Move to the outgoing adjacent vertices given the edge labels.
* `in(string...)`: Move to the incoming adjacent vertices given the edge labels.
* `both(string...)`: Move to both the incoming and outgoing adjacent vertices given the edge labels.
* `outE(string...)`: Move to the outgoing incident edges given the edge labels.
* `inE(string...)`: Move to the incoming incident edges given the edge labels.
* `bothE(string...)`: Move to both the incoming and outgoing incident edges given the edge labels.
* `outV()`: Move to the outgoing vertex.
* `inV()`: Move to the incoming vertex.
* `bothV()`: Move to both vertices.
* `otherV()` : Move to the vertex that was not the vertex that was moved from.
[NOTE, caption=Groovy]
====
The term `in` is a reserved word in Groovy, and when therefore used as part of an anonymous traversal must be referred
to in Gremlin with the double underscore `__.in()`.
====
[NOTE, caption=Javascript]
====
The term `in` is a reserved word in Javascript, and therefore must be referred to in Gremlin with `in_()`.
====
[NOTE, caption=Python]
====
The term `in` is a reserved word in Python, and therefore must be referred to in Gremlin with `in_()`.
====
[gremlin-groovy,modern]
----
g.V(4)
g.V(4).outE() <1>
g.V(4).inE('knows') <2>
g.V(4).inE('created') <3>
g.V(4).bothE('knows','created','blah')
g.V(4).bothE('knows','created','blah').otherV()
g.V(4).both('knows','created','blah')
g.V(4).outE().inV() <4>
g.V(4).out() <5>
g.V(4).inE().outV()
g.V(4).inE().bothV()
----
<1> All outgoing edges.
<2> All incoming knows-edges.
<3> All incoming created-edges.
<4> Moving forward touching edges and vertices.
<5> Moving forward only touching vertices.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#both-java.lang.String...-++[`both(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#bothE-java.lang.String...-++[`bothE(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#bothV--++[`bothV()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#in-java.lang.String...-++[`in(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#inE-java.lang.String...-++[`inE(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#inV--++[`inV()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#otherV--++[`otherV()`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#out-java.lang.String...-++[`out(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#outE-java.lang.String...-++[`outE(String...)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#outV--++[`outV()`]
[[where-step]]
=== Where Step
The `where()`-step filters the current object based on either the object itself (`Scope.local`) or the path history
of the object (`Scope.global`) (*filter*). This step is typically used in conjunction with either
<<match-step,`match()`>>-step or <<select-step,`select()`>>-step, but can be used in isolation.
[gremlin-groovy,modern]
----
g.V(1).as('a').out('created').in('created').where(neq('a')) <1>
g.withSideEffect('a',['josh','peter']).V(1).out('created').in('created').values('name').where(within('a')) <2>
g.V(1).out('created').in('created').where(out('created').count().is(gt(1))).values('name') <3>
----
<1> Who are marko's collaborators, where marko can not be his own collaborator? (predicate)
<2> Of the co-creators of marko, only keep those whose name is josh or peter. (using a sideEffect)
<3> Which of marko's collaborators have worked on more than 1 project? (using a traversal)
IMPORTANT: Please see <<using-where-with-match,`match().where()`>> and <<using-where-with-select,`select().where()`>>
for how `where()` can be used in conjunction with `Map<String,Object>` projecting steps -- i.e. `Scope.local`.
A few more examples of filtering an arbitrary object based on a anonymous traversal is provided below.
[gremlin-groovy,modern]
----
g.V().where(out('created')).values('name') <1>
g.V().out('knows').where(out('created')).values('name') <2>
g.V().where(out('created').count().is(gte(2))).values('name') <3>
g.V().where(out('knows').where(out('created'))).values('name') <4>
g.V().where(__.not(out('created'))).where(__.in('knows')).values('name') <5>
g.V().where(__.not(out('created')).and().in('knows')).values('name') <6>
g.V().as('a').out('knows').as('b').
where('a',gt('b')).
by('age').
select('a','b').
by('name') <7>
g.V().as('a').out('knows').as('b').
where('a',gt('b').or(eq('b'))).
by('age').
by('age').
by(__.in('knows').values('age')).
select('a','b').
by('name') <8>
----
<1> What are the names of the people who have created a project?
<2> What are the names of the people that are known by someone one and have created a project?
<3> What are the names of the people how have created two or more projects?
<4> What are the names of the people who know someone that has created a project? (This only works in OLTP -- see the `WARNING` below)
<5> What are the names of the people who have not created anything, but are known by someone?
<6> The concatenation of `where()`-steps is the same as a single `where()`-step with an and'd clause.
<7> Marko knows josh and vadas but is only older than vadas.
<8> Marko is younger than josh, but josh knows someone equal in age to marko (which is marko).
WARNING: The anonymous traversal of `where()` processes the current object "locally". In OLAP, where the atomic unit
of computing is the vertex and its local "star graph," it is important that the anonymous traversal does not leave
the confines of the vertex's star graph. In other words, it can not traverse to an adjacent vertex's properties or
edges.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#where-org.apache.tinkerpop.gremlin.process.traversal.P-++[`where(P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#where-java.lang.String-org.apache.tinkerpop.gremlin.process.traversal.P-++[`where(String,P)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#where-org.apache.tinkerpop.gremlin.process.traversal.Traversal-++[`where(Traversal)`],
link:++https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/P.html++[`P`]
[[with-step]]
=== With Step
The `with()`-step is not an actual step, but is instead a "step modulator" which modifies the behavior of the step
prior to it. The `with()`-step provides additional "configuration" information to steps that implement the `Configuring`
interface. Steps that allow for this type of modulation will explicitly state so in their documentation.
[NOTE, caption=Javascript]
====
The term `with` is a reserved word in Javascript, and therefore must be referred to in Gremlin with `with_()`.
====
[NOTE, caption=Python]
====
The term `with` is a reserved word in Python, and therefore must be referred to in Gremlin with `with_()`.
====
[[write-step]]
=== Write Step
The `write()`-step is not really a "step" but a step modulator in that it modifies the functionality of the `io()`-step.
More specifically, it tells the `io()`-step that it is expected to use its configuration to write data to some
location. Please see the <<io-step,documentation>> for `io()`-step for more complete details on usage.
*Additional References*
link:++https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#write--[`write()`]
[[a-note-on-predicates]]
== A Note on Predicates
A `P` is a predicate of the form `Function<Object,Boolean>`. That is, given some object, return true or false. As of
the release of TinkerPop 3.4.0, Gremlin also supports simple text predicates, which only work on `String` values. The `TextP`
text predicates extend the `P` predicates, but are specialized in that they are of the form `Function<String,Boolean>`.
The provided predicates are outlined in the table below and are used in various steps such as <<has-step,`has()`>>-step,
<<where-step,`where()`>>-step, <<is-step,`is()`>>-step, etc.
[width="100%",cols="3,15",options="header"]
|=========================================================
| Predicate | Description
| `P.eq(object)` | Is the incoming object equal to the provided object?
| `P.neq(object)` | Is the incoming object not equal to the provided object?
| `P.lt(number)` | Is the incoming number less than the provided number?
| `P.lte(number)` | Is the incoming number less than or equal to the provided number?
| `P.gt(number)` | Is the incoming number greater than the provided number?
| `P.gte(number)` | Is the incoming number greater than or equal to the provided number?
| `P.inside(number,number)` | Is the incoming number greater than the first provided number and less than the second?
| `P.outside(number,number)` | Is the incoming number less than the first provided number or greater than the second?
| `P.between(number,number)` | Is the incoming number greater than or equal to the first provided number and less than the second?
| `P.within(objects...)` | Is the incoming object in the array of provided objects?
| `P.without(objects...)` | Is the incoming object not in the array of the provided objects?
| `TextP.startingWith(string)` | Does the incoming `String` start with the provided `String`?
| `TextP.endingWith(string)` | Does the incoming `String` end with the provided `String`?
| `TextP.containing(string)` | Does the incoming `String` contain the provided `String`?
| `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
| `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
| `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
|=========================================================
[gremlin-groovy]
----
eq(2)
not(neq(2)) <1>
not(within('a','b','c'))
not(within('a','b','c')).test('d') <2>
not(within('a','b','c')).test('a')
within(1,2,3).and(not(eq(2))).test(3) <3>
inside(1,4).or(eq(5)).test(3) <4>
inside(1,4).or(eq(5)).test(5)
between(1,2) <5>
not(between(1,2))
----
<1> The `not()` of a `P`-predicate is another `P`-predicate.
<2> `P`-predicates are arguments to various steps which internally `test()` the incoming value.
<3> `P`-predicates can be and'd together.
<4> `P`-predicates can be or' together.
<5> `and()` is a `P`-predicate and thus, a `P`-predicate can be composed of multiple `P`-predicates.
TIP: To reduce the verbosity of predicate expressions, it is good to
`import static org.apache.tinkerpop.gremlin.process.traversal.P.*`.
Finally, note that <<where-step,`where()`>>-step takes a `P<String>`. The provided string value refers to a variable
binding, not to the explicit string value.
[gremlin-groovy,modern]
----
g.V().as('a').both().both().as('b').count()
g.V().as('a').both().both().as('b').where('a',neq('b')).count()
----
NOTE: It is possible for graph system providers and users to extend `P` and provide new predicates. For instance, a
`regex(pattern)` could be a graph system specific `P`.
[[a-note-on-barrier-steps]]
== A Note on Barrier Steps
image:barrier.png[width=165,float=right] Gremlin is primarily a
link:http://en.wikipedia.org/wiki/Lazy_evaluation[lazy], stream processing language. This means that Gremlin fully
processes (to the best of its abilities) any traversers currently in the traversal pipeline before getting more data
from the start/head of the traversal. However, there are numerous situations in which a completely lazy computation
is not possible (or impractical). When a computation is not lazy, a "barrier step" exists. There are three types of
barriers:
. `CollectingBarrierStep`: All of the traversers prior to the step are put into a collection and then processed in
some way (e.g. ordered) prior to the collection being "drained" one-by-one to the next step. Examples
include: <<order-step,`order()`>>, <<sample-step,`sample()`>>, <<aggregate-step,`aggregate()`>>, <<barrier-step,`barrier()`>>.
. `ReducingBarrierStep`: All of the traversers prior to the step are processed by a reduce function and once all the
previous traversers are processed, a single "reduced value" traverser is emitted to the next step. Note that the path
history leading up to a reducing barrier step is destroyed given its many-to-one nature. Examples include:
<<fold-step,`fold()`>>, <<count-step,`count()`>>, <<sum-step,`sum()`>>, <<max-step,`max()`>>, <<min-step,`min()`>>.
. `SupplyingBarrierStep`: All of the traversers prior to the step are iterated (no processing) and then some provided
supplier yields a single traverser to continue to the next step. Examples include: <<cap-step,`cap()`>>.
In Gremlin OLAP (see <<traversalvertexprogram,`TraversalVertexProgram`>>), a barrier is introduced at the end of
every <<vertex-steps,adjacent vertex step>>. This means that the traversal does its best to compute as much as
possible at the current, local vertex. What it can't compute without referencing an adjacent vertex is aggregated
into a barrier collection. When there are no more traversers at the local vertex, the barriered traversers are the
messages that are propagated to remote vertices for further processing.
[[a-note-on-scopes]]
== A Note on Scopes
The `Scope` enum has two constants: `Scope.local` and `Scope.global`. Scope determines whether the particular step
being scoped is with respects to the current object (`local`) at that step or to the entire stream of objects up to that
step (`global`).
[NOTE, caption=Python]
====
The term `global` is a reserved word in Python, and therefore a `Scope` using that term must be referred as `global_`.
====
[gremlin-groovy,modern]
----
g.V().has('name','marko').out('knows').count() <1>
g.V().has('name','marko').out('knows').fold().count() <2>
g.V().has('name','marko').out('knows').fold().count(local) <3>
g.V().has('name','marko').out('knows').fold().count(global) <4>
----
<1> Marko knows 2 people.
<2> A list of Marko's friends is created and thus, one object is counted (the single list).
<3> A list of Marko's friends is created and a `local`-count yields the number of objects in that list.
<4> `count(global)` is the same as `count()` as the default behavior for most scoped steps is `global`.
The steps that support scoping are:
* <<count-step,`count()`>>: count the local collection or global stream.
* <<dedup-step, `dedup()`>>: dedup the local collection of global stream.
* <<max-step, `max()`>>: get the max value in the local collection or global stream.
* <<mean-step, `mean()`>>: get the mean value in the local collection or global stream.
* <<min-step, `min()`>>: get the min value in the local collection or global stream.
* <<order-step,`order()`>>: order the objects in the local collection or global stream.
* <<range-step, `range()`>>: clip the local collection or global stream.
* <<limit-step, `limit()`>>: clip the local collection or global stream.
* <<sample-step, `sample()`>>: sample objects from the local collection or global stream.
* <<tail-step, `tail()`>>: get the tail of the objects in the local collection or global stream.
A few more examples of the use of `Scope` are provided below:
[gremlin-groovy,modern]
----
g.V().both().group().by(label).select('software').dedup(local)
g.V().groupCount().by(label).select(values).min(local)
g.V().groupCount().by(label).order(local).by(values,desc)
g.V().fold().sample(local,2)
----
Finally, note that <<local-step,`local()`>>-step is a "hard-scoped step" that transforms any internal traversal into a
locally-scoped operation. A contrived example is provided below:
[gremlin-groovy,modern]
----
g.V().fold().local(unfold().count())
g.V().fold().count(local)
----
[[a-note-on-lambdas]]
== A Note On Lambdas
image:lambda.png[width=150,float=right] A link:http://en.wikipedia.org/wiki/Anonymous_function[lambda] is a function
that can be referenced by software and thus, passed around like any other piece of data. In Gremlin, lambdas make it
possible to generalize the behavior of a step such that custom steps can be created (on-the-fly) by the user. However,
it is advised to avoid using lambdas if possible.
[gremlin-groovy,modern]
----
g.V().filter{it.get().value('name') == 'marko'}.
flatMap{it.get().vertices(OUT,'created')}.
map {it.get().value('name')} <1>
g.V().has('name','marko').out('created').values('name') <2>
----
<1> A lambda-rich Gremlin traversal which should and can be avoided. (*bad*)
<2> The same traversal (result), but without using lambdas. (*good*)
Gremlin attempts to provide the user a comprehensive collection of steps in the hopes that the user will never need to
leverage a lambda in practice. It is advised that users only leverage a lambda if and only if there is no
corresponding lambda-less step that encompasses the desired functionality. The reason being, lambdas can not be
optimized by Gremlin's compiler strategies as they can not be programmatically inspected (see
<<traversalstrategy,traversal strategies>>). It is also not currently possible to send a natively written lambda for
remote execution to Gremlin-Server or a driver that supports remote execution.
In many situations where a lambda could be used, either a corresponding step exists or a traversal can be provided in
its place. A `TraversalLambda` behaves like a typical lambda, but it can be optimized and it yields less objects than
the corresponding pure-lambda form.
[gremlin-groovy,modern]
----
g.V().out().out().path().by {it.value('name')}.
by {it.value('name')}.
by {g.V(it).in('created').values('name').fold().next()} <1>
g.V().out().out().path().by('name').
by('name').
by(__.in('created').values('name').fold()) <2>
----
<1> The length-3 paths have each of their objects transformed by a lambda. (*bad*)
<2> The length-3 paths have their objects transformed by a lambda-less step and a traversal lambda. (*good*)
[[traversalstrategy]]
== TraversalStrategy
image:traversal-strategy.png[width=125,float=right] A `TraversalStrategy` analyzes a `Traversal` and, if the traversal
meets its criteria, can mutate it accordingly. Traversal strategies are executed at compile-time and form the foundation
of the Gremlin traversal machine's compiler. There are 5 categories of strategies which are itemized below:
* There is an application-level feature that can be embedded into the traversal logic (*decoration*).
* There is a more efficient way to express the traversal at the TinkerPop level (*optimization*).
* There is a more efficient way to express the traversal at the graph system/language/driver level (*provider optimization*).
* There are some final adjustments/cleanups/analyses required before executing the traversal (*finalization*).
* There are certain traversals that are not legal for the application or traversal engine (*verification*).
NOTE: The <<explain-step,`explain()`>>-step shows the user how each registered strategy mutates the traversal.
TinkerPop ships with a generous number of `TraversalStrategy` definitions, most of which are applied implicitly when
executing a gremlin traversal. Users and providers can add `TraversalStrategy` definitions for particular needs. The
following sections detail how traversal strategies are applied and defined and describe a collection of traversal
strategies that are generally useful to end-users.
=== Application
One can explicitly add or remove `TraversalStrategy` strategies on the `GraphTraversalSource` with the `withStrategies()`
and `withoutStrategies()` <<start-steps, start steps>>, see the <<readonlystrategy, ReadOnlyStrategy>> and the
<<barrier-step, barrier() step>> for examples. End users typically do this as part of issuing a gremlin traversal, either
on a locally opened graph or a remotely accessed graph. However, when configuring Gremlin Server, traversal strategies
can also be applied on exposed `GraphTraversalSource` instances and as part of an `Authorizer` implementation, see
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#authorization[Gremlin Server Authorization].
Therefore, one should keep the following in mind when modifying the list of `TraversalStrategy` strategies:
* A `TraversalStrategy` added to the traversal can be removed again later on. An example is the
`conf/gremlin-server-modern-readonly.yaml` file from the Gremlin Server distribution, which applies the `ReadOnlyStrategy`
to the `GraphTraversalSource` that remote clients can connect to. However, a remote client can remove it on its turn
by applying the `withoutStrategies()` step with the `ReadOnlyStrategy`.
* When a `TraversalStrategy` of a particular type is added, it replaces any instances of its type that exist prior to
it. Multiple instances of a `TraversalStrategy` can therefore not be registered and their functionality is no way
merged automatically. Therefore, if there is a particular strategy registered whose functionality needs to be changed
it is important to either find and modify the existing instance or construct a new one copying the options to keep
from the old to the new instance.
=== Definition
A simple `OptimizationStrategy` is the `IdentityRemovalStrategy`.
[source,java]
----
public final class IdentityRemovalStrategy extends AbstractTraversalStrategy<TraversalStrategy.OptimizationStrategy> implements TraversalStrategy.OptimizationStrategy {
private static final IdentityRemovalStrategy INSTANCE = new IdentityRemovalStrategy();
private IdentityRemovalStrategy() {
}
@Override
public void apply(Traversal.Admin<?, ?> traversal) {
if (traversal.getSteps().size() <= 1)
return;
for (IdentityStep<?> identityStep : TraversalHelper.getStepsOfClass(IdentityStep.class, traversal)) {
if (identityStep.getLabels().isEmpty() || !(identityStep.getPreviousStep() instanceof EmptyStep)) {
TraversalHelper.copyLabels(identityStep, identityStep.getPreviousStep(), false);
traversal.removeStep(identityStep);
}
}
}
public static IdentityRemovalStrategy instance() {
return INSTANCE;
}
}
----
This strategy simply removes any `IdentityStep` steps in the Traversal as `aStep().identity().identity().bStep()`
is equivalent to `aStep().bStep()`. For those traversal strategies that require other strategies to execute prior or
post to the strategy, then the following two methods can be defined in `TraversalStrategy` (with defaults being an
empty set). If the `TraversalStrategy` is in a particular traversal category (i.e. decoration, optimization,
provider-optimization, finalization, or verification), then priors and posts are only possible within the respective category.
[source,java]
public Set<Class<? extends S>> applyPrior();
public Set<Class<? extends S>> applyPost();
IMPORTANT: `TraversalStrategy` categories are sorted within their category and the categories are then executed in
the following order: decoration, optimization, provider optimization, finalization, and verification. If a designed strategy
does not fit cleanly into these categories, then it can implement `TraversalStrategy` and its prior and posts can reference
strategies within any category. However, such generalization are strongly discouraged.
An example of a `GraphSystemOptimizationStrategy` is provided below.
[source,groovy]
g.V().has('name','marko')
The expression above can be executed in a `O(|V|)` or `O(log(|V|)` fashion in <<tinkergraph-gremlin,TinkerGraph>>
depending on whether there is or is not an index defined for "name."
[source,java]
----
public final class TinkerGraphStepStrategy extends AbstractTraversalStrategy<TraversalStrategy.ProviderOptimizationStrategy> implements TraversalStrategy.ProviderOptimizationStrategy {
private static final TinkerGraphStepStrategy INSTANCE = new TinkerGraphStepStrategy();
private TinkerGraphStepStrategy() {
}
@Override
public void apply(Traversal.Admin<?, ?> traversal) {
if (TraversalHelper.onGraphComputer(traversal))
return;
for (GraphStep originalGraphStep : TraversalHelper.getStepsOfClass(GraphStep.class, traversal)) {
TinkerGraphStep<?, ?> tinkerGraphStep = new TinkerGraphStep<>(originalGraphStep);
TraversalHelper.replaceStep(originalGraphStep, tinkerGraphStep, traversal);
Step<?, ?> currentStep = tinkerGraphStep.getNextStep();
while (currentStep instanceof HasStep || currentStep instanceof NoOpBarrierStep) {
if (currentStep instanceof HasStep) {
for (HasContainer hasContainer : ((HasContainerHolder) currentStep).getHasContainers()) {
if (!GraphStep.processHasContainerIds(tinkerGraphStep, hasContainer))
tinkerGraphStep.addHasContainer(hasContainer);
}
TraversalHelper.copyLabels(currentStep, currentStep.getPreviousStep(), false);
traversal.removeStep(currentStep);
}
currentStep = currentStep.getNextStep();
}
}
}
public static TinkerGraphStepStrategy instance() {
return INSTANCE;
}
}
----
The traversal is redefined by simply taking a chain of `has()`-steps after `g.V()` (`TinkerGraphStep`) and providing
their `HasContainers` to `TinkerGraphStep`. Then its up to `TinkerGraphStep` to determine if an appropriate index exists.
Given that the strategy uses non-TinkerPop provided steps, it should go into the `ProviderOptimizationStrategy` category
to ensure the added step does not interfere with the assumptions of the `OptimizationStrategy` strategies.
[gremlin-groovy,modern]
----
t = g.V().has('name','marko'); null
t.toString()
t.iterate(); null
t.toString()
----
WARNING: The reason that `OptimizationStrategy` and `ProviderOptimizationStrategy` are two different categories is
that optimization strategies should only rewrite the traversal using TinkerPop steps. This ensures that the
optimizations executed at the end of the optimization strategy round are TinkerPop compliant. From there, provider
optimizations can analyze the traversal and rewrite the traversal as desired using graph system specific steps (e.g.
replacing `GraphStep.HasStep...HasStep` with `TinkerGraphStep`). If provider optimizations use graph system specific
steps and implement `OptimizationStrategy`, then other TinkerPop optimizations may fail to optimize the traversal or
mis-understand the graph system specific step behaviors (e.g. `ProviderVertexStep extends VertexStep`) and yield
incorrect semantics.
Finally, here is a complicated traversal that has various components that are optimized by the default TinkerPop strategies.
[gremlin-groovy,modern]
----
g.V().hasLabel('person'). <1>
and(has('name'), <2>
has('name','marko'),
filter(has('age',gt(20)))). <3>
match(__.as('a').has('age',lt(32)), <4>
__.as('a').repeat(outE().inV()).times(2).as('b')). <5>
where('a',neq('b')). <6>
where(__.as('b').both().count().is(gt(1))). <7>
select('b'). <8>
groupCount().
by(out().count()). <9>
explain()
----
<1> `TinkerGraphStepStrategy` pulls in `has()`-step predicates for global, graph-centric index lookups.
<2> `FilterRankStrategy` sorts filter steps by their time/space execution costs.
<3> `InlineFilterStrategy` de-nests filters to increase the likelihood of filter concatenation and aggregation.
<4> `InlineFilterStrategy` pulls out named predicates from `match()`-step to more easily allow provider strategies to use indices.
<5> `RepeatUnrollStrategy` will unroll loops and `IncidentToAdjacentStrategy` will turn `outE().inV()`-patterns into `out()`.
<6> `MatchPredicateStrategy` will pull in `where()`-steps so that they can be subjected to `match()`-steps runtime query optimizer.
<7> `CountStrategy` will limit the traversal to only the number of traversers required for the `count().is(x)`-check.
<8> `PathRetractionStrategy` will remove paths from the traversers and increase the likelihood of bulking as path data is not required after `select('b')`.
<9> `AdjacentToIncidentStrategy` will turn `out()` into `outE()` to increase data access locality.
=== EdgeLabelVerificationStrategy
`EdgeLabelVerificationStrategy` prevents traversals from writing traversals that do not explicitly specify and edge
label when using steps like `out()`, 'in()', 'both()' and their related `E` oriented steps, providing the
option to throw an exception, log a warning or do both when one of these keys is encountered in a mutating step.
[source,java,tab]
----
EdgeLabelVerificationStrategy verificationStrategy = EdgeLabelVerificationStrategy.build()
.throwException().create()
// results in VerificationException - as out() does not have a label specified
g.withStrategies(verificationStrategy).V(1).out().iterate();
----
[source,groovy]
----
// results in VerificationException - as out() does not have a label specified
g.withStrategies(new EdgeLabelVerificationStrategy(throwException: true))
.V(1).out().iterate()
----
[source,csharp]
----
// results in VerificationException - as out() does not have a label specified
g.WithStrategies(new EdgeLabelVerificationStrategy(throwException: true))
.V(1).Out().Iterate();
----
[source,javascript]
----
// results in Error - as out() does not have a label specified
g.withStrategies(new EdgeLabelVerificationStrategy(throwException: true))
.V(1).out().iterate();
----
[source,python]
----
// results in Error - as out() does not have a label specified
g.withStrategies(EdgeLabelVerificationStrategy(throwException=true))
.V(1).out().iterate()
----
=== ElementIdStrategy
`ElementIdStrategy` provides control over element identifiers. Some Graph implementations, such as TinkerGraph,
allow specification of custom identifiers when creating elements:
[gremlin-groovy]
----
g = traversal().withEmbedded(TinkerGraph.open())
v = g.addV().property(id,'42a').next()
g.V('42a')
----
Other `Graph` implementations, such as Neo4j, generate element identifiers automatically and cannot be assigned.
As a helper, `ElementIdStrategy` can be used to make identifier assignment possible by using vertex and edge indices
under the hood.
[gremlin-groovy]
----
graph = Neo4jGraph.open('/tmp/neo4j')
strategy = ElementIdStrategy.build().create()
g = traversal().withEmbedded(graph).withStrategies(strategy)
g.addV().property(id, '42a').id()
----
IMPORTANT: The key that is used to store the assigned identifier should be indexed in the underlying graph
database. If it is not indexed, then lookups for the elements that use these identifiers will perform a linear scan.
=== EventStrategy
The purpose of the `EventStrategy` is to raise events to one or more `MutationListener` objects as changes to the
underlying `Graph` occur within a `Traversal`. Such a strategy is useful for logging changes, triggering certain
actions based on change, or any application that needs notification of some mutating operation during a `Traversal`.
If the transaction is rolled back, the event queue is reset.
The following events are raised to the `MutationListener`:
* New vertex
* New edge
* Vertex property changed
* Edge property changed
* Vertex property removed
* Edge property removed
* Vertex removed
* Edge removed
To start processing events from a `Traversal` first implement the `MutationListener` interface. An example of this
implementation is the `ConsoleMutationListener` which writes output to the console for each event. The following
console session displays the basic usage:
[gremlin-groovy]
----
import org.apache.tinkerpop.gremlin.process.traversal.step.util.event.*
graph = TinkerFactory.createModern()
l = new ConsoleMutationListener(graph)
strategy = EventStrategy.build().addListener(l).create()
g = traversal().withEmbedded(graph).withStrategies(strategy)
g.addV().property('name','stephen')
g.V().has('name','stephen').
property(list, 'location', 'centreville', 'startTime', 1990, 'endTime', 2000).
property(list, 'location', 'dulles', 'startTime', 2000, 'endTime', 2006).
property(list, 'location', 'purcellville', 'startTime', 2006)
g.V().has('name','stephen').
property(set, 'location', 'purcellville', 'startTime', 2006, 'endTime', 2019)
g.E().drop()
----
By default, the `EventStrategy` is configured with an `EventQueue` that raises events as they occur within execution
of a `Step`. As such, the final line of Gremlin execution that drops all edges shows a bit of an inconsistent count,
where the removed edge count is accounted for after the event is raised. The strategy can also be configured with a
`TransactionalEventQueue` that captures the changes within a transaction and does not allow them to fire until the
transaction is committed.
WARNING: `EventStrategy` is not meant for usage in tracking global mutations across separate processes. In other
words, a mutation in one JVM process is not raised as an event in a different JVM process. In addition, events are
not raised when mutations occur outside of the `Traversal` context.
Another default configuration for `EventStrategy` revolves around the concept of "detachment". Graph elements are
detached from the graph as copies when passed to referring mutation events. Therefore, when adding a new `Vertex` in
TinkerGraph, the event will not contain a `TinkerVertex` but will instead include a `DetachedVertex`. This behavior
can be modified with the `detach()` method on the `EventStrategy.Builder` which accepts the following inputs: `null`
meaning no detachment and the return of the original element, `DetachedFactory` which is the same as the default
behavior, and `ReferenceFactory` which will return "reference" elements only with no properties.
IMPORTANT: If setting the `detach()` configuration to `null`, be aware that transactional graphs will likely create a
new transaction immediately following the `commit()` that raises the events. The graph elements raised in the events
may also not behave as "snapshots" at the time of their creation as they are "live" references to actual database
elements.
[[partitionstrategy]]
=== PartitionStrategy
image::partition-graph.png[width=325]
`PartitionStrategy` partitions the vertices and edges of a graph into `String` named partitions (i.e. buckets,
subgraphs, etc.). The idea behind `PartitionStrategy` is presented in the image above where each element is in a
single partition (represented by its color). Partitions can be read from, written to, and linked/joined by edges
that span one or two partitions (e.g. a tail vertex in one partition and a head vertex in another).
There are three primary configurations in `PartitionStrategy`:
. Partition Key - The property key that denotes a String value representing a partition.
. Write Partition - A `String` denoting what partition all future written elements will be in.
. Read Partitions - A `Set<String>` of partitions that can be read from.
The best way to understand `PartitionStrategy` is via example.
[gremlin-groovy]
----
graph = TinkerFactory.createModern()
strategyA = new PartitionStrategy(partitionKey: "_partition", writePartition: "a", readPartitions: ["a"])
strategyB = new PartitionStrategy(partitionKey: "_partition", writePartition: "b", readPartitions: ["b"])
gA = traversal().withEmbedded(graph).withStrategies(strategyA)
gA.addV() // this vertex has a property of {_partition:"a"}
gB = traversal().withEmbedded(graph).withStrategies(strategyB)
gB.addV() // this vertex has a property of {_partition:"b"}
gA.V()
gB.V()
----
The following examples demonstrate the above `PartitionStrategy` definition for "strategyA" in other programming
languages:
[source,java,tab]
----
PartitionStrategy strategyA = PartitionStrategy.build().partitionKey("_partition")
.writePartition("a")
.readPartitions("a").create();
----
[source,csharp]
----
PartitionStrategy strategyA = new PartitionStrategy(
partitionKey: "_partition", writePartition: "a",
readPartitions: new List<string>(){"a"});
----
[source,javascript]
----
const strategyA = new PartitionStrategy(partitionKey: "_partition", writePartition: "a", readPartitions: ["a"])
----
[source,python]
----
strategyA = PartitionStrategy(partitionKey="_partition", writePartition="a", readPartitions=["a"])
----
Partitions may also extend to `VertexProperty` elements if the `Graph` can support meta-properties and if the
`includeMetaProperties` value is set to `true` when the `PartitionStrategy` is built. The `partitionKey` will be
stored in the meta-properties of the `VertexProperty` and blind the traversal to those properties. Please note that
the `VertexProperty` will only be hidden by way of the `Traversal` itself. For example, calling `Vertex.property(k)`
bypasses the context of the `PartitionStrategy` and will thus allow all properties to be accessed.
By writing elements to particular partitions and then restricting read partitions, the developer is able to create
multiple graphs within a single address space. Moreover, by supporting references between partitions, it is possible
to merge those multiple graphs (i.e. join partitions).
[[readonlystrategy]]
=== ReadOnlyStrategy
`ReadOnlyStrategy` is largely self-explanatory. A `Traversal` that has this strategy applied will throw an
`IllegalStateException` if the `Traversal` has any mutating steps within it.
[source,java,tab]
----
ReadOnlyStrategy verificationStrategy = ReadOnlyStrategy.instance();
// results in VerificationException
g.withStrategies(verificationStrategy).addV('person').iterate();
----
[source,groovy]
----
// results in VerificationException
g.withStrategies(ReadOnlyStrategy).addV('person').iterate();
----
[source,csharp]
----
// results in VerificationException
g.WithStrategies(new ReadOnlyStrategy()).addV("person").Iterate();
----
[source,javascript]
----
// results in Error
g.withStrategies(new ReadOnlyStrategy()).addV("person").iterate();
----
[source,python]
----
// results in Error
g.withStrategies(ReadOnlyStrategy).addV("person").iterate()
----
=== ReservedKeysVerificationStrategy
`ReservedKeysVerificationStrategy` prevents traversals from adding property keys that are protected, providing the
option to throw an exception, log a warning or do both when one of these keys is encountered in a mutating step. By
default "id" and "label" are considered "reserved" but the default can be changed by building with the
`reservedKeys()` options and supply a `Set` of keys to trigger the `VerificationException`.
[source,java,tab]
----
ReservedKeysVerificationStrategy verificationStrategy = ReservedKeysVerificationStrategy.build()
.throwException().create()
// results in VerificationException
g.withStrategies(verificationStrategy).addV('person').property("id",123).iterate();
----
[source,groovy]
----
// results in VerificationException
g.withStrategies(new ReservedKeysVerificationStrategy(throwException: true))
.addV('person').property("id",123).iterate()
----
[source,csharp]
----
// results in VerificationException
g.WithStrategies(new ReservedKeysVerificationStrategy(throwException: true))
.AddV('person').Property("id",123).Iterate();
----
[source,javascript]
----
// results in Error
g.withStrategies(new ReservedKeysVerificationStrategy(throwException: true))
.addV('person').property("id",123).iterate();
----
[source,python]
----
// results in Error
g.withStrategies(ReservedKeysVerificationStrategy(throwException=true))
.addV('person').property("id",123).iterate()
----
=== SeedStrategy
There are number of components of the Gremlin language that, by design, can produce non-deterministic results:
* <<coin-step,coin()>>
* <<order-step,order()>> when `Order.shuffle` is used
* <<sample-step,sample()>>
To get these steps to return deterministic results, `SeedStrategy` allows assignment of a seed value to the `Random`
operations of the steps. The following example demonstrates the random nature of `shuffle`:
[gremlin-groovy,modern]
----
g.V().values('name').fold().order(local).by(shuffle)
g.V().values('name').fold().order(local).by(shuffle)
g.V().values('name').fold().order(local).by(shuffle)
g.V().values('name').fold().order(local).by(shuffle)
g.V().values('name').fold().order(local).by(shuffle)
----
With `SeedStrategy` in place, however, the same order is applied each time:
[gremlin-groovy,modern]
----
seedStrategy = new SeedStrategy(999998L)
g.withStrategies(seedStrategy).V().values('name').fold().order(local).by(shuffle)
g.withStrategies(seedStrategy).V().values('name').fold().order(local).by(shuffle)
g.withStrategies(seedStrategy).V().values('name').fold().order(local).by(shuffle)
g.withStrategies(seedStrategy).V().values('name').fold().order(local).by(shuffle)
g.withStrategies(seedStrategy).V().values('name').fold().order(local).by(shuffle)
----
IMPORTANT: `SeedStrategy` only makes specific steps behave in a deterministic fashion and does not necessarily make
the entire traversal deterministic itself. If the underlying graph database or processing engine happens to not
guarantee iteration order, then it is possible that the final result of the traversal will appear to be
non-deterministic. In these cases, it would be necessary to enforce a deterministic iteration with `order()` prior to
these steps that make use of randomness to return results.
[[subraphstrategy]]
=== SubgraphStrategy
`SubgraphStrategy` is similar to `PartitionStrategy` in that it constrains a `Traversal` to certain vertices, edges,
and vertex properties as determined by a `Traversal`-based criterion defined individually for each.
[gremlin-groovy]
----
graph = TinkerFactory.createTheCrew()
g = traversal().withEmbedded(graph)
g.V().as('a').values('location').as('b'). <1>
select('a','b').by('name').by()
g = g.withStrategies(new SubgraphStrategy(vertexProperties: hasNot('endTime'))) <2>
g.V().as('a').values('location').as('b'). <3>
select('a','b').by('name').by()
g.V().as('a').values('location').as('b').
select('a','b').by('name').by().explain()
----
<1> Get all vertices and their vertex property locations.
<2> Create a `SubgraphStrategy` where vertex properties must not have an `endTime`-property (thus, the current location).
<3> Get all vertices and their current vertex property locations.
The following examples demonstrate the above `SubgraphStrategy` definition in other programming languages:
[source,java,tab]
----
g.withStrategies(SubgraphStrategy.build().vertexProperties(hasNot("endTime")).create());
----
[source,csharp]
----
g.WithStrategies(new SubgraphStrategy(vertexProperties: HasNot("endTime")));
----
[source,javascript]
----
g.withStrategies(new SubgraphStrategy(vertexProperties: hasNot("endTime")));
----
[source,python]
----
g.withStrategies(new SubgraphStrategy(vertexProperties=hasNot("endTime")))
----
IMPORTANT: This strategy is implemented such that the vertices attached to an `Edge` must both satisfy the vertex
criterion (if present) in order for the `Edge` to be considered a part of the subgraph.
The example below uses all three filters: vertex, edge, and vertex property. People vertices must have lived in more
than three places, edges must be labeled "develops," and vertex properties must be the persons current location or a
non-location property.
[gremlin-groovy]
----
graph = TinkerFactory.createTheCrew()
g = traversal().withEmbedded(graph).withStrategies(SubgraphStrategy.build().
vertices(or(hasNot('location'),properties('location').count().is(gt(3)))).
edges(hasLabel('develops')).
vertexProperties(or(hasLabel(neq('location')),hasNot('endTime'))).create())
g.V().elementMap()
g.E().elementMap()
g.V().outE().inV().
path().
by('name').
by().
by('name')
----
=== VertexProgramDenyStrategy
Like the `ReadOnlyStrategy`, the `VertexProgramDenyStrategy` denies the execution of specific traversals. A `Traversal`
that has the `VertexProgramDenyStrategy` applied will throw an `IllegalStateException` if it uses the
`withComputer()` step. This `TraversalStrategy` can be useful for configuring `GraphTraversalSource` instances in
Gremlin Server with the `ScriptFileGremlinPlugin`.
[source,text]
----
gremlin> oltpOnly = g.withStrategies(VertexProgramDenyStrategy.instance())
==>graphtraversalsource[tinkergraph[vertices:5 edges:7], standard]
gremlin> oltpOnly.withComputer().V().elementMap()
The TraversalSource does not allow the use of a GraphComputer
Type ':help' or ':h' for help.
Display stack trace? [yN]
----
[[dsl]]
== Domain Specific Languages
Gremlin is a link:http://en.wikipedia.org/wiki/Domain-specific_language[domain specific language] (DSL) for traversing
graphs. It operates in the language of vertices, edges and properties. Typically, applications built with Gremlin are
not of the graph domain, but instead model their domain within a graph. For example, the
link:https://tinkerpop.apache.org/docs/current/images/tinkerpop-modern.png["modern" toy graph] models
software and person domain objects with the relationships between them (i.e. a person "knows" another person and a
person "created" software).
An analyst who wanted to find out if "marko" knows "josh" could write the following Gremlin:
[source,java]
----
g.V().hasLabel('person').has('name','marko').
out('knows').hasLabel('person').has('name','josh').hasNext()
----
While this method achieves the desired answer, it requires the analyst to traverse the graph in the domain language
of the graph rather than the domain language of the social network. A more natural way for the analyst to write this
traversal might be:
[source,java]
----
g.persons('marko').knows('josh').hasNext()
----
In the statement above, the traversal is written in the language of the domain, abstracting away the underlying
graph structure from the query. The two traversal results are equivalent and, indeed, the "Social DSL" produces
the same set of traversal steps as the "Graph DSL" thus producing equivalent strategy application and performance
runtimes.
To further the example of the Social DSL consider the following:
[source,java]
----
// Graph DSL - find the number of persons who created at least 2 projects
g.V().hasLabel('person').
where(outE("created").count().is(P.gte(2))).count()
// Social DSL - find the number of persons who created at least 2 projects
social.persons().where(createdAtLeast(2)).count()
// Graph DSL - determine the age of the youngest friend "marko" has
g.V().hasLabel('person').has('name','marko').
out("knows").hasLabel("person").values("age").min()
// Social DSL - determine the age of the youngest friend "marko" has
social.persons("marko").youngestFriendsAge()
----
Learn more about how to implement these DSLs in the <<gremlin-drivers-variants,Gremlin Language Variants>> section
specific to the programming language of interest.
[[translators]]
== Translators
image::gremlin-translator.png[width=1024]
There are times when is helpful to translate Gremlin from one programming language to another. Perhaps a large Gremlin
example is found on StackOverflow written in Java, but the programming language the developer has chosen is Python.
Fortunately, TinkerPop has developed `Translator` infrastructure that will convert Gremlin from one programming
language syntax to another.
The functionality relevant to most users is actually a sub-function of `Translator` infrastructure and is more
specifically a `ScriptTranslator` which takes Gremlin `Bytecode` of a traversal and generates a `String` representation
of that `Bytecode` in the programming language syntax that the `ScriptTranslator` instance supports. The translation
therefore allows Gremlin to be converted from the host programming language of the `Translator` to another.
The following translators are available, where the first column identifies the host programming language and the
columns represent the language that Gremlin can be generated in:
[width="100%",cols="<,^,^,^,^,^",options="header"]
|=========================================================
| |Java |Groovy |Javascript |.NET |Python
|*Java* |- |X |X |X |X
|*Groovy* | |X |X | |X
|*Javascript* | |X |- | |
|*.NET* | |X | |- |
|*Python* | |X | | |-
|=========================================================
Each programming language has its own API for translation, but the pattern is quite similar from one to the next:
WARNING: While `Translator` implementations have been around for some time, they are still in their early stages from
an interface perspective. API changes may occur in the near future.
[source,java,tab]
----
// gremlin-core module
import org.apache.tinkerpop.gremlin.process.traversal.translator.*;
GraphTraversalSource g = ...;
Traversal<Vertex,Integer> t = g.V().has("person","name","marko").
where(in("knows")).
values("age").
map(Lambda.function("it.get() + 1"));
Translator.ScriptTranslator groovyTranslator = GroovyTranslator.of("g");
System.out.println(groovyTranslator.translate(t).getScript());
// OUTPUT: g.V().has("person","name","marko").where(__.in("knows")).values("age").map({it.get() + 1})
Translator.ScriptTranslator dotnetTranslator = DotNetTranslator.of("g");
System.out.println(dotnetTranslator.translate(t).getScript());
// OUTPUT: g.V().Has("person","name","marko").Where(__.In("knows")).Values<object>("age").Map<object>(Lambda.Groovy("it.get() + 1"))
Translator.ScriptTranslator pythonTranslator = PythonTranslator.of("g");
System.out.println(pythonTranslator.translate(t).getScript());
// OUTPUT: g.V().has('person','name','marko').where(__.in_('knows')).age.map(lambda: "it.get() + 1")
Translator.ScriptTranslator javascriptTranslator = JavascriptTranslator.of("g");
System.out.println(javascriptTranslator.translate(t).getScript());
// OUTPUT: g.V().has("person","name","marko").where(__.in_("knows")).values("age").map(() => "it.get() + 1")
----
[source,javascript]
----
const g = ...;
const t = g.V().has("person","name","marko").
where(in_("knows")).
values("age");
// Groovy
const translator = new gremlin.process.Translator('g');
console.log(translator.translate(t));
// OUTPUT: g.V().has('person','name','marko').where(__.in('knows')).values('age')
----
[source,python]
----
from gremlin_python.process.translator import *
g = ...
t = (g.V().has('person','name','marko').
where(__.in_("knows")).
values("age"))
# Groovy
translator = Translator().of('g');
print(translator.translate(t.bytecode));
# OUTPUT: g.V().has('person','name','marko').where(__.in('knows')).values('age')
----
[source,csharp]
----
var g = ...;
var t = g.V().Has("person", "name", "marko").Where(In("knows")).Values<int>("age");
// Groovy
var translator = GroovyTranslator.Of("g");
Console.WriteLine(translator.Translate(t));
// OUTPUT: g.V().has('person', 'name', 'marko').where(__.in('knows')).values('age')
----
The JVM-based translator has the added option of parameter extraction, where the translation process will attempt to
identify opportunities to generate an output that would replace constant values with parameters. The parameters would
then be extracted and returned as part of the `Script` object:
[source,java]
----
Traversal<Vertex,Integer> t = g.V().has("person","name","marko").
where(__.in("knows")).
values("age");
// specify true to attempt parameter extraction
Translator.ScriptTranslator translator = GroovyTranslator.of("g", true);
Script s = translator.translate(t);
System.out.println(s.getScript());
// OUTPUT: g.V().has(_args_0,_args_1,_args_2).where(__.in(_args_3)).values(_args_4)
System.out.println(s.parameters);
// OUTPUT: Optional[{_args_0=person, _args_2=marko, _args_1=name, _args_4=age, _args_3=knows}]
----
The `GroovyTranslator` can take a `TypeTranslator` argument which allows some customization of how types get
converted to script form. The `DefaultTypeTranslator` is used if a specific implementation is not specified. A built-in
alternative to this implementation is the `LanguageTypeTranslator` which will prefer use of the Gremlin language
`datetime()` function rather than the JVM specific `Date` and `Timestamp` conversions. This translator can be helpful
when generating scripts that will be sent to Gremlin Server or Remote Graph Providers supporting the `datetime()` form.
The `PythonTranslator` can take a `TypeTranslator` argument to disable the syntactic sugar which the default translator
applies to converted queries. The `DefaultTypeTranslator` is used if a specific implementation is not specified.
[source,java]
----
Traversal<Vertex,String> t = g.V().range(0, 10).has("person","name","marko").
limit(2).
values("name");
// default translator
Translator.ScriptTranslator translator = PythonTranslator.of("g");
String defaultQueryTranslation = translator.translate(t)
System.out.println(defaultQueryTranslation);
// OUTPUT: g.V()[0:10].has('person','name','marko')[0:2].name
// no synantic sugar translator
Translator.ScriptTranslator noSugarTranslator = PythonTranslator.of("g", new PythonTranslator.NoSugarTranslator(false));
String noSugarTranslation = noSugarTranslator.translate(t)
System.out.println(noSugarTranslation);
// OUTPUT: g.V().range_(0,10).has('person','name','marko').limit(2).values('name')
// With parameter extraction
Translator.ScriptTranslator noSugarTranslatorWithParameters = PythonTranslator.of("g", new PythonTranslator.NoSugarTranslator(true));
String noSugarTranslationWithParameters = noSugarTranslatorWithParameters.translate(t)
System.out.println(noSugarTranslationWithParameters);
// OUTPUT: g.V().range_(0,10).has(_args_0,_args_1,_args_2).limit(2).values(_args_1)
----