blob: ebc0e9b3e50cb112766097d835abb13aa64a03ba [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[intro]]
= Introduction
Welcome to the Reference Documentation for Apache TinkerPop™ - the backbone for all details on how to work with
TinkerPop and the Gremlin graph traversal language. This documentation is not meant to be a "book", but a source
from which to spawn more detailed accounts of specific topics and a target to which all other resources point.
The Reference Documentation makes some general assumptions about the reader:
1. They have a sense of what a graph is - not sure? see link:http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#whygraph[Practical Gremlin - Why Graph?]
1. They know what it means for a graph system to be TinkerPop-enabled - not sure? see link:https://tinkerpop.apache.org/providers.html[TinkerPop-enabled Providers]
1. They know what the role of Gremlin is - not sure? see link:link:https://tinkerpop.apache.org/gremlin.html[Introduction to Gremlin]
Given those assumptions, it's possible to dive more quickly into the details without spending a lot of time repeating
what is written elsewhere.
It is fairly certain that readers of the Reference Documentation are coming from the most diverse software development
backgrounds that TinkerPop has ever engaged in over the decade or so of its existence. While TinkerPop holds some roots
in Java, and thus, languages bound to the Java Virtual Machine (JVM), it long ago branched out into other languages
such as Python, Javascript, .NET, and others. To compound upon that diversity, it is also seeing extensive support
from different graph systems which have chosen TinkerPop as their standard method for allowing users to interface
with their graph. Moreover, the graph systems themselves are not only separated by OLTP and OLAP style workloads, but
also by their implementation patterns, which range everywhere from being an embedded graph system to a cloud-only
graph. One might even find diversity parallel to Gremlin if considering other graph query languages.
image::gremlin-reference.png[width=1024]
Despite all this diversity and disparity, Gremlin remains the unifying interface for all these different elements of
the graph community. As a user, choosing a TinkerPop-enabled graph and using Gremlin in the correct way when building
applications shields them from change and disparity in the space. As a graph provider, choosing to become
TinkerPop-enabled not only expands the reach their system can get into different development ecosystems, but also
provides access to other query languages through bytecode compilation as seen in <<sparql-gremlin,sparql-gremlin>>.
Irrespective of the programming language being used, graph system chosen or other development background that might
be driving a user to this documentation, the critical point to remember is that "Gremlin is Gremlin is Gremlin". The
same Gremlin that is written for an OLTP query over an in-memory TinkerGraph is the same Gremlin that is written to
execute over a multi-billion edge graph using OLAP through Spark. That same Gremlin for either of those cases is
written in the same way whether using Java or Python or Javascript. The Gremlin is always fundamentally the same
aside from syntactical differences that might be language specific - e.g. the construction of a lambda in Groovy is
different than the construction of a lambda in Python or a reserved word in Javascript forces a Gremlin step to have
slightly different naming than Java.
While learning the Gremlin language and its patterns is largely agnostic to all the diversity in the space, it is not
really possible to ignore the impact of the diversity from an application development perspective and the Reference
Documentation makes an effort to try to point out where differences and inconsistencies might lie without diving too
deeply into specific graph provider implementations. Users are strongly encouraged to consult the documentation of
their chosen graph provider to understand all of the capabilities and limitations that may restrict or inhibit usage
of certain aspects of TinkerPop APIs which are defined here in this Reference Documentation.
The following introductory sections and separately referenced content will be of varying interest to different readers.
The summaries below will hopefully be helpful in directing individuals to the appropriate place to start their
learning process.
* <<graph-computing,Graph Computing>> is an introduction to what "graph computing" means to TinkerPop and describes
many of the provider and user-facing TinkerPop APIs and concepts that enable Gremlin.
* <<connecting-gremlin,Connecting Gremlin>> provides descriptions for the different modes by which users will connect
to graphs depending on their environment.
* <<basic-gremlin, Basic Gremlin>> describes how to use a connection to start writing Gremlin.
* <<staying-agnostic, Staying Agnostic>> provides tips on ways to keep Gremlin as portable as possible among different
graph providers.
New users should not ignore TinkerPop's link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/getting-started/[Getting Started]
tutorial or link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/[The Gremlin Console] tutorial.
Both contain a large set of basic information and tips that can help readers avoid some general pitfalls early on.
Both also focus on Gremlin usage in the Gremlin Console, which tends to be a critical tool for Gremlin developers of
any development background.
More advanced and experience users will appreciate link:https://tinkerpop.apache.org/docs/x.y.z/recipes/[Gremlin Recipes]
which provide examples of common Gremlin traversal patterns.
Finally, all Gremlin developers should become familiar with
link:http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html["Practical Gremlin"] by Kelvin Lawrence. This book is
freely available and published online. It contains great examples and details that are applicable to anyone building
applications with Gremlin.
[[graph-computing]]
== Graph Computing
image::graph-computing.png[width=350]
A link:http://en.wikipedia.org/wiki/Graph_(data_structure)[graph] is a data structure composed of vertices (nodes,
dots) and edges (arcs, lines). When modeling a graph in a computer and applying it to modern data sets and practices,
the generic mathematically-oriented, binary graph is extended to support both labels and key/value properties. This
structure is known as a property graph. More formally, it is a directed, binary, attributed multi-graph. An example
property graph is diagrammed below.
[[tinkerpop-modern]]
.TinkerPop Modern
image::tinkerpop-modern.png[width=500]
TIP: Get to know this graph structure as it is used extensively throughout the documentation and in wider circles as
well. It is referred to as "TinkerPop Modern" as it is a modern variation of the original demo graph distributed with
TinkerPop0 back in 2009 (i.e. the good ol' days -- it was the best of times and it was the worst of times).
TIP: All of the toy graphs available in TinkerPop are described in
link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/#toy-graphs[The Gremlin Console] tutorial.
Similar to computing in general, graph computing makes a distinction between *structure* (graph) and *process*
(traversal). The structure of the graph is the data model defined by a vertex/edge/property
link:http://en.wikipedia.org/wiki/Network_topology[topology]. The process of the graph is the means by which the
structure is analyzed. The typical form of graph processing is called a
link:http://en.wikipedia.org/wiki/Graph_traversal[traversal].
image:tinkerpop-enabled.png[width=135,float=left] TinkerPop's role in graph computing is to provide the appropriate
interfaces for link:https://tinkerpop.apache.org/providers.html[graph providers] and users to interact with graphs over
their structure and process. When a graph system implements the TinkerPop structure and process
link:http://en.wikipedia.org/wiki/Application_programming_interface[APIs], their technology is considered
_TinkerPop-enabled_ and becomes nearly indistinguishable from any other TinkerPop-enabled graph system save for their
respective time and space complexity. The purpose of this documentation is to describe the structure/process dichotomy
at length and in doing so, explain how to leverage TinkerPop for the sole purpose of graph system-agnostic graph
computing.
IMPORTANT: TinkerPop is licensed under the popular link:http://www.apache.org/licenses/LICENSE-2.0.html[Apache2]
free software license. However, note that the underlying graph engine used with TinkerPop may have a different
license. Thus, be sure to respect the license caveats of the graph system product.
Generally speaking, the structure or "graph" API is meant for link:https://tinkerpop.apache.org/providers.html[graph providers]
who are implementing the TinkerPop interfaces and the process or "traversal" API (i.e. Gremlin) is meant for end-users
who are utilizing a graph system from a graph provider. While the components of the process API are itemized below,
they are described in greater detail in the link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/gremlins-anatomy/[Gremlin's Anatomy]
tutorial.
.Primary components of the TinkerPop *structure* API
* `Graph`: maintains a set of vertices and edges, and access to database functions such as transactions.
* `Element`: maintains a collection of properties and a string label denoting the element type.
** `Vertex`: extends Element and maintains a set of incoming and outgoing edges.
** `Edge`: extends Element and maintains an incoming and outgoing vertex.
* `Property<V>`: a string key associated with a `V` value.
** `VertexProperty<V>`: a string key associated with a `V` value as well as a collection of `Property<U>` properties (*vertices only*)
.Primary components of the TinkerPop *process* API
* `TraversalSource`: a generator of traversals for a particular graph, link:http://en.wikipedia.org/wiki/Domain-specific_language[domain specific language] (DSL), and execution engine.
** `Traversal<S,E>`: a functional data flow process transforming objects of type `S` into object of type `E`.
*** `GraphTraversal`: a traversal DSL that is oriented towards the semantics of the raw graph (i.e. vertices, edges, etc.).
* `GraphComputer`: a system that processes the graph in parallel and potentially, distributed over a multi-machine cluster.
** `VertexProgram`: code executed at all vertices in a logically parallel manner with intercommunication via message passing.
** `MapReduce`: a computation that analyzes all vertices in the graph in parallel and yields a single reduced result.
NOTE: The TinkerPop API rides a fine line between providing concise "query language" method names and respecting
Java method naming standards. The general convention used throughout TinkerPop is that if a method is "user exposed,"
then a concise name is provided (e.g. `out()`, `path()`, `repeat()`). If the method is primarily for graph systems
providers, then the standard Java naming convention is followed (e.g. `getNextStep()`, `getSteps()`,
`getElementComputeKeys()`).
[[graph-structure]]
=== The Graph Structure
image:gremlin-standing.png[width=125,float=left] A graph's structure is the topology formed by the explicit references
between its vertices, edges, and properties. A vertex has incident edges. A vertex is adjacent to another vertex if
they share an incident edge. A property is attached to an element and an element has a set of properties. A property
is a key/value pair, where the key is always a character `String`. Conceptual knowledge of how a graph is composed is
essential to end-users working with graphs, however, as mentioned earlier, the structure API is not the appropriate
way for users to think when building applications with TinkerPop. The structure API is reserved for usage by graph
providers. Those interested in implementing the structure API to make their graph system TinkerPop enabled can learn
more about it in the link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/[Graph Provider] documentation.
[[the-graph-process]]
=== The Graph Process
image:gremlin-running.png[width=125,float=left] The primary way in which graphs are processed are via graph
traversals. The TinkerPop process API is focused on allowing users to create graph traversals in a
syntactically-friendly way over the structures defined in the previous section. A traversal is an algorithmic walk
across the elements of a graph according to the referential structure explicit within the graph data structure.
For example: _"What software does vertex 1's friends work on?"_ This English-statement can be represented in the
following algorithmic/traversal fashion:
. Start at vertex 1.
. Walk the incident knows-edges to the respective adjacent friend vertices of 1.
. Move from those friend-vertices to software-vertices via created-edges.
. Finally, select the name-property value of the current software-vertices.
Traversals in Gremlin are spawned from a `TraversalSource`. The `GraphTraversalSource` is the typical "graph-oriented"
DSL used throughout the documentation and will most likely be the most used DSL in a TinkerPop application.
`GraphTraversalSource` provides two traversal methods.
. `GraphTraversalSource.V(Object... ids)`: generates a traversal starting at vertices in the graph (if no ids are provided, all vertices).
. `GraphTraversalSource.E(Object... ids)`: generates a traversal starting at edges in the graph (if no ids are provided, all edges).
The return type of `V()` and `E()` is a `GraphTraversal`. A GraphTraversal maintains numerous methods that return
`GraphTraversal`. In this way, a `GraphTraversal` supports function composition. Each method of `GraphTraversal` is
called a step and each step modulates the results of the previous step in one of five general ways.
. `map`: transform the incoming traverser's object to another object (S &rarr; E).
. `flatMap`: transform the incoming traverser's object to an iterator of other objects (S &rarr; E*).
. `filter`: allow or disallow the traverser from proceeding to the next step (S &rarr; E &sube; S).
. `sideEffect`: allow the traverser to proceed unchanged, but yield some computational sideEffect in the process (S &rarrlp; S).
. `branch`: split the traverser and send each to an arbitrary location in the traversal (S &rarr; { S~1~ &rarr; E*, ..., S~n~ &rarr; E* } &rarr; E*).
Nearly every step in `GraphTraversal` either extends `MapStep`, `FlatMapStep`, `FilterStep`, `SideEffectStep`, or
`BranchStep`.
TIP: `GraphTraversal` is a link:http://en.wikipedia.org/wiki/Monoid[monoid] in that it is an algebraic structure
that has a single binary operation that is associative. The binary operation is function composition (i.e. method
chaining) and its identity is the step `identity()`. This is related to a
link:http://en.wikipedia.org/wiki/Monad_(functional_programming)[monad] as popularized by the functional programming
community.
Given the TinkerPop graph, the following query will return the names of all the people that the marko-vertex knows.
The following query is demonstrated using Gremlin-Groovy.
[source,groovy]
----
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
gremlin> graph = TinkerFactory.createModern() // <1>
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal() // <2>
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name','marko').out('knows').values('name') // <3>
==>vadas
==>josh
----
<1> Open the toy graph and reference it by the variable `graph`.
<2> Create a graph traversal source from the graph using the standard, OLTP traversal engine. This object should be created once and then re-used.
<3> Spawn a traversal off the traversal source that determines the names of the people that the marko-vertex knows.
.The Name of The People That Marko Knows
image::tinkerpop-classic-ex1.png[width=500]
Or, if the marko-vertex is already realized with a direct reference pointer (i.e. a variable), then the traversal can
be spawned off that vertex.
[gremlin-groovy,modern]
----
marko = g.V().has('name','marko').next() <1>
g.V(marko).out('knows') <2>
g.V(marko).out('knows').values('name') <3>
----
<1> Set the variable `marko` to the vertex in the graph `g` named "marko".
<2> Get the vertices that are outgoing adjacent to the marko-vertex via knows-edges.
<3> Get the names of the marko-vertex's friends.
==== The Traverser
When a traversal is executed, the source of the traversal is on the left of the expression (e.g. vertex 1), the steps
are the middle of the traversal (e.g. `out('knows')` and `values('name')`), and the results are "traversal.next()'d"
out of the right of the traversal (e.g. "vadas" and "josh").
image::traversal-mechanics.png[width=500]
The objects propagating through the traversal are wrapped in a `Traverser<T>`. The traverser provides the means by
which steps remain stateless. A traverser maintains all the metadata about the traversal -- e.g., how many times the
traverser has gone through a loop, the path history of the traverser, the current object being traversed, etc.
Traverser metadata may be accessed by a step. A classic example is the <<path-step,`path()`>>-step.
[gremlin-groovy,modern]
----
g.V(marko).out('knows').values('name').path()
----
WARNING: Path calculation is costly in terms of space as an array of previously seen objects is stored in each path
of the respective traverser. Thus, a traversal strategy analyzes the traversal to determine if path metadata is
required. If not, then path calculations are turned off.
Another example is the <<repeat-step,`repeat()`>>-step which takes into account the number of times the traverser
has gone through a particular section of the traversal expression (i.e. a loop).
[gremlin-groovy,modern]
----
g.V(marko).repeat(out()).times(2).values('name')
----
WARNING: TinkerPop does not guarantee the order of results returned from a traversal. It only guarantees not to modify
the iteration order provided by the underlying graph. Therefore it is important to understand the order guarantees of
the graph database being used. A traversal's result is never ordered by TinkerPop unless performed explicitly by means
of <<order-step,`order()`>>-step.
[[connecting-gremlin]]
== Connecting Gremlin
It was established in the initial introductory section that _Gremlin is Gremlin is Gremlin_, meaning that irrespective
of programming language, graph system, etc. the Gremlin written is always of the same general construct making it
possible for users to move between development languages and TinkerPop-enabled graph technology easily. This quality
of Gremlin generally applies to the traversal language itself. It applies less to the way in which the user connects
to a graph to utilize Gremlin, which might differ considerably depending on the programming language or graph database
chosen.
How one connects to a graph is a multi-faceted subject that essentially divides along a simple lines determined by the
answer to this question: Where is the Gremlin Traversal Machine (GTM)? The reason that this question is so important is
because the GTM is responsible for processing traversals. One can write Gremlin traversals in any language, but without
a GTM there will be no way to execute that traversal against a TinkerPop-enabled graph. The GTM is typically in one
of the following places:
* <<connecting-embedded,Embedded>> in a Java application (i.e. Java Virtual Machine)
* <<connecting-gremlin-server,Hosted>> in <<gremlin-server,Gremlin Server>>
* <<connecting-rgp,Hosted>> by a Remote Gremlin Provider (RGP)
The following sections outline each of these models and what impact they have to using Gremlin.
[[connecting-embedded]]
=== Embedded
image:blueprints-character-1.png[width=125,float=left] TinkerPop maintains the reference implementation for the GTM,
which is written in Java and thus available for the Java Virtual Machine (JVM). This is the classic model that
TinkerPop has long been based on and many examples, blog posts and other resources on the internet will be
demonstrated in this style. It is worth noting that the embedded mode is not restricted to just Java as a programming
language. Any JVM language can take this approach and in some cases there are language specific wrappers that can help
make Gremlin more convenient to use in the style and capability of that language. Examples of these wrappers include
link:https://github.com/mpollmeier/gremlin-scala[gremlin-scala] and link:http://ogre.clojurewerkz.org/[Ogre] (for Clojure).
In this mode, users will start by creating a `Graph` instance, followed by a `GraphTraversalSource` which is the class
from which Gremlin traversals are spawned. Graphs that allow this sort of direct instantiation are obviously ones
that are JVM-based (or have a JVM-based connector) and directly implement TinkerPop interfaces.
[source,java]
Graph graph = TinkerGraph.open();
The "graph" then spawns a `GraphTraversalSource` as follows and typically, by convention, this variable is named "g":
[source,java]
----
GraphTraversalSource g = graph.traversal();
List<Vertex> vertices = g.V().toList()
----
NOTE: It may be helpful to read the link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/gremlins-anatomy/[Gremlin Anatomy]
tutorial, which describes the component parts of Gremlin to get a better understanding of the terminology before
proceeding further.
While the TinkerPop Community strives to ensure consistent behavior among all modes of usage, the embedded mode does
provide the greatest level of flexibility and control. There are a number of features that can only work if using a
JVM language. The following list outlines a number of these available options:
* Lambdas can be written in the native language which is convenient, however, it will reduce the portability of Gremlin
to do so should the need arise to switch away from the embedded mode. See more in the
<<a-note-on-lambdas,Note on Lambdas>> Section.
* Any features that involve extending TinkerPop Java interfaces - e.g. `VertexProgram`, `TraversalStrategy`, etc. are
bound to the JVM. In some cases, these features can be made accessible to non-JVM languages, but they obviously must
be initially developed for the JVM.
* Certain built-in `TraversalStrategy` implementations that rely on lambdas or other JVM-only configurations may not
be available for use any other way.
* There are no boundaries put in place by serialization (e.g. GraphSON) as embedded graphs are only dealing with
Java objects.
* Greater control of graph <<transactions,transactions>>.
* Direct access to lower-levels of the API - e.g. "structure" API methods like `Vertex` and `Edge` interface methods.
As mentioned <<graph-computing, elsewhere>> in this documentation, TinkerPop does not recommend direct usage of these
methods by end-users.
[[connecting-gremlin-server]]
=== Gremlin Server
image:rexster-character-3.png[width=125,float=left] A JVM-based graph may be hosted in TinkerPop's
<<gremlin-server,Gremlin Server>>. Gremlin Server exposes the graph as an endpoint to which different clients can
connect, essentially providing a remote GTM. Gremlin Server supports multiple methods for clients to interface with it:
* Websockets with a link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[custom sub-protocol]
** String-based Gremlin scripts
** Bytecode-based Gremlin traversals
* HTTP for string-based scripts
Users are encouraged to use the bytecode-based approach with websockets because it allows them to write Gremlin
in the language of their choice. Connecting looks somewhat similar to the <<connecting-embedded, embedded>> approach
in that there is a need to create a `GraphTraversalSource`. In the embedded approach, the means for that object's
creation is derived from a `Graph` object which spawns it. In this case, however, the `Graph` instance exists only on
the server which means that there is no `Graph` instance to create locally. The approach is to instead create a
`GraphTraversalSource` anonymously with `AnonymousTraversalSource` and then apply some "remote" options that describe
the location of the Gremlin Server to connect to:
[source,java,tab]
----
import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
GraphTraversalSource g = traversal().withRemote('conf/remote-graph.properties');
----
[source,groovy]
----
import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
def g = traversal().withRemote('conf/remote-graph.properties')
----
[source,csharp]
----
include::../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/IntroTests.cs[tags=traversalSourceUsing]
include::../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/IntroTests.cs[tags=traversalSourceCreation]
----
[source,javascript]
----
const traversal = gremlin.process.AnonymousTraversalSource.traversal;
const g = traversal().withRemote(
new DriverRemoteConnection('ws://localhost:8182/gremlin'));
----
[source,python]
----
from gremlin_python.process.anonymous_traversal_source import traversal
g = traversal().withRemote(
DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
----
As shown in the embedded approach in the previous section, once "g" is defined, writing Gremlin is structurally and
conceptually the same irrespective of programming language.
TIP: The variable `g`, the `TraversalSource`, only needs to be instantiated once and should then be re-used.
[[connecting-gremlin-server-limitations]]
==== Limitations
The previous section on the embedded model outlined a number of areas where it has some advantages that it gains due to
the fact that the full GTM is available to the user in the language of its origin, i.e. Java. Some of those items
touch upon important concepts to focus on here.
The first of these points is serialization. When Gremlin Server receives a request, the results must be serialized to
the form requested by the client and then the client deserializes those into objects native to the language. TinkerPop
has two such formats that it uses with link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#gryo[Gryo] and
link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[GraphSON]. Gryo is a JVM-only format and thus carries the
advantage that serializing and deserializing occurs on the classes native to the JVM on both the client and server side.
As the client has full access to the same classes that the server does it basically has a full GTM on its own and
therefore has the ability to do some slightly more advanced things.
A good example is the `subgraph()`-step which returns a `Graph` instance as its result. The subgraph returned from
the server can be deserialized into an actual `Graph` instance on the client, which then means it is possible to
spawn a `GraphTraversalSource` from that to do local Gremlin traversals on the client-side. For non-JVM
<<gremlin-drivers-variants,Gremlin Language Variants>> there is no local graph to deserialize that result into and
no GTM to process Gremlin so there isn't much that can be done with such a result.
The second point is related to this issue. As there is no GTM, there is no "structure" API and thus graph elements like
`Vertex` and `Edge` are "references" only. A "reference" means that they only contain the `id` and `label` of the
element and not the properties. To be consistent, even JVM-based languages hold this limitation when talking to a
remote Gremlin Server.
IMPORTANT: Most SQL developers would not write a query as `SELECT * FROM table`. They would instead write the
individual names of the fields they wanted in place of the wildcard. Writing "good" Gremlin is no different with this
regard. Prefer explicit property key names in Gremlin unless it is completely impossible to do so.
The third and final point involves transactions. Under this model, one traversal is equivalent to a single transaction
and there is no way in TinkerPop to string together multiple traversals into the same transaction.
[[connecting-rgp]]
=== Remote Gremlin Provider
Remote Gremlin Providers (RGPs) are showing up more and more often in the graph database space. In TinkerPop terms,
this category of graph providers is defined by those who simply support the Gremlin language. Typically, these are
server-based graphs, often cloud-based, which accept Gremlin scripts or bytecode as a request and return results.
They will often implement Gremlin Server protocols, which enables TinkerPop drivers to connect to them as they would
with Gremlin Server. Therefore, the typical connection approach is identical to the method of connection presented in
the <<connecting-gremlin-server,previous section>> with the exact same caveats pointed out toward the end.
Despite leveraging TinkerPop protocols and drivers as being typical, RGPs are not required to do so to be considered
TinkerPop-enabled. RGPs may well have their own drivers and protocols that may plug into
<<gremlin-drivers-variants,Gremlin Language Variants> and may allow for more advanced options like better security,
cluster awareness, batched requests or other features. The details of these different systems are outside the scope
of this documentation, so be sure to consult their documentation for more information.
[[basic-gremlin]]
== Basic Gremlin
The `GraphTraversalSource` is basically the connection to a graph instance. That graph instance might be
<<connecting-embedded,embedded>>, hosted in <<connecting-gremlin-server,Gremlin Server>> or hosted in a
<<connecting-rgp,RGP>>, but the `GraphTraversalSource` is agnostic to that. Assuming "g" is the `GraphTraversalSource`,
getting data into the graph regardless of programming language or mode of operation is just some basic Gremlin:
[gremlin-groovy]
----
v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()
----
[source,csharp]
----
include::../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/IntroTests.cs[tags=basicGremlinAdds]
----
[source,java]
----
Vertex v1 = g.addV("person").property("name","marko").next();
Vertex v2 = g.addV("person").property("name","stephen").next();
g.V(v1).addE("knows").to(v2).property("weight",0.75).iterate();
----
[source,javascript]
----
const v1 = g.addV('person').property('name','marko').next();
const v2 = g.addV('person').property('name','stephen').next();
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate();
----
[source,python]
----
v1 = g.addV('person').property('name','marko').next()
v2 = g.addV('person').property('name','stephen').next()
g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate()
----
The first two lines add a vertex each with the vertex label of "person" and the associated "name" property. The third
line adds an edge with the "knows" label between them and an associated "weight" property. Note the use of `next()`
and `iterate()` at the end of the lines - their effect as <<terminal-steps, terminal steps>> is described in
link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/#result-iteration[The Gremlin Console Tutorial].
IMPORTANT: Writing Gremlin is just one way to load data into the graph. Some graphs may have special data loaders which
could be more efficient and make the task easier and faster. It is worth looking into those tools especially if there
is a large one-time load to do.
Retrieving this data is also a just writing a Gremlin statement:
[gremlin-groovy,existing]
----
marko = g.V().has('person','name','marko').next()
peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
----
[source,csharp]
----
include::../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/IntroTests.cs[tags=basicGremlinMarkoKnows]
----
[source,java]
----
Vertex marko = g.V().has("person","name","marko").next()
List<Vertex> peopleMarkoKnows = g.V().has("person","name","marko").out("knows").toList()
----
[source,javascript]
----
const marko = g.V().has('person','name','marko').next()
const peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
----
[source,python]
----
marko = g.V().has('person','name','marko').next()
peopleMarkoKnows = g.V().has('person','name','marko').out('knows').toList()
----
In all these examples presented so far there really isn't a lot of difference in how the Gremlin itself looks. There
are a few language syntax specific odds and ends, but for the most part Gremlin looks like Gremlin in all of the
different languages.
The library of Gremlin steps with examples for each can be found in <<traversal, The Traversal Section>>. This section
is meant as a reference guide and will not necessarily provide methods for applying Gremlin to solve particular
problems. Please see the aforementioned link:https://tinkerpop.apache.org/docs/x.y.z/#tutorials[Tutorials]
link:https://tinkerpop.apache.org/docs/x.y.z/recipes/[Recipes] and the
link:http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html[Practical Gremlin] book for that sort of information.
NOTE: A full list of helpful Gremlin resources can be found on the
link:https://tinkerpop.apache.org/docs/x.y.z/[TinkerPop Compendium] page.
[[staying-agnostic]]
== Staying Agnostic
A good deal has been written in these introductory sections on how TinkerPop enables an agnostic approach to building
graph application and that agnosticism is enabled through Gremlin. As good a job as Gremlin can do in this area, it's
evident from the <<connecting-gremlin,Connecting Gremlin>> Section that TinkerPop is just an enabler. It does not
prevent a developer from making design choices that can limit its protective power.
There are several places to be concerned when considering this issue:
* *Data types* - Different graphs will support different types of data. Something like TinkerGraph will accept any JVM
object, but another graph like Neo4j has a small tight subset of possible types. Choosing a type that is exotic or
perhaps is a custom type that only a specific graph supports might create migration friction should the need arise.
* *Schemas/Indices* - TinkerPop does not provide abstractions for schemas and/or index management. Users will work
directly with the API of the graph provider. It is considered good practice to attempt to enclose such code in a
graph provider specific class or set of classes to isolate or abstract it.
* *Extensions* - Graphs may provide extensions to the Gremlin language, which will not be designed to be compatible
with other graph providers. There may be a special helper syntax or
link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/gremlins-anatomy/#_expressions[expressions] which can make
certain features of that specific graph shine in powerful ways. Using those options is probably recommended, but users
should be aware that doing so ties them more tightly to that graph.
* *Graph specific semantics* - TinkerPop tries to enforce specific semantics through its test suite which is quite
extensive, but some graph providers may not completely respect all the semantics of the Gremlin language or
TinkerPop's model for its APIs. For the most part, that doesn't disqualify them from being any less TinkerPop-enabled
than another provider that might meet the semantics perfectly. Take care when considering a new graph and pay
attention to what it supports and does not support.
* <<graph,*Graph API*>> - The <<graph-structure, Graph API>> (also referred to as the Structure API) is not always
accessible to users. Its accessibility is dependent on the choice of graph system and programming language. It is
therefore recommended that users avoid usage of methods like `Graph.addVertex()` or `Vertex.properties()` and instead
prefer use of Gremlin with `g.addV()` or `g.V(1).properties()`.
Outside of considering these points, the best practice for ensuring the greatest level of compatibility across graphs
is to avoid <<connecting-embedded,embedded>> mode and stick to the bytecode based approaches explained in the
<<connecting-gremlin-server,Gremlin Server>> and the <<connecting-rgp,RGP>> sections above. It creates the least
opportunity to stray from the agnostic path as anything that can be done with those two modes also works in embedded
mode. If using embedded mode, simply write code as though the `Graph` instance is "remote" and not local to the JVM.
In other words, write code as though the GTM is not available locally. Taking that approach and isolating the points
of concern above makes it so that swapping graph providers largely comes down to a configuration task (i.e. modifying
configuration files to point at a different graph system).