blob: 66697dab120f1377fc84d7532ab34c219b69b190 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"]
*x.y.z*
== Getting Started
link:https://tinkerpop.apache.org[Apache TinkerPop™] is an open source Graph Computing Framework. Within itself, TinkerPop
represents a large collection of capabilities and technologies and, in its wider ecosystem, an additionally extended
world of link:https://tinkerpop.apache.org/#graph-systems[third-party contributed] graph libraries and
systems. TinkerPop's ecosystem can appear complex to newcomers of all experience, especially when glancing at the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/[reference documentation] for the first time.
So, where do you get started with TinkerPop? How do you dive in quickly and get productive? Well - Gremlin, the
most recognizable citizen of The TinkerPop, is here to help with this thirty minute tutorial. That's right - in just
thirty short minutes, you too can be fit to start building graph applications with TinkerPop. Welcome to _The
TinkerPop Workout - by Gremlin_!
image::gremlin-gym.png[width=1024]
== The First Five Minutes
It is quite possible to learn a lot in just five minutes with TinkerPop, but before doing so, a proper introduction of
your trainer is in order. Meet Gremlin!
image:gremlin-standing.png[width=125]
Gremlin helps you navigate the vertices and edges of a graph. He is essentially your query language to graph
databases, as link:http://sql2gremlin.com/[SQL] is the query language to relational databases. To tell Gremlin how
he should "traverse" the graph (i.e. what you want your query to do) you need a way to provide him commands in the
language he understands - and, of course, that language is called "Gremlin". For this task, you need one of
TinkerPop's most important tools: link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-console[The Gremlin Console].
NOTE: Are you unsure of what a vertex or edge is? That topic is covered in the <<_the_next_fifteen_minutes, next section>>,
but please allow the tutorial to get you oriented with the Gremlin Console first, so that you have an understanding of
the tool that will help you with your learning experience.
link:https://www.apache.org/dyn/closer.lua/tinkerpop/x.y.z/apache-tinkerpop-gremlin-console-x.y.z-bin.zip[Download the console],
unpackage it and start it:
[source,text]
----
$ unzip apache-tinkerpop-gremlin-console-x.y.z-bin.zip
$ cd apache-tinkerpop-gremlin-console-x.y.z
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
----
TIP: Windows users may use the included `bin/gremlin.bat` file to start the Gremlin Console.
The Gremlin Console is a link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL environment],
which provides a nice way to learn Gremlin as you get immediate feedback for the code that you enter. This eliminates
the more complex need to "create a project" to try things out. The console is not just for "getting started" however.
You will find yourself using it for a variety of TinkerPop-related activities, such as loading data, administering
graphs, working out complex traversals, etc.
To get Gremlin to traverse a graph, you need a `Graph` instance, which holds the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_the_graph_structure[structure] and data of the
graph. TinkerPop is a graph abstraction layer over different graph databases and different graph processors, so there
are many `Graph` instances you can choose from to instantiate in the console. The best `Graph` instance to start with
however is link:https://tinkerpop.apache.org/docs/x.y.z/reference/#tinkergraph-gremlin[TinkerGraph]. TinkerGraph
is a fast, in-memory graph database with a small handful of configuration options, making it a good choice for beginners.
TIP: TinkerGraph is not just a toy for beginners. It is useful in analyzing subgraphs taken from a large graph,
working with a small static graph that doesn't change much, writing unit tests and other use cases where the graph
can fit in memory.
TIP: For purposes of "getting started", resist the temptation to dig into more complex databases that have lots of
configuration options or to delve into how to get link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-server[Gremlin Server]
working properly. Focusing on the basics, presented in this guide, builds a good foundation for all the other things
TinkerPop offers.
To make your learning process even easier, start with one of TinkerPop's "toy" graphs. These are "small" graphs
designed to provide a quick start into querying. It is good to get familiar with them, as almost all TinkerPop
documentation is based on them and when you need help and have to come to the
link:http://groups.google.com/group/gremlin-users[mailing list], a failing example put in the context of the toy graphs
can usually get you a fast answer to your problem.
TIP: When asking questions on the mailing list or StackOverflow about Gremlin, it's always helpful to include a
sample graph so that those attempting to answer your question understand exactly what kind of graph you have and can
focus their energies on a good answer rather than trying to build sample data themselves. The sample graph should just
be a simple Gremlin script that can be copied and pasted into a Gremlin Console session.
For your first graph, use the "Modern" graph which looks like this:
image:tinkerpop-modern.png[width=500]
It can be instantiated in the console this way:
[gremlin-groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
----
The first command creates a `Graph` instance named `graph`, which thus provides a reference to the data you want
Gremlin to traverse. Unfortunately, just having `graph` doesn't provide Gremlin enough context to do his job. You
also need something called a `TraversalSource`, which is generated by the second command. The `TraversalSource`
provides additional information to Gremlin (such as the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#traversalstrategy[traversal strategies]
to apply and the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graphcomputer[traversal engine] to use) which
provides him guidance on how to execute his trip around the `Graph`.
With your `TraversalSource` `g` available it is now possible to ask Gremlin to traverse the `Graph`:
[gremlin-groovy,modern]
----
g.V() <1>
g.V(1) <2>
g.V(1).values('name') <3>
g.V(1).outE('knows') <4>
g.V(1).outE('knows').inV().values('name') <5>
g.V(1).out('knows').values('name') <6>
g.V(1).out('knows').has('age', gt(30)).values('name') <7>
----
<1> Get all the vertices in the `Graph`.
<2> Get the vertex with the unique identifier of "1".
<3> Get the value of the `name` property on the vertex with the unique identifier of "1".
<4> Get the edges with the label "knows" for the vertex with the unique identifier of "1".
<5> Get the names of the people that the vertex with the unique identifier of "1" "knows".
<6> Note that when one uses `outE().inV()` as shown in the previous command, this can be shortened to just `out()`
(similar to `inE().outV()` and `in()` for incoming edges).
<7> Get the names of the people vertex "1" knows who are over the age of 30.
TIP: The variable `g`, the `TraversalSource`, only needs to be instantiated once and should then be re-used.
IMPORTANT: A `Traversal` is essentially an `Iterator` so if you have code like `x = g.V()`, the `x` does not contain
the results of the `g.V()` query. Rather, that statement assigns an `Iterator` value to `x`. To get your results,
you would then need to iterate through `x`. This understanding is *important* because in the context of the console
typing `g.V()` instantly returns a value. The console does some magic for you by noticing that `g.V()` returns
an `Iterator` and then automatically iterates the results. In short, when writing Gremlin outside of the console
always remember that you must iterate your `Traversal` manually in some way for it to do anything. The concept of
"iterating your traversal" is described further in link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/[The Gremlin Console Tutorial].
In this first five minutes with Gremlin, you've gotten the Gremlin Console installed, instantiated a `Graph` and
`TraversalSource`, written some traversals and hopefully learned something about TinkerPop in general. You've only
scratched the surface of what there is to know, but those accomplishments will help enable your understanding of the
more detailed sections to come.
== The Next Fifteen Minutes
In the first five minutes of _The TinkerPop Workout - by Gremlin_, you learned some basics for traversing graphs. Of
course, there wasn't much discussion about what a graph is. A graph is a collection of vertices (i.e. nodes, dots)
and edges (i.e. relationships, lines), where a vertex is an entity which represents some domain object (e.g. a person,
a place, etc.) and an edge represents the relationship between two vertices.
image:modern-edge-1-to-3-1.png[width=300]
The diagram above shows a graph with two vertices, one with a unique identifier of "1" and another with a unique
identifier of "3". There is an edge connecting the two with a unique identifier of "9". It is important to consider
that the edge has a direction which goes _out_ from vertex "1" and _in_ to vertex "3".
IMPORTANT: Most TinkerPop implementations do not allow for identifier assignment. They will rather assign
their own identifiers and ignore assigned identifiers that you attempt to assign to them.
A graph with elements that just have identifiers does not make for much of a database. To give some meaning to
this basic structure, vertices and edges can each be given labels to categorize them.
image:modern-edge-1-to-3-2.png[width=300]
You can now see that a vertex "1" is a "person" and vertex "3" is a "software" vertex. They are joined by a "created"
edge which allows you to see that a "person created software". The "label" and the "id" are reserved attributes of
vertices and edges, but you can add your own arbitrary properties as well:
image:modern-edge-1-to-3-3.png[width=325]
This model is referred to as a _property graph_ and it provides a flexible and intuitive way in which to model your
data.
=== Creating a Graph
As intuitive as it is to you, it is perhaps more intuitive to Gremlin himself, as vertices, edges and properties make
up the very elements of his existence. It is indeed helpful to think of our friend, Gremlin, moving about a graph when
developing traversals, as picturing his position as the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_the_traverser[traverser]
helps orient where you need him to go next. Let's use the two vertex, one edge graph we've been discussing above
as an example. First, you need to create this graph:
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = graph.traversal()
v1 = g.addV("person").property(id, 1).property("name", "marko").property("age", 29).next()
v2 = g.addV("software").property(id, 3).property("name", "lop").property("lang", "java").next()
g.addE("created").from(v1).to(v2).property(id, 9).property("weight", 0.4)
----
There are a number of important things to consider in the above code. First, recall that `id` is
"reserved" for special usage in TinkerPop and is a member of the enum, `T`. The "keys" supplied to the creation
method are link:https://docs.oracle.com/javase/8/docs/technotes/guides/language/static-import.html[statically imported]
to the console, which allows you to access them without having to specify their owning enum. Think of `id` as a
shorthand form that enables a more fluid code style. You would normally refer to it as `T.id`, so without
that static importing you would instead have to write:
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = graph.traversal()
v1 = g.addV("person").property(T.id, 1).property("name", "marko").property("age", 29).next()
v2 = g.addV("software").property(T.id, 3).property("name", "lop").property("lang", "java").next()
g.addE("created").from(v1).to(v2).property(T.id, 9).property("weight", 0.4)
----
NOTE: The fully qualified name for `T` is `org.apache.tinkerpop.gremlin.structure.T`. Another important static import
that is often seen in Gremlin comes from `+org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__+`, which allows
for the creation of link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[anonymous traversals].
Second, don't forget that you are working with TinkerGraph which allows for identifier assignment. That is _not_ the
case with most graph databases.
Finally, the label for an `Edge` is required and is thus part of the method signature of `addEdge()`. This usage of `addEdge` is
creating an edge that goes _out_ of `v1` and into `v2` with a label of "created".
=== Graph Traversal - Staying Simple
Now that Gremlin knows where the graph data is, you can ask him to get you some data from it by doing a traversal,
which you can think of as executing some link:https://tinkerpop.apache.org/docs/x.y.z/reference/#the-graph-process[process]
over the structure of the graph. We can form our question in English and then translate it to Gremlin. For this
initial example, let's ask Gremlin: "What software has Marko created?"
To answer this question, we would want Gremlin to:
. Find "marko" in the graph
. Walk along the "created" edges to "software" vertices
. Select the "name" property of the "software" vertices
The English-based steps above largely translate to Gremlin's position in the graph and to the steps we need to take
to ask him to answer our question. By stringing these steps together, we form a `Traversal` or the sequence of programmatic
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[steps] Gremlin needs to perform
in order to get you an answer.
Let's start with finding "marko". This operation is a filtering step as it searches the full set of vertices to match
those that have the "name" property value of "marko". This can be done with the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#has-step[has()] step as follows:
[gremlin-groovy,modern]
----
g.V().has('name','marko')
----
NOTE: The variable `g` is the `TraversalSource`, which was introduced in the "The First Five Minutes". The
`TraversalSource` is created with `graph.traversal()` and is the object used to spawn new traversals.
We can picture this traversal in our little graph with Gremlin sitting on vertex "1".
image:modern-edge-1-to-3-1-gremlin.png[width=325]
When Gremlin is on a vertex or an edge, he has access to all the properties that are available to that element.
IMPORTANT: The above query iterates all the vertices in the graph to get its answer. That's fine for our little example,
but for multi-million or billion edge graphs that is a big problem. To solve this problem, you should look to use
indices. TinkerPop does not provide an abstraction for index management. You should consult the documentation of the
graph you have chosen and utilize its native API to create indices which will then speed up these types of lookups. Your
traversals will remain unchanged however, as the indices will be used transparently at execution time.
Now that Gremlin has found "marko", he can now consider the next step in the traversal where we ask him to "walk"
along "created" edges to "software" vertices. As described earlier, edges have direction, so we have to tell Gremlin
what direction to follow. In this case, we want him to traverse on outgoing edges from the "marko" vertex. For this,
we use the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#vertex-steps[outE] step.
[gremlin-groovy,modern]
----
g.V().has('name','marko').outE('created')
----
At this point, you can picture Gremlin moving from the "marko" vertex to the "created" edge.
image:modern-edge-1-to-3-2-gremlin.png[width=325]
To get to the vertex on the other end of the edge, you need to tell Gremlin to move from the edge to the incoming
vertex with `inV()`.
[gremlin-groovy,modern]
----
g.V().has('name','marko').outE('created').inV()
----
You can now picture Gremlin on the "software" vertex as follows:
image:modern-edge-1-to-3-3-gremlin.png[width=325]
As you are not asking Gremlin to do anything with the properties of the "created" edge, you can simplify the
statement above with:
[gremlin-groovy,modern]
----
g.V().has('name','marko').out('created')
----
image:modern-edge-1-to-3-4-gremlin.png[width=325]
Finally, now that Gremlin has reached the "software that Marko created", he has access to the properties of the
"software" vertex and you can therefore ask Gremlin to extract the value of the "name" property as follows:
[gremlin-groovy,modern]
----
g.V().has('name','marko').out('created').values('name')
----
You should now be able to see the connection Gremlin has to the structure of the graph and how Gremlin maneuvers from
vertices to edges and so on. Your ability to string together steps to ask Gremlin to do more complex things, depends
on your understanding of these basic concepts.
=== Graph Traversal - Increasing Complexity
Armed with the knowledge from the previous section, let's ask Gremlin to perform some more difficult traversal tasks.
There's not much more that can be done with the "baby" graph we had, so let's return to the "modern" toy graph from
the "five minutes section". Recall that you can create this `Graph` and establish a `TraversalSource` with:
[gremlin-groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
----
Earlier we'd used the `has()`-step to tell Gremlin how to find the "marko" vertex. Let's look at some other ways to
use `has()`. What if we wanted Gremlin to find the "age" values of both "vadas" and "marko"? In this case we could
use the `within` comparator with `has()` as follows:
[gremlin-groovy,modern]
----
g.V().has('name',within('vadas','marko')).values('age')
----
It is worth noting that `within` is statically imported from `P` to the Gremlin Console (much like `T` is, as described
earlier).
NOTE: The fully qualified name for `P` is `org.apache.tinkerpop.gremlin.process.traversal.P`.
If we wanted to ask Gremlin the average age of "vadas" and "marko" we could use the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#mean-step[mean()] step as follows:
[gremlin-groovy,modern]
----
g.V().has('name',within('vadas','marko')).values('age').mean()
----
Another method of filtering is seen in the use of the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#where-step[where]
step. We know how to find the "software" that "marko" created:
[gremlin-groovy,modern]
----
g.V().has('name','marko').out('created')
----
image:gremlin-on-software-vertex.png[width=325,float=right] Let's extend on that query to try to learn who "marko"
collaborates with when it comes to the software he created. In other words, let's try to answer the question of: "Who
are the people that marko develops software with?" To do that, we should first picture Gremlin where we left him in
the previous query. He was standing on the "software" vertex. To find out who "created" that "software", we need to
have Gremlin traverse back _in_ along the "created" edges to find the "person" vertices tied to it.
TIP: The nature of Gremlin leads to long lines of code. Readability can be greatly improved by using line spacing and
indentation. See the link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#style-guide[Style Guide] for recommendations
on what well formatted Gremlin should look like.
[gremlin-groovy,modern]
----
g.V().has('name','marko').
out('created').in('created').
values('name')
----
So that's nice, we can see that "peter", "josh" and "marko" are all responsible for creating "v[3]", which is the
"software" vertex named "lop". Of course, we already know about the involvement of "marko" and it seems strange to say that
"marko" collaborates with himself, so excluding "marko" from the results seems logical. The following traversal
handles that exclusion:
[gremlin-groovy,modern]
----
g.V().has('name','marko').as('exclude').
out('created').in('created').
where(neq('exclude')).
values('name')
----
We made two additions to the traversal to make it exclude "marko" from the results. First, we added the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#as-step[as()] step. The `as()`-step is not really a "step",
but a "step modulator" - something that adds features to a step or the traversal. Here, the `as('exclude')` labels
the `has()`-step with the name "exclude" and all values that pass through that step are held in that label for later
use. In this case, the "marko" vertex is the only vertex to pass through that point, so it is held in "exclude".
The other addition that was made was the `where()`-step which is a filter step like `has()`. The `where()` is
positioned after the `in()`-step that has "person" vertices, which means that the `where()` filter is occurring
on the list of "marko" collaborators. The `where()` specifies that the "person" vertices passing through it should
not equal (i.e. `neq()`) the contents of the "exclude" label. As it just contains the "marko" vertex, the `where()`
filters out the "marko" that we get when we traverse back _in_ on the "created" edges.
You will find many uses of `as()`. Here it is in combination with link:https://tinkerpop.apache.org/docs/x.y.z/reference/#select-step[select]:
[gremlin-groovy,modern]
----
g.V().as('a').out().as('b').out().as('c').
select('a','b','c')
----
In the above example, we tell Gremlin to iterate through all vertices and traverse _out_ twice from each. Gremlin
will label each vertex in that path with "a", "b" and "c", respectively. We can then use `select` to extract the
contents of that label.
Another common but important step is the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#group-step[group()]
step and its related step modulator called link:https://tinkerpop.apache.org/docs/x.y.z/reference/#by-step[by()]. If
we wanted to ask Gremlin to group all the vertices in the graph by their vertex label we could do:
[gremlin-groovy,modern]
----
g.V().group().by(label)
----
The use of `by()` here provides the mechanism by which to do the grouping. In this case, we've asked Gremlin to
use the `label` (which, again, is an automatic static import from `T` in the console). We can't really tell much
about our distribution though because we just have unique identifiers of vertices as output. To make that nicer we
could ask Gremlin to get us the value of the "name" property from those vertices, by supplying another `by()`
modulator to `group()` to transform the values.
[gremlin-groovy,modern]
----
g.V().group().by(label).by('name')
----
In this section, you have learned a bit more about what property graphs are and how Gremlin interacts with them.
You also learned how to envision Gremlin moving about a graph and how to use some of the more complex, but commonly
utilized traversal steps. You are now ready to think about TinkerPop in terms of its wider applicability to
graph computing.
== The Final Ten Minutes
In these final ten minutes of _The TinkerPop Workout - by Gremlin_ we'll look at TinkerPop from a higher level and
introduce different features it provides to help orient you to some of the project's technology ecosystem. In this
way, you can identify areas of interest and dig into the details from there.
=== Why TinkerPop?
image:provider-integration.png[float=right,width=350] The goal of TinkerPop, as a Graph Computing Framework, is to make it
easy for developers to create graph applications by providing APIs and tools that simplify their endeavors. One of
the fundamental aspects to what TinkerPop offers in this area lies in the fact that TinkerPop is an abstraction layer
over different graph databases and different graph processors. As an abstraction layer, TinkerPop provides a way to
avoid vendor lock-in to a specific database or processor. This capability provides immense value to developers who
are thus afforded options in their architecture and development because:
* They can try different implementations using the same code to decide which is best for their environment.
* They can grow into a particular implementation if they so desire - start with a graph that is designed to scale
within a single machine and then later switch to a graph that is designed to scale horizontally.
* They can feel more confident in graph technology choices, as advances in the state of different provider
implementations are behind TinkerPop APIs, which open the possibility to switch providers with limited impact.
TinkerPop has always had the vision of being an abstraction over different graph databases. That much
is not new and dates back to TinkerPop 1.x. It is in TinkerPop 3.x however that we see the introduction of the notion
that TinkerPop is also an abstraction over different graph processors like link:http://spark.apache.org[Spark] and
link:http://giraph.apache.org/[Giraph]. The scope of this tutorial does not permit it to delve into
"graph processors", but the short story is that the same Gremlin statement we wrote in the examples above can be
executed to run in distributed fashion over Spark or Hadoop. The changes required to the code to do this are not
in the traversal itself, but in the definition of the `TraversalSource`. You can again see why we encourage graph
operations to be executed through that class as opposed to just using `Graph`. You can read more about these
features in this section on link:https://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-gremlin[hadoop-gremlin].
TIP: To maintain an abstraction over `Graph` creation use `GraphFactory.open()` to construct new instances. See
the documentation for individual `Graph` implementations to learn about the configuration options to provide.
=== Loading Data
image:gremlin-to-the-7.png[width=100,float=left] There are many strategies for getting data into your graph. As you are
just getting started, let's look at the more simple methods aimed at "smaller" graphs. A "small" graph, in this
context, is one that has fewer than ten million edges. The most direct way to load this data is to write a Groovy script
that can be executed in the Gremlin Console, a tool that you should be well familiar with at this point. For our
example, let's use the link:http://snap.stanford.edu/data/wiki-Vote.html[Wikipedia Vote Network] data set which
contains 7,115 vertices and 103,689 edges.
[source,text]
----
$ curl -L -O http://snap.stanford.edu/data/wiki-Vote.txt.gz
$ gunzip wiki-Vote.txt.gz
----
The data is contained in a tab-delimited structure where vertices are Wikipedia users and edges from one user to
another imply a "vote" relationship. Here is the script to parse the file and generate the `Graph` instance using
TinkerGraph:
[source,groovy]
----
graph = TinkerGraph.open()
graph.createIndex('userId', Vertex.class) <1>
g = graph.traversal()
getOrCreate = { id ->
g.V().has('userId', id).
fold().
coalesce(unfold(),
addV('user').property('userId', id)).next() <2>
}
new File('wiki-Vote.txt').eachLine {
if (!it.startsWith("#")){
(fromVertex, toVertex) = it.split('\t').collect(getOrCreate) <3>
g.addE('votesFor').from(fromVertex).to(toVertex).iterate()
}
}
----
<1> To ensure fast lookups of vertices, we need an index. The `createIndex()` method is a method native to
TinkerGraph. Please consult your graph databases' documentation for their index creation approaches.
<2> This "get or create" traversal gets a a vertex if it already exists; otherwise, it creates it. It uses `coalesce()` in
a clever way by first determining if the list of vertices produced by the previous `fold()` has anything in it by
testing the result of `unfold()`. If `unfold()` returns nothing then that vertex doesn't exist and the subsequent
`addV()` inner traversal can be called to create it.
<3> We are iterating each line of the `wiki-Vote.txt` file and this line splits the line on the delimiter, then
uses some neat Groovy syntax to apply the `getOrCreate()` function to each of the two `userId` fields encountered in
the line and stores those vertices in the `fromVertex` and `toVertex` variables respectively.
NOTE: While this is a tab-delimited structure, this same pattern can be applied
to any data source you require and Groovy tends to have nice libraries that can help make working with data
link:https://thinkaurelius.wordpress.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/[quite enjoyable].
WARNING: Take care if using a `Graph` implementation that supports
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#transactions[transactions]. As TinkerGraph does not, there is
no need to `commit()`. If your `Graph` does support transactions, intermediate commits during load will need to be
applied.
To load larger data sets you should read about the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#clonevertexprogram[CloneVertexProgram], which provides a
generalized method for loading graphs of virtually any size and consider the native bulk loading features of the
underlying graph database that you've chosen.
=== Gremlin Server
image:gremlin-server-protocol.png[width=325,float=right] link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-server[Gremlin Server]
provides a way to remotely execute Gremlin scripts against one or more `Graph` instances hosted within it. It does
this by exposing different endpoints, such as link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_connecting_via_http[HTTP]
and link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-via-java[WebSocket], which allow a request
containing a Gremlin script to be processed with results returned.
[source,text]
----
$ curl -L -O https://www.apache.org/dist/tinkerpop/x.y.z/apache-tinkerpop-gremlin-server-x.y.z-bin.zip
$ unzip apache-tinkerpop-gremlin-server-x.y.z-bin.zip
$ cd apache-tinkerpop-gremlin-server-x.y.z
$ bin/gremlin-server.sh conf/gremlin-server-rest-modern.yaml
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-rest-modern.yaml
...
[INFO] GremlinServer$1 - Channel started at port 8182.
----
[source,text]
$ curl -X POST -d "{\"gremlin\":\"g.V(x).out().values('name')\", \"language\":\"gremlin-groovy\", \"bindings\":{\"x\":1}}" "http://localhost:8182"
[source,json]
----
{
"requestId": "f67dbfff-b33a-4ae3-842d-c6e7c97b246b",
"status": {
"message": "",
"code": 200,
"attributes": {
"@type": "g:Map",
"@value": []
}
},
"result": {
"data": {
"@type": "g:List",
"@value": ["lop", "vadas", "josh"]
},
"meta": {
"@type": "g:Map",
"@value": []
}
}
}
----
IMPORTANT: Take careful note of the use of "bindings" in the arguments on the request. These are variables that are
applied to the script on execution and is essentially a way to parameterize your scripts. This "parameterization" is
critical to link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_best_practices[performance]. Whenever
possible, parameterize your queries.
As mentioned earlier, Gremlin Server can also be configured with a WebSocket endpoint. This endpoint has an
embedded link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[subprotocol] that allow a
compliant driver to communicate with it. TinkerPop supplies a
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-via-java[reference driver] written in Java, but
there are drivers developed by both TinkerPop and third-parties for other link:https://tinkerpop.apache.org/#language-drivers[languages]
such as Python, Javascript, etc. Gremlin Server therefore represents the method by which non-JVM languages can
interact with TinkerPop.
=== Conclusion
...and that is the end of _The TinkerPop Workout - by Gremlin_. You are hopefully feeling more confident in your
TinkerPop skills and have a good overview of what the stack has to offer, as well as some entry points to further
research within the reference documentation. Welcome to The TinkerPop!