docs/src/tutorials/getting-started/index.asciidoc - tinkerpop - Git at Google

 ////
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 ////

 image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"]

 *x.y.z*

 == Getting Started

 link:https://tinkerpop.apache.org[Apache TinkerPop™] is an open source Graph Computing Framework. Within itself, TinkerPop
 represents a large collection of capabilities and technologies and, in its wider ecosystem, an additionally extended
 world of link:https://tinkerpop.apache.org/#graph-systems[third-party contributed] graph libraries and
 systems. TinkerPop's ecosystem can appear complex to newcomers of all experience, especially when glancing at the
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/[reference documentation] for the first time.

 So, where do you get started with TinkerPop? How do you dive in quickly and get productive? Well - Gremlin, the
 most recognizable citizen of The TinkerPop, is here to help with this thirty minute tutorial. That's right - in just
 thirty short minutes, you too can be fit to start building graph applications with TinkerPop. Welcome to _The
 TinkerPop Workout - by Gremlin_!

 image::gremlin-gym.png[width=1024]

 == The First Five Minutes

 It is quite possible to learn a lot in just five minutes with TinkerPop, but before doing so, a proper introduction of
 your trainer is in order. Meet Gremlin!

 image:gremlin-standing.png[width=125]

 Gremlin helps you navigate the vertices and edges of a graph. He is essentially your query language to graph
 databases, as link:http://sql2gremlin.com/[SQL] is the query language to relational databases. To tell Gremlin how
 he should "traverse" the graph (i.e. what you want your query to do) you need a way to provide him commands in the
 language he understands - and, of course, that language is called "Gremlin". For this task, you need one of
 TinkerPop's most important tools: link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-console[The Gremlin Console].

 NOTE: Are you unsure of what a vertex or edge is? That topic is covered in the <<_the_next_fifteen_minutes, next section>>,
 but please allow the tutorial to get you oriented with the Gremlin Console first, so that you have an understanding of
 the tool that will help you with your learning experience.

 link:https://www.apache.org/dyn/closer.lua/tinkerpop/x.y.z/apache-tinkerpop-gremlin-console-x.y.z-bin.zip[Download the console],
 unpackage it and start it:

 [source,text]
 ----
 $ unzip apache-tinkerpop-gremlin-console-x.y.z-bin.zip
 $ cd apache-tinkerpop-gremlin-console-x.y.z
 $ bin/gremlin.sh

          \,,,/
          (o o)
 -----oOOo-(3)-oOOo-----
 plugin activated: tinkerpop.server
 plugin activated: tinkerpop.utilities
 plugin activated: tinkerpop.tinkergraph
 gremlin>
 ----

 TIP: Windows users may use the included `bin/gremlin.bat` file to start the Gremlin Console.

 The Gremlin Console is a link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL environment],
 which provides a nice way to learn Gremlin as you get immediate feedback for the code that you enter. This eliminates
 the more complex need to "create a project" to try things out. The console is not just for "getting started" however.
 You will find yourself using it for a variety of TinkerPop-related activities, such as loading data, administering
 graphs, working out complex traversals, etc.

 To get Gremlin to traverse a graph, you need a `Graph` instance, which holds the
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_the_graph_structure[structure] and data of the
 graph. TinkerPop is a graph abstraction layer over different graph databases and different graph processors, so there
 are many `Graph` instances you can choose from to instantiate in the console. The best `Graph` instance to start with
 however is link:https://tinkerpop.apache.org/docs/x.y.z/reference/#tinkergraph-gremlin[TinkerGraph]. TinkerGraph
 is a fast, in-memory graph database with a small handful of configuration options, making it a good choice for beginners.

 TIP: TinkerGraph is not just a toy for beginners. It is useful in analyzing subgraphs taken from a large graph,
 working with a small static graph that doesn't change much, writing unit tests and other use cases where the graph
 can fit in memory.

 TIP: For purposes of "getting started", resist the temptation to dig into more complex databases that have lots of
 configuration options or to delve into how to get link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-server[Gremlin Server]
 working properly. Focusing on the basics, presented in this guide, builds a good foundation for all the other things
 TinkerPop offers.

 To make your learning process even easier, start with one of TinkerPop's "toy" graphs. These are "small" graphs
 designed to provide a quick start into querying. It is good to get familiar with them, as almost all TinkerPop
 documentation is based on them and when you need help and have to come to the
 link:http://groups.google.com/group/gremlin-users[mailing list], a failing example put in the context of the toy graphs
 can usually get you a fast answer to your problem.

 TIP: When asking questions on the mailing list or StackOverflow about Gremlin, it's always helpful to include a
 sample graph so that those attempting to answer your question understand exactly what kind of graph you have and can
 focus their energies on a good answer rather than trying to build sample data themselves. The sample graph should just
 be a simple Gremlin script that can be copied and pasted into a Gremlin Console session.

 For your first graph, use the "Modern" graph which looks like this:

 image:tinkerpop-modern.png[width=500]

 It can be instantiated in the console this way:

 [gremlin-groovy]
 ----
 graph = TinkerFactory.createModern()
 g = graph.traversal()
 ----

 The first command creates a `Graph` instance named `graph`, which thus provides a reference to the data you want
 Gremlin to traverse. Unfortunately, just having `graph` doesn't provide Gremlin enough context to do his job. You
 also need something called a `TraversalSource`, which is generated by the second command. The `TraversalSource`
 provides additional information to Gremlin (such as the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#traversalstrategy[traversal strategies]
 to apply and the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graphcomputer[traversal engine] to use) which
 provides him guidance on how to execute his trip around the `Graph`.

 With your `TraversalSource` `g` available it is now possible to ask Gremlin to traverse the `Graph`:

 [gremlin-groovy,modern]
 ----
 g.V()    <1>
 g.V(1)    <2>
 g.V(1).values('name')    <3>
 g.V(1).outE('knows')    <4>
 g.V(1).outE('knows').inV().values('name')    <5>
 g.V(1).out('knows').values('name')    <6>
 g.V(1).out('knows').has('age', gt(30)).values('name')    <7>
 ----

 <1> Get all the vertices in the `Graph`.
 <2> Get the vertex with the unique identifier of "1".
 <3> Get the value of the `name` property on the vertex with the unique identifier of "1".
 <4> Get the edges with the label "knows" for the vertex with the unique identifier of "1".
 <5> Get the names of the people that the vertex with the unique identifier of "1" "knows".
 <6> Note that when one uses `outE().inV()` as shown in the previous command, this can be shortened to just `out()`
 (similar to `inE().outV()` and `in()` for incoming edges).
 <7> Get the names of the people vertex "1" knows who are over the age of 30.

 TIP: The variable `g`, the `TraversalSource`, only needs to be instantiated once and should then be re-used.

 IMPORTANT: A `Traversal` is essentially an `Iterator` so if you have code like `x = g.V()`, the `x` does not contain
 the results of the `g.V()` query.  Rather, that statement assigns an `Iterator` value to `x`. To get your results,
 you would then need to iterate through `x`. This understanding is *important* because in the context of the console
 typing `g.V()` instantly returns a value. The console does some magic for you by noticing that `g.V()` returns
 an `Iterator` and then automatically iterates the results. In short, when writing Gremlin outside of the console
 always remember that you must iterate your `Traversal` manually in some way for it to do anything. The concept of
 "iterating your traversal" is described further in link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/[The Gremlin Console Tutorial].

 In this first five minutes with Gremlin, you've gotten the Gremlin Console installed, instantiated a `Graph` and
 `TraversalSource`, written some traversals and hopefully learned something about TinkerPop in general. You've only
 scratched the surface of what there is to know, but those accomplishments will help enable your understanding of the
 more detailed sections to come.

 == The Next Fifteen Minutes

 In the first five minutes of _The TinkerPop Workout - by Gremlin_, you learned some basics for traversing graphs. Of
 course, there wasn't much discussion about what a graph is. A graph is a collection of vertices (i.e. nodes, dots)
 and edges (i.e. relationships, lines), where a vertex is an entity which represents some domain object (e.g. a person,
 a place, etc.) and an edge represents the relationship between two vertices.

 image:modern-edge-1-to-3-1.png[width=300]

 The diagram above shows a graph with two vertices, one with a unique identifier of "1" and another with a unique
 identifier of "3". There is an edge connecting the two with a unique identifier of "9". It is important to consider
 that the edge has a direction which goes _out_ from vertex "1" and _in_ to vertex "3".

 IMPORTANT: Most TinkerPop implementations do not allow for identifier assignment. They will rather assign
 their own identifiers and ignore assigned identifiers that you attempt to assign to them.

 A graph with elements that just have identifiers does not make for much of a database. To give some meaning to
 this basic structure, vertices and edges can each be given labels to categorize them.

 image:modern-edge-1-to-3-2.png[width=300]

 You can now see that a vertex "1" is a "person" and vertex "3" is a "software" vertex. They are joined by a "created"
 edge which allows you to see that a "person created software". The "label" and the "id" are reserved attributes of
 vertices and edges, but you can add your own arbitrary properties as well:

 image:modern-edge-1-to-3-3.png[width=325]

 This model is referred to as a _property graph_ and it provides a flexible and intuitive way in which to model your
 data.

 === Creating a Graph

 As intuitive as it is to you, it is perhaps more intuitive to Gremlin himself, as vertices, edges and properties make
 up the very elements of his existence. It is indeed helpful to think of our friend, Gremlin, moving about a graph when
 developing traversals, as picturing his position as the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_the_traverser[traverser]
 helps orient where you need him to go next. Let's use the two vertex, one edge graph we've been discussing above
 as an example. First, you need to create this graph:

 [gremlin-groovy]
 ----
 graph = TinkerGraph.open()
 g = graph.traversal()
 v1 = g.addV("person").property(id, 1).property("name", "marko").property("age", 29).next()
 v2 = g.addV("software").property(id, 3).property("name", "lop").property("lang", "java").next()
 g.addE("created").from(v1).to(v2).property(id, 9).property("weight", 0.4)
 ----

 There are a number of important things to consider in the above code. First, recall that `id` is
 "reserved" for special usage in TinkerPop and is a member of the enum, `T`. The "keys" supplied to the creation
 method are link:https://docs.oracle.com/javase/8/docs/technotes/guides/language/static-import.html[statically imported]
 to the console, which allows you to access them without having to specify their owning enum. Think of `id` as a
 shorthand form that enables a more fluid code style. You would normally refer to it as `T.id`, so without
 that static importing you would instead have to write:

 [gremlin-groovy]
 ----
 graph = TinkerGraph.open()
 g = graph.traversal()
 v1 = g.addV("person").property(T.id, 1).property("name", "marko").property("age", 29).next()
 v2 = g.addV("software").property(T.id, 3).property("name", "lop").property("lang", "java").next()
 g.addE("created").from(v1).to(v2).property(T.id, 9).property("weight", 0.4)
 ----

 NOTE: The fully qualified name for `T` is `org.apache.tinkerpop.gremlin.structure.T`. Another important static import
 that is often seen in Gremlin comes from `+org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__+`, which allows
 for the creation of link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[anonymous traversals].

 Second, don't forget that you are working with TinkerGraph which allows for identifier assignment. That is _not_ the
 case with most graph databases.

 Finally, the label for an `Edge` is required and is thus part of the method signature of `addEdge()`. This usage of `addEdge` is
 creating an edge that goes _out_ of `v1` and into `v2` with a label of "created".

 === Graph Traversal - Staying Simple

 Now that Gremlin knows where the graph data is, you can ask him to get you some data from it by doing a traversal,
 which you can think of as executing some link:https://tinkerpop.apache.org/docs/x.y.z/reference/#the-graph-process[process]
 over the structure of the graph. We can form our question in English and then translate it to Gremlin. For this
 initial example, let's ask Gremlin: "What software has Marko created?"

 To answer this question, we would want Gremlin to:

 . Find "marko" in the graph
 . Walk along the "created" edges to "software" vertices
 . Select the "name" property of the "software" vertices

 The English-based steps above largely translate to Gremlin's position in the graph and to the steps we need to take
 to ask him to answer our question. By stringing these steps together, we form a `Traversal` or the sequence of programmatic
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[steps] Gremlin needs to perform
 in order to get you an answer.

 Let's start with finding "marko". This operation is a filtering step as it searches the full set of vertices to match
 those that have the "name" property value of "marko". This can be done with the
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#has-step[has()] step as follows:

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko')
 ----

 NOTE: The variable `g` is the `TraversalSource`, which was introduced in the "The First Five Minutes". The
 `TraversalSource` is created with `graph.traversal()` and is the object used to spawn new traversals.

 We can picture this traversal in our little graph with Gremlin sitting on vertex "1".

 image:modern-edge-1-to-3-1-gremlin.png[width=325]

 When Gremlin is on a vertex or an edge, he has access to all the properties that are available to that element.

 IMPORTANT: The above query iterates all the vertices in the graph to get its answer. That's fine for our little example,
 but for multi-million or billion edge graphs that is a big problem. To solve this problem, you should look to use
 indices. TinkerPop does not provide an abstraction for index management. You should consult the documentation of the
 graph you have chosen and utilize its native API to create indices which will then speed up these types of lookups. Your
 traversals will remain unchanged however, as the indices will be used transparently at execution time.

 Now that Gremlin has found "marko", he can now consider the next step in the traversal where we ask him to "walk"
 along "created" edges to "software" vertices. As described earlier, edges have direction, so we have to tell Gremlin
 what direction to follow. In this case, we want him to traverse on outgoing edges from the "marko" vertex. For this,
 we use the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#vertex-steps[outE] step.

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').outE('created')
 ----

 At this point, you can picture Gremlin moving from the "marko" vertex to the "created" edge.

 image:modern-edge-1-to-3-2-gremlin.png[width=325]

 To get to the vertex on the other end of the edge, you need to tell Gremlin to move from the edge to the incoming
 vertex with `inV()`.

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').outE('created').inV()
 ----

 You can now picture Gremlin on the "software" vertex as follows:

 image:modern-edge-1-to-3-3-gremlin.png[width=325]

 As you are not asking Gremlin to do anything with the properties of the "created" edge, you can simplify the
 statement above with:

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').out('created')
 ----

 image:modern-edge-1-to-3-4-gremlin.png[width=325]

 Finally, now that Gremlin has reached the "software that Marko created", he has access to the properties of the
 "software" vertex and you can therefore ask Gremlin to extract the value of the "name" property as follows:

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').out('created').values('name')
 ----

 You should now be able to see the connection Gremlin has to the structure of the graph and how Gremlin maneuvers from
 vertices to edges and so on. Your ability to string together steps to ask Gremlin to do more complex things, depends
 on your understanding of these basic concepts.

 === Graph Traversal - Increasing Complexity

 Armed with the knowledge from the previous section, let's ask Gremlin to perform some more difficult traversal tasks.
 There's not much more that can be done with the "baby" graph we had, so let's return to the "modern" toy graph from
 the "five minutes section". Recall that you can create this `Graph` and establish a `TraversalSource` with:

 [gremlin-groovy]
 ----
 graph = TinkerFactory.createModern()
 g = graph.traversal()
 ----

 Earlier we'd used the `has()`-step to tell Gremlin how to find the "marko" vertex. Let's look at some other ways to
 use `has()`. What if we wanted Gremlin to find the "age" values of both "vadas" and "marko"? In this case we could
 use the `within` comparator with `has()` as follows:

 [gremlin-groovy,modern]
 ----
 g.V().has('name',within('vadas','marko')).values('age')
 ----

 It is worth noting that `within` is statically imported from `P` to the Gremlin Console (much like `T` is, as described
 earlier).

 NOTE: The fully qualified name for `P` is `org.apache.tinkerpop.gremlin.process.traversal.P`.

 If we wanted to ask Gremlin the average age of "vadas" and "marko" we could use the
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#mean-step[mean()] step as follows:

 [gremlin-groovy,modern]
 ----
 g.V().has('name',within('vadas','marko')).values('age').mean()
 ----

 Another method of filtering is seen in the use of the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#where-step[where]
 step. We know how to find the "software" that "marko" created:

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').out('created')
 ----

 image:gremlin-on-software-vertex.png[width=325,float=right] Let's extend on that query to try to learn who "marko"
 collaborates with when it comes to the software he created. In other words, let's try to answer the question of: "Who
 are the people that marko develops software with?" To do that, we should first picture Gremlin where we left him in
 the previous query.  He was standing on the "software" vertex. To find out who "created" that "software", we need to
 have Gremlin traverse back _in_ along the "created" edges to find the "person" vertices tied to it.

 TIP: The nature of Gremlin leads to long lines of code. Readability can be greatly improved by using line spacing and
 indentation. See the link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#style-guide[Style Guide] for recommendations
 on what well formatted Gremlin should look like.

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').
   out('created').in('created').
   values('name')
 ----

 So that's nice, we can see that "peter", "josh" and "marko" are all responsible for creating "v[3]", which is the
 "software" vertex named "lop". Of course, we already know about the involvement of "marko" and it seems strange to say that
 "marko" collaborates with himself, so excluding "marko" from the results seems logical. The following traversal
 handles that exclusion:

 [gremlin-groovy,modern]
 ----
 g.V().has('name','marko').as('exclude').
   out('created').in('created').
   where(neq('exclude')).
   values('name')
 ----

 We made two additions to the traversal to make it exclude "marko" from the results. First, we added the
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#as-step[as()] step. The `as()`-step is not really a "step",
 but a "step modulator" - something that adds features to a step or the traversal. Here, the `as('exclude')` labels
 the `has()`-step with the name "exclude" and all values that pass through that step are held in that label for later
 use. In this case, the "marko" vertex is the only vertex to pass through that point, so it is held in "exclude".

 The other addition that was made was the `where()`-step which is a filter step like `has()`. The `where()` is
 positioned after the `in()`-step that has "person" vertices, which means that the `where()` filter is occurring
 on the list of "marko" collaborators. The `where()` specifies that the "person" vertices passing through it should
 not equal (i.e. `neq()`) the contents of the "exclude" label. As it just contains the "marko" vertex, the `where()`
 filters out the "marko" that we get when we traverse back _in_ on the "created" edges.

 You will find many uses of `as()`. Here it is in combination with link:https://tinkerpop.apache.org/docs/x.y.z/reference/#select-step[select]:

 [gremlin-groovy,modern]
 ----
 g.V().as('a').out().as('b').out().as('c').
   select('a','b','c')
 ----

 In the above example, we tell Gremlin to iterate through all vertices and traverse _out_ twice from each. Gremlin
 will label each vertex in that path with "a", "b" and "c", respectively. We can then use `select` to extract the
 contents of that label.

 Another common but important step is the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#group-step[group()]
 step and its related step modulator called link:https://tinkerpop.apache.org/docs/x.y.z/reference/#by-step[by()]. If
 we wanted to ask Gremlin to group all the vertices in the graph by their vertex label we could do:

 [gremlin-groovy,modern]
 ----
 g.V().group().by(label)
 ----

 The use of `by()` here provides the mechanism by which to do the grouping. In this case, we've asked Gremlin to
 use the `label` (which, again, is an automatic static import from `T` in the console). We can't really tell much
 about our distribution though because we just have unique identifiers of vertices as output. To make that nicer we
 could ask Gremlin to get us the value of the "name" property from those vertices, by supplying another `by()`
 modulator to `group()` to transform the values.

 [gremlin-groovy,modern]
 ----
 g.V().group().by(label).by('name')
 ----

 In this section, you have learned a bit more about what property graphs are and how Gremlin interacts with them.
 You also learned how to envision Gremlin moving about a graph and how to use some of the more complex, but commonly
 utilized traversal steps. You are now ready to think about TinkerPop in terms of its wider applicability to
 graph computing.

 == The Final Ten Minutes

 In these final ten minutes of _The TinkerPop Workout - by Gremlin_ we'll look at TinkerPop from a higher level and
 introduce different features it provides to help orient you to some of the project's technology ecosystem. In this
 way, you can identify areas of interest and dig into the details from there.

 === Why TinkerPop?

 image:provider-integration.png[float=right,width=350] The goal of TinkerPop, as a Graph Computing Framework, is to make it
 easy for developers to create graph applications by providing APIs and tools that simplify their endeavors. One of
 the fundamental aspects to what TinkerPop offers in this area lies in the fact that TinkerPop is an abstraction layer
 over different graph databases and different graph processors. As an abstraction layer, TinkerPop provides a way to
 avoid vendor lock-in to a specific database or processor. This capability provides immense value to developers who
 are thus afforded options in their architecture and development because:

 * They can try different implementations using the same code to decide which is best for their environment.
 * They can grow into a particular implementation if they so desire - start with a graph that is designed to scale
 within a single machine and then later switch to a graph that is designed to scale horizontally.
 * They can feel more confident in graph technology choices, as advances in the state of different provider
 implementations are behind TinkerPop APIs, which open the possibility to switch providers with limited impact.

 TinkerPop has always had the vision of being an abstraction over different graph databases. That much
 is not new and dates back to TinkerPop 1.x. It is in TinkerPop 3.x however that we see the introduction of the notion
 that TinkerPop is also an abstraction over different graph processors like link:http://spark.apache.org[Spark] and
 link:http://giraph.apache.org/[Giraph]. The scope of this tutorial does not permit it to delve into
 "graph processors", but the short story is that the same Gremlin statement we wrote in the examples above can be
 executed to run in distributed fashion over Spark or Hadoop. The changes required to the code to do this are not
 in the traversal itself, but in the definition of the `TraversalSource`. You can again see why we encourage graph
 operations to be executed through that class as opposed to just using `Graph`. You can read more about these
 features in this section on link:https://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-gremlin[hadoop-gremlin].

 TIP: To maintain an abstraction over `Graph` creation use `GraphFactory.open()` to construct new instances. See
 the documentation for individual `Graph` implementations to learn about the configuration options to provide.

 === Loading Data

 image:gremlin-to-the-7.png[width=100,float=left] There are many strategies for getting data into your graph. As you are
 just getting started, let's look at the more simple methods aimed at "smaller" graphs. A "small" graph, in this
 context, is one that has fewer than ten million edges. The most direct way to load this data is to write a Groovy script
 that can be executed in the Gremlin Console, a tool that you should be well familiar with at this point. For our
 example, let's use the link:http://snap.stanford.edu/data/wiki-Vote.html[Wikipedia Vote Network] data set which
 contains 7,115 vertices and 103,689 edges.

 [source,text]
 ----
 $ curl -L -O http://snap.stanford.edu/data/wiki-Vote.txt.gz
 $ gunzip wiki-Vote.txt.gz
 ----

 The data is contained in a tab-delimited structure where vertices are Wikipedia users and edges from one user to
 another imply a "vote" relationship. Here is the script to parse the file and generate the `Graph` instance using
 TinkerGraph:

 [source,groovy]
 ----
 graph = TinkerGraph.open()
 graph.createIndex('userId', Vertex.class) <1>

 g = graph.traversal()

 getOrCreate = { id ->
   g.V().has('userId', id).
     fold().
     coalesce(unfold(),
              addV('user').property('userId', id)).next()  <2>
 }

 new File('wiki-Vote.txt').eachLine {
   if (!it.startsWith("#")){
     (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) <3>
     g.addE('votesFor').from(fromVertex).to(toVertex).iterate()
   }
 }
 ----

 <1> To ensure fast lookups of vertices, we need an index. The `createIndex()` method is a method native to
 TinkerGraph. Please consult your graph databases' documentation for their index creation approaches.
 <2> This "get or create" traversal gets a a vertex if it already exists; otherwise, it creates it. It uses `coalesce()` in
 a clever way by first determining if the list of vertices produced by the previous `fold()` has anything in it by
 testing the result of `unfold()`. If `unfold()` returns nothing then that vertex doesn't exist and the subsequent
 `addV()` inner traversal can be called to create it.
 <3> We are iterating each line of the `wiki-Vote.txt` file and this line splits the line on the delimiter, then
 uses some neat Groovy syntax to apply the `getOrCreate()` function to each of the two `userId` fields encountered in
 the line and stores those vertices in the `fromVertex` and `toVertex` variables respectively.

 NOTE: While this is a tab-delimited structure, this same pattern can be applied
 to any data source you require and Groovy tends to have nice libraries that can help make working with data
 link:https://thinkaurelius.wordpress.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/[quite enjoyable].

 WARNING: Take care if using a `Graph` implementation that supports
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#transactions[transactions]. As TinkerGraph does not, there is
 no need to `commit()`.  If your `Graph` does support transactions, intermediate commits during load will need to be
 applied.

 To load larger data sets you should read about the
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#clonevertexprogram[CloneVertexProgram], which provides a
 generalized method for loading graphs of virtually any size and consider the native bulk loading features of the
 underlying graph database that you've chosen.

 === Gremlin Server

 image:gremlin-server-protocol.png[width=325,float=right] link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-server[Gremlin Server]
 provides a way to remotely execute Gremlin scripts against one or more `Graph` instances hosted within it. It does
 this by exposing different endpoints, such as link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_connecting_via_http[HTTP]
 and link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-via-java[WebSocket], which allow a request
 containing a Gremlin script to be processed with results returned.

 [source,text]
 ----
 $ curl -L -O https://www.apache.org/dist/tinkerpop/x.y.z/apache-tinkerpop-gremlin-server-x.y.z-bin.zip
 $ unzip apache-tinkerpop-gremlin-server-x.y.z-bin.zip
 $ cd apache-tinkerpop-gremlin-server-x.y.z
 $ bin/gremlin-server.sh conf/gremlin-server-rest-modern.yaml
 [INFO] GremlinServer -
          \,,,/
          (o o)
 -----oOOo-(3)-oOOo-----

 [INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-rest-modern.yaml
 ...
 [INFO] GremlinServer$1 - Channel started at port 8182.
 ----

 [source,text]
 $ curl -X POST -d "{\"gremlin\":\"g.V(x).out().values('name')\", \"language\":\"gremlin-groovy\", \"bindings\":{\"x\":1}}" "http://localhost:8182"

 [source,json]
 ----
 {
 	"requestId": "f67dbfff-b33a-4ae3-842d-c6e7c97b246b",
 	"status": {
 		"message": "",
 		"code": 200,
 		"attributes": {
 			"@type": "g:Map",
 			"@value": []
 		}
 	},
 	"result": {
 		"data": {
 			"@type": "g:List",
 			"@value": ["lop", "vadas", "josh"]
 		},
 		"meta": {
 			"@type": "g:Map",
 			"@value": []
 		}
 	}
 }
 ----

 IMPORTANT: Take careful note of the use of "bindings" in the arguments on the request. These are variables that are
 applied to the script on execution and is essentially a way to parameterize your scripts. This "parameterization" is
 critical to link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_best_practices[performance].  Whenever
 possible, parameterize your queries.

 As mentioned earlier, Gremlin Server can also be configured with a WebSocket endpoint. This endpoint has an
 embedded link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[subprotocol] that allow a
 compliant driver to communicate with it.  TinkerPop supplies a
 link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-via-java[reference driver] written in Java, but
 there are drivers developed by both TinkerPop and third-parties for other link:https://tinkerpop.apache.org/#language-drivers[languages]
 such as Python, Javascript, etc. Gremlin Server therefore represents the method by which non-JVM languages can
 interact with TinkerPop.

 === Conclusion

 ...and that is the end of _The TinkerPop Workout - by Gremlin_. You are hopefully feeling more confident in your
 TinkerPop skills and have a good overview of what the stack has to offer, as well as some entry points to further
 research within the reference documentation. Welcome to The TinkerPop!