| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| :docinfo: shared |
| :docinfodir: ../../ |
| |
| image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"] |
| |
| *x.y.z* |
| |
| == Getting Started |
| |
| link:https://tinkerpop.apache.org[Apache TinkerPop™] is an open source Graph Computing Framework. Within itself, TinkerPop |
| represents a large collection of capabilities and technologies and, in its wider ecosystem, an additionally extended |
| world of link:https://tinkerpop.apache.org/#graph-systems[third-party contributed] graph libraries and |
| systems. TinkerPop's ecosystem can appear complex to newcomers of all experience levels, especially when glancing at the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/[reference documentation] for the first time. |
| |
| So, where do you get started with TinkerPop? How do you dive in quickly and get productive? Well ... Gremlin, the |
| most recognizable citizen of The TinkerPop, is here to help with this thirty-minute tutorial. That's right: in just |
| thirty short minutes, you too can be fit to start building graph applications with TinkerPop. Welcome to _The |
| TinkerPop Workout — by Gremlin_! |
| |
| image::gremlin-gym.png[width=1024] |
| |
| == The First Five Minutes |
| |
| It is quite possible to learn a lot in just five minutes with TinkerPop, but before doing so, a proper introduction of |
| your trainer is in order. Meet Gremlin! |
| |
| image:gremlin-standing.png[width=125] |
| |
| Gremlin helps you navigate the vertices and edges of a graph. He is essentially your query language to graph |
| databases, as link:http://sql2gremlin.com/[SQL] is the query language to relational databases. To tell Gremlin how |
| he should "traverse" the graph (i.e., what you want your query to do) you need a way to provide him commands in the |
| language he understands — and, of course, that language is called "Gremlin". For this task, you need one of |
| TinkerPop's most important tools: link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-console[The Gremlin Console]. |
| |
| NOTE: Are you unsure of what a vertex or edge is? That topic is covered in the <<_the_next_fifteen_minutes, next section>>, |
| but please allow the tutorial to get you oriented with the Gremlin Console first, so that you have an understanding of |
| the tool that will help you with your learning experience. |
| |
| link:https://www.apache.org/dyn/closer.lua/tinkerpop/x.y.z/apache-tinkerpop-gremlin-console-x.y.z-bin.zip[Download the console], |
| unpackage it and start it: |
| |
| [source,text] |
| ---- |
| $ unzip apache-tinkerpop-gremlin-console-x.y.z-bin.zip |
| $ cd apache-tinkerpop-gremlin-console-x.y.z |
| $ bin/gremlin.sh |
| |
| \,,,/ |
| (o o) |
| -----oOOo-(3)-oOOo----- |
| plugin activated: tinkerpop.server |
| plugin activated: tinkerpop.utilities |
| plugin activated: tinkerpop.tinkergraph |
| gremlin> |
| ---- |
| |
| TIP: Windows users may use the included `bin/gremlin.bat` file to start the Gremlin Console. |
| |
| The Gremlin Console is a link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL environment], |
| which provides a nice way to learn Gremlin as you get immediate feedback for the code that you enter. This eliminates |
| the more complex need to "create a project" to try things out. The console is not just for "getting started", however. |
| You will find yourself using it for a variety of TinkerPop-related activities, such as loading data, administering |
| graphs and working out complex traversals. |
| |
| To get Gremlin to traverse a graph, you need a `TraversalSource` instance, which holds a reference to a |
| `Graph` instance, which in turn holds the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-structure[structure] and data of the |
| graph. TinkerPop is a graph abstraction layer over different graph databases and different graph processors, so there |
| are many `Graph` instances link:https://tinkerpop.apache.org/#graph-systems[you can choose from] to instantiate a |
| connection to in the console. The best `Graph` instance to start with, however, is |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#tinkergraph-gremlin[TinkerGraph]. TinkerGraph is a fast, |
| in-memory graph database with a small handful of configuration options, making it a good choice for beginners. |
| |
| TIP: TinkerGraph is not just a toy for beginners. It is useful in analyzing subgraphs taken from a large graph, |
| working with a small static graph that doesn't change much, writing unit tests and other use cases where the graph |
| can fit in memory. |
| |
| TIP: For purposes of "getting started", resist the temptation to dig into more complex databases that have lots of |
| configuration options or to delve into how to get link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-server[Gremlin Server] |
| working properly. Focusing on the basics, presented in this guide, builds a good foundation for all the other things |
| TinkerPop offers. |
| |
| To make your learning process even easier, start with one of TinkerPop's "toy" graphs. These are "small" graphs |
| designed to provide a quick start into querying. It is good to get familiar with them, as almost all TinkerPop |
| documentation is based on them and when you need help and have to come to the |
| link:http://groups.google.com/group/gremlin-users[mailing list], a failing example put in the context of the toy graphs |
| can usually get you a fast answer to your problem. |
| |
| TIP: When asking questions on the mailing list or StackOverflow about Gremlin, it's always helpful to |
| link:https://stackoverflow.com/questions/51388315/gremlin-choose-one-item-at-random[include a sample graph] so that |
| those attempting to answer your question understand exactly what kind of graph you have and can focus their energies |
| on providing a good, tested answer rather than trying to build sample data themselves. The sample graph should just be a simple |
| Gremlin script that can be copied and pasted into a Gremlin Console session. |
| |
| For your first graph, use the "Modern" graph, which looks like this: |
| |
| image:tinkerpop-modern.png[width=500] |
| |
| It can be instantiated in the console this way: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerFactory.createModern() |
| g = traversal().withEmbedded(graph) |
| ---- |
| |
| The first command creates a `Graph` instance named `graph`, which thus provides a reference to the data you want |
| Gremlin to traverse. Unfortunately, just having `graph` doesn't provide Gremlin enough context to do his job. You |
| also need something called a `TraversalSource`, which is generated by the second command. The `TraversalSource` |
| provides additional information to Gremlin (such as the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#traversalstrategy[traversal strategies] |
| to apply and the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graphcomputer[traversal engine] to use) which |
| provides him guidance on how to execute his trip around the `Graph`. |
| |
| There are several ways to create a `TraversalSource`. The example above uses the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-embedded[embedded] style and is an approach |
| restricted to languages using the Java Virtual Machine (JVM). Other methods are similar in form, but are not the focus of |
| this tutorial. See the Reference Documentation for more information on the different ways of |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-gremlin[connecting with Gremlin]. |
| |
| NOTE: The `traversal()` method is statically imported from the `AnonymousTraversalSource` class so that it can be used |
| in a more fluent fashion. There are common imports for all languages that support Gremlin to make it easier to read |
| and to write (link:https://tinkerpop.apache.org/docs/x.y.z/reference/#java-imports[Java], |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#python-imports[Python], |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#dotnet-imports[.NET], |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#javascript-imports[Javascript]). |
| |
| With your `TraversalSource` `g` available it is now possible to ask Gremlin to traverse the `Graph`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V() <1> |
| g.V(1) <2> |
| g.V(1).values('name') <3> |
| g.V(1).outE('knows') <4> |
| g.V(1).outE('knows').inV().values('name') <5> |
| g.V(1).out('knows').values('name') <6> |
| g.V(1).out('knows').has('age', gt(30)).values('name') <7> |
| ---- |
| |
| <1> Get all the vertices in the `Graph`. |
| <2> Get the vertex with the unique identifier of "1". |
| <3> Get the value of the `name` property on the vertex with the unique identifier of "1". |
| <4> Get the edges with the label "knows" for the vertex with the unique identifier of "1". |
| <5> Get the names of the people whom the vertex with the unique identifier of "1" "knows". |
| <6> Note that when one uses `outE().inV()` as shown in the previous command, this can be shortened to just `out()` |
| (similar to `inE().outV()` and `in()` for incoming edges). |
| <7> Get the names of the people vertex "1" knows who are over the age of 30. |
| |
| TIP: The variable `g`, the `TraversalSource`, only needs to be instantiated once and should then be re-used. |
| |
| IMPORTANT: A `Traversal` is essentially an `Iterator` so if you have code like `x = g.V()`, the `x` does not contain |
| the results of the `g.V()` query. Rather, that statement assigns an `Iterator` value to `x`. To get your results, |
| you would then need to iterate through `x`. This understanding is *important* because in the context of the console |
| typing `g.V()` instantly returns a value. The console does some magic for you by noticing that `g.V()` returns |
| an `Iterator` and then automatically iterates the results. In short, when writing Gremlin outside of the console |
| always remember that you must iterate your `Traversal` manually in some way for it to do anything. The concept of |
| "iterating your traversal" is described further in link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/[The Gremlin Console Tutorial]. |
| |
| In this first five minutes with Gremlin, you've gotten the Gremlin Console installed, instantiated a `Graph` and |
| `TraversalSource`, written some traversals and hopefully learned something about TinkerPop in general. You've only |
| scratched the surface of what there is to know, but those accomplishments will help enable your understanding of the |
| more detailed sections to come. |
| |
| == The Next Fifteen Minutes |
| |
| In the first five minutes of _The TinkerPop Workout — by Gremlin_, you learned some basics for traversing graphs. Of |
| course, there wasn't much discussion about what a graph is. A graph is a collection of vertices (i.e., nodes, dots) |
| and edges (i.e., relationships, lines), where a vertex is an entity which represents some domain object (e.g., a person or |
| a place) and an edge represents the relationship between two vertices. |
| |
| image:modern-edge-1-to-3-1.png[width=300] |
| |
| The diagram above shows a graph with two vertices, one with a unique identifier of "1" and another with a unique |
| identifier of "3". There is an edge connecting the two with a unique identifier of "9". It is important to consider |
| that the edge has a direction, which goes _out_ from vertex "1" and _in_ to vertex "3". |
| |
| IMPORTANT: Most TinkerPop implementations do not allow for identifier assignment. They will rather assign |
| their own identifiers and ignore assigned identifiers that you attempt to assign to them. |
| |
| A graph with elements that just have identifiers does not make for much of a database. To give some meaning to |
| this basic structure, vertices and edges can each be given labels to categorize them. |
| |
| image:modern-edge-1-to-3-2.png[width=300] |
| |
| You can now see that vertex "1" is a "person" and vertex "3" is a "software" vertex. They are joined by a "created" |
| edge which allows you to see that a "person created software". The "label" and the "id" are reserved attributes of |
| vertices and edges, but you can add your own arbitrary properties as well: |
| |
| image:modern-edge-1-to-3-3.png[width=325] |
| |
| This model is referred to as a _property graph_ and it provides a flexible and intuitive way in which to model your |
| data. |
| |
| === Creating a Graph |
| |
| As intuitive as it is to you, it is perhaps more intuitive to Gremlin himself, as vertices, edges and properties make |
| up the very elements of his existence. It is indeed helpful to think of our friend, Gremlin, moving about a graph when |
| developing traversals, as picturing his position as the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_the_traverser[traverser] |
| helps orient where you need him to go next. Let's use the two-vertex, one-edge graph we've been discussing above |
| as an example. First, you need to create this graph: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = traversal().withEmbedded(graph) |
| v1 = g.addV("person").property(id, 1).property("name", "marko").property("age", 29).next() |
| v2 = g.addV("software").property(id, 3).property("name", "lop").property("lang", "java").next() |
| g.addE("created").from(v1).to(v2).property(id, 9).property("weight", 0.4) |
| ---- |
| |
| There are a number of important things to consider in the above code. First, recall that `id` is |
| "reserved" for special usage in TinkerPop. It is a member of the enum, `T`. Those "keys" supplied to the creation |
| method are link:https://docs.oracle.com/javase/8/docs/technotes/guides/language/static-import.html[statically imported] |
| to the console, which allows you to access them without having to specify their owning enum. Think of `id` as a |
| shorthand form that enables a more fluid code style. You would normally refer to it as `T.id`, so without |
| that static importing you would instead have to write: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = traversal().withEmbedded(graph) |
| v1 = g.addV("person").property(T.id, 1).property("name", "marko").property("age", 29).next() |
| v2 = g.addV("software").property(T.id, 3).property("name", "lop").property("lang", "java").next() |
| g.addE("created").from(v1).to(v2).property(T.id, 9).property("weight", 0.4) |
| ---- |
| |
| NOTE: On the JVM, the fully qualified name for `T` is `org.apache.tinkerpop.gremlin.structure.T`. Another important |
| static import that is often seen in Gremlin comes from `+org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__+`, |
| which allows for the creation of link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[anonymous traversals]. |
| You can find the analogous variations of `T` and `+__+` for other Gremlin languages by viewing the "Common Imports" |
| sections for the programming language you are interested in in the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-drivers-variants[Reference Documentation]. |
| |
| Second, don't forget that you are working with TinkerGraph, which allows for identifier assignment. That is _not_ the |
| case with most graph databases. |
| |
| Finally, the label for an `Edge` is required and is thus part of the method signature of `addEdge()`. This usage of `addEdge` is |
| creating an edge that goes _out_ of `v1` and into `v2` with a label of "created". |
| |
| === Graph Traversal - Staying Simple |
| |
| Now that Gremlin knows where the graph data is, you can ask him to get you some data from it by doing a traversal, |
| which you can think of as executing some link:https://tinkerpop.apache.org/docs/x.y.z/reference/#the-graph-process[process] |
| over the structure of the graph. We can form our question in English and then translate it to Gremlin. For this |
| initial example, let's ask Gremlin: "What software has Marko created?" |
| |
| To answer this question, we would want Gremlin to: |
| |
| . Find "marko" in the graph |
| . Walk along the "created" edges to "software" vertices |
| . Select the "name" property of the "software" vertices |
| |
| The English-based steps above largely translate to Gremlin's position in the graph and to the steps we need to take |
| to ask him to answer our question. By stringing these steps together, we form a `Traversal` or the sequence of |
| programmatic link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[steps] Gremlin needs to |
| perform in order to get you an answer. |
| |
| Let's start with finding "marko". This operation is a filtering step as it searches the full set of vertices to match |
| those that have the "name" property value of "marko". This can be done with the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#has-step[has()] step as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko') |
| ---- |
| |
| NOTE: The variable `g` is the `TraversalSource`, which was introduced in the "The First Five Minutes". The |
| `TraversalSource` is created with `traversal().withEmbedded(graph)` and is the object used to spawn new traversals. |
| |
| This bit of Gremlin can be improved and made more |
| link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#unspecified-label-in-global-vertex-lookup[idiomatically pleasing] |
| by including the vertex label as part of the filter to ensure that the "name" property key refers to a "person" vertex. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko') |
| ---- |
| |
| We can picture this traversal in our little graph with Gremlin sitting on vertex "1". |
| |
| image:modern-edge-1-to-3-1-gremlin.png[width=325] |
| |
| When Gremlin is on a vertex or an edge, he has access to all the properties that are available to that element. |
| |
| IMPORTANT: The above query iterates *all* the vertices in the graph to get its answer. That's fine for our little example, |
| but for multi-million- or billion-edge graphs that is a big problem. To solve this problem, you should look to use |
| indices. TinkerPop does not provide an abstraction for index management. You should consult the documentation of the |
| graph you have chosen and utilize its native API to create indices which will then speed up these types of lookups. Your |
| traversals will remain unchanged, however, as the indices will be used transparently at execution time. |
| |
| Now that Gremlin has found "marko", he can consider the next step in the traversal where we ask him to "walk" |
| along "created" edges to "software" vertices. As described earlier, edges have direction, so we have to tell Gremlin |
| what direction to follow. In this case, we want him to traverse on outgoing edges from the "marko" vertex. For this, |
| we use the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#vertex-steps[outE] step. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko').outE('created') |
| ---- |
| |
| At this point, you can picture Gremlin moving from the "marko" vertex to the "created" edge. |
| |
| image:modern-edge-1-to-3-2-gremlin.png[width=325] |
| |
| To get to the vertex on the other end of the edge, you need to tell Gremlin to move from the edge to the incoming |
| vertex with `inV()`. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko').outE('created').inV() |
| ---- |
| |
| You can now picture Gremlin on the "software" vertex as follows: |
| |
| image:modern-edge-1-to-3-3-gremlin.png[width=325] |
| |
| As you are not asking Gremlin to do anything with the properties of the "created" edge, you can simplify the |
| statement above with: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko').out('created') |
| ---- |
| |
| image:modern-edge-1-to-3-4-gremlin.png[width=325] |
| |
| Finally, now that Gremlin has reached the "software that Marko created", he has access to the properties of the |
| "software" vertex and you can therefore ask Gremlin to extract the value of the "name" property as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko').out('created').values('name') |
| ---- |
| |
| You should now be able to see the connection Gremlin has to the structure of the graph and how Gremlin maneuvers from |
| vertices to edges and so on. Your ability to string together steps to ask Gremlin to do more complex things depends |
| on your understanding of these basic concepts. |
| |
| === Graph Traversal - Increasing Complexity |
| |
| Armed with the knowledge from the previous section, let's ask Gremlin to perform some more difficult traversal tasks. |
| There's not much more that can be done with the "baby" graph we had, so let's return to the "modern" toy graph from |
| the "First Five Minutes" section. Recall that you can create this `Graph` and establish a `TraversalSource` with: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerFactory.createModern() |
| g = traversal().withEmbedded(graph) |
| ---- |
| |
| Earlier we'd used the `has()`-step to tell Gremlin how to find the "marko" vertex. Let's look at some other ways to |
| use `has()`. What if we wanted Gremlin to find the "age" values of both "vadas" and "marko"? In this case we could |
| use the `within` comparator with `has()` as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name',within('vadas','marko')).values('age') |
| ---- |
| |
| It is worth noting that `within` is statically imported from `P` to the Gremlin Console (much like `T` is, as described |
| earlier). |
| |
| NOTE: On the JVM, the fully qualified name for `P` is `org.apache.tinkerpop.gremlin.process.traversal.P`. You can find |
| the analogous variation of `P` for other Gremlin languages by viewing the "Common Imports" sections for the programming |
| language you are interested in in the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-drivers-variants[Reference Documentation]. |
| |
| If we wanted to ask Gremlin the average age of "vadas" and "marko" we could use the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#mean-step[mean()] step as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name',within('vadas','marko')).values('age').mean() |
| ---- |
| |
| Another method of filtering is seen in the use of the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#where-step[where] |
| step. We know how to find the "software" that "marko" created: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko').out('created') |
| ---- |
| |
| image:gremlin-on-software-vertex.png[width=325,float=right] Let's extend on that query to try to learn who "marko" |
| collaborates with when it comes to the software he created. In other words, let's try to answer the question of: "Who |
| are the people that marko develops software with?" To do that, we should first picture Gremlin where we left him in |
| the previous query. He was standing on the "software" vertex. To find out who "created" that "software", we need to |
| have Gremlin traverse back _in_ along the "created" edges to find the "person" vertices tied to it. |
| |
| TIP: The nature of Gremlin leads to long lines of code. Readability can be greatly improved by using line spacing and |
| indentation. See the link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#style-guide[Style Guide] for recommendations |
| on what well-formatted Gremlin should look like. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko'). |
| out('created').in('created'). |
| values('name') |
| ---- |
| |
| So that's nice, we can see that "peter", "josh" and "marko" are all responsible for creating "v[3]", which is the |
| "software" vertex named "lop". Of course, we already know about the involvement of "marko" and it seems strange to say that |
| "marko" collaborates with himself, so excluding "marko" from the results seems logical. The following traversal |
| handles that exclusion: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('person','name','marko').as('exclude'). |
| out('created').in('created'). |
| where(neq('exclude')). |
| values('name') |
| ---- |
| |
| We made two additions to the traversal to make it exclude "marko" from the results. First, we added the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#as-step[as()] step. The `as()`-step is not really a "step", |
| but a "step modulator" - something that adds features to a step or the traversal. Here, the `as('exclude')` labels |
| the `has()`-step with the name "exclude" and all values that pass through that step are held in that label for later |
| use. In this case, the "marko" vertex is the only vertex to pass through that point, so it is held in "exclude". |
| |
| The other addition that was made was the `where()`-step, which is a filter step like `has()`. The `where()` is |
| positioned after the `in()`-step that has "person" vertices, which means that the `where()` filter is occurring |
| on the list of "marko" collaborators. The `where()` specifies that the "person" vertices passing through it should |
| not equal (i.e., `neq()`) the contents of the "exclude" label. As it just contains the "marko" vertex, the `where()` |
| filters out the "marko" that we get when we traverse back _in_ on the "created" edges. |
| |
| You will find many uses of `as()`. Here it is in combination with link:https://tinkerpop.apache.org/docs/x.y.z/reference/#select-step[select]: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().as('a').out().as('b').out().as('c'). |
| select('a','b','c') |
| ---- |
| |
| In the above example, we tell Gremlin to iterate through all vertices and traverse _out_ twice from each. Gremlin |
| will label each vertex in that path with "a", "b" and "c", respectively. We can then use `select` to extract the |
| contents of that label. |
| |
| Another common but important step is the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#group-step[group()] |
| step and its related step modulator called link:https://tinkerpop.apache.org/docs/x.y.z/reference/#by-step[by()]. If |
| we wanted to ask Gremlin to group all the vertices in the graph by their vertex label we could do: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().group().by(label) |
| ---- |
| |
| The use of `by()` here provides the mechanism by which to do the grouping. In this case, we've asked Gremlin to |
| use the `label` (which, again, is an automatic static import from `T` in the console). We can't really tell much |
| about our distribution though because we just have unique identifiers of vertices as output. To make that nicer we |
| could ask Gremlin to get us the value of the "name" property from those vertices, by supplying another `by()` |
| modulator to `group()` to transform the values. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().group().by(label).by('name') |
| ---- |
| |
| In this section, you have learned a bit more about what property graphs are and how Gremlin interacts with them. |
| You also learned how to envision Gremlin moving about a graph and how to use some of the more complex, but commonly |
| utilized traversal steps. You are now ready to think about TinkerPop in terms of its wider applicability to |
| graph computing. |
| |
| == The Final Ten Minutes |
| |
| In these final ten minutes of _The TinkerPop Workout — by Gremlin_, we'll look at TinkerPop from a higher level and |
| introduce different features it provides to help orient you to some of the project's technology ecosystem. In this |
| way, you can identify areas of interest and dig into the details from there. |
| |
| === Why TinkerPop? |
| |
| image:provider-integration.png[float=right,width=350] The goal of TinkerPop, as a Graph Computing Framework, is to |
| make it easy for developers to create graph applications by providing APIs and tools that simplify their endeavors. |
| One of the fundamental aspects to what TinkerPop offers in this area lies in the fact that TinkerPop is an abstraction |
| layer over different graph databases and different graph processors. As an abstraction layer, TinkerPop provides a way |
| to avoid vendor lock-in to a specific database or processor. This capability provides immense value to developers who |
| are thus afforded options in their architecture and development because: |
| |
| * They can try different implementations using the same code to decide which is best for their environment. |
| * They can grow into a particular implementation if they so desire; e.g., start with a graph that is designed to scale |
| within a single machine and then later switch to a graph that is designed to scale horizontally. |
| * They can feel more confident in graph technology choices, as advances in the state of different provider |
| implementations are behind TinkerPop APIs, which open the possibility to switch providers with limited impact. |
| |
| TinkerPop has always had the vision of being an abstraction over different graph databases. That much |
| is not new and dates back to TinkerPop 1.x. It is in TinkerPop 3.x, however, that we see the introduction of the notion |
| that TinkerPop is also an abstraction over different graph processors like link:http://spark.apache.org[Spark]. The |
| scope of this tutorial does not permit it to delve into "graph processors", but the short story is that the same |
| Gremlin statement we wrote in the examples above can be executed to run in distributed fashion over Spark or Hadoop. |
| The changes required to the code to do this are not in the traversal itself, but in the definition of the |
| `TraversalSource`. You can again see why we encourage graph operations to be executed through that class as opposed |
| to just using `Graph`. You can read more about these features in this section on |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-gremlin[hadoop-gremlin]. |
| |
| TIP: To maintain an abstraction over `Graph` creation, use `GraphFactory.open()` to construct new instances. See |
| the documentation for individual `Graph` implementations to learn about the configuration options to provide. |
| |
| === Loading Data |
| |
| image:gremlin-to-the-7.png[width=100,float=left] There are many strategies for getting data into your graph. As you are |
| just getting started, let's look at the simpler methods aimed at "smaller" graphs. A "small" graph, in this |
| context, is one that has fewer than ten million edges. The most direct way to load this data is to write a Groovy script |
| that can be executed in the Gremlin Console, a tool that you should be well familiar with at this point. For our |
| example, let's use the link:http://snap.stanford.edu/data/wiki-Vote.html[Wikipedia Vote Network] data set, which |
| contains 7,115 vertices and 103,689 edges. |
| |
| [source,text] |
| ---- |
| $ curl -L -O http://snap.stanford.edu/data/wiki-Vote.txt.gz |
| $ gunzip wiki-Vote.txt.gz |
| ---- |
| |
| The data is contained in a tab-delimited structure in which vertices are Wikipedia users and edges from one user to |
| another imply a "vote" relationship. Here is the script to parse the file and generate the `Graph` instance using |
| TinkerGraph: |
| |
| [source,groovy] |
| ---- |
| graph = TinkerGraph.open() |
| graph.createIndex('userId', Vertex.class) <1> |
| |
| g = traversal().withEmbedded(graph) |
| |
| getOrCreate = { id -> |
| g.V().has('user','userId', id). |
| fold(). |
| coalesce(unfold(), |
| addV('user').property('userId', id)).next() <2> |
| } |
| |
| new File('wiki-Vote.txt').eachLine { |
| if (!it.startsWith("#")){ |
| (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) <3> |
| g.addE('votesFor').from(fromVertex).to(toVertex).iterate() |
| } |
| } |
| ---- |
| |
| <1> To ensure fast lookups of vertices, we need an index. The `createIndex()` method is a method native to |
| TinkerGraph. Please consult your graph databases' documentation for their index creation approaches. |
| <2> This "get or create" traversal gets a vertex if it already exists; otherwise, it creates it. It uses `coalesce()` in |
| a clever way by first determining whether the list of vertices produced by the previous `fold()` has anything in it by |
| testing the result of `unfold()`. If `unfold()` returns nothing then that vertex doesn't exist and the subsequent |
| `addV()` inner traversal can be called to create it. |
| <3> We are iterating each line of the `wiki-Vote.txt` file and this line splits the line on the delimiter, then |
| uses some neat Groovy syntax to apply the `getOrCreate()` function to each of the two `userId` fields encountered in |
| the line and stores those vertices in the `fromVertex` and `toVertex` variables, respectively. |
| |
| NOTE: While this is a tab-delimited structure, this same pattern can be applied |
| to any data source you require and Groovy tends to have nice libraries that can help make working with data |
| link:https://thinkaurelius.wordpress.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/[quite enjoyable]. |
| |
| WARNING: Take care if using a `Graph` implementation that supports |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#transactions[transactions]. As TinkerGraph does not, there is |
| no need to `commit()`. If your `Graph` does support transactions, intermediate commits during load will need to be |
| applied. |
| |
| To load larger data sets you should read about the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#clonevertexprogram[CloneVertexProgram], which provides a |
| generalized method for loading graphs of virtually any size and consider the native bulk loading features of the |
| underlying graph database that you've chosen. |
| |
| === Gremlin in Other Programming Languages |
| |
| This tutorial focused on Gremlin usage within the |
| link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/[Gremlin Console] which means that the |
| examples were Groovy-based and oriented toward the JVM. Gremlin, however, is far from being a Java-only library. |
| TinkerPop natively supports a number of different programming languages, making it possible to execute all of the |
| examples presented in this tutorial with little modification. These different language implementations of Gremlin are |
| referred to as link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-drivers-variants[Gremlin Language Variants] and |
| they help make Gremlin more accessible and easier to use for those who do not use Java as their primary programming |
| language. |
| |
| [gremlin-groovy] |
| ---- |
| v1 = g.addV('person').property('name','marko').next() |
| v2 = g.addV('person').property('name','stephen').next() |
| g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate() |
| ---- |
| [source,csharp] |
| ---- |
| Vertex v1 = g.AddV("person").Property("name","marko").Next(); |
| Vertex v2 = g.AddV("person").Property("name","stephen").Next(); |
| g.V(v1).AddE("knows").To(v2).Property("weight",0.75).Iterate(); |
| ---- |
| [source,java] |
| ---- |
| Vertex v1 = g.addV("person").property("name","marko").next(); |
| Vertex v2 = g.addV("person").property("name","stephen").next(); |
| g.V(v1).addE("knows").to(v2).property("weight",0.75).iterate(); |
| ---- |
| [source,javascript] |
| ---- |
| const v1 = g.addV('person').property('name','marko').next(); |
| const v2 = g.addV('person').property('name','stephen').next(); |
| g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate(); |
| ---- |
| [source,python] |
| ---- |
| v1 = g.addV('person').property('name','marko').next() |
| v2 = g.addV('person').property('name','stephen').next() |
| g.V(v1).addE('knows').to(v2).property('weight',0.75).iterate() |
| ---- |
| [source,go] |
| ---- |
| v1, err := g.AddV("person").Property("name", "marko").Next() |
| v2, err := g.AddV("person").Property("name", "stephen").Next() |
| g.V(v1).AddE("knows").To(v2).Property("weight", 0.75).Iterate() |
| ---- |
| |
| === Conclusion |
| |
| ...and that is the end of _The TinkerPop Workout — by Gremlin_. You are hopefully feeling more confident in your |
| TinkerPop skills and have a good overview of what the stack has to offer, as well as some entry points to further |
| research within the reference documentation. Welcome to The TinkerPop! |