| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| image::apache-tinkerpop-logo.png[width=500] |
| |
| Getting Started |
| =============== |
| |
| link:http://tinkerpop.com[Apache TinkerPop] is an open source Graph Computing Framework. Within itself, TinkerPop |
| represents a large collection of capabilities and technologies and, in its wider ecosystem, an additionally extended |
| world of link:http://tinkerpop.incubator.apache.org/#graph-systems[third-party contributed] graph libraries and |
| systems. TinkerPop's ecosystem can appear complex to newcomers of all experience, especially when glancing at the |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/index.html[reference documentation] for the first time. |
| |
| So, where do you get started with TinkerPop? How do you dive in quickly and get productive? Well - Gremlin, the |
| most recognizable citizen of The TinkerPop, is here to help with this thirty minute tutorial. That's right - in just |
| thirty short minutes, you too can be fit to start building graph applications with TinkerPop. Welcome to _The |
| TinkerPop Workout - by Gremlin_! |
| |
| image::gremlin-gym.png[width=1024] |
| |
| The First Five Minutes |
| ---------------------- |
| |
| It is quite possible to learn a lot in just five minutes with TinkerPop, but before doing so, a proper introduction of |
| your trainer is in order. Meet Gremlin! |
| |
| image:gremlin-standing.png[width=125] |
| |
| Gremlin helps you navigate the vertices and edges of a graph. He is essentially your query language to graph |
| databases, as link:http://sql2gremlin.com/[SQL] is the query language to relational databases. To tell Gremlin how |
| he should "traverse" the graph (i.e. what you want your query to do) you need a way to provide him commands in the |
| language he understands - and, of course, that language is called "Gremlin". For this task, you need one of |
| TinkerPop's most important tools: link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#gremlin-console[The Gremlin Console]. |
| |
| Download the console, unpackage it and start it: |
| |
| [source,text] |
| ---- |
| $ curl -L -O https://www.apache.org/dist/incubator/tinkerpop/x.y.z/apache-gremlin-console-x.y.z-bin.zip |
| $ unzip apache-gremlin-console-x.y.z-bin.zip |
| $ cd apache-gremlin-console-x.y.z-bin.zip |
| $ bin/gremlin.sh |
| |
| \,,,/ |
| (o o) |
| -----oOOo-(3)-oOOo----- |
| plugin activated: tinkerpop.server |
| plugin activated: tinkerpop.utilities |
| plugin activated: tinkerpop.tinkergraph |
| gremlin> |
| ---- |
| |
| TIP: Windows users may use the included `bin/gremlin.bat` file to start the Gremlin Console. |
| |
| The Gremlin Console is a link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL environment], |
| which provides a nice way to learn Gremlin as you get immediate feedback for the code that you enter. This eliminates |
| the more complex need to "create a project" to try things out. The console is not just for "getting started" however. |
| You will find yourself using it for a variety of TinkerPop-related activities, such as loading data, administering |
| graphs, working out complex traversals, etc. |
| |
| To get Gremlin to traverse a graph, you need a `Graph` instance, which holds the |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_the_graph_structure[structure] and data of the |
| graph. TinkerPop is a graph abstraction layer over different graph databases and different graph processors, so there |
| are many `Graph` instances you can choose from to instantiate in the console. The best `Graph` instance to start with |
| however is link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#tinkergraph-gremlin[TinkerGraph]. TinkerGraph |
| is a fast, in-memory graph database with a small handful of configuration options, making it a good choice for beginners. |
| |
| TIP: TinkerGraph is not just a toy for beginners. It is useful in analyzing subgraphs taken from a large graph, |
| working with a small static graph that doesn't change much, writing unit tests and other use cases where the graph |
| can fit in memory. |
| |
| TIP: Resist the temptation to "get started" with more complex databases like link:http://thinkaurelius.github.io/titan/[Titan] |
| or to delve into how to get link:http://tinkerpop.incubator.apache.org/docs/x.y.zg/#gremlin-server[Gremlin Server] |
| working properly. Focusing on the basics, presented in this guide, builds a good foundation for all the other things |
| TinkerPop offers. |
| |
| To make your process even easier, start with one of TinkerPop's "toy" graphs. These are "small" graphs designed to |
| provide a quick start into querying. It is good to get familiar with them, as almost all TinkerPop documentation is based |
| on them and when you need help and have to come to the link:http://groups.google.com/group/gremlin-users[mailing list], |
| a failing example put in the context of the toy graphs can usually get you a fast answer to your problem. |
| |
| For your first graph, use the "Modern" graph which looks like this: |
| |
| image:tinkerpop-modern.png[width=500] |
| |
| It can be instantiated in the console this way: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerFactory.createModern() |
| g = graph.traversal() |
| ---- |
| |
| The first command creates a `Graph` instance named `graph`, which thus provides a reference to the data you want |
| Gremlin to traverse. Unfortunately, just having `graph` doesn't provide Gremlin enough context to do his job. You |
| also need something called a `TraversalSource`, which is generated by the second command. The `TraversalSource` |
| provides additional information to Gremlin (such as the link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#traversalstrategy[traversal strategies] |
| to apply and the link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#graphcomputer[traversal engine] to use) which |
| provides him guidance on how to execute his trip around the `Graph`. |
| |
| With your `TraversalSource` `g` available it is now possible to ask Gremlin to traverse the `Graph`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V() <1> |
| g.V(1) <2> |
| g.V(1).values('name') <3> |
| g.V(1).outE('knows') <4> |
| g.V(1).outE('knows').inV().values('name') <5> |
| g.V(1).out('knows').values('name') <6> |
| g.V(1).out('knows').has('age', gt(30)).values('name') <7> |
| ---- |
| |
| <1> Get all the vertices in the `Graph`. |
| <2> Get a vertex with the unique identifier of "1". |
| <3> Get the value of the `name` property on vertex with the unique identifier of "1". |
| <4> Get the edges with the label "knows" for the vertex with the unique identifier of "1". |
| <5> Get the names of the people that the vertex with the unique identifier of "1" "knows". |
| <6> Note that when one uses `outE().inV()` as shown in the previous command, this can be shortened to just `out()` |
| (similar to `inE().inV()` and `in` for incoming edges). |
| <7> Get the names of the people vertex "1" knows who are over the age of 30. |
| |
| In this first five minutes with Gremlin, you've gotten the Gremlin Console installed, instantiated a `Graph` and |
| `TraversalSource`, wrote some traversals and hopefully learned something about TinkerPop in general. You've only |
| scratched the surface of what there is to know, but those accomplishments will help enable your understanding of the |
| detailed sections to come. |
| |
| The Next Fifteen Minutes |
| ------------------------ |
| |
| In the first five minutes of _The TinkerPop Workout - by Gremlin_, you learned some basics for traversing graphs. Of |
| course, there wasn't much discussion about what a graph is. A graph is a collection of vertices (i.e. nodes, dots) |
| and edges (i.e. relationships, lines), where a vertex is an entity which represents some domain object (e.g. a person, |
| a place, etc.) and an edge represents the relationship between two vertices. |
| |
| image:modern-edge-1-to-3-1.png[width=300] |
| |
| The diagram above shows a graph with two vertices, one with a unique identifier of "1" and another with a unique |
| identifier of "3". There is a edge connecting the two with a unique identifier of "9". It is important to consider |
| that the edge has a direction which goes _out_ from vertex "1" and _in_ to vertex "3'. |
| |
| IMPORTANT: Most TinkerPop implementations do not allow for identifier assignment. They will rather assign |
| their own identifiers and ignore assigned identifiers that you attempt to assign to them. |
| |
| A graph with elements that just have identifiers does not make for much of a database. To give some meaning to |
| this basic structure, vertices and edges can each be given labels to categorize them. |
| |
| image:modern-edge-1-to-3-2.png[width=300] |
| |
| You can now see that a vertex "1" is a "person" and vertex "3" is a "software" vertex. They are joined by a "created" |
| edge which allows you to see that a "person created software". The "label" and the "id" are reserved attributes of |
| vertices and edges, but you can add your own arbitrary properties as well: |
| |
| image:modern-edge-1-to-3-3.png[width=325] |
| |
| This model is referred to as a _property graph_ and it provides a flexible and intuitive way in which to model your |
| data. |
| |
| Creating a Graph |
| ^^^^^^^^^^^^^^^^ |
| |
| As intuitive as it is to you, it is perhaps more intuitive to Gremlin himself, as vertices, edges and properties make |
| up the very elements of his existence. It is indeed helpful to think of our friend, Gremlin, moving about a graph when |
| developing traversals, as picturing his position as the link:http://tinkerpop.incubator.apache.org/docs/3.0.2-incubating/#_the_traverser[traverser] |
| helps orient where you need him to go next. Let's use the two vertex, one edge graph we've been discussing above |
| as an example. First, you need to create this graph: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| v1 = graph.addVertex(id, 1, label, "person", "name", "marko", "age", 29) |
| v2 = graph.addVertex(id, 3, label, "software", "name", "lop", "lang", "java") |
| v1.addEdge("created", v2, id, 9, "weight", 0.4) |
| ---- |
| |
| There are a number of important things to consider in the above code. First, recall that `id` and `label` are |
| "reserved" for special usage in TinkerPop. Those "keys" supplied to the creation method are statically imported to |
| the console. You would normally refer to them as `T.id` and `T.label`. |
| |
| NOTE: The fully qualified name for `T` is `org.apache.tinkerpop.gremlin.structure.T`. |
| |
| Second, don't forget that you are working with TinkerGraph which allows for identifier assignment. That is _not_ the |
| case with most graph databases. |
| |
| Finally, the label for an `Edge` is required and is thus part of the method signature of `addEdge()`. It is the first |
| parameter supplied, followed by the `Vertex` to which `v1` should be connected. Therefore, this usage of `addEdge` is |
| creating an edge that goes _out_ of `v1` and into `v2` with a label of "created". |
| |
| Graph Traversal - Staying Simple |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Now that Gremlin knows where the graph data is, you can ask him to get you some data from it by doing a traversal, |
| which you can think of as executing some link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_the_graph_process[process] |
| over the structure of the graph. We can form our question in English and then translate it to Gremlin. For this |
| initial example, let's ask Gremlin: "What software has Marko created?" |
| |
| To answer this question, we would want Gremlin to: |
| |
| . Find "marko" in the graph |
| . Walk along the "created" edges to "software" vertices |
| . Select the "name" property of the "software" vertices |
| |
| The English-based steps above largely translate to Gremlin's position in the graph and to the steps we need to take |
| to ask him to answer our question. By stringing these steps together, we form a `Traversal` or the sequence of programmatic |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#graph-traversal-steps[steps] Gremlin needs to perform |
| in order to get you an answer. |
| |
| Let's start with finding "marko". This operation is a filtering step as it searches the full set of vertices to match |
| those that have the "name" property value of "marko". This can be done with the |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#has-step[has()] step as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko') |
| ---- |
| |
| NOTE: The variable `g` is the `TraversalSource`, which was introduced in the "The First Five Minutes". The |
| `TraversalSource` is created with `graph.traversal()` and is the object used to spawn new traversals. |
| |
| We can picture this traversal in our little graph with Gremlin sitting on vertex "1". |
| |
| image:modern-edge-1-to-3-1-gremlin.png[width=325] |
| |
| When Gremlin is on a vertex or an edge, he has access to all the properties that are available to that element. |
| |
| IMPORTANT: The above query iterates all the vertices in the graph to get its answer. That's fine for our little example, |
| but for multi-million or billion edge graphs that is a big problem. To solve this problem, you should look to use |
| indices. TinkerPop does not provide an abstraction for index management. You should consult the documentation of the |
| graph you have chosen and utilize its native API to create indices which will then speed up these types of lookups. Your |
| traversals will remain unchanged however, as the indices will be used transparently at execution time. |
| |
| Now that Gremlin has found "marko", he can now consider the next step in the traversal where we ask him to "walk" |
| along "created" edges to "software" vertices. As described earlier, edges have direction, so we have to tell Gremlin |
| what direction to follow. In this case, we want him to traverse on outgoing edges from the "marko" vertex. For this, |
| we use the link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#vertex-steps[outE] step. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').outE('created') |
| ---- |
| |
| At this point, you can picture Gremlin moving from the "marko" vertex to the "created" edge. |
| |
| image:modern-edge-1-to-3-2-gremlin.png[width=325] |
| |
| To get to the vertex on the other end of the edge, you need to tell Gremlin to move from the edge to the incoming |
| vertex with `inV()`. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').outE('created').inV() |
| ---- |
| |
| You can now picture Gremlin on the "software" vertex as follows: |
| |
| image:modern-edge-1-to-3-3-gremlin.png[width=325] |
| |
| As you are not asking Gremlin to do anything with the properties of the "created" edge, you can simplify the |
| statement above with: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').out('created') |
| ---- |
| |
| Finally, now that Gremlin has reached the "software that Marko created", he has access to the properties of the |
| "software" vertex and you can therefore ask Gremlin to extract the value of the "name" property as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').out('created').values('name') |
| ---- |
| |
| You should now be able to see the connection Gremlin has to the structure of the graph and how Gremlin maneuvers from |
| vertices to edges and so on. Your ability to string together steps to ask Gremlin to do more complex things, depends |
| on your understanding of these basic concepts. |
| |
| Graph Traversal - Increasing Complexity |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Armed with the knowledge from the previous section, let's ask Gremlin to perform some more difficult traversal tasks. |
| There's not much more that can be done with the "baby" graph we had, so let's return to the "modern" toy graph from |
| the "five minutes section". Recall that you can create this `Graph` and establish a `TraversalSource` with: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerFactory.createModern() |
| g = graph.traversal() |
| ---- |
| |
| Earlier we'd used the `has()` step to tell Gremlin how to find the "marko" vertex. Let's look at some other ways to |
| use `has()`. What if we wanted Gremlin to find the "age" values of both "vadas" and "marko"? In this case we could |
| use the `within` comparator with `has()` as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name',within('vadas','marko')).values('age') |
| ---- |
| |
| It is worth noting that `within` is statically imported from `P` to the Gremlin Console (much like `T` is, as described |
| earlier). |
| |
| NOTE: The fully qualified name for `P` is `org.apache.tinkerpop.gremlin.process.traversal.P`. |
| |
| If we wanted to ask Gremlin the average age of "vadas" and "marko" we could use the |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#mean-step[mean()] step as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name',within('vadas','marko')).values('age').mean() |
| ---- |
| |
| Another method of filtering is seen in the use of the link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#where-step[where] |
| step. We know how to find the "software" that "marko" created: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').out('created') |
| ---- |
| |
| image:gremlin-on-software-vertex.png[width=350,float=right] Let's extend on that query to try to learn who "marko" |
| collaborates with when it comes to the software he created. In other words, let's try to answer the question of: "Who |
| are the people that marko develops software with?" To do that, we should first picture Gremlin where we left him in |
| the previous query. He was standing on the "software" vertex. To find out who "created" that "software", we need to |
| have Gremlin traverse back _in_ along the "created" edges to find the "person" vertices tied to it. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').out('created').in('created').values('name') |
| ---- |
| |
| So that's nice, we can see that "peter", "josh" and "marko" are both responsible for creating "lop". Of course, we already |
| know about the involvement of "marko" and it seems strange to say that "marko" collaborates with himself, so excluding |
| "marko" from the results seems logical. The following traversal handles that exclusion: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').as('exclude').out('created').in('created').where(neq('exclude')).values('name') |
| ---- |
| |
| We made two additions to the traversal to make it exclude "marko" from the results. First, we added the |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#as-step[as()] step. The `as()` step is not really a "step", |
| but a "step modulator" - something that adds features to a step or the traversal. Here, the `as('exclude')` labels |
| the `has()` step with the name "exclude" and all values that pass through that step are held in that label for later |
| use. In this case, the "marko" vertex is the only vertex to pass through that point, so it is held in "exclude". |
| |
| The other addition that was made was the `where()` step which is a filter step like `has()`. The `where()` is |
| positioned after the `in()` step that has "person" vertices, which means that the `where()` filter is occurring |
| on the list of "marko" collaborators. The `where()` specifies that the "person" vertices passing through it should |
| not equal (i.e. `neq()`) the contents of the "exclude" label. As it just contains the "marko" vertex, the `where()` |
| filters out the "marko" that we get when we traverse back _in_ on the "created" edges. |
| |
| You will find many uses of `as()`. Here it is in combination with link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#select-step[select]: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().as('a').out().as('b').out().as('c').select('a','b','c') |
| ---- |
| |
| In the above example, we tell Gremlin to iterate through all vertices and traverse _out_ twice from each. Gremlin |
| will label each vertex in that path with "a", "b" and "c", respectively. We can then use `select` to extract the |
| contents of that label. |
| |
| Another common but important step is the link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#group-step[group()] |
| step and its related step modulator called link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#by-step[by()]. If |
| we wanted to ask Gremlin to group all the vertices in the graph by their vertex label we could do: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().group().by(label) |
| ---- |
| |
| The use of `by()` here provides the mechanism by which to do the grouping. In this case, we've asked Gremlin to |
| use the `label` (which, again, is an automatic static import from `T` in the console). We can't really tell much |
| about our distribution though because we just have vertex unique identifiers as output. To make that nicer we |
| could ask Gremlin to get us the value of the "name" property from those vertices, by supplying another `by()` |
| modulator to `group()` to transform the values. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().group().by(label).by('name') |
| ---- |
| |
| In this section, you have learned a bit more about what property graphs are and how Gremlin interacts with them. |
| You also learned how to envision Gremlin moving about a graph and how to use some of the more complex, but commonly |
| utilized traversal steps. You are now ready to think about TinkerPop in terms of its wider applicability to |
| graph computing. |
| |
| The Final Ten Minutes |
| --------------------- |
| |
| In these final ten minutes of _The TinkerPop Workout - by Gremlin_ we'll look at TinkerPop from a higher level and |
| introduce different features of the stack in order to orient you with what it offers. In this way, you can |
| identify areas of interest and dig into the details from there. |
| |
| Why TinkerPop? |
| ^^^^^^^^^^^^^^ |
| |
| image:provider-integration.png[float=right,width=350] The goal of TinkerPop, as a Graph Computing Framework, is to make it |
| easy for developers to create graph applications by providing APIs and tools that simplify their endeavors. One of |
| the fundamental aspects to what TinkerPop offers in this area lies in the fact that TinkerPop is an abstraction layer |
| over different graph databases and different graph processors. As an abstraction layer, TinkerPop provides a way to |
| avoid vendor lock-in to a specific database or processor. This capability provides immense value to developers who |
| are thus afforded options in their architecture and development because: |
| |
| * They can try different implementations using the same code to decide which is best for their environment. |
| * They can grow into a particular implementation if they so desire - start with a graph that is designed to scale |
| within a single machine and then later switch to a graph that is designed to scale horizontally. |
| * They can feel more confident in graph technology choices, as advances in the state of different provider |
| implementations are behind TinkerPop APIs, which open the possibility to switch providers with limited impact. |
| |
| TinkerPop has always had the vision of being an abstraction over different graph databases. That much |
| is not new and dates back to TinkerPop 1.x. It is in TinkerPop 3.x however that we see the introduction of the notion |
| that TinkerPop is also an abstraction over different graph processors like link:http://spark.apache.org[Spark] and |
| link:http://giraph.apache.org/[Giraph]. The scope of this tutorial does not permit it to delve into |
| "graph processors", but the short story is that the same Gremlin statement we wrote in the examples above can be |
| executed to run in distributed fashion over Spark or Hadoop. The changes required to the code to do this are not |
| in the traversal itself, but in the definition of the `TraversalSource`. You can again see why we encourage, graph |
| operations to be executed through that class as opposed to just using `Graph`. You can read more about these |
| features in this section on link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#hadoop-gremlin[hadoop-gremlin]. |
| |
| TIP: To maintain an abstraction over `Graph` creation use `GraphFactory.open()` to construct new instances. See |
| the documentation for individual `Graph` implementations to learn about the configuration options to provide. |
| |
| Loading Data |
| ^^^^^^^^^^^^ |
| |
| image:gremlin-to-the-7.png[width=100,float=left] There are many strategies for getting data into your graph. As you are just |
| getting started, let's look the more simple methods aimed at "smaller" graphs. A "small" graph, in this context, is |
| one that has less than ten million edges. The most direct way to load this data is to write a Groovy script that can |
| be executed in the Gremlin Console, a tool that you should be well familiar with at this point. For our example, let's |
| use the link:http://snap.stanford.edu/data/wiki-Vote.html[Wikipedia Vote Network] data set which contains 7,115 |
| vertices and 103,689 edges. |
| |
| [source,text] |
| ---- |
| $ curl -L -O http://snap.stanford.edu/data/wiki-Vote.txt.gz |
| $ gunzip wiki-Vote.txt.gz |
| ---- |
| |
| The data is contained in a tab-delimited structure where vertices are Wikipedia users and edges from one user to |
| another implies a "vote" relationship. Here is the script to parse the file and generate the `Graph` instance using |
| TinkerGraph: |
| |
| [source,groovy] |
| ---- |
| graph = TinkerGraph.open() |
| graph.createIndex('userId', Vertex.class) <1> |
| |
| g = graph.traversal() |
| |
| getOrCreate = { id -> |
| g.V().has('userId', id).tryNext().orElseGet{ g.addV('userId', id).next() } |
| } |
| |
| new File('wiki-Vote.txt').eachLine { |
| if (!it.startsWith("#")){ |
| (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) <2> |
| fromVertex.addEdge('votesFor', toVertex) |
| } |
| } |
| ---- |
| |
| <1> To ensure fast lookups of vertices, we need an index. The `createIndex()` method is a method native to |
| TinkerGraph. Please consult your graph databases documentation for their index creation approaches. |
| <2> We are iterating each line of the `wiki-Vote.txt` file and this line splits the line on the delimiter, then |
| uses some neat Groovy syntax to apply the `getOrCreate()` function to each of the two `userId` fields encountered in |
| the line and stores those vertices in the `fromVertex` and `toVertex` variables respectively. |
| |
| NOTE: While this is a tab-delimited structure, this same pattern can be applied |
| to any data source you require and Groovy tends to have nice libraries that can help making working with data |
| link:http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/[quite enjoyable]. |
| |
| WARNING: Take care if using a `Graph` implementation that supports |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#transactions[transactions]. As TinkerGraph does not, there is |
| no need to `commit()`. If your `Graph` does support transactions, intermediate commits during load will need to be |
| applied. |
| |
| To load larger data sets you should read about the |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#bulkloadervertexprogram[BulkLoaderVertexProgram] (BLVP), which |
| provides a generalized method for loading graphs of virtually any size. |
| |
| Gremlin Server |
| ^^^^^^^^^^^^^^ |
| |
| image:gremlin-server-protocol.png[width=325,float=right] link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#gremlin-server[Gremlin Server] |
| provides a way to remotely execute Gremlin scripts against one or more `Graph` instances hosted within it. It does |
| this by exposing different endpoints, such as link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_connecting_via_rest[REST] |
| and link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_connecting_via_java[websockets], which allow a request |
| containing a Gremlin script to be processed with results returned. |
| |
| [source,text] |
| ---- |
| $ curl -L -O https://www.apache.org/dist/incubator/tinkerpop/x.y.z/apache-gremlin-server-x.y.z-bin.zip |
| $ unzip apache-gremlin-server-x.y.z-bin.zip |
| $ cd apache-gremlin-server-x.y.z-bin.zip |
| $ bin/gremlin-server.sh conf/gremlin-server-modern.yaml |
| [INFO] GremlinServer - |
| \,,,/ |
| (o o) |
| -----oOOo-(3)-oOOo----- |
| |
| [INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-rest-modern.yaml |
| ... |
| [INFO] GremlinServer$1 - Channel started at port 8182. |
| ---- |
| |
| [source,text] |
| $ curl -X POST -d "{\"gremlin\":\"g.V(x).out().values('name')\", \"language\":\"gremlin-groovy\", \"bindings\":{\"x\":1}}" "http://localhost:8182" |
| |
| [source,json] |
| ---- |
| { |
| "requestId": "abe3be05-1e86-481a-85e0-c59ad8a37c6b", |
| "status": { |
| "message": "", |
| "code": 200, |
| "attributes": {} |
| }, |
| "result": { |
| "data": [ |
| "lop", |
| "vadas", |
| "josh" |
| ], |
| "meta": {} |
| } |
| } |
| ---- |
| |
| IMPORTANT: Take careful note of the use of "bindings" in the arguments on the request. These are variables that are |
| applied to the script on execution and is essentially a way to parameterize your scripts. This "parameterization" is |
| critical to link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_best_practices[performance]. Whenever |
| possible, parameterize your queries. |
| |
| As mentioned earlier, Gremlin Server can also be configured with a websockets endpoint. This endpoint has an |
| embedded link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_developing_a_driver[subprotocol] that allow a |
| compliant driver to communicate with it. TinkerPop supplies a |
| link:http://tinkerpop.incubator.apache.org/docs/x.y.z/#_connecting_via_java[reference driver] written in Java, but |
| there are drivers developed by third-parties for other link:http://tinkerpop.incubator.apache.org/#graph-libraries[languages] |
| such as Python, Javascript, etc. Gremlin Server therefore represents the method by which non-JVM languages can |
| interact with TinkerPop. |
| |
| Conclusion |
| ^^^^^^^^^^ |
| |
| ...and that is the end of _The TinkerPop Workout - by Gremlin_. You are hopefully feeling more confident in your |
| TinkerPop skills and have a good overview of what the stack has to offer, as well as some entry points to further |
| research within the reference documentation. Welcome to The TinkerPop! |