| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| [[long-traversals]] |
| == Long Traversals |
| |
| It can be tempting to generate long traversals, e.g. to create a set of vertices and edges based on information that |
| resides within an application. For example, let's consider two lists - one that contains information about persons and |
| another that contains information about the relationship between these persons. To illustrate the problem we will |
| create two list with a few random map entries. |
| |
| [gremlin-groovy] |
| ---- |
| :set max-iteration 10 |
| rnd = new Random(123) ; x = [] |
| persons = (1..100).collect {["id": it, "name": "person ${it}", "age": rnd.nextInt(40) + 20]} |
| relations = (1..500).collect {[rnd.nextInt(persons.size()), rnd.nextInt(persons.size())]}. |
| unique().grep {it[0] != it[1] && !x.contains(it.reverse())}.collect { |
| x << it |
| minAge = Math.min(persons[it[0]].age, persons[it[1]].age) |
| knowsSince = new Date().year + 1900 - rnd.nextInt(minAge) |
| ["from": persons[it[0]].id, "to": persons[it[1]].id, "since": knowsSince] |
| } |
| [ "Number of persons": persons.size() |
| , "Number of unique relationships": relations.size() ] |
| ---- |
| |
| Now, to create the `person` vertices and the `knows` edges between them it may look like a good idea to generate a |
| single graph-mutating traversal, just like this: |
| |
| [gremlin-groovy] |
| ---- |
| t = g |
| for (person in persons) { |
| t = t.addV("person"). |
| property(id, person.id). |
| property("name", person.name). |
| property("age", person.age).as("p${person.id}") |
| } ; [] |
| for (relation in relations) { |
| t = t.addE("knows").property("since", relation.since). |
| from("p${relation.from}"). |
| to("p${relation.to}") |
| } ; [] |
| traversalAsString = org.apache.tinkerpop.gremlin.process.traversal.translator.GroovyTranslator.of("g").translate(t.bytecode).getScript() ; [] |
| [ "Traversal String Length": traversalAsString.length() |
| , "Traversal Preview": traversalAsString.replaceFirst(/^(.{104}).*(.{64})$/, '$1 ... $2') ] |
| ---- |
| |
| However, this kind of traversal does not scale and it's prone to produce a `StackOverflowError`. This error can hardly be prevented |
| as it's a limit imposed by the JVM. The stack size can be increased using the `-Xss` JVM option, but that's not how the problem that's |
| discussed here, should be solved. The proper way to accomplish the same thing as in the traversal above is to inject the lists into |
| the traversal and process them from there. |
| |
| [gremlin-groovy] |
| ---- |
| g.withSideEffect("rels", relations). |
| inject(persons).sideEffect( |
| unfold(). |
| addV("person"). |
| property(id, select("id")). |
| property("name", select("name")). |
| property("age", select("age")). |
| group("m"). |
| by(id). |
| by(unfold())). |
| select("rels").unfold().as("r"). |
| addE("knows"). |
| from(select("m").select(select("r").select("from"))). |
| to(select("m").select(select("r").select("to"))). |
| property("since", select("since")).iterate() |
| g |
| ---- |
| |
| Obviously, these traversals are more complicated, but the number of steps is known and thus it's the best way to |
| prevent an unexpected `StackOverflowError`. Furthermore, shorter traversals reduce the (de)serialization costs when |
| such a traversal is send over the wire to a Gremlin Server. |
| |
| NOTE: Although the example was based on a graph-mutating traversal, the same rules apply for read-only and mixed traversals. |
| |
| [[unspecified-keys-and-labels]] |
| == Unspecified Keys and Labels |
| |
| Some Gremlin steps have optional arguments that represent keys (e.g. `elementMap()`, valueMap()`) or labels (e.g. |
| `out()`). In the prototyping phase of a projects it's often convenient to use these steps without any arguments. |
| However, in production code this is bad idea and keys and labels should always be specified. Not only does it make the |
| traversal easier to read for others, but it also ensures that the application will not break if the schema changes at |
| one point and the queries return completely different results. |
| |
| The following code block shows a few examples that are good for prototyping or graph discovery. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has("person","name","marko").out() |
| g.V().has("person","name","marko").out("created").elementMap() |
| g.V().has("software","name","ripple").inE().has("weight", gte(0.5)).outV().properties() |
| ---- |
| |
| The next code block shows the same queries, but with specified keys and labels. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.V().has("person","name","marko").out("created","knows") |
| g.V().has("person","name","marko").out("created").elementMap("name","lang") |
| g.V().has("software","name","ripple").inE("created").has("weight", gte(0.5)).outV(). |
| properties("name","age") |
| ---- |
| |
| [[unnecessary-steps]] |
| == Unnecessary Steps |
| |
| There are quite a few steps and patterns that can be combined into a much shorter form. TinkerPop is trying to optimize queries, by |
| rewriting such patterns automatically using traversal optimization strategies. These strategies, however, do have a few preconditions |
| and under certain circumstance they will not attempt to rewrite a traversal. For example, if the traversal has path computations |
| enabled (e.g. by using certain steps, such as `path()`, `simplePath()`, `otherV()`, etc.), then the assumption is that all steps are |
| required in order to produce the desired path. |
| |
| An often seen anti-pattern is the one that explicitly traverses to an edge and then to a vertex without using any filters. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().hasLabel("person").outE("created").inV().dedup() <1> |
| g.V().hasLabel("software").inE("created").outV().count() <2> |
| ---- |
| |
| <1> The `created` edge is never really needed as the traversal only asks for all things that were created by all persons in the graph. |
| These "things" are only represented by the adjacent vertices, not the edges. |
| <2> This traversals counts the persons in the graph who created software. The interesting thing about this query is that it actually |
| doesn't need to traverse all the way to the `person` vertices to count them. In this case it's sufficient to count the edges |
| between the `software` and `person` vertices. The performance of this query pretty much depends on the particular provider |
| implementation, but counting incident edges is usually much faster than counting adjacent vertices. |
| |
| The next code block shows the two aforementioned queries properly rewritten. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().hasLabel("person").out("created").dedup() |
| g.V().hasLabel("software").inE("created").count() |
| ---- |
| |
| Another anti-pattern that is commonly seen is the chaining of `where()`-steps using predicates. Consider the following traversal: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().as('a'). |
| both().where(lt('a')).by(id).as('b'). |
| both().where(lt('a')).by(id).where(gt('b')).by(id).as('c'). |
| not(both().where(eq('a'))). |
| select('a','b','c'). |
| by('name') |
| ---- |
| |
| Ignoring the anti-patterns that were discussed before, there's not much wrong with the traversal, but note the two chained `where()`-steps |
| (`where(lt('a')).by(id).where(gt('b'))).by(id)`). Both steps compare the id of the current vertex with the id of a previous vertex. These |
| two conditions can be combined on the predicate level. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.V().as('a'). |
| both().where(lt('a')).by(id).as('b'). |
| both().where(lt('a').and(gt('b'))).by(id).as('c'). |
| not(both().where(eq('a'))). |
| select('a','b','c'). |
| by('name') |
| ---- |
| |
| The `profile()` output of both queries should make clear why this is better than using two `where()`-steps. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.V().as('a'). |
| both().where(lt('a')).by(id).as('b'). |
| both().where(lt('a')).by(id).where(gt('b')).by(id).as('c'). |
| not(both().where(eq('a'))). |
| select('a','b','c'). |
| by('name'). |
| profile() |
| g.V().as('a'). |
| both().where(lt('a')).by(id).as('b'). |
| both().where(lt('a').and(gt('b'))).by(id).as('c'). |
| not(both().where(eq('a'))). |
| select('a','b','c'). |
| by('name'). |
| profile() |
| ---- |
| |
| [[unspecified-label-in-global-vertex-lookup]] |
| == Unspecified Label in Global Vertex lookup |
| |
| The severity of the anti-pattern described in this section heavily depends on the provider implementation. Throughout the TinkerPop |
| documentation the code samples often use traversals that start like this: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko') |
| ---- |
| |
| This is totally fine for TinkerGraph as it uses a very simplified indexing schema, e.g. every vertex that has a certain property is stored in |
| the same index. However, providers may prefer to use separate indexes for different vertex labels. This becomes more important as graphs grow |
| much larger over time (which is not what TinkerGraph is meant to do). Hence, any traversal that's going to be used in production code should |
| also specify the vertex label to prevent the query engine from searching every index for the provided property value. |
| |
| The easy fix for the initially mentioned query follows in the code block below. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.V().hasLabel('person').has('name','marko') <1> |
| g.V().has('person','name','marko') <2> |
| ---- |
| |
| <1> With the specified label the traversal still returns the same result, but it's much safer to use across different providers. |
| <2> Same as statement 1, but a much shorter form to improve readability. |
| |
| [[steps-instead-of-tokens]] |
| == Steps Instead of Tokens |
| |
| NOTE: As of 3.5.0, `ByModulatorOptimizationStrategy` is present to automatically translate this anti-pattern to their |
| more performant versions for most cases however, it is still best to write Gremlin according to the contents that follow. |
| |
| When child traversals contain a single step, there's a good chance that the step can be replaced with a token. These |
| tokens are translated into optimized traversals that execute much faster then their step traversal pendants. A few |
| examples of single step child traversals are shown in the following code block. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().groupCount().by(label()) |
| g.V().group().by(label()).by(id().fold()) |
| g.V().project("id","label"). |
| by(id()). |
| by(label()) |
| g.V().choose(label()). |
| option("person", project("person").by(values("name"))). |
| option("software", project("product").by(values("name"))) |
| ---- |
| |
| With tokens used instead of steps the traversals become a little shorter and more readable. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.V().groupCount().by(label) |
| g.V().group().by(label).by(id) <1> |
| g.V().project("id","label"). |
| by(id). |
| by(label) |
| g.V().choose(label). |
| option("person", project("person").by("name")). |
| option("software", project("product").by("name")) <2> |
| ---- |
| |
| <1> Note, that tokens use a `fold()` reducer by default. |
| <2> `by("name")` doesn't use a token, but falls into the same category as the String `"name"` is translated into an optimized traversal. |
| |
| [has-traversal] |
| == has() and Traversal Arguments |
| |
| There is an understandable assumption that the `has(String,Traversal)` overload indicates that the value returned by |
| the `Traversal` argument will be used as the comparative value for the specified property key. There are often similar |
| assumptions that values of `P` can take a `Traversal` argument to achieve a similar end as in |
| `has(String, eq(Traversal))`. Unfortunately, neither of these work as assumed. |
| |
| Starting with the latter issue of `P` and `Traversal` it should be noted that while `P` values take `Object` and thus |
| a `Traversal` it does not mean the `Traversal` will be resolved to a result that will be comparable. `P` will rather |
| do a compare on the raw `Traversal` object which of course will always return `false` (unless for some odd reason you |
| happen to store that `Traversal` object in your graph): |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name', eq(constant('josh'))) |
| eq(constant('josh')) |
| ---- |
| |
| As for the former issue with `has(String,Traversal)`, this requires a bit more explanation. The `Traversal` object is |
| meant to be treated as a `Predicate`, meaning that if it returns a value the `has()` will allow the traverser to pass: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name', constant('josh')) <1> |
| g.V().has('name', constant('josh').is('xyz')) <2> |
| ---- |
| |
| <1> `constant()` always returns a value so all vertices pass through the `has()` |
| <2> By adding `is()` this `Traversal` will no longer return a value so no vertices pass through the `has()` |
| |
| These examples are a bit contrived for sake of demonstration, but the common pattern folks attempt appears as follows: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.withSideEffect('x',[name: 'josh']).V().has('name', select('x').select('name')) |
| ---- |
| |
| The above example represents a commonly seen mistake where we try to dynamically inject the value "josh" from a |
| `Map` stored in a side-effect named "x". As we can see, since `select('x').select('name')` returns a value the `has()` |
| succeeds for every single vertex which is unexpected. The correct way to do this dynamic injection is with `where()` |
| as in the following example: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.withSideEffect('x',[name: 'josh']).V().as('a').where('a',eq('x')).by('name') |
| ---- |
| |
| As a final note on this topic, it's worth noting how `has(String,Traversal)` can be used. Note that the traverser that |
| starts the `Traversal` argument is the `Property` value being compared. Therefore, if we wanted to find all the |
| vertices that had the "name" of "josh" we would do: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name', is('josh')) |
| ---- |
| |