| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[collections]] |
| == Collections |
| |
| image:gremlin-collections.png[width=400] |
| |
| Lists and maps form the basis for much of the processing in Gremlin traversals. They are core to how side-effects |
| are typically held and how results are generally produced. Being able to pick them apart and reformat them is sometimes |
| required. This need to shape the data within a traversal may arise both at the |
| link:https://tinkerpop.apache.org/docs/current/reference/#terminal-steps[terminal step] of the traversal (technically |
| just prior to the terminal step) or in the middle of a traversal. Considering the former, a transformation just prior |
| to iteration will get the result into the form required by the application which would remove the need for additional |
| application level manipulation. Moreover, a transformation at this stage may reduce the size of the payload being |
| returned which could be useful in remote applications. Examining the latter, there may be times where a `List` or `Map` |
| requires some mid-traversal transformation so as to continue with the general logic of the traversal itself. For |
| example, a traversal at some point might produce a `Map` of `List` objects where the lists contain vertices, where each |
| `List` might need to be sorted by some criteria and then the top item for each extracted to become the basis for the |
| continued traversal. Executing transformations for either of these types of situations can be made possible with the |
| patterns described in this section. |
| |
| The appearance of a `List` as a traverser in Gremlin usually arises as a result of a `fold()` operation, but may also |
| appear by way of some side-effect steps like `store()`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().fold() |
| g.V().store('a').cap('a') |
| ---- |
| |
| It is worth noting that while a `Path` is not technically a `List` it does present like one and can be manipulated in |
| similar fashion to lists: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().out().out().path() |
| ---- |
| |
| These examples are obviously trivial and there are other ways that a traverser might end up in a `List` form, but, at |
| this moment, the point here is to focus less on how to get a `List` and more on how to manipulate one within the |
| Gremlin language. The examples going forward will also be similarly contrived insofar as producing a usable `List` to |
| manipulate. Bear in mind that it may be quite possible to get the same end results of these examples using more direct |
| means than what is demonstrated. |
| |
| It may seem simple, but the most obvious choice to modifying what is in a list is to simply `unfold()` the `List`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().fold().unfold().values('name') |
| g.V().store('a').cap('a').unfold().values('name') |
| ---- |
| |
| The above examples show that `unfold()` works quite well when you don't want to preserve the `List` structure of the |
| traverser as it just flattens `List` traversers to the traversal stream. The above examples only have one `List` as a |
| result, but consider what happens when there is more than one: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().union(fold(),fold()) |
| g.V().union(fold(),fold()).unfold().values('name') |
| ---- |
| |
| The two separate `List` traversers are flattened to a single traversal stream and all the results are mixed together. |
| While this approach may be acceptable, there are many cases where it might not be so. To preserve the individual |
| structure of the `List` traversers "locally" `unfold()` the lists to transform them: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| union(fold(),fold()). |
| local(unfold().values('name').fold()) |
| ---- |
| |
| The call to `local()` executes its anonymous sub-traversal over each individual `List` iterator and as the |
| sub-traversal ends with a `fold()`-step, the results are reduced back into a `List` to preserve the original structure, |
| thus maintaining two traverser results. |
| |
| This pattern for unfolding and folding `List` traversers ends up having other applications: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().union(limit(3).fold(),tail(3).fold()) <1> |
| g.V().union(limit(3).fold(),tail(3).fold()). |
| local(unfold(). <2> |
| order(). |
| by(bothE().count(),desc). |
| limit(1). |
| fold()) |
| g.V().union(limit(3).fold(),tail(3).fold()). <3> |
| local(unfold(). |
| has('age',gte(29)). |
| values('age'). |
| mean()) |
| ---- |
| |
| <1> The output consists of two `List` traversers. |
| <2> For each `List` of vertices, order them by their number of edges, and choose the first one which will be the one |
| with the highest degree (i.e. number of edges). By ending with `fold()` the `List` traverser structure is preserved |
| thus returning two `List` objects. Consider this a method for choosing a "max" or a highly ranked vertex. In this case |
| the rank was determined by the number of edges, but it could have just as easily been determined by a vertex property, |
| edge property, a calculated value, etc. - one simply needs to alter the `by()`-step modulator to `order()`. |
| <3> For each `List` of vertices, filter that `List` to only include vertices that have an "age" property with a |
| value greater than or equal to "29" and then average the results of each list. More generally, consider how this |
| approach performs some kind of reducing calculation on each `List` traverser. In this case, an average was calculated, |
| but it might also have been a `sum()`, `count()` or similar operation that reduced the list to a single calculated |
| value. |
| |
| So far, this section has focused on what to do with a `List` traverser once there is one present and there have been |
| fairly contrived examples for how to produce one in the first place. The use of `fold()` has been used most frequently |
| at this point to achieve list creation and that step should be recalled whenever there is a need to reduce some |
| traversal stream to an actual `List`. Of course, it may become necessary to more manually construct a `List`, |
| especially in cases where the expected output of the traversal is composed of one or more ordered results in the |
| form of a `List`. For example, consider the following three traversals: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').values('age') <1> |
| g.V().has('name','marko'). <2> |
| repeat(out()). |
| until(has('lang','java')). |
| path(). |
| by('name') |
| g.V().has('name','marko'). <3> |
| repeat(outE().inV()). |
| until(has('lang','java')). |
| path(). |
| local(unfold(). |
| has('weight'). |
| values('weight'). |
| mean()) |
| ---- |
| |
| <1> Get the age of "marko" |
| <2> get the "name" values of the vertices in the collected paths that traverse out from "marko" to any vertex with |
| the "lang" of "java". |
| <3> Get the average of the "weight" values of edges in the collected paths that traverse out from "marko" to any vertex |
| with the "lang" of "java". Note the use of the earlier defined pattern that used `local()` in conjunction with |
| `unfold()`. In this case it filters out vertices from the `Path` as they are not relevant as the concern is only with |
| the "weight" property on the edges. |
| |
| For purposes of this example, the three traversals above happen to represent three pieces of data that are required by |
| an application. It is plain to note that all of the above traversals hold a similar pattern that starts with |
| "getting 'marko'" and, in the case of the latter two, traversing on outgoing edges away from him and collecting data |
| from that path. Ideally, all three of these traversals should execute as one to prevent having to submit three separate |
| traversals, thus incurring additional query execution costs for what amounts to be largely the same underlying data but |
| with different transformations applied. The goal here would be to return the results of this data as a `List` with |
| three results (i.e. triple) that could then be submitted once by the application. The following example demonstrates |
| the use of `store()` to aid in construction of this `List`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').as('v'). <1> |
| store('a'). <2> |
| by('age'). |
| repeat(outE().as('e').inV().as('v')). <3> |
| until(has('lang','java')). |
| aggregate('b'). <4> |
| by(select(all,'v').unfold().values('name').fold()). |
| aggregate('c'). <5> |
| by(select(all,'e').unfold().values('weight').mean()). |
| fold(). <6> |
| store('a'). <7> |
| by(cap('b')). |
| store('a'). <8> |
| by(cap('c')). |
| cap('a') |
| ---- |
| |
| <1> Get the "marko" vertex and label that step as "v". |
| <2> Store the first "age" of "marko" as the first item in the `List` called "a", which will ultimately be the result. |
| <3> Execute the traversal away from "marko" and continue to traverse on outgoing edges until the vertex has the value |
| of "java" for the "lang" property. Note the labels of "e" and "v". Note that "e" will contain a `List` of all of the |
| edges that have been traversed and "v" will contain a `List` of all the vertices that have been traversed. |
| <4> The incoming traverser to `aggregate('b')` are vertices that terminate the `repeat()` (i.e. those with the "lang" |
| of "java"). Note however that the `by()` modulator overrides that traverser completely by starting a fresh stream of |
| the list of vertices in "v". Those vertices are unfolded to retrieve the name property from each and then are reduced |
| with `fold()` back into a list to be stored in the side-effected named "b". |
| <5> A similar use of `aggregate()` as the previous step, though this one turns "e" into a stream of edges to calculate |
| the `mean()` to store in a `List` called "c". Note that `aggregate()` was used here instead of `store()`, as |
| the former is an eager collection of the elements in the stream (`store()` is lazy) and will force the traversal to be |
| iterated up to that point before moving forward. Without that eager collection, "v" and "e" would not contain the |
| complete information required for the production of "b" and "c". |
| <6> Adding `fold()`-step here is a bit of a trick. To see the trick, copy and paste all lines of Gremlin up to but |
| not including this `fold()`-step and run them against the "modern" graph. The output is three vertices and if the |
| `profile()`-step was added one would also see that the traversal contained three traversers. These three traversers |
| with a vertex in each one were produced from the `repeat()`-step (i.e. those vertices that had the "lang" of "java" |
| when traversing away from "marko"). The `aggregate()`-steps are side-effects and just allow the traversers to pass |
| through them unchanged. The `fold()` obviously converts those three traversers to a single `List` to make one |
| traverser with a `List` inside. That means that the remaining steps following the `fold()` will only be executed one |
| time each instead of three, which, as will be shown, is critical to the proper result. |
| <7> The single traverser with the `List` of three vertices in it passes to `store()`. The `by()` modulator presents an |
| override telling Gremlin to ignore the `List` of three vertices and simply grab the "b" side effect created earlier and |
| stick that into "a" as part of the result. The `List` with three vertices passes out unchanged as `store()` is a |
| side-effect step. |
| <8> Again, the single traverser with the `List` of three vertices passes to `store()` and again, the `by()` modulator |
| presents an override to include "c" into the result. |
| |
| All of the above code and explanation show that `store()` can be used to construct `List` objects as side-effects |
| which can then be used as a result. Note that `aggregate()` can be used to similar effect, should it make sense that |
| lazy `List` creation is not acceptable with respect to the nature of the traversal. An interesting sub-pattern that |
| emerges here is that the `by()`-step can modulate its step to completely override the current traverser and ignore its |
| contents for purpose of that step. This ability to override a traverser acts as a powerful and flexible tool as it |
| means that each traverser can effectively become a completely different object as determined by a sub-traversal. |
| |
| Another interesting method for `List` creation was demonstrated a bit earlier but not examined in detail - the use of |
| `union()`. It was shown earlier in the following context where it helped create a `List` of two lists of three |
| vertices each: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().union(limit(3).fold(),tail(3).fold()) |
| ---- |
| |
| By folding the results of `union()`, it becomes possible to essentially construct lists with arbitrary traversal |
| results. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| local(union(identity(), <1> |
| bothE().count()). |
| fold()) |
| g.V(). |
| store('x'). |
| by(union(select('x').count(local), <2> |
| identity(), |
| bothE().count()). |
| fold()). |
| cap('x') |
| ---- |
| |
| <1> For each vertex, create a "pair" (i.e. a `List` of two objects) of the vertex itself and its edge count. |
| <2> For each vertex, create a "triple" (i.e. a `List`of three objects) of the index of the vertex (starting at zero), |
| the vertex itself and its edge count. |
| |
| The pattern here is to use `union()` in conjunction with `fold()`. As explained earlier, the `fold()` operation reduces |
| the stream from `union()` to a single `List` that is then fed forward to the next step in the traversal. |
| |
| Now that `List` patterns have been explained, there can now be some attention on `Map`. One of the most common ways |
| to end up with a `Map` is with `valueMap()`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko').valueMap('name','age') |
| ---- |
| |
| The problem is that unless the graph is making use of multi-properties, there is little need to have the value of each |
| property stored as a `List`. One way to unwrap this value from the list is to avoid having it there in the first place |
| by avoiding use of `valueMap()`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko'). |
| local(properties('name','age'). |
| group().by(key()).by(value())) |
| ---- |
| |
| Interestingly, it's worth looking at how to process the output of `valueMap()` to attain this output as the approach is |
| generally applicable to processing any `Map` instances with any sorts of values: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().has('name','marko'). |
| valueMap('name','age'). |
| unfold(). |
| group(). |
| by(keys). |
| by(select(values).unfold()) |
| ---- |
| |
| The code above, basically deconstructs then reconstructs the `Map`. The key to the pattern is to first `unfold()` the |
| `Map` into its key and value entries. Then for each key and value produce a new `Map` using `group()` where the key |
| for that map is the key of the entry (those are obviously unique as you picked them out of the `valueMap()`) and the |
| value is simply the `unfold()` of the list of values in each entry. Recall that the `select(values).unfold()` only |
| returns one value (i.e. the first) not only because there is only one, but also because `by()` will only call `next()` |
| on that sub-traversal (it does not iterate the entire thing). |
| |
| Generally speaking, a `Map` constructed as part of `group()` or `project()` will already be in the form required as |
| the `by()` modulators would be written in such a fashion as to produce that final output. It would be unnecessary to |
| deconstruct/reconstruct it. Be certain that there isn't a way to re-write the `group()` or `project()` to get the |
| desired output before taking this approach. |
| |
| In the following case, `project()` is used to create a `Map` that does not meet this requirement as it contains some |
| unavoidable extraneous keys in the output `Map`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| project('name','age','lang'). |
| by('name'). |
| by(coalesce(values('age'),constant('n/a'))). |
| by(coalesce(values('lang'),constant('n/a'))) |
| ---- |
| |
| The use of `coalesce()` works around the problem where "age" and "lang" are not necessarily property keys present on |
| every single vertex in the traversal stream. When the "age" or "lang" are not present, the constant of "n/a" is |
| supplied. While this may be an acceptable output, it is possible to shape the `Map` to be "nicer": |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| project('name','age','lang'). |
| by('name'). |
| by(coalesce(values('age'),constant('n/a'))). |
| by(coalesce(values('lang'),constant('n/a'))). |
| local(unfold(). |
| filter(select(values).is(P.neq('n/a'))). |
| group(). |
| by(keys). |
| by(values)) |
| ---- |
| |
| The additional steps above `unfold()` the `Map` to key-value entries and filter the values for "n/a" and remove them |
| prior to reconstructing the `Map` with the method shown earlier. To go a step further, apply the pattern presented |
| earlier to flatten `List` values within a `Map`: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| project('name','age','lang'). |
| by('name'). |
| by(coalesce(values('age'),constant('n/a'))). |
| by(coalesce(values('lang'),constant('n/a'))). |
| local(unfold(). |
| filter(select(values).is(P.neq('n/a'))). |
| group(). |
| by(keys). |
| by(select(values).unfold())) |
| ---- |
| |
| As there may be a desire to remove entries from a `Map`, there may also be the need to add keys to a `Map`. The pattern |
| here involves the use of a `union()` that returns the `Map` instances which can be flattened to entries and then |
| reconstructed as a new `Map` that has been merged together: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| has('name','marko'). |
| union(project('degree'). <1> |
| by(bothE().count()), |
| valueMap(true)). |
| unfold(). <2> |
| group(). |
| by(keys). |
| by(select(values).unfold()) |
| ---- |
| |
| <1> The `valueMap(true)` of a `Vertex` can be extended with the "degree" of the `Vertex` by performing a `union()` of |
| the two traversals that produce that output (both produce `Map` objects). |
| <2> The `unfold()`-step is used to decompose the `Map` objects into key/value entries that are then constructed back |
| into a single new `Map` given the patterns shown earlier. The `Map` objects of both traversals in the `union()` are |
| essentially merged together. |
| |
| When using this pattern, it is important to recognize that if there are non-unique keys produced by the traversals |
| supplied to `union()`, they will overwrite one another given the final `by()` modulator above. If changed to |
| `by(select(values).unfold().fold())` they will merge to produce a `List` of values. Of course, that change will bring |
| a `List` back for all the values of the new `Map`. With some added logic the `Map` values can be flattened out of |
| `List` instances when necessary: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V(). |
| has('name','marko'). |
| union(valueMap(true), |
| project('age'). |
| by(constant(100))). |
| unfold(). |
| group(). |
| by(keys). |
| by(select(values). |
| unfold(). |
| fold(). |
| choose(count(local).is(eq(1)), unfold())) |
| ---- |