| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[shortest-path]] |
| == Shortest Path |
| |
| image:shortest-path.png[width=300] |
| |
| When working with a graph, it is often necessary to identify the |
| link:https://en.wikipedia.org/wiki/Shortest_path_problem[shortest path] between two identified vertices. The following |
| is a simple example that identifies the shortest path between vertex "1" and vertex "5" while traversing over out edges: |
| |
| [gremlin-groovy] |
| ---- |
| g.addV().property(id, 1).as('1'). |
| addV().property(id, 2).as('2'). |
| addV().property(id, 3).as('3'). |
| addV().property(id, 4).as('4'). |
| addV().property(id, 5).as('5'). |
| addE('knows').from('1').to('2'). |
| addE('knows').from('2').to('4'). |
| addE('knows').from('4').to('5'). |
| addE('knows').from('2').to('3'). |
| addE('knows').from('3').to('4').iterate() |
| g.V(1).repeat(out().simplePath()).until(hasId(5)).path().limit(1) <1> |
| g.V(1).repeat(out().simplePath()).until(hasId(5)).path().count(local) <2> |
| g.V(1).repeat(out().simplePath()).until(hasId(5)).path(). |
| group().by(count(local)).next() <3> |
| ---- |
| |
| <1> The traversal starts at vertex with the identifier of "1" and repeatedly traverses on out edges "until" it finds a |
| vertex with an identifier of "5". The inclusion of `simplePath` within the `repeat` is present to filter out repeated |
| paths. The traversal terminates with `limit` in this case as the first path returned will be the shortest one. Of |
| course, it is possible for there to be more than one path in the graph of the same length (i.e. two or more paths of |
| length three), but this example is not considering that. |
| <2> It might be interesting to know the path lengths for all paths between vertex "1" and "5". |
| <3> Alternatively, one might wish to do a path length distribution over all the paths. |
| |
| The following code block demonstrates how the shortest path from `v[1]` to `v[5]` can be queried in OLAP, using the `shortestPath()` step. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g = g.withComputer() |
| g.V(1).shortestPath(). |
| with(ShortestPath.edges, Direction.OUT). |
| with(ShortestPath.target, hasId(5)) |
| ---- |
| |
| The previous example defines the length of the path by the number of vertices in the path, but the "path" might also |
| be measured by data within the graph itself. The following example use the same graph structure as the previous example, |
| but includes a "weight" on the edges, that will be used to help determine the "cost" of a particular path: |
| |
| [gremlin-groovy] |
| ---- |
| g.addV().property(id, 1).as('1'). |
| addV().property(id, 2).as('2'). |
| addV().property(id, 3).as('3'). |
| addV().property(id, 4).as('4'). |
| addV().property(id, 5).as('5'). |
| addE('knows').from('1').to('2').property('weight', 1.25). |
| addE('knows').from('2').to('4').property('weight', 1.5). |
| addE('knows').from('4').to('5').property('weight', 0.25). |
| addE('knows').from('2').to('3').property('weight', 0.25). |
| addE('knows').from('3').to('4').property('weight', 0.25).iterate() |
| g.V(1).repeat(out().simplePath()).until(hasId(5)).path(). |
| group().by(count(local)).next() <1> |
| g.V(1).repeat(outE().inV().simplePath()).until(hasId(5)). |
| path().by(coalesce(values('weight'), |
| constant(0.0))). |
| map(unfold().sum()) <2> |
| g.V(1).repeat(outE().inV().simplePath()).until(hasId(5)). |
| path().by(constant(0.0)).by('weight').map(unfold().sum()) <3> |
| g.V(1).repeat(outE().inV().simplePath()).until(hasId(5)). |
| path().as('p'). |
| map(unfold().coalesce(values('weight'), |
| constant(0.0)).sum()).as('cost'). |
| select('cost','p') <4> |
| ---- |
| |
| <1> Note that the shortest path as determined by the structure of the graph is the same. |
| <2> Calculate the "cost" of the path as determined by the weight on the edges. As the "weight" data is on the edges |
| between the vertices, it is necessary to change the contents of the `repeat` step to use `outE().inV()` so that the |
| edge is included in the path. The path is then post-processed with a `by` modulator that extracts the "weight" value. |
| The traversal uses `coalesce` as there is a mixture of vertices and edges in the path and the traversal is only |
| interested in edge elements that can return a "weight" property. The final part of the traversal executes a map |
| function over each path, unfolding it and summing the weights. |
| <3> The same traversal as the one above it, but avoids the use of `coalesce` with the use of two `by` modulators. The |
| `by` modulator is applied in a round-robin fashion, so the first `by` will always apply to a vertex (as it is the first |
| item in every path) and the second `by` will always apply to an edge (as it always follows the vertex in the path). |
| <4> The output of the previous examples of the "cost" wasn't terribly useful as it didn't include which path had the |
| calculated cost. With some slight modifications given the use of `select` it becomes possible to include the path in |
| the output. Note that the path with the lowest "cost" actually has a longer path length as determined by the graph |
| structure. |
| |
| The next code block demonstrates how the `shortestPath()` step can be used in OLAP to determine the shortest weighted path. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g = g.withComputer() |
| g.V(1).shortestPath(). |
| with(ShortestPath.edges, Direction.OUT). |
| with(ShortestPath.distance, 'weight'). |
| with(ShortestPath.target, hasId(5)) |
| ---- |
| |
| The following query illustrates how `select(<traversal>)` can be used to find all shortest weighted undirected paths |
| in the modern toy graph. |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.withSack(0.0).V().as("from"). <1> |
| repeat(bothE(). |
| sack(sum). |
| by("weight"). |
| otherV(). <2> |
| where(neq("from")).as("to"). <3> |
| group("m"). <4> |
| by(select("from","to")). |
| by(sack().min()). |
| filter(project("s","x"). <5> |
| by(sack()). |
| by(select("m").select(select("from","to"))). |
| where("s", eq("x"))). |
| group("p"). <6> |
| by(select("from", "to")). |
| by(map(union(path().by("name").by("weight"), |
| sack()).fold())). |
| barrier()). |
| cap("p").unfold(). |
| order(). <7> |
| by(select(keys).select("from").id()). |
| by(select(keys).select("to").id()). |
| barrier(). |
| dedup(). <8> |
| by(select(keys).select(values).order(local).by(id)) |
| ---- |
| |
| <1> Start the traversal from all vertices with an initial sack value of 0. |
| <2> Traverse into all directions and sum up the edge weight values. |
| <3> Filter out the initial start vertex. |
| <4> For the current start and end vertex, update the minimum sack value (weighted length of the path). |
| <5> Compare the current weighted path length to the current minimum weighted path length between the 2 vertices. Eliminate traversers that found a path that is longer than the current shortest path. |
| <6> Update the path and weighted path length for the current start and end vertex pair. |
| <7> Order the output by the start vertex id and then the end vertex id (for better readability). |
| <8> Deduplicate vertex pairs (the shortest path from `v[1]` to `v[6]` is the same as the path from `v[6]` to `v[1]`). |
| |
| Again, this can be translated into an OLAP query using the `shortestPath()` step. |
| |
| [gremlin-groovy,existing] |
| ---- |
| result = g.withComputer().V(). |
| shortestPath(). |
| with(ShortestPath.distance, 'weight'). |
| with(ShortestPath.includeEdges, true). |
| filter(count(local).is(gt(1))). |
| group(). |
| by(project('from','to'). |
| by(limit(local, 1)). |
| by(tail(local, 1))). |
| unfold(). |
| order(). |
| by(select(keys).select('from').id()). |
| by(select(keys).select('to').id()).toList() |
| ---- |
| |
| The obvious difference in the result is the absence of property values in the OLAP result. Since OLAP traversers are not |
| allowed to leave the local star graph, it's not possible to have the exact same result in an OLAP query. However, the determined |
| shortest paths can be passed back into the OLTP `GraphTraversalSource`, which can then be used to query the values. |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.withSideEffect('v', []). <1> |
| inject(result.toArray()).as('kv').select(values). |
| unfold(). |
| map(unfold().as('v_or_e'). |
| coalesce(V().where(eq('v_or_e')).store('v'), |
| select('v').tail(local, 1).bothE().where(eq('v_or_e'))). |
| values('name','weight'). |
| fold()). |
| group(). |
| by(select('kv').select(keys)).unfold(). |
| order(). |
| by(select(keys).select('from').id()). |
| by(select(keys).select('to').id()).toList() |
| ---- |
| |
| <1> The side-effect `v` is used to keep track of the last processed vertex, hence it needs to be an order-preserving list. Without this explicit definition `v` would become a `BulkSet` which doesn't preserve the insert order. |