blob: 1e25740ffe542504b9480ac40b643eeb92d8260b [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[cycle-detection]]
== Cycle Detection
A cycle occurs in a graph where a path loops back on itself to the originating vertex. For example, in the graph
depicted below Gremlin could be use to detect the cycle among vertices `A-B-C`.
image:graph-cycle.png[width=250]
[gremlin-groovy]
----
g.addV().property(id,'a').as('a').
addV().property(id,'b').as('b').
addV().property(id,'c').as('c').
addV().property(id,'d').as('d').
addE('knows').from('a').to('b').
addE('knows').from('b').to('c').
addE('knows').from('c').to('a').
addE('knows').from('a').to('d').
addE('knows').from('c').to('d').iterate()
g.V().as('a').repeat(out().simplePath()).times(2).
where(out().as('a')).path() <1>
g.V().as('a').repeat(out().simplePath()).times(2).
where(out().as('a')).path().
dedup().by(unfold().order().by(id).dedup().fold()) <2>
----
<1> Gremlin starts its traversal from a vertex labeled "a" and traverses `out()` from each vertex filtering on the
`simplePath`, which removes paths with repeated objects. The steps going `out()` are repeated twice as in this case
the length of the cycle is known to be three and there is no need to exceed that. The traversal filters with a
`where()` to see only return paths that end with where it started at "a".
<2> The previous query returned the `A-B-C` cycle, but it returned three paths which were all technically the same
cycle. It returned three, because there was one for each vertex that started the cycle (i.e. one for `A`, one for `B`
and one for `C`). This next line introduce deduplication to only return unique cycles.
The above case assumed that the need was to only detect cycles over a path length of three.
It also respected the directionality of the edges by only considering outgoing ones.
Also note that the traversal above won't detect self-loops (vertices directly connected to
themselves). To do so, you would need to `.emit()` a Traverser before the repeat()-loop.
[gremlin-groovy]
----
g.addV().property(id,'a').as('a').
addV().property(id,'b').as('b').
addV().property(id,'c').as('c').
addV().property(id,'d').as('d').
addE('knows').from('a').to('b').
addE('knows').from('b').to('c').
addE('knows').from('c').to('a').
addE('knows').from('a').to('d').
addE('knows').from('c').to('d').
addE('self').from('a').to('a').iterate()
g.V().as('a').
emit().
repeat(outE().inV().simplePath()).
times(2).
outE().inV().where(eq('a')).
path().
by(id).
by(label)
----
What would need to change to detect cycles of arbitrary length over both incoming and
outgoing edges, in the modern graph?
[gremlin-groovy,modern]
----
g.V().as('a').repeat(both().simplePath()).emit(loops().is(gt(1))).
both().where(eq('a')).path().
dedup().by(unfold().order().by(id).dedup().fold())
----
An interesting type of cycle is known as the Eulerian circuit which is a path taken in a graph where each edge is
visited once and the path starts and ends with the same vertex. Consider the following graph, representative of an
imaginary but geographically similar link:https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg[Königsberg]
that happens to have an eighth bridge (the diagram depicts edge direction but direction won't be considered in the traversal):
image:eulerian-circuit.png[width=500]
Gremlin can detect if such a cycle exists with:
[gremlin-groovy]
----
g.addV().property(id, 'blue').as('b').
addV().property(id, 'orange').as('o').
addV().property(id, 'red').as('r').
addV().property(id, 'green').as('g').
addE('bridge').from('g').to('b').
addE('bridge').from('g').to('o').
addE('bridge').from('g').to('r').
addE('bridge').from('g').to('r').
addE('bridge').from('o').to('b').
addE('bridge').from('o').to('b').
addE('bridge').from('o').to('r').
addE('bridge').from('o').to('r').iterate()
g.V().sideEffect(outE("bridge").aggregate("bridges")).barrier(). <1>
repeat(bothE(). <2>
or(__.not(select('e')),
__.not(filter(__.as('x').select(all, 'e').unfold(). <3>
where(eq('x'))))).as('e').
otherV()).
until(select(all, 'e').count(local).as("c"). <4>
select("bridges").count(local).where(eq("c"))).hasNext()
----
<1> Gather all the edges in a "bridges" side effect.
<2> As mentioned earlier with the diagram, directionality is ignored as the traversal uses `bothE` and, later, `otherV`.
<3> In continually traversing over both incoming and outgoing edges, this path is only worth continuing if the edges
traversed thus far are only traversed once. That set of edges is maintained in "e".
<4> The traversal should repeat until the number of edges traversed in "e" is equal to the total number gathered in
the first step above, which would mean that the complete circuit has been made.
Unlike Königsberg, with just seven bridges, a Eulerian circuit exists in the case with an eighth bridge. The first
detected circuit can be displayed with:
[gremlin-groovy,existing]
----
g.V().sideEffect(outE("bridge").aggregate("bridges")).barrier().
repeat(bothE().or(__.not(select('e')),
__.not(filter(__.as('x').select(all, 'e').unfold().
where(eq('x'))))).as('e').otherV()).
until(select(all, 'e').count(local).as("c").
select("bridges").count(local).where(eq("c"))).limit(1).
path().by(id).by(constant(" -> ")).
map {String.join("", it.get().objects())}
----