| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[connected-components]] |
| == Connected Components |
| |
| Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components] |
| in a graph. Consider the following graph which has three connected components: |
| |
| image:connected-components.png[width=600] |
| |
| [gremlin-groovy] |
| ---- |
| g.addV().property(id, "A").as("a"). |
| addV().property(id, "B").as("b"). |
| addV().property(id, "C").as("c"). |
| addV().property(id, "D").as("d"). |
| addV().property(id, "E").as("e"). |
| addV().property(id, "F"). |
| addE("link").from("a").to("b"). |
| addE("link").from("b").to("c"). |
| addE("link").from("d").to("e").iterate() |
| ---- |
| |
| One way to detect the various subgraphs would be to do something like this: |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). <1> |
| path().aggregate("p"). <2> |
| unfold().dedup(). <3> |
| map(__.as("v").select("p").unfold(). <4> |
| filter(unfold().where(eq("v"))). |
| unfold().dedup().order().by(id).fold()). |
| dedup() <5> |
| ---- |
| |
| <1> Iterate all vertices and repeatedly traverse over both incoming and outgoing edges (TinkerPop doesn't support |
| unidirectional graphs directly so it must be simulated by ignoring the direction with `both`). Note the use of `emit` |
| prior to `repeat` as this allows for return of a single length path. |
| <2> Aggregate the `path()` of the emitted vertices to "p". It is within these paths that the list of connected |
| components will be identified. Obviously the paths list are duplicative in the sense that they contains different |
| paths traveled over the same vertices. |
| <3> Unroll the elements in the path list with `unfold` and `dedup`. |
| <4> Use the first vertex in each path to filter against the paths stored in "p". When a path is found that has the |
| vertex in it, dedup the vertices in the path, order it by the identifier. Each path output from this `map` step |
| represents a connected component. |
| <5> The connected component list is duplicative given the nature of the paths in "p", but now that the vertices within |
| the paths are ordered, a final `dedup` will make the list of connective components unique. |
| |
| NOTE: This is a nice example of where running smaller pieces of a large Gremlin statement make it easier to see what |
| is happening at each step. Consider running this example one line at a time (or perhaps even in a step at a time) to |
| see the output at each point. |
| |
| While the above approach returns results nicely, the traversal doesn't appear to work with OLAP. A less efficient |
| approach, but one more suited for OLAP execution looks quite similar but does not use `dedup` as heavily (thus |
| `GraphComputer` is forced to analyze far more paths): |
| |
| [gremlin-groovy,existing] |
| ---- |
| g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). |
| aggregate("p").by(path()).cap("p").unfold().limit(local, 1). |
| map(__.as("v").select("p").unfold(). |
| filter(unfold().where(eq("v"))). |
| unfold().dedup().order().by(id).fold() |
| ).toSet() |
| ---- |