| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[neo4j-gremlin]] |
| == Neo4j-Gremlin |
| |
| [source,xml] |
| ---- |
| <dependency> |
| <groupId>org.apache.tinkerpop</groupId> |
| <artifactId>neo4j-gremlin</artifactId> |
| <version>x.y.z</version> |
| </dependency> |
| <!-- neo4j-tinkerpop-api-impl is NOT Apache 2 licensed - more information below --> |
| <dependency> |
| <groupId>org.neo4j</groupId> |
| <artifactId>neo4j-tinkerpop-api-impl</artifactId> |
| <version>0.7-3.2.3</version> |
| </dependency> |
| ---- |
| |
| link:http://neo4j.com[Neo4j, Inc.] are the developers of the OLTP-based link:http://neo4j.com[Neo4j graph database]. |
| |
| WARNING: Unless under a commercial agreement with Neo4j, Inc., Neo4j is licensed |
| link:http://en.wikipedia.org/wiki/Affero_General_Public_License[AGPL]. The `neo4j-gremlin` module is licensed Apache2 |
| because it only references the Apache2-licensed Neo4j API (not its implementation). Note that neither the |
| <<gremlin-console,Gremlin Console>> nor <<gremlin-server,Gremlin Server>> distribute with the Neo4j implementation |
| binaries. To access the binaries, use the `:install` command to download binaries from |
| link:http://search.maven.org/[Maven Central Repository]. |
| |
| TIP: For configuring Grape, the dependency resolver of Groovy, please refer to the <<gremlin-applications,Gremlin Applications>> section. |
| |
| [source,groovy] |
| ---- |
| gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z |
| ==>Loaded: [org.apache.tinkerpop, neo4j-gremlin, x.y.z] - restart the console to use [tinkerpop.neo4j] |
| gremlin> :q |
| ... |
| gremlin> :plugin use tinkerpop.neo4j |
| ==>tinkerpop.neo4j activated |
| gremlin> graph = Neo4jGraph.open('/tmp/neo4j') |
| ==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]] |
| ---- |
| |
| TIP: To host Neo4j in <<gremlin-server,Gremlin Server>>, the dependencies must first be "installed" or otherwise |
| copied to the Gremlin Server path. The automated method for doing this would be to execute |
| `bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin x.y.z`. Once installed, the Gremlin Server |
| configuration file must be edited to include the `Neo4jGremlinPlugin` as shown in `conf/gremlin-server.neo4j`. |
| |
| === Indices |
| |
| Neo4j 2.x indices leverage vertex labels to partition the index space. TinkerPop does not provide method interfaces |
| for defining schemas/indices for the underlying graph system. Thus, in order to create indices, it is important to |
| call the Neo4j API directly. |
| |
| NOTE: `Neo4jGraphStep` will attempt to discern which indices to use when executing a traversal of the form `g.V().has()`. |
| |
| The Gremlin-Console session below demonstrates Neo4j indices. For more information, please refer to the Neo4j documentation: |
| |
| * Manipulating indices with link:http://neo4j.com/docs/developer-manual/current/#query-schema-index[Cypher]. |
| * Manipulating indices with the Neo4j link:http://neo4j.com/docs/stable/tutorials-java-embedded-new-index.html[Java API]. |
| |
| [gremlin-groovy] |
| ---- |
| graph = Neo4jGraph.open('/tmp/neo4j') |
| g = traversal().withEmbedded(graph) |
| graph.cypher("CREATE INDEX ON :person(name)") |
| graph.tx().commit() <1> |
| g.addV('person').property('name','marko') |
| g.addV('dog').property('name','puppy') |
| g.V().hasLabel('person').has('name','marko').values('name') |
| graph.close() |
| ---- |
| |
| <1> Schema mutations must happen in a different transaction than graph mutations |
| |
| Below demonstrates the runtime benefits of indices and demonstrates how if there is no defined index (only vertex |
| labels), a linear scan of the vertex-label partition is still faster than a linear scan of all vertices. |
| |
| [gremlin-groovy] |
| ---- |
| graph = Neo4jGraph.open('/tmp/neo4j') |
| g = traversal().withEmbedded(graph) |
| g.io('data/grateful-dead.xml').read().iterate() |
| g.tx().commit() |
| clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} <1> |
| graph.cypher("CREATE INDEX ON :artist(name)") <2> |
| g.tx().commit() |
| Thread.sleep(5000) <3> |
| clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} <4> |
| clock(1000) {g.V().has('name','Garcia').iterate()} <5> |
| graph.cypher("DROP INDEX ON :artist(name)") <6> |
| g.tx().commit() |
| graph.close() |
| ---- |
| |
| <1> Find all artists whose name is Garcia which does a linear scan of the artist vertex-label partition. |
| <2> Create an index for all artist vertices on their name property. |
| <3> Neo4j indices are eventually consistent so this stalls to give the index time to populate itself. |
| <4> Find all artists whose name is Garcia which uses the pre-defined schema index. |
| <5> Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph. |
| <6> Drop the created index. |
| |
| === Cypher |
| |
| image::gremlin-loves-cypher.png[width=400] |
| |
| NeoTechnology are the creators of the graph pattern-match query language link:https://neo4j.com/developer/cypher-query-language/[Cypher]. |
| It is possible to leverage Cypher from within Gremlin by using the `Neo4jGraph.cypher()` graph traversal method. |
| |
| [gremlin-groovy] |
| ---- |
| graph = Neo4jGraph.open('/tmp/neo4j') |
| g = traversal().withEmbedded(graph) |
| g.io('data/tinkerpop-modern.kryo').read().iterate() |
| graph.cypher('MATCH (a {name:"marko"}) RETURN a') |
| graph.cypher('MATCH (a {name:"marko"}) RETURN a').select('a').out('knows').values('name') |
| graph.close() |
| ---- |
| |
| Thus, like <<match-step,`match()`>>-step in Gremlin, it is possible to do a declarative pattern match and then move |
| back into imperative Gremlin. |
| |
| TIP: For those developers using <<gremlin-server,Gremlin Server>> against Neo4j, it is possible to do Cypher queries |
| by simply placing the Cypher string in `graph.cypher(...)` before submission to the server. |
| |
| === Multi-Label |
| |
| TinkerPop requires every `Element` to have a single, immutable string label (i.e. a `Vertex`, `Edge`, and |
| `VertexProperty`). In Neo4j, a `Node` (vertex) can have an |
| link:http://neo4j.com/docs/developer-manual/current/#graphdb-neo4j-labels[arbitrary number of labels] while a `Relationship` |
| (edge) can have one and only one. Furthermore, in Neo4j, `Node` labels are mutable while `Relationship` labels are |
| not. In order to handle this mismatch, three `Neo4jVertex` specific methods exist in Neo4j-Gremlin. |
| |
| [source,java] |
| public Set<String> labels() // get all the labels of the vertex |
| public void addLabel(String label) // add a label to the vertex |
| public void removeLabel(String label) // remove a label from the vertex |
| |
| An example use case is presented below. |
| |
| [gremlin-groovy] |
| ---- |
| graph = Neo4jGraph.open('/tmp/neo4j') |
| g = traversal().withEmbedded(graph) |
| vertex = (Neo4jVertex) g.addV('human::animal').next() <1> |
| vertex.label() <2> |
| vertex.labels() <3> |
| vertex.addLabel('organism') <4> |
| vertex.label() |
| vertex.removeLabel('human') <5> |
| vertex.labels() |
| vertex.addLabel('organism') <6> |
| vertex.labels() |
| vertex.removeLabel('human') <7> |
| vertex.label() |
| g.V().has(label,'organism') <8> |
| g.V().has(label,of('organism')) <9> |
| g.V().has(label,of('organism')).has(label,of('animal')) |
| g.V().has(label,of('organism').and(of('animal'))) |
| graph.close() |
| ---- |
| |
| <1> Typecasting to a `Neo4jVertex` is only required in Java. |
| <2> The standard `Vertex.label()` method returns all the labels in alphabetical order concatenated using `::`. |
| <3> `Neo4jVertex.labels()` method returns the individual labels as a set. |
| <4> `Neo4jVertex.addLabel()` method adds a single label. |
| <5> `Neo4jVertex.removeLabel()` method removes a single label. |
| <6> Labels are unique and thus duplicate labels don't exist. |
| <7> If a label that does not exist is removed, nothing happens. |
| <8> `P.eq()` does a full string match and should only be used if multi-labels are not leveraged. |
| <9> `LabelP.of()` is specific to `Neo4jGraph` and used for multi-label matching. |
| |
| IMPORTANT: `LabelP.of()` is only required if multi-labels are leveraged. `LabelP.of()` is used when |
| filtering/looking-up vertices by their label(s) as the standard `P.eq()` does a direct match on the `::`-representation |
| of `vertex.label()` |
| |
| === Configuration |
| |
| The previous examples showed how to create a `Neo4jGraph` with the default configuration, but Neo4j has many other |
| options to initialize it that are native to Neo4j. In order to expose those, `Neo4jGraph` has an `open(Configuration)` |
| method which takes a standard Apache Configuration object. The same can be said of the standard method for creating |
| `Graph` instances with `GraphFactory`. Each configuration key that Neo4j has must simply be prefixed with |
| `gremlin.neo4j.conf.` and the suffix configuration key will be passed through to Neo4j. |
| |
| NOTE: Gremlin Server uses `GraphFactory` to instantiate the `Graph` instances it manages, so the example below is also |
| relevant for that purpose as well. |
| |
| For example, a standard configuration file called `neo4j.properties` that sets the Neo4j |
| `dbms.index_sampling.background_enabled` setting might look like: |
| |
| [source,properties] |
| ---- |
| gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph |
| gremlin.neo4j.directory=/tmp/neo4j |
| gremlin.neo4j.conf.dbms.index_sampling.background_enabled=true |
| ---- |
| |
| which can then be used as follows: |
| |
| [source,text] |
| ---- |
| gremlin> graph = GraphFactory.open('neo4j.properties') |
| ==>neo4jgraph[community single [/tmp/neo4j]] |
| gremlin> g = traversal().withEmbedded(graph) |
| ==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard] |
| ---- |
| |
| Having this ability to set standard Neo4j configurations makes it possible to better control the initialization of |
| Neo4j itself and provides the ability to enable certain features that would not otherwise be accessible. |
| |
| === Bolt Configuration |
| |
| While `Neo4jGraph` enables Gremlin based queries, users may find it helpful to also be able to connect to that graph |
| with native Neo4j drivers and other tools from that space. It is possible to enable the |
| link:https://boltprotocol.org/[Bolt Protocol] as a way to do this: |
| |
| [source,properties] |
| ---- |
| gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph |
| gremlin.neo4j.directory=/tmp/neo4j |
| gremlin.neo4j.conf.dbms.connector.0.type=BOLT |
| gremlin.neo4j.conf.dbms.connector.0.enabled=true |
| gremlin.neo4j.conf.dbms.connector.0.address=localhost:7687 |
| ---- |
| |
| This configuration is especially relevant to Gremlin Server where one might want to connect to the same graph instance |
| with both Gremlin and Cypher. |
| |
| [source,text] |
| ---- |
| gremlin> :install org.neo4j.driver neo4j-java-driver 1.7.2 |
| ==>Loaded: [org.neo4j.driver, neo4j-java-driver, 1.7.2] |
| ... // restart Gremlin Console |
| gremlin> import org.neo4j.driver.v1.* |
| ==>org.apache.tinkerpop.gremlin.structure.*, org.apache.tinkerpop.gremlin.structure.util.*, ... org.neo4j.driver.v1.* |
| gremlin> driver = GraphDatabase.driver( "bolt://localhost:7687", AuthTokens.basic("neo4j", "neo4j")) |
| Oct 28, 2019 3:28:20 PM org.neo4j.driver.internal.logging.JULogger info |
| INFO: Direct driver instance 1385140107 created for server address localhost:7687 |
| ==>org.neo4j.driver.internal.InternalDriver@528f8f8b |
| gremlin> session = driver.session() |
| ==>org.neo4j.driver.internal.NetworkSession@f3fcd59 |
| gremlin> session.run( "CREATE (a:person {name: {name}, age: {age}})", |
| ......1> Values.parameters("name", "stephen", "age", 29)) |
| gremlin> :remote connect tinkerpop.server conf/remote.yaml |
| ==>Configured localhost/127.0.0.1:8182 |
| gremlin> :remote console |
| ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode |
| gremlin> g.V().elementMap() |
| ==>{id=0, label=person, name=stephen, age=29} |
| ---- |
| |
| === High Availability Configuration |
| |
| image:neo4j-ha.png[width=400,float=right] TinkerPop supports running Neo4j with its fault tolerant master-slave |
| replication configuration, referred to as its |
| link:http://neo4j.com/docs/operations-manual/current/#_neo4j_cluster_install[High Availability (HA) cluster]. From the |
| TinkerPop perspective, configuring for HA is not that different than configuring for embedded mode as shown above. The |
| main difference is the usage of HA configuration options that enable the cluster. Once connected to a cluster, usage |
| from the TinkerPop perspective is largely the same. |
| |
| In configuring for HA the most important thing to realize is that all Neo4j HA settings are simply passed through the |
| TinkerPop configuration settings given to the `GraphFactory.open()` or `Neo4j.open()` methods. For example, to |
| provide the all-important `ha.server_id` configuration option through TinkerPop, simply prefix that key with the |
| TinkerPop Neo4j key of `gremlin.neo4j.conf`. |
| |
| The following properties demonstrates one of the three configuration files required to setup a simple three node HA |
| cluster on the same machine instance: |
| |
| [source,properties] |
| ---- |
| gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph |
| gremlin.neo4j.directory=/tmp/neo4j.server1 |
| gremlin.neo4j.conf.ha.server_id=1 |
| gremlin.neo4j.conf.ha.initial_hosts=localhost:5001\,localhost:5002\,localhost:5003 |
| gremlin.neo4j.conf.ha.host.coordination=localhost:5001 |
| gremlin.neo4j.conf.ha.host.data=localhost:6001 |
| ---- |
| |
| Assuming the intent is to configure this cluster completely within TinkerPop (perhaps within three separate Gremlin |
| Server instances), the other two configuration files will be quite similar. The second will be: |
| |
| [source,properties] |
| ---- |
| gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph |
| gremlin.neo4j.directory=/tmp/neo4j.server2 |
| gremlin.neo4j.conf.ha.server_id=2 |
| gremlin.neo4j.conf.ha.initial_hosts=localhost:5001\,localhost:5002\,localhost:5003 |
| gremlin.neo4j.conf.ha.host.coordination=localhost:5002 |
| gremlin.neo4j.conf.ha.host.data=localhost:6002 |
| ---- |
| |
| and the third will be: |
| |
| [source,properties] |
| ---- |
| gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph |
| gremlin.neo4j.directory=/tmp/neo4j.server3 |
| gremlin.neo4j.conf.ha.server_id=3 |
| gremlin.neo4j.conf.ha.initial_hosts=localhost:5001\,localhost:5002\,localhost:5003 |
| gremlin.neo4j.conf.ha.host.coordination=localhost:5003 |
| gremlin.neo4j.conf.ha.host.data=localhost:6003 |
| ---- |
| |
| IMPORTANT: The backslashes in the values provided to `gremlin.neo4j.conf.ha.initial_hosts` prevent that configuration |
| setting as being interpreted as a `List`. |
| |
| Create three separate Gremlin Server configuration files and point each at one of these Neo4j files. Since these Gremlin |
| Server instances will be running on the same machine, ensure that each Gremlin Server instance has a unique `port` |
| setting in that Gremlin Server configuration file. Start each Gremlin Server instance to bring the HA cluster online. |
| |
| NOTE: `Neo4jGraph` instances will block until all nodes join the cluster. |
| |
| Neither Gremlin Server nor Neo4j will share transactions across the cluster. Be sure to either use Gremlin Server |
| managed transactions or, if using a session without that option, ensure that all requests are being routed to the |
| same server. |
| |
| This example discussed use of Gremlin Server to demonstrate the HA configuration, but it is also easy to setup with |
| three Gremlin Console instances. Simply start three Gremlin Console instances and use `GraphFactory` to read those |
| configuration files to form the cluster. Furthermore, keep in mind that it is possible to have a Gremlin Console join |
| a cluster handled by two Gremlin Servers or Neo4j Enterprise. The only limits as to how the configuration can be |
| utilized are prescribed by Neo4j itself. Please refer to their |
| link:http://neo4j.com/docs/operations-manual/current/#ha-setup-tutorial[documentation] for more information on how |
| this feature works. |