| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| :docinfo: shared |
| :docinfodir: ../../ |
| :toc-position: left |
| |
| image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"] |
| |
| *x.y.z* |
| |
| = Introduction |
| |
| image:tinkerpop-cityscape.png[] |
| |
| This document discusses Apache TinkerPop™ implementation details that are most useful to developers who implement |
| TinkerPop interfaces and the Gremlin language. This document may also be helpful to Gremlin users who simply want a |
| deeper understanding of how TinkerPop works and what the behavioral semantics of Gremlin are. The |
| <<providers,Provider Section>> outlines the various integration and extension points that TinkerPop has while the |
| <<gremlin-semantics,Gremlin Semantics Section>> documents the Gremlin language itself. |
| |
| Providers who rely on the TinkerPop execution engine generally receive the behaviors described in the Gremlin Semantics |
| section for free, but those who develop their own engine or extend upon the certain features should refer to that |
| section for the details required for a consistent Gremlin experience. |
| |
| [[providers]] |
| = Provider Documentation |
| |
| TinkerPop exposes a set of interfaces, protocols, and tests that make it possible for third-parties to build |
| libraries and systems that plug-in to the TinkerPop stack. TinkerPop refers to those third-parties as "providers" and |
| this documentation is designed to help providers understand what is involved in developing code on these lower levels |
| of the TinkerPop API. |
| |
| This document attempts to address the needs of the different providers that have been identified: |
| |
| * Graph System Provider |
| ** Graph Database Provider |
| ** Graph Processor Provider |
| * Graph Driver Provider |
| * Graph Language Provider |
| * Graph Plugin Provider |
| |
| [[graph-system-provider-requirements]] |
| == Graph System Provider Requirements |
| |
| image:tinkerpop-enabled.png[width=140,float=left] At the core of TinkerPop 3.x is a Java API. The implementation of this |
| core API and its validation via the `gremlin-test` suite is all that is required of a graph system provider wishing to |
| provide a TinkerPop-enabled graph engine. Once a graph system has a valid implementation, then all the applications |
| provided by TinkerPop (e.g. Gremlin Console, Gremlin Server, etc.) and 3rd-party developers (e.g. Gremlin-Scala, |
| Gremlin-JS, etc.) will integrate properly. Finally, please feel free to use the logo on the left to promote your |
| TinkerPop implementation. |
| |
| [[graph-structure-api]] |
| === Graph Structure API |
| |
| The graph structure API of TinkerPop provides the interfaces necessary to create a TinkerPop enabled system and |
| exposes the basic components of a property graph to include `Graph`, `Vertex`, `Edge`, `VertexProperty` and `Property`. |
| The structure API can be used directly as follows: |
| |
| [source,java] |
| ---- |
| Graph graph = TinkerGraph.open(); <1> |
| Vertex marko = graph.addVertex(T.label, "person", T.id, 1, "name", "marko", "age", 29); <2> |
| Vertex vadas = graph.addVertex(T.label, "person", T.id, 2, "name", "vadas", "age", 27); |
| Vertex lop = graph.addVertex(T.label, "software", T.id, 3, "name", "lop", "lang", "java"); |
| Vertex josh = graph.addVertex(T.label, "person", T.id, 4, "name", "josh", "age", 32); |
| Vertex ripple = graph.addVertex(T.label, "software", T.id, 5, "name", "ripple", "lang", "java"); |
| Vertex peter = graph.addVertex(T.label, "person", T.id, 6, "name", "peter", "age", 35); |
| marko.addEdge("knows", vadas, T.id, 7, "weight", 0.5f); <3> |
| marko.addEdge("knows", josh, T.id, 8, "weight", 1.0f); |
| marko.addEdge("created", lop, T.id, 9, "weight", 0.4f); |
| josh.addEdge("created", ripple, T.id, 10, "weight", 1.0f); |
| josh.addEdge("created", lop, T.id, 11, "weight", 0.4f); |
| peter.addEdge("created", lop, T.id, 12, "weight", 0.2f); |
| ---- |
| |
| <1> Create a new in-memory `TinkerGraph` and assign it to the variable `graph`. |
| <2> Create a vertex along with a set of key/value pairs with `T.label` being the vertex label and `T.id` being the vertex id. |
| <3> Create an edge along with a set of key/value pairs with the edge label being specified as the first argument. |
| |
| In the above code all the vertices are created first and then their respective edges. There are two "accessor tokens": |
| `T.id` and `T.label`. When any of these, along with a set of other key value pairs is provided to |
| `Graph.addVertex(Object...)` or `Vertex.addEdge(String,Vertex,Object...)`, the respective element is created along |
| with the provided key/value pair properties appended to it. |
| |
| Below is a sequence of basic graph mutation operations represented in Java: |
| |
| image:basic-mutation.png[width=240,float=right] |
| [source,java] |
| ---- |
| // create a new graph |
| Graph graph = TinkerGraph.open(); |
| // add a software vertex with a name property |
| Vertex gremlin = graph.addVertex(T.label, "software", |
| "name", "gremlin"); <1> |
| // only one vertex should exist |
| assert(IteratorUtils.count(graph.vertices()) == 1) |
| // no edges should exist as none have been created |
| assert(IteratorUtils.count(graph.edges()) == 0) |
| // add a new property |
| gremlin.property("created",2009) <2> |
| // add a new software vertex to the graph |
| Vertex blueprints = graph.addVertex(T.label, "software", |
| "name", "blueprints"); <3> |
| // connect gremlin to blueprints via a dependsOn-edge |
| gremlin.addEdge("dependsOn",blueprints); <4> |
| // now there are two vertices and one edge |
| assert(IteratorUtils.count(graph.vertices()) == 2) |
| assert(IteratorUtils.count(graph.edges()) == 1) |
| // add a property to blueprints |
| blueprints.property("created",2010) <5> |
| // remove that property |
| blueprints.property("created").remove() <6> |
| // connect gremlin to blueprints via encapsulates |
| gremlin.addEdge("encapsulates",blueprints) <7> |
| assert(IteratorUtils.count(graph.vertices()) == 2) |
| assert(IteratorUtils.count(graph.edges()) == 2) |
| // removing a vertex removes all its incident edges as well |
| blueprints.remove() <8> |
| gremlin.remove() <9> |
| // the graph is now empty |
| assert(IteratorUtils.count(graph.vertices()) == 0) |
| assert(IteratorUtils.count(graph.edges()) == 0) |
| // tada! |
| ---- |
| |
| The above code samples are just examples of how the structure API can be used to access a graph. Those APIs are then |
| used internally by the process API (i.e. Gremlin) to access any graph that implements those structure API interfaces |
| to execute queries. Typically, the structure API methods are not used directly by end-users. |
| |
| === Implementing Gremlin-Core |
| |
| The classes that a graph system provider should focus on implementing are itemized below. It is a good idea to study |
| the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#tinkergraph-gremlin[TinkerGraph] (in-memory OLTP and OLAP |
| in `tinkergraph-gremlin`), link:https://tinkerpop.apache.org/docs/x.y.z/reference/#neo4j-gremlin[Neo4jGraph] |
| (OLTP w/ transactions in `neo4j-gremlin`) and/or |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-gremlin[HadoopGraph] (OLAP in `hadoop-gremlin`) |
| implementations for ideas and patterns. |
| |
| . Online Transactional Processing Graph Systems (*OLTP*) |
| .. Structure API: `Graph`, `Element`, `Vertex`, `Edge`, `Property` and `Transaction` (if transactions are supported). |
| .. Process API: `TraversalStrategy` instances for optimizing Gremlin traversals to the provider's graph system (i.e. `TinkerGraphStepStrategy`). |
| . Online Analytics Processing Graph Systems (*OLAP*) |
| .. Everything required of OLTP is required of OLAP (but not vice versa). |
| .. GraphComputer API: `GraphComputer`, `Messenger`, `Memory`. |
| |
| Please consider the following implementation notes: |
| |
| * Use `StringHelper` to ensuring that the `toString()` representation of classes are consistent with other |
| implementations. |
| * Ensure that your implementation's `Features` (Graph, Vertex, etc.) are correct so that test cases handle particulars |
| accordingly. |
| * Use the numerous static method helper classes such as `ElementHelper`, `GraphComputerHelper`, `VertexProgramHelper`, etc. |
| * There are a number of default methods on the provided interfaces that are semantically correct. However, if they are |
| not efficient for the implementation, override them. |
| * Implement the `structure/` package interfaces first and then, if desired, interfaces in the `process/` package |
| interfaces. |
| * `ComputerGraph` is a `Wrapper` system that ensure proper semantics during a GraphComputer computation. |
| * The link:https://tinkerpop.apache.org/javadocs/x.y.z/core/[javadoc] is often a good resource in understanding |
| expectations from both the user's perspective as well as the graph provider's perspective. Also consider examining |
| the javadoc of TinkerGraph which is often well annotated and the interfaces and classes of the test suite itself. |
| |
| [[oltp-implementations]] |
| ==== OLTP Implementations |
| |
| image:pipes-character-1.png[width=110,float=right] The most important interfaces to implement are in the `structure/` |
| package. These include interfaces like `Graph`, `Vertex`, `Edge`, `Property`, `Transaction`, etc. The |
| `StructureStandardSuite` will ensure that the semantics of the methods implemented are correct. Moreover, there are |
| numerous `Exceptions` classes with static exceptions that should be thrown by the graph system so that all the |
| exceptions and their messages are consistent amongst all TinkerPop implementations. |
| |
| The following bullets provide some tips to consider when implementing the structure interfaces: |
| |
| * `Graph` |
| ** Be sure the `Graph` implementation is named as `XXXGraph` (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.). |
| ** This implementation needs to be `GraphFactory` compatible which means that the implementation should have a static |
| `Graph open(Configuration)` method where the `Configuration` is an Apache Commons class of that name. Alternatively, the |
| `Graph` implementation can have the `GraphFactoryClass` annotation which specifies a class with that static |
| `Graph open(Configuration)` method. |
| * `VertexProperty` |
| ** This interface is both a `Property` and an `Element` as `VertexProperty` is a first-class graph element in that it |
| can have its own properties (i.e. meta-properties). Even if the implementation does not intend to support |
| meta-properties, the `VertexProperty` needs to be implemented as an `Element`. `VertexProperty` should return empty |
| iterable for properties if meta-properties is not supported. |
| |
| [[olap-implementations]] |
| ==== OLAP Implementations |
| |
| image:furnace-character-1.png[width=110,float=right] Implementing the OLAP interfaces may be a bit more complicated. |
| Note that before OLAP interfaces are implemented, it is necessary for the OLTP interfaces to be, at minimal, |
| implemented as specified in <<oltp-implementations,OLTP Implementations>>. A summary of each required interface |
| implementation is presented below: |
| |
| . `GraphComputer`: A fluent builder for specifying an isolation level, a VertexProgram, and any number of MapReduce jobs to be submitted. |
| . `Memory`: A global blackboard for ANDing, ORing, INCRing, and SETing values for specified keys. |
| . `Messenger`: The system that collects and distributes messages being propagated by vertices executing the VertexProgram application. |
| . `MapReduce.MapEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications map-phase. |
| . `MapReduce.ReduceEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications combine- and reduce-phases. |
| |
| NOTE: The VertexProgram and MapReduce interfaces in the `process/computer/` package are not required by the graph |
| system. Instead, these are interfaces to be implemented by application developers writing VertexPrograms and MapReduce jobs. |
| |
| IMPORTANT: TinkerPop provides two OLAP implementations: |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#tinkergraph-gremlin[TinkerGraphComputer] (TinkerGraph), |
| and |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#sparkgraphcomputer[SparkGraphComputer] (Hadoop). |
| Given the complexity of the OLAP system, it is good to study and copy many of the patterns used in these reference |
| implementations. |
| |
| ===== Implementing GraphComputer |
| |
| image:furnace-character-3.png[width=150,float=right] The most complex method in GraphComputer is the `submit()`-method. The method must do the following: |
| |
| . Ensure the GraphComputer has not already been executed. |
| . Ensure that at least there is a VertexProgram or 1 MapReduce job. |
| . If there is a VertexProgram, validate that it can execute on the GraphComputer given the respectively defined features. |
| . Create the Memory to be used for the computation. |
| . Execute the VertexProgram.setup() method once and only once. |
| . Execute the VertexProgram.execute() method for each vertex. |
| . Execute the VertexProgram.terminate() method once and if true, repeat VertexProgram.execute(). |
| . When VertexProgram.terminate() returns true, move to MapReduce job execution. |
| . MapReduce jobs are not required to be executed in any specified order. |
| . For each Vertex, execute MapReduce.map(). Then (if defined) execute MapReduce.combine() and MapReduce.reduce(). |
| . Update Memory with runtime information. |
| . Construct a new `ComputerResult` containing the compute Graph and Memory. |
| |
| ===== Implementing Memory |
| |
| image:gremlin-brain.png[width=175,float=left] The Memory object is initially defined by `VertexProgram.setup()`. |
| The memory data is available in the first round of the `VertexProgram.execute()` method. Each Vertex, when executing |
| the VertexProgram, can update the Memory in its round. However, the update is not seen by the other vertices until |
| the next round. At the end of the first round, all the updates are aggregated and the new memory data is available |
| on the second round. This process repeats until the VertexProgram terminates. |
| |
| ===== Implementing Messenger |
| |
| The Messenger object is similar to the Memory object in that a vertex can read and write to the Messenger. However, |
| the data it reads are the messages sent to the vertex in the previous step and the data it writes are the messages |
| that will be readable by the receiving vertices in the subsequent round. |
| |
| ===== Implementing MapReduce Emitters |
| |
| image:hadoop-logo-notext.png[width=150,float=left] The MapReduce framework in TinkerPop is similar to the model |
| popularized by link:http://hadoop.apache.org[Hadoop]. The primary difference is that all Mappers process the vertices |
| of the graph, not an arbitrary key/value pair. However, the vertices' edges can not be accessed -- only their |
| properties. This greatly reduces the amount of data needed to be pushed through the MapReduce engine as any edge |
| information required, can be computed in the VertexProgram.execute() method. Moreover, at this stage, vertices can |
| not be mutated, only their token and property data read. A Gremlin OLAP system needs to provide implementations for |
| to particular classes: `MapReduce.MapEmitter` and `MapReduce.ReduceEmitter`. TinkerGraph's implementation is provided |
| below which demonstrates the simplicity of the algorithm (especially when the data is all within the same JVM). |
| |
| [source,java] |
| ---- |
| public class TinkerMapEmitter<K, V> implements MapReduce.MapEmitter<K, V> { |
| |
| public Map<K, Queue<V>> reduceMap; |
| public Queue<KeyValue<K, V>> mapQueue; |
| private final boolean doReduce; |
| |
| public TinkerMapEmitter(final boolean doReduce) { <1> |
| this.doReduce = doReduce; |
| if (this.doReduce) |
| this.reduceMap = new ConcurrentHashMap<>(); |
| else |
| this.mapQueue = new ConcurrentLinkedQueue<>(); |
| } |
| |
| @Override |
| public void emit(K key, V value) { |
| if (this.doReduce) |
| this.reduceMap.computeIfAbsent(key, k -> new ConcurrentLinkedQueue<>()).add(value); <2> |
| else |
| this.mapQueue.add(new KeyValue<>(key, value)); <3> |
| } |
| |
| protected void complete(final MapReduce<K, V, ?, ?, ?> mapReduce) { |
| if (!this.doReduce && mapReduce.getMapKeySort().isPresent()) { <4> |
| final Comparator<K> comparator = mapReduce.getMapKeySort().get(); |
| final List<KeyValue<K, V>> list = new ArrayList<>(this.mapQueue); |
| Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator)); |
| this.mapQueue.clear(); |
| this.mapQueue.addAll(list); |
| } else if (mapReduce.getMapKeySort().isPresent()) { |
| final Comparator<K> comparator = mapReduce.getMapKeySort().get(); |
| final List<Map.Entry<K, Queue<V>>> list = new ArrayList<>(); |
| list.addAll(this.reduceMap.entrySet()); |
| Collections.sort(list, Comparator.comparing(Map.Entry::getKey, comparator)); |
| this.reduceMap = new LinkedHashMap<>(); |
| list.forEach(entry -> this.reduceMap.put(entry.getKey(), entry.getValue())); |
| } |
| } |
| } |
| ---- |
| |
| <1> If the MapReduce job has a reduce, then use one data structure (`reduceMap`), else use another (`mapList`). The |
| difference being that a reduction requires a grouping by key and therefore, the `Map<K,Queue<V>>` definition. If no |
| reduction/grouping is required, then a simple `Queue<KeyValue<K,V>>` can be leveraged. |
| <2> If reduce is to follow, then increment the Map with a new value for the key. `MapHelper` is a TinkerPop class |
| with static methods for adding data to a Map. |
| <3> If no reduce is to follow, then simply append a KeyValue to the queue. |
| <4> When the map phase is complete, any map-result sorting required can be executed at this point. |
| |
| [source,java] |
| ---- |
| public class TinkerReduceEmitter<OK, OV> implements MapReduce.ReduceEmitter<OK, OV> { |
| |
| protected Queue<KeyValue<OK, OV>> reduceQueue = new ConcurrentLinkedQueue<>(); |
| |
| @Override |
| public void emit(final OK key, final OV value) { |
| this.reduceQueue.add(new KeyValue<>(key, value)); |
| } |
| |
| protected void complete(final MapReduce<?, ?, OK, OV, ?> mapReduce) { |
| if (mapReduce.getReduceKeySort().isPresent()) { |
| final Comparator<OK> comparator = mapReduce.getReduceKeySort().get(); |
| final List<KeyValue<OK, OV>> list = new ArrayList<>(this.reduceQueue); |
| Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator)); |
| this.reduceQueue.clear(); |
| this.reduceQueue.addAll(list); |
| } |
| } |
| } |
| ---- |
| |
| The method `MapReduce.reduce()` is defined as: |
| |
| [source,java] |
| public void reduce(final OK key, final Iterator<OV> values, final ReduceEmitter<OK, OV> emitter) { ... } |
| |
| In other words, for the TinkerGraph implementation, iterate through the entrySet of the `reduceMap` and call the |
| `reduce()` method on each entry. The `reduce()` method can emit key/value pairs which are simply aggregated into a |
| `Queue<KeyValue<OK,OV>>` in an analogous fashion to `TinkerMapEmitter` when no reduce is to follow. These two emitters |
| are tied together in `TinkerGraphComputer.submit()`. |
| |
| [source,java] |
| ---- |
| ... |
| for (final MapReduce mapReduce : mapReducers) { |
| if (mapReduce.doStage(MapReduce.Stage.MAP)) { |
| final TinkerMapEmitter<?, ?> mapEmitter = new TinkerMapEmitter<>(mapReduce.doStage(MapReduce.Stage.REDUCE)); |
| final SynchronizedIterator<Vertex> vertices = new SynchronizedIterator<>(this.graph.vertices()); |
| workers.setMapReduce(mapReduce); |
| workers.mapReduceWorkerStart(MapReduce.Stage.MAP); |
| workers.executeMapReduce(workerMapReduce -> { |
| while (true) { |
| final Vertex vertex = vertices.next(); |
| if (null == vertex) return; |
| workerMapReduce.map(ComputerGraph.mapReduce(vertex), mapEmitter); |
| } |
| }); |
| workers.mapReduceWorkerEnd(MapReduce.Stage.MAP); |
| |
| // sort results if a map output sort is defined |
| mapEmitter.complete(mapReduce); |
| |
| // no need to run combiners as this is single machine |
| if (mapReduce.doStage(MapReduce.Stage.REDUCE)) { |
| final TinkerReduceEmitter<?, ?> reduceEmitter = new TinkerReduceEmitter<>(); |
| final SynchronizedIterator<Map.Entry<?, Queue<?>>> keyValues = new SynchronizedIterator((Iterator) mapEmitter.reduceMap.entrySet().iterator()); |
| workers.mapReduceWorkerStart(MapReduce.Stage.REDUCE); |
| workers.executeMapReduce(workerMapReduce -> { |
| while (true) { |
| final Map.Entry<?, Queue<?>> entry = keyValues.next(); |
| if (null == entry) return; |
| workerMapReduce.reduce(entry.getKey(), entry.getValue().iterator(), reduceEmitter); |
| } |
| }); |
| workers.mapReduceWorkerEnd(MapReduce.Stage.REDUCE); |
| reduceEmitter.complete(mapReduce); // sort results if a reduce output sort is defined |
| mapReduce.addResultToMemory(this.memory, reduceEmitter.reduceQueue.iterator()); <1> |
| } else { |
| mapReduce.addResultToMemory(this.memory, mapEmitter.mapQueue.iterator()); <2> |
| } |
| } |
| } |
| ... |
| ---- |
| |
| <1> Note that the final results of the reducer are provided to the Memory as specified by the application developer's |
| `MapReduce.addResultToMemory()` implementation. |
| <2> If there is no reduce stage, the map-stage results are inserted into Memory as specified by the application |
| developer's `MapReduce.addResultToMemory()` implementation. |
| |
| ==== Hadoop-Gremlin Usage |
| |
| Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to |
| leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a |
| Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed |
| results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific |
| `OutputFormat<NullWritable,VertexWritable>` must be developed as well. |
| |
| Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as |
| the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a |
| small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP |
| features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer> |
| graphComputerClass)`-method may look as follows: |
| |
| [source,java] |
| ---- |
| public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException { |
| try { |
| if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass)) |
| return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this); |
| else |
| throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass); |
| } catch (final Exception e) { |
| throw new IllegalArgumentException(e.getMessage(),e); |
| } |
| } |
| ---- |
| |
| Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the |
| case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the |
| `compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which |
| determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphReader` and |
| `gremlin.hadoop.graphWriter`. |
| |
| ===== GraphFilterAware Interface |
| |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-filter[Graph filters] by OLAP processors to only pull a subgraph of the full graph from the graph data source. For instance, the |
| example below constructs a `GraphFilter` that will only pull the "knows"-graph amongst people into the `GraphComputer` |
| for processing. |
| |
| [source,java] |
| ---- |
| graph.compute().vertices(hasLabel("person")).edges(bothE("knows")) |
| ---- |
| |
| If the provider has a custom `InputRDD`, they can implement `GraphFilterAware` and that graph filter will be provided to their |
| `InputRDD` at load time. For providers that use an `InputFormat`, state but the graph filter can be accessed from the configuration |
| as such: |
| |
| [source,java] |
| ---- |
| if (configuration.containsKey(Constants.GREMLIN_HADOOP_GRAPH_FILTER)) |
| this.graphFilter = VertexProgramHelper.deserialize(configuration, Constants.GREMLIN_HADOOP_GRAPH_FILTER); |
| ---- |
| |
| ===== PersistResultGraphAware Interface |
| |
| A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which |
| determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided |
| by Hadoop-Gremlin (e.g. link:++https://tinkerpop.apache.org/docs/x.y.z/reference/#gryo-io-format++[`GryoOutputFormat`], link:++https://tinkerpop.apache.org/docs/x.y.z/reference/#graphson-io-format++[`GraphSONOutputFormat`], |
| and link:++https://tinkerpop.apache.org/docs/x.y.z/reference/#script-io-format++[`ScriptInputOutputFormat`]) `ResultGraph.ORIGINAL` is not supported as the original graph |
| data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support |
| `ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum. |
| |
| [[io-implementations]] |
| ==== IO Implementations |
| |
| If a `Graph` requires custom serializers for IO to work properly, implement the `Graph.io` method. A typical example |
| of where a `Graph` would require such a custom serializers is if their identifier system uses non-primitive values, |
| such as OrientDB's `Rid` class. From basic serialization of a single `Vertex` all the way up the stack to Gremlin |
| Server, the need to know how to handle these complex identifiers is an important requirement. |
| |
| The first step to implementing custom serializers is to first implement the `IoRegistry` interface and register the |
| custom classes and serializers to it. Each `Io` implementation has different requirements for what it expects from the |
| `IoRegistry`: |
| |
| * *GraphML* - No custom serializers expected/allowed. |
| * *GraphSON* - Register a Jackson `SimpleModule`. The `SimpleModule` encapsulates specific classes to be serialized, |
| so it does not need to be registered to a specific class in the `IoRegistry` (use `null`). |
| * *Gryo* - Expects registration of one of three objects: |
| ** Register just the custom class with a `null` Kryo `Serializer` implementation - this class will use default "field-level" Kryo serialization. |
| ** Register the custom class with a specific Kryo `Serializer' implementation. |
| ** Register the custom class with a `Function<Kryo, Serializer>` for those cases where the Kryo `Serializer` requires the `Kryo` instance to get constructed. |
| |
| This implementation should provide a zero-arg constructor as the stack may require instantiation via reflection. |
| Consider extending `AbstractIoRegistry` for convenience as follows: |
| |
| [source,java] |
| ---- |
| public class MyGraphIoRegistry extends AbstractIoRegistry { |
| public MyGraphIoRegistry() { |
| register(GraphSONIo.class, null, new MyGraphSimpleModule()); |
| register(GryoIo.class, MyGraphIdClass.class, new MyGraphIdSerializer()); |
| } |
| } |
| ---- |
| |
| In the `Graph.io` method, provide the `IoRegistry` object to the supplied `Builder` and call the `create` method to |
| return that `Io` instance as follows: |
| |
| [source,java] |
| ---- |
| public <I extends Io> I io(final Io.Builder<I> builder) { |
| return (I) builder.graph(this).registry(myGraphIoRegistry).create(); |
| }} |
| ---- |
| |
| In this way, `Graph` implementations can pre-configure custom serializers for IO interactions and users will not need |
| to know about those details. Following this pattern will ensure proper execution of the test suite as well as |
| simplified usage for end-users. |
| |
| IMPORTANT: Proper implementation of IO is critical to successful `Graph` operations in Gremlin Server. The Test Suite |
| does have "serialization" tests that provide some assurance that an implementation is working properly, but those |
| tests cannot make assertions against any specifics of a custom serializer. It is the responsibility of the |
| implementer to test the specifics of their custom serializers. |
| |
| TIP: Consider separating serializer code into its own module, if possible, so that clients that use the `Graph` |
| implementation remotely don't need a full dependency on the entire `Graph` - just the IO components and related |
| classes being serialized. |
| |
| There is an important implication to consider when the addition of a custom serializer. Presumably, the custom |
| serializer was written for the JVM to be deployed with a `Graph` instance. For example, a graph may expose a |
| geographical type like a `Point` or something similar. The library that contains `Point` assuming users expected to |
| deserialize back to a `Point` would need to have the library with `Point` and the "`PointSerializer`" class available |
| to them. In cases where that deployment approach is not desirable, it is possible to coerce a class like `Point` to |
| a type that is already in the list of types supported in TinkerPop. For example, `Point` could be coerced one-way to |
| `Map` of keys "x" and "y". Of course, on the client side, users would have to construct a `Map` for a `Point` which |
| isn't quite as user-friendly. |
| |
| If doing a type coercion is not desired, then it is important to remember that writing a `Point` class and related |
| serializer in Java is not sufficient for full support of Gremlin, as users of non-JVM Gremlin Language Variants (GLV) |
| will not be able to consume them. Getting full support would mean writing similar classes for each GLV. While |
| developing those classes is not hard, it also means more code to support. |
| |
| ===== Supporting Gremlin-Python IO |
| |
| The serialization system of Gremlin-Python provides ways to add new types by creating serializers and deserializers in |
| Python and registering them with the `RemoteConnection`. |
| |
| [source,python] |
| ---- |
| class MyType(object): |
| GRAPHSON_PREFIX = "providerx" |
| GRAPHSON_BASE_TYPE = "MyType" |
| GRAPHSON_TYPE = GraphSONUtil.formatType(GRAPHSON_PREFIX, GRAPHSON_BASE_TYPE) |
| |
| def __init__(self, x, y): |
| self.x = x |
| self.y = y |
| |
| @classmethod |
| def objectify(cls, value, reader): |
| return cls(value['x'], value['y']) |
| |
| @classmethod |
| def dictify(cls, value, writer): |
| return GraphSONUtil.typedValue(cls.GRAPHSON_BASE_TYPE, |
| {'x': value.x, 'y': value.y}, |
| cls.GRAPHSON_PREFIX) |
| |
| graphson_reader = GraphSONReader({MyType.GRAPHSON_TYPE: MyType}) |
| graphson_writer = GraphSONWriter({MyType: MyType}) |
| |
| connection = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g', |
| graphson_reader=graphson_reader, |
| graphson_writer=graphson_writer) |
| ---- |
| |
| ===== Supporting Gremlin.Net IO |
| |
| The serialization system of Gremlin.Net provides ways to add new types by creating serializers and deserializers in |
| any .NET language and registering them with the `GremlinClient`. |
| |
| [source,csharp] |
| ---- |
| include::../../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Dev/Provider/IndexTests.cs[tags=myTypeSerialization] |
| |
| include::../../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Dev/Provider/IndexTests.cs[tags=supportingGremlinNetIO] |
| ---- |
| |
| [[remoteconnection-implementations]] |
| ==== RemoteConnection Implementations |
| |
| A `RemoteConnection` is an interface that is important for usage on traversal sources configured using the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#connecting-via-drivers[with()] option. A `Traversal` |
| that is generated from that source will apply a `RemoteStrategy` which will inject a `RemoteStep` to its end. That |
| step will then send the `GremlinLang` of the `Traversal` over the `RemoteConnection` to get the results that it will |
| iterate. |
| |
| There is one method to implement on `RemoteConnection`: |
| |
| [source,java] |
| public <E> CompletableFuture<RemoteTraversal<?, E>> submitAsync(final GremlinLang gremlinLang) throws RemoteConnectionException; |
| |
| Note that it returns a `RemoteTraversal`. This interface should also be implemented and in most cases implementers can |
| simply extend the `AbstractRemoteTraversal`. |
| |
| TinkerPop provides the `DriverRemoteConnection` as a useful and |
| link:https://github.com/apache/tinkerpop/blob/x.y.z/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/remote[example implementation]. |
| `DriverRemoteConnection` serializes the `Traversal` as GremlinLang and then submits it for remote processing on |
| Gremlin Server. Gremlin Server parses the script into a `Traversal` to a configured `Graph` instance and then iterates |
| the results back as it would normally do. |
| |
| Implementing `RemoteConnection` is not something routinely done for those implementing `gremlin-core`. It is only |
| something required if there is a need to exploit remote traversal submission. If a graph provider has a "graph server" |
| similar to Gremlin Server that can accept traversal-based requests on its own protocol, then that would be one example |
| of a reason to implement this interface. |
| |
| [[bulk-import-export]] |
| ==== Bulk Import Export |
| |
| When it comes to doing "bulk" operations, the diverse nature of the available graph databases and their specific |
| capabilities, prevents TinkerPop from doing a good job of generalizing that capability well. TinkerPop thus maintains |
| two positions on the concept of import and export: |
| |
| 1. TinkerPop refers users to the bulk import/export facilities of specific graph providers as they tend to be more |
| efficient and easier to use than the options TinkerPop has tried to generalize in the past. |
| 2. TinkerPop encourages graph providers to expose those capabilities via `g.io()` and the `IoStep` by way of a |
| `TraversalStrategy`. |
| |
| That said, for graph providers that don't have a special bulk loading feature, they can either rely on the default |
| OLTP (single-threaded) `GraphReader` and `GraphWriter` options that are embedded in `IoStep` or get a basic bulk loader |
| from TinkerPop using the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#clonevertexprogram[CloneVertexProgram]. |
| Simply provide a `InputFormat` and `OutputFormat` that can be referenced by a `HadoopGraph` instance as discussed |
| in the link:https://tinkerpop.apache.org/docs/x.y.z/reference/#clonevertexprogram[Reference Documentation]. |
| |
| [[validating-with-gremlin-test]] |
| === Validating with Gremlin-Test |
| |
| image:gremlin-edumacated.png[width=225] |
| |
| [source,xml] |
| ---- |
| <!-- gremlin-tests contains test classes and gherkin tests --> |
| <dependency> |
| <groupId>org.apache.tinkerpop</groupId> |
| <artifactId>gremlin-test</artifactId> |
| <version>x.y.z</version> |
| </dependency> |
| <!-- TinkerGraph is required for validating subgraph() tests --> |
| <dependency> |
| <groupId>org.apache.tinkerpop</groupId> |
| <artifactId>tinkergraph-gremlin</artifactId> |
| <version>x.y.z</version> |
| </dependency> |
| ---- |
| |
| TinkerPop provides `gremlin-test`, a technology test kit, that helps validate provider implementations. There is a |
| Gherkin-based test suite and a JVM-based test suite. The Gherkin suite is meant to test Gremlin operations and semantics |
| while the JVM-based suite is meant for lower-level `Graph` implementation validation and for Gremlin use cases that |
| apply to embedded operations (i.e. running Gremlin in the same JVM as the `Graph` implementation). |
| |
| ==== JVM Test Suite |
| |
| IMPORTANT: 4.0.0-beta.1 Release - The final form of the TinkerPop test suite for 4.0 is not wholly settled, but going |
| forward providers should focus on implementing the <<gherkin-tests-suite>> as opposed to the JVM suite. |
| |
| The JVM test suite is useful to graph system implementers who want to validate that their `Graph` implementation is |
| working properly and that Gremlin features specific to embedded use cases are behaving as expected. These tests do not |
| validate that all the semantics of the Gremlin query language are properly operating. Providers should implement the |
| <<gherkin-tests-suite>> for that type of validation. |
| |
| If you are a remote graph database only or don't implement the `Graph` API but simply expose access to the Gremlin |
| language then it likely makes sense to skip implementing this suite and only implemented the Gherkin suite. On the other |
| hand, if you do implement the `Graph` API or `GraphComputer` API, it likely makes sense to implement both these tests |
| and the Gherkin suite. |
| |
| To implement the JVM tests, provide test case implementations as shown below, where `XXX` below denotes the name of the |
| graph implementation (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.). |
| |
| [source,java] |
| ---- |
| // Structure API tests which validate the core functionality of the Graph implementation. |
| // Start with this one because, generally speaking, if this suite passes then the others |
| // will mostly follow suit. Moreover, the structure tests operate on lower-level aspects |
| // of the Graph interfaces which are much easier to debug than Gremlin level tests. |
| @RunWith(StructureStandardSuite.class) |
| @GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class) |
| public class XXXStructureStandardTest {} |
| |
| // Process API tests cover embedded Gremlin uses cases |
| @RunWith(ProcessEmbeddedStandardSuite.class) |
| @GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class) |
| public class XXXProcessStandardTest {} |
| |
| @RunWith(ProcessEmbeddedComputerSuite.class) |
| @GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class) |
| public class XXXProcessComputerTest {} |
| ---- |
| |
| IMPORTANT: It is as important to look at "ignored" tests as it is to look at ones that fail. The `gremlin-test` |
| suite utilizes the `Feature` implementation exposed by the `Graph` to determine which tests to execute. If a test |
| utilizes features that are not supported by the graph, it will ignore them. While that may be fine, implementers |
| should validate that the ignored tests are appropriately bypassed and that there are no mistakes in their feature |
| definitions. Moreover, implementers should consider filling gaps in their own test suites, especially when |
| IO-related tests are being ignored. |
| |
| TIP: If it is expensive to construct a new `Graph` instance, consider implementing `GraphProvider.getStaticFeatures()` |
| which can help by caching a static feature set for instances produced by that `GraphProvider` and allow the test suite |
| to avoid that construction cost if the test is ignored. |
| |
| The only test-class that requires any code investment is the `GraphProvider` implementation class. This class is a |
| used by the test suite to construct `Graph` configurations and instances and provides information about the |
| implementation itself. In most cases, it is best to simply extend `AbstractGraphProvider` as it provides many |
| default implementations of the `GraphProvider` interface. |
| |
| Finally, specify the test suites that will be supported by the `Graph` implementation using the `@Graph.OptIn` |
| annotation. See the `TinkerGraph` implementation below as an example: |
| |
| [source,java] |
| ---- |
| @Graph.OptIn(Graph.OptIn.SUITE_STRUCTURE_STANDARD) |
| @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD) |
| @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER) |
| public class TinkerGraph implements Graph { |
| ---- |
| |
| Only include annotations for the suites the implementation will support. Note that implementing the suite, but |
| not specifying the appropriate annotation will prevent the suite from running (an obvious error message will appear |
| in this case when running the mis-configured suite). |
| |
| There are times when there may be a specific test in the suite that the implementation cannot support (despite the |
| features it implements) or should not otherwise be executed. It is possible for implementers to "opt-out" of a test |
| by using the `@Graph.OptOut` annotation. This annotation can be applied to either a `Graph` instance or a |
| `GraphProvider` instance (the latter would typically be used for "opting out" for a particular `Graph` configuration |
| that was under test). The following is an example of this annotation usage as taken from `HadoopGraph`: |
| |
| [source,java] |
| ---- |
| @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD) |
| @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER) |
| @Graph.OptOut( |
| test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals", |
| method = "g_V_matchXa_hasXname_GarciaX__a_inXwrittenByX_b__a_inXsungByX_bX", |
| reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.") |
| @Graph.OptOut( |
| test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals", |
| method = "g_V_matchXa_inXsungByX_b__a_inXsungByX_c__b_outXwrittenByX_d__c_outXwrittenByX_e__d_hasXname_George_HarisonX__e_hasXname_Bob_MarleyXX", |
| reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.") |
| @Graph.OptOut( |
| test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest", |
| method = "shouldNotAllowBadMemoryKeys", |
| reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.") |
| @Graph.OptOut( |
| test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest", |
| method = "shouldRequireRegisteringMemoryKeys", |
| reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.") |
| public class HadoopGraph implements Graph { |
| ---- |
| |
| The above examples show how to ignore individual tests. It is also possible to: |
| |
| * Ignore an entire test case (i.e. all the methods within the test) by setting the `method` to "*". |
| * Ignore a "base" test class such that test that extend from those classes will all be ignored. |
| * Ignore a `GraphComputer` test based on the type of `GraphComputer` being used. Specify the "computer" attribute on |
| the `OptOut` (which is an array specification) which should have a value of the `GraphComputer` implementation class |
| that should ignore that test. This attribute should be left empty for "standard" execution and by default all |
| `GraphComputer` implementations will be included in the `OptOut` so if there are multiple implementations, explicitly |
| specify the ones that should be excluded. |
| |
| Also note that some of the tests in the Gremlin Test Suite are parameterized tests and require an additional level of |
| specificity to be properly ignored. To ignore these types of tests, examine the name template of the parameterized |
| tests. It is defined by a Java annotation that looks like this: |
| |
| [source,java] |
| @Parameterized.Parameters(name = "expect({0})") |
| |
| The annotation above shows that the name of each parameterized test will be prefixed with "expect" and have |
| parentheses wrapped around the first parameter (at index 0) value supplied to each test. This information can |
| only be garnered by studying the test set up itself. Once the pattern is determined and the specific unique name of |
| the parameterized test is identified, add it to the `specific` property on the `OptOut` annotation in addition to |
| the other arguments. |
| |
| These annotations help provide users a level of transparency into test suite compliance (via the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#describe-graph[describeGraph()] utility function). It also |
| allows implementers to have a lot of flexibility in terms of how they wish to support TinkerPop. For example, maybe |
| there is a single test case that prevents an implementer from claiming support of a `Feature`. The implementer could |
| choose to either not support the `Feature` or to support it but "opt-out" of the test with a "reason" as to why so |
| that users understand the limitation. |
| |
| IMPORTANT: Before using `OptOut` be sure that the reason for using it is sound and that it is more of a last resort. |
| It is possible that a test from the suite doesn't properly represent the expectations of a feature, is too broad or |
| narrow for the semantics it is trying to enforce or simply contains a bug. Please consider raising issues in the |
| developer mailing list with such concerns before assuming `OptOut` is the only answer. |
| |
| IMPORTANT: There are no tests that specifically validate complete compliance with Gremlin Server. Generally speaking, |
| a `Graph` that passes the full Test Suite, should be compliant with Gremlin Server. The one area where problems can |
| occur is in serialization. Always ensure that IO is properly implemented, that custom serializers are tested fully |
| and ultimately integration test the `Graph` with an actual Gremlin Server instance. |
| |
| WARNING: Configuring tests to run in parallel might result in errors that are difficult to debug as there is some |
| shared state in test execution around graph configuration. It is therefore recommended that parallelism be turned |
| off for the test suite (the Maven SureFire Plugin is configured this way by default). It may also be important to |
| include this setting, `<reuseForks>false</reuseForks>`, in the SureFire configuration if tests are failing in an |
| unexplainable way. |
| |
| WARNING: For graph implementations that require a schema, take note that TinkerPop tests were originally developed |
| without too much concern for these types of graphs. While most tests utilize the standard toy graphs there are |
| instances where tests will utilize their own independent schema that stands alone from all other tests. It may be |
| necessary to create schemas specific to certain tests in those situations. |
| |
| TIP: When running the `gremlin-test` suite against your implementation, you may need to set `build.dir` as an |
| environment variable, depending on your project layout. Some tests require this to find a writable directory for |
| creating temporary files. The value is typically set to the project build directory. For example using the Maven |
| SureFire Plugin, this is done via the configuration argLine with `-Dbuild.dir=${project.build.directory}`. |
| |
| ===== Checking Resource Leaks |
| |
| The TinkerPop query engine retrieves data by interfacing with the provider using iterators. These iterators (depending |
| on the provider) may hold up resources in the underlying storage layer and hence, it is critical to close them after |
| the query is finished. |
| |
| TinkerPop provides you with the ability to test for such resource leaks by checking for leaks when you run the |
| Gremlin-Test suites against your implementation. To enable this leak detection, providers should increment the |
| `StoreIteratorCounter` whenever a resource is opened and decrement it when it is closed. A reference implementation |
| is provided with TinkerGraph as `TinkerGraphIterator.java`. |
| |
| Assertions for leak detection are enabled by default when running the test suite. They can be temporarily disabled |
| by way of a system property - simply set `-DtestIteratorLeaks=false". |
| |
| [[gherkin-tests-suite]] |
| ==== Gherkin Test Suite |
| |
| The Gherkin Test Suite is a language agnostic set of tests that verify Gremlin semantics. It provides a unified set of |
| tests that validate many TinkerPop components internally. The tests themselves can be found in `gremlin-tests/features` |
| (link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/test/features[here]) |
| with their syntax described in the |
| link:https://tinkerpop.apache.org/docs/x.y.z/dev/developer/#gremlin-language-test-cases[TinkerPop Developer Documentation]. |
| |
| TinkerPop provides some infrastructure, for JVM based graphs, to help make it easier for providers to implement these |
| tests against their implementations. This infrastructure is built on `cucumber-java` which is a dependency of |
| `gremlin-test`. There are two main components to implementing the tests: |
| |
| 1. A `org.apache.tinkerpop.gremlin.features.World` implementation which is a class in `gremlin-test`. |
| 2. A JUnit test class that will act as the runner for the tests with the appropriate annotations |
| |
| TIP: It may be helpful to get familiar with link:https://cucumber.io/docs/installation/java/[Cucumber] before |
| proceeding with an implementation. |
| |
| The `World` implementation provides context to the tests and allows providers to intercept test events that might be |
| important to proper execution specific to their implementations. The most important part of implementing `World` is |
| properly implementing the `GraphTraversalSource getGraphTraversalSource(GraphData)` method which provides to the test |
| the `GraphTraversalSource` to execute the test against. |
| |
| The JUnit test class is really just the test runner. It is a simple class which must include some Cucumber annotations. |
| The following is just an example as taken from TinkerGraph: |
| |
| [source,java] |
| ---- |
| @RunWith(Cucumber.class) |
| @CucumberOptions( |
| tags = "not @GraphComputerOnly and not @AllowNullPropertyValues", |
| glue = { "org.apache.tinkerpop.gremlin.features" }, |
| features = { "classpath:/org/apache/tinkerpop/gremlin/test/features" }, |
| plugin = {"progress", "junit:target/cucumber.xml", |
| objectFactory = GuiceFactory.class}) |
| ---- |
| |
| The `@CucumberOptions` that are used are mostly implementation specific, so it will be up to the provider to make some |
| choices as to what is right for their environment. For TinkerGraph, it needed to ignore Gherkin tests with the |
| `@GraphComputerOnly` and `@AllowNullPropertyValues` tags (the full list of possible tags can be found |
| link:https://tinkerpop.apache.org/docs/x.y.z/dev/developer/#gherkin-tags[here]), since this configuration was for OLTP |
| tests and the graph was configured without support for storing `null` in the graph. The "glue" will be the same for all |
| test implementers as it refers to a package containing TinkerPop's test infrastructure in `gremlin-test` (unless of |
| course, a provider needs to develop their own infrastructure for some reason). The "features" is the path to the actual |
| Gherkin test files that should be made available locally. The files can be referenced on the classpath assuming |
| `gremlin-test` is a dependency. The "plugin" defines a JUnit style output, which happens to be understood by Maven. |
| |
| The "objectFactory" is the last component. Cucumber relies on dependency injection to get a `World` implementation into |
| the test infrastructure. Providers may choose from multiple available implementations, but TinkerPop chose to use |
| Guice. To follow this approach include the following module: |
| |
| [source,xml] |
| ---- |
| <dependency> |
| <groupId>com.google.inject</groupId> |
| <artifactId>guice</artifactId> |
| <version>4.2.3</version> |
| <scope>test</scope> |
| </dependency> |
| ---- |
| |
| Following the Neo4jGraph implementation, there are two classes to construct: |
| |
| [source,java] |
| ---- |
| public class ServiceModule extends AbstractModule { |
| @Override |
| protected void configure() { |
| bind(World.class).to(Neo4jGraphWorld.class); |
| } |
| } |
| |
| public class WorldInjectorSource implements InjectorSource { |
| @Override |
| public Injector getInjector() { |
| return Guice.createInjector(Stage.PRODUCTION, CucumberModules.createScenarioModule(), new ServiceModule()); |
| } |
| } |
| ---- |
| |
| The key here is that the `Neo4jGraphWorld` implementation gets bound to `World` in the `ServiceModule` and there is |
| a `WorldInjectorSource` that specifies the `ServiceModule` to Cucumber. As a final step, the provider's test resources |
| needs a `cucumber.properties` file with an entry that specifies the `InjectorSource` so that Guice can find it. Here |
| is the example taken from TinkerGraph where the `WorldInjectorSource` is inner class of `TinkerGraphFeatureTest` |
| itself. |
| |
| [source,text] |
| ---- |
| guice.injector-source=org.apache.tinkerpop.gremlin.neo4j.Neo4jGraphFeatureTest$WorldInjectorSource |
| ---- |
| |
| In the event that a single `World` configuration is insufficient, it may be necessary to develop a custom |
| `ObjectFactory`. An easy way to do this is to create a class that extends from the `AbstractGuiceFactory` in |
| `gremlin-test` and provide that class to the `@CucumberOptions`. This approach does rely on the `ServiceLoader` which |
| means it will be important to include a `io.cucumber.core.backend.ObjectFactory` file in `META-INF/services` and an |
| entry that registers the custom implementation. Please see the TinkerGraph test code for further information on this |
| approach. |
| |
| If implementing the Gherkin tests, providers can choose to opt-in to the slimmed down version of the normal JVM process |
| test suite to help alleviate test duplication between the two frameworks: |
| |
| [source,java] |
| ---- |
| @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_LIMITED_STANDARD) |
| @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_LIMITED_COMPUTER) |
| ---- |
| |
| === Accessibility via GremlinPlugin |
| |
| image:gremlin-plugin.png[width=100,float=left] The applications distributed with TinkerPop do not distribute with |
| any graph system implementations besides TinkerGraph. If your implementation is stored in a Maven repository (e.g. |
| Maven Central Repository), then it is best to provide a <<gremlin-plugins,`GremlinPlugin`>> implementation so the respective jars can be |
| downloaded according and when required by the user. Neo4j's GremlinPlugin is provided below for reference. |
| |
| [source,java] |
| ---- |
| include::{basedir}/neo4j-gremlin/src/main/java/org/apache/tinkerpop/gremlin/neo4j/jsr223/Neo4jGremlinPlugin.java[] |
| ---- |
| |
| With the above plugin implementations, users can now download respective binaries for Gremlin Console, Gremlin Server, etc. |
| |
| [source,groovy] |
| gremlin> g = Neo4jGraph.open('/tmp/neo4j') |
| No such property: Neo4jGraph for class: groovysh_evaluate |
| Display stack trace? [yN] |
| gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z |
| ==>loaded: [org.apache.tinkerpop, neo4j-gremlin, …] |
| gremlin> :plugin use tinkerpop.neo4j |
| ==>tinkerpop.neo4j activated |
| gremlin> g = Neo4jGraph.open('/tmp/neo4j') |
| ==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]] |
| |
| === In-Depth Implementations |
| |
| image:gremlin-painting.png[width=200,float=right] The graph system implementation details presented thus far are |
| minimum requirements necessary to yield a valid TinkerPop implementation. However, there are other areas that a |
| graph system provider can tweak to provide an implementation more optimized for their underlying graph engine. Typical |
| areas of focus include: |
| |
| * Traversal Strategies: A link:https://tinkerpop.apache.org/docs/x.y.z/reference/#traversalstrategy[TraversalStrategy] |
| can be used to alter a traversal prior to its execution. A typical example is converting a pattern of |
| `g.V().has('name','marko')` into a global index lookup for all vertices with name "marko". In this way, a `O(|V|)` |
| lookup becomes an `O(log(|V|))`. Please review `TinkerGraphStepStrategy` for ideas. |
| * Step Implementations: Every link:https://tinkerpop.apache.org/docs/x.y.z/reference/#graph-traversal-steps[step] is |
| ultimately referenced by the `GraphTraversal` interface. It is possible to extend `GraphTraversal` to use a graph |
| system specific step implementation. Note that while it is sometimes possible to develop custom step implementations |
| by extending from a TinkerPop step (typically, `AddVertexStep` and other `Mutating` steps), it's important to |
| consider that doing so introduces some greater risk for code breaks on upgrades as opposed to other areas of the code |
| base. As steps are more internal features of TinkerPop, they might be subject to breaking API and behavioral changes |
| that would be less likely to be accepted by more public facing interfaces. |
| |
| == Graph Driver Provider Requirements |
| |
| image::gremlin-server-protocol.png[width=325] |
| |
| One of the roles for link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-server[Gremlin Server] is to |
| provide a bridge from TinkerPop to non-JVM languages (e.g. Go, Python, etc.). Developers can build language bindings |
| (or driver) that provide a way to submit Gremlin scripts to Gremlin Server and get back results. Given the |
| extensible nature of Gremlin Server, it is difficult to provide an authoritative guide to developing a driver. |
| It is however possible to describe the core communication protocol using the standard out-of-the-box configuration |
| which should provide enough information to develop a driver for a specific language. |
| |
| Gremlin Server is distributed with a configuration that utilizes HTTP with a custom API. Under this configuration, |
| Gremlin Server accepts requests containing a Gremlin script, evaluates that script and then streams back the results in |
| HTTP chunks. |
| |
| Let's use the incoming request to process the Gremlin script of `g.V()` as an example. Gremlin Server evaluates that |
| script, getting an `Iterator` of vertices as a result, and steps through each `Vertex` within it. The vertices are |
| batched together into an HTTP chunk. Each response is serialized given the requested serializer type (GraphBinary is |
| recommended) and written back to the requesting client immediately. Gremlin Server does not wait for the entire result |
| to be iterated, before sending back a response. It will send the responses as they are realized. |
| |
| This approach allows for the processing of large result sets without having to serialize the entire result into memory |
| for the response. It places a bit of a burden on the developer of the driver however, because it becomes necessary to |
| provide a way to reconstruct the entire result on the client side from all of the individual responses that Gremlin |
| Server returns for a single request. Again, this description of Gremlin Server's "flow" is related to the |
| out-of-the-box configuration. It is quite possible to construct other flows, that might be more amenable to a |
| particular language or style of processing. |
| |
| NOTE: TinkerPop provides a test server which may be useful for testing drivers. Details can be found |
| link:https://tinkerpop.apache.org/docs/current/dev/developer/#gremlin-socket-server-tests[here] |
| |
| It is recommended but not required that a driver include a `User-Agent` header as part of any HTTP request to Gremlin |
| Server. Gremlin Server uses the user agent in building usage metrics as well as debugging. The standard format for |
| connection user agents is: |
| |
| [[user-agent-format]] |
| `"[Application Name] [GLV Name].[Version] [Language Runtime Version] [OS].[Version] [CPU Architecture]"` |
| For example: |
| `"MyTestApplication Gremlin-Java.3.5.4 11.0.16.1 Mac_OS_X.12.6.1 aarch64"` |
| |
| The following section provides an in-depth description of the TinkerPop HTTP API. The HTTP API is used for |
| communicating the requests and responses that were described earlier. |
| |
| === HTTP API |
| |
| This section describes the TinkerPop HTTP API which should be implemented by both graph system providers and graph |
| driver providers. There is only one endpoint that currently needs to be supported which is `POST /gremlin`. This |
| endpoint is a Gremlin evaluator which takes in a Gremlin script request and responds with the serialized results. The |
| formats below use a bit of pseudo-JSON to help represent request and response bodies. The actual format of the request |
| and response bodies will be determined by the serializers defined via the "Accept" and "Content-Type" headers. As a |
| result, a generic type definition in this document like "number" could translate to a "long" for a serializer that |
| supports types like GraphBinary. |
| |
| ==== HTTP Request |
| |
| To formulate a request to Gremlin Server, a `RequestMessage` needs to be constructed. The `RequestMessage` is a |
| generalized representation of a request. This message can be serialized in any fashion that is supported by Gremlin |
| Server, which by default is GraphBinary. An HTTP request that contains a `RequestMessage` has the following form: |
| |
| [source,text] |
| ---- |
| POST /gremlin HTTP/1.1 |
| Accept: <mimetype> |
| Content-Type: <mimetype> |
| Gremlin-Hints: <hints> |
| |
| { |
| "gremlin": string, |
| "timeoutMs": number, |
| "bindings": object, |
| "g": string, |
| "language" : string, |
| "materializeProperties": string, |
| "bulkResults": boolean |
| } |
| ---- |
| |
| An actual, complete request might look like the following: |
| |
| [source,text] |
| ---- |
| POST /gremlin HTTP/1.1 |
| content-length: 61 |
| host: 127.0.0.1 |
| content-type: application/vnd.gremlin-v4.0+json |
| accept-encoding: deflate |
| accept: application/vnd.graphbinary-v4.0 |
| user-agent: NotAvailable Gremlin-Java.4.0.0 11.0.25 Windows_11.10.0 amd64 |
| { |
| "gremlin": "g.V()", |
| "language": "gremlin-lang" |
| } |
| ---- |
| |
| ===== Expected Request HTTP Headers |
| |
| [width="100%",cols="3,10,3,3",options="header"] |
| |========================================================= |
| |Name |Description |Required |Default |
| |Accept |Serializer MIME types supported for the response. Must be a mimetype (see <<serializers>>). |No |`application/vnd.gremlin-v4.0+json;types=false` |
| |Accept-Encoding |The requested compression algorithm of the response. Valid values: `deflate`. |No |N/A |
| |Authorization |Header used with Basic authorization. |No |N/A |
| |Content-Length |The size of the payload |Yes |N/A |
| |Content-Type |The MIME type of the serialized body |No |None |
| |Gremlin-Hints |A semi-colon separated list of key/value pair metadata that could be helpful to the server in processing a particular request in some way. Must be a hints (see table below). |No |N/A |
| |User-Agent |The user agent. Follow the format specified by <<user-agent-format, user agent format>>. |No |<<user-agent-format, user agent format>> |
| |========================================================= |
| |
| ===== Request Header Value Options |
| |
| [width="100%",cols="2,10",options="header"] |
| |========================================================= |
| |Name |Options |
| |mimetype |A MIME type listed in <<serializers>>. |
| |hints | mutations: yes, no, unknown - Indicates if the Gremlin contains steps that can mutate the graph. |
| |========================================================= |
| |
| The body of the request should be a `RequestMessage` which is a `Map`. The `RequestMessage` should be serialized using |
| the serializer specified by the `Content-Type` header. The following are the key value pairs allowed in a |
| `RequestMessage`: |
| |
| ===== Request Message Format |
| |
| [width="100%",cols="3,10,3,3",options="header"] |
| |========================================================= |
| |Key |Description |Value |Required |
| |gremlin |The Gremlin query to execute. |String containing script |Yes |
| |timeoutMs |The maximum time a query is allowed to execute in milliseconds. |Number between 0 and 2^31-1 |No |
| |bindings |A map used during query execution. Its usage depends on "language". For "gremlin-groovy", these are the variable bindings. For "gremlin-lang", these are the parameter bindings. |Object (Map) |No |
| |g |The name of the graph traversal source to which the query applies. Default: "g" |String containing traversal source name |No |
| |language |The name of the ScriptEngine to use to parse the gremlin query. Default: "gremlin-lang" |String containing ScriptEngine name |No |
| |materializeProperties |Whether to include all properties for results. One of "tokens" or "all". |String |No |
| |bulkResults |Whether the results should be bulked by the server (only applies to GraphBinary) |Boolean |No |
| |========================================================= |
| |
| ==== HTTP Response |
| |
| When Gremlin Server receives that request, it will decode it given the "mime type", and execute it using the |
| `ScriptEngine` specified by the `language` field. In this case, it will evaluate the script `g.V(x).out()` using the |
| `bindings` supplied in the `args` and stream back the results in HTTP chunks. When the chunks are combined, they will |
| form a single `ResponseMessage`. The HTTP response containing the `ResponseMessage` has the following form: |
| |
| [source,text] |
| ---- |
| HTTP/1.1 200 |
| Content-type: <mimetype> |
| Transfer-Encoding: chunked |
| Gremlin-RequestId: <uuid> |
| { |
| "result": list, |
| "status": object |
| } |
| ---- |
| |
| NOTE: While this response message is expected for all serialized responses, there may be some errors that are not |
| serialized. In that case, the `Content-Type` of the response should be `application/json` and the JSON should contain a |
| `message` key. |
| |
| ===== Response Message Format |
| |
| [width="100%",cols="2,10a",options="header"] |
| |========================================================= |
| |Key |Description |
| |result |A map that contains the result data. |
| [width="100%",cols="3,10,3,3",options="header"] |
| !========================================================= |
| !Name !Description !Required !Default |
| !data !A list of result objects. !Array !Yes |
| !========================================================= |
| |status |A map that contains the status of the result. |
| [width="100%",cols="3,10,3,3",options="header"] |
| !========================================================= |
| !Name !Description !Required !Default |
| !code !The actual <<http-status-codes,status code>> of the result. !Number !Yes |
| !exception !A class of exception if an error occurred. !String !No |
| !message !The error message if an error occurred. !String !No |
| !========================================================= |
| |========================================================= |
| |
| ===== Expected Response HTTP Headers |
| |
| [width="100%",cols="3,10,3,3",options="header"] |
| |========================================================= |
| |Name |Description |Required |Default |
| |Content-Type |The MIME type of the serialized body which is based on the request's `Accept` header. May also be "application/json". |Yes |N/A |
| |Gremlin-RequestId |The server generated UUID that is used as a request ID. |Yes |N/A |
| |Transfer-Encoding |The server should attempt to chunk all responses. |No |"chunked" |
| |========================================================= |
| |
| ===== Response Header Value Options |
| |
| [width="100%",cols="3,10",options="header"] |
| |========================================================= |
| |Name |Options |
| |mimetype |A MIME type listed in <<serializers>>. |
| |uuid |A randomly generated UUID string. |
| |========================================================= |
| |
| [[http-status-codes]] |
| ===== Response Status Codes |
| |
| The following table details the HTTP status codes that Gremlin Server will send: |
| |
| [width="100%",cols="1,3,9",options="header"] |
| |========================================================= |
| |Code |Name |Description |
| |200 |SUCCESS |The server successfully processed a request to completion - there are no messages remaining in this stream. |
| |204 |NO CONTENT |The server processed the request but there is no result to return (e.g. an `Iterator` with no elements) - there are no messages remaining in this stream. |
| |206 |PARTIAL CONTENT |The server successfully returned some content, but there is more in the stream to arrive - wait for a `SUCCESS` to signify the end of the stream. |
| |400 |BAD REQUEST |There was a problem with the HTTP request. |
| |401 |UNAUTHORIZED |The request attempted to access resources that the requesting user did not have access to. |
| |403 |FORBIDDEN |The server could authenticate the request, but will not fulfill it. |
| |404 |NOT FOUND |The server was unable to find the requested resource. |
| |405 |METHOD NOT ALLOWED |The request used an unsupported method. The server only supports POST. |
| |413 |REQUEST ENTITY TOO LARGE |The request was too large or the query could not be compiled due to size limitations. |
| |500 |INTERNAL SERVER ERROR |A general server error occurred that prevented the request from being processed. |
| |505 |HTTP VERSION NOT SUPPORTED |A server error indicating that an unsupported version of HTTP is being used. Only HTTP/1.1 is supported. |
| |========================================================= |
| |
| ===== Trailing Headers |
| |
| Error responses will have trailing headers in addition to the status object in the response body. This information is |
| duplicated and should be the same, so graph driver providers should use whichever is easier for them. The trailers, |
| however, will only contain the `Status` and `Exception` without the `Message`. |
| |
| ==== HTTP Examples |
| |
| For examples of actual requests and responses, take a look at the IO documentation for |
| link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#_requestmessage[GraphSON requests] and |
| link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#_responsemessage[GraphSON responses]. |
| |
| === HTTP Request Interceptor |
| |
| A graph driver may support HTTP request intercepting which provides a means for the user of your graph driver to update |
| the headers and body of the HTTP request before it is sent to the server. This enables use cases where a graph system |
| provider's server implementation has additional capabilities that aren't included in the base Gremlin Server. Although |
| every graph system provider is expected to support the protocol defined by the TinkerPop HTTP API, this doesn't |
| preclude them from including additional functionality. Be aware that if you choose to not provide this functionality, |
| then your graph driver may not have access to some graph provider's features, or, possibly, it may not be able to |
| connect at all. |
| |
| === Authentication and Authorization |
| |
| By default, Gremlin Server only supports |
| link:https://en.wikipedia.org/wiki/Basic_access_authentication[basic HTTP authentication]. This is handled by the |
| `HttpBasicAuthenticationHandler` which is the only `AbstractAuthenticationHandler` provided with the Gremlin Server. |
| Other common HTTP authentication schemes that are sent via an HTTP header can be supported by implementing a custom |
| `AbstractAuthenticationHandler`. Because the communication protocol is HTTP/1.1, authentication should be header-based |
| and should not include negotiation. |
| |
| When basic authentication is enabled, an incoming request is intercepted before it is evaluated by the `ScriptEngine`. |
| The request is examined for an `Authorization` header. If one doesn't exist then "401 Unauthorized" error response is |
| returned. |
| |
| In addition to authenticating users at the start of a connection, Gremlin Server allows providers to authorize users on |
| a per request basis. If |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#_configuring_2[a java class is configured] that implements the |
| link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/server/authz/Authorizer.html[Authorizer interface], |
| Gremlin Server passes each request to this `Authorizer`. The `Authorizer` can deny authorization for the request by |
| throwing an exception and Gremlin Server returns `UNAUTHORIZED` (status code `401`) to the client. The `Authorizer` |
| authorizes the request by returning the original request or the request with some additional constraints. Gremlin Server |
| proceeds with the returned request and on its turn returns the result of the request to the client. More details on |
| implementing authorization can be found in the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#security[reference documentation for Gremlin Server security]. |
| |
| NOTE: While Gremlin Server supports this authorization feature it is not a feature that TinkerPop requires of graph |
| providers as part of the agreement between client and server. |
| |
| [[serializers]] |
| === Serializers |
| |
| In order to serialize and deserialize the requests and responses, your graph driver will need to implement |
| link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphbinary[GraphBinary]. The Gremlin Server is capable of |
| returning both GraphBinary and GraphSON, however, GraphBinary is a more compact format which can lead to increased |
| performance as fewer bytes need to be sent through the wire. For this reason, drivers only need to support GraphBinary. |
| link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[GraphSON] can be used by applications that only support JSON serialization. |
| |
| The following table lists the serializers supported by the Gremlin Server and their MIME types. These MIME types should |
| be used in the `Content-Type` and `Accept` HTTP headers. |
| |
| [width="100%",cols="3,5,5",options="header"] |
| |========================================================= |
| |Name |Description |MIME type |
| |Untyped GraphSON 4.0 |A JSON-based graph format |application/vnd.gremlin-v4.0+json;types=false |
| |Typed GraphSON 4.0 |A JSON-based graph format with embedded type information used for serialization |application/vnd.gremlin-v4.0+json;types=true |
| |GraphBinary 4.0 |A binary graph format |application/vnd.graphbinary-v4.0 |
| |========================================================= |
| |
| ==== IO Tests |
| |
| The IO test suite is a collection of files that contain the expected outcome of serialization of certain types. These |
| tests can be used to determine if a particular serializer has been correctly implemented. In general, a driver should |
| be able to "round trip" each of these types. That is, it should be able to both read from and write to those exact same |
| bytes. Not all programming languages provide library types that will match the specification of the corresponding type |
| defined by the serializer. In this case, it is not possible to completely round trip that type and you may skip that |
| test. The GraphBinary test files can be found |
| link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/structure/io/graphbinary[here]. |
| The link:https://github.com/apache/tinkerpop/blob/x.y.z/gremlin-util/src/test/java/org/apache/tinkerpop/gremlin/structure/io/AbstractTypedCompatibilityTest.java:[Java implementation] |
| can be used as a reference on how these files can be used and its |
| link:https://github.com/apache/tinkerpop/blob/x.y.z/gremlin-util/src/test/java/org/apache/tinkerpop/gremlin/structure/io/Model.java[model] |
| shows the Java representation of those files. |
| |
| [[gremlin-plugins]] |
| == Gremlin Plugins |
| |
| image:gremlin-plugin.png[width=125] |
| |
| Plugins provide a way to expand the features of a `GremlinScriptEngine`, which stands at that core of both Gremlin |
| Console and Gremlin Server. Providers may wish to create plugins for a variety of reasons, but some common examples |
| include: |
| |
| * Initialize the `GremlinScriptEngine` application with important classes so that the user doesn't need to type their |
| own imports. |
| * Place specific objects in the bindings of the `GremlinScriptEngine` for the convenience of the user. |
| * Bootstrap the `GremlinScriptEngine` with custom functions so that they are ready for usage at startup. |
| |
| The first step to developing a plugin is to implement the link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/GremlinPlugin.html[GremlinPlugin] |
| interface: |
| |
| [source,java] |
| ---- |
| include::{basedir}/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/GremlinPlugin.java[] |
| ---- |
| |
| The most simple plugin and the one most commonly implemented will likely be one that just provides a list of classes |
| for import. This type of plugin is the easiest way for implementers of the TinkerPop Structure and Process APIs to |
| make their implementations available to users. The TinkerGraph implementation has just such a plugin: |
| |
| [source,java] |
| ---- |
| include::{basedir}/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/jsr223/TinkerGraphGremlinPlugin.java[] |
| ---- |
| |
| This plugin extends from the abstract base class of link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/AbstractGremlinPlugin.html[AbstractGremlinPlugin] |
| which provides some default implementations of the `GremlinPlugin` methods. It simply allows those who extend from it |
| to be able to just supply the name of the module and a list of link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/Customizer.html[Customizer] |
| instances to apply to the `GremlinScriptEngine`. In this case, the TinkerGraph plugin just needs an |
| link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/ImportCustomizer.html[ImportCustomizer] |
| which describes the list of classes to import when the plugin is activated and applied to the `GremlinScriptEngine`. |
| |
| The `ImportCustomizer` is just one of several provided `Customizer` implementations that can be used in conjunction |
| with plugin development: |
| |
| * link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/BindingsCustomizer.html[BindingsCustomizer] - Inject a key/value pair into the global bindings of the `GremlinScriptEngine` instances |
| * link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/ImportCustomizer.html[ImportCustomizer] - Add imports to a `GremlinScriptEngine` |
| * link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/ScriptCustomizer.html[ScriptCustomizer] - Execute a script on a `GremlinScriptEngine` at startup |
| |
| Individual `GremlinScriptEngine` instances may have their own `Customizer` instances that can be used only with that |
| engine - e.g. `gremlin-groovy` has some that are specific to controlling the Groovy compiler configuration. Developing |
| a new `Customizer` implementation is not really possible without changes to TinkerPop, as the framework is not designed |
| to respond to external ones. The base `Customizer` implementations listed above should cover most needs. |
| |
| A `GremlinPlugin` must support one of two instantiation models so that it can be instantiated from configuration files |
| for use in various situations - e.g. Gremlin Server. The first option is to use a static initializer given a method |
| with the following signature: |
| |
| [source,java] |
| ---- |
| public static GremlinPlugin instance() |
| ---- |
| |
| The limitation with this approach is that it does not provide a way to supply any configuration to the plugin so it |
| tends to only be useful for fairly simplistic plugins. The more advanced approach is to provide a "builder" given a |
| method with the following signature: |
| |
| [source,java] |
| ---- |
| public static Builder build() |
| ---- |
| |
| It doesn't really matter what kind of class is returned from `build` so long as it follows a "Builder" pattern, where |
| methods on that object return an instance of itself, so that builder methods can be chained together prior to calling |
| a final `create` method as follows: |
| |
| [source,java] |
| ---- |
| public GremlinPlugin create() |
| ---- |
| |
| Please see the link:https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/jsr223/ImportGremlinPlugin.html[ImportGremlinPlugin] |
| for an example of what implementing a `Builder` might look like in this context. |
| |
| Note that the plugin provides a unique name for the plugin which follows a namespaced pattern as _namespace_._plugin-name_ |
| (e.g. "tinkerpop.hadoop" - "tinkerpop" is the reserved namespace for TinkerPop maintained plugins). |
| |
| For plugins that will work with Gremlin Console, there is one other step to follow to ensure that the `GremlinPlugin` |
| will work there. The console loads `GremlinPlugin` instances via link:http://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html[ServiceLoader] |
| and therefore need a resource file added to the jar file where the plugin exists. Add a file called |
| `org.apache.tinkerpop.gremlin.jsr223.GremlinPlugin` to `META-INF/services`. In the case of the TinkerGraph |
| plugin above, that file will have this line in it: |
| |
| [source,java] |
| ---- |
| include::{basedir}/tinkergraph-gremlin/src/main/resources/META-INF/services/org.apache.tinkerpop.gremlin.jsr223.GremlinPlugin[] |
| ---- |
| |
| Once the plugin is packaged, there are two ways to test it out: |
| |
| . Copy the jar and its dependencies to the Gremlin Console path and start it. It is preferrable that the plugin is |
| copied to the `/ext/_plugin_name_` directory. |
| . Start Gremlin Console and try the `:install` command: `:install com.company my-plugin 1.0.0`. |
| |
| In either case, once one of these two approaches is taken, the jars and their dependencies are available to the |
| Console. The next step is to "activate" the plugin by doing `:plugin use my-plugin`, where "my-plugin" refers to the |
| name of the plugin to activate. |
| |
| NOTE: When `:install` is used logging dependencies related to link:http://www.slf4j.org/[SLF4J] are filtered out so as |
| not to introduce multiple logger bindings (which generates warning messages to the logs). |
| |
| Plugins can also tie into the `:remote` and `:submit` commands. Recall that a `:remote` represents a different |
| context within which Gremlin is executed, when issued with `:submit`. It is encouraged to use this integration point |
| when possible, as opposed to registering new commands that can otherwise follow the `:remote` and `:submit` pattern. |
| To expose this integration point as part of a plugin, implement the `RemoteAcceptor` interface: |
| |
| TIP: Be good to the users of plugins and prevent dependency conflicts. Maintaining a conflict free plugin is most |
| easily done by using the link:http://maven.apache.org/enforcer/maven-enforcer-plugin/[Maven Enforcer Plugin]. |
| |
| TIP: Consider binding the plugin's minor version to the TinkerPop minor version so that it's easy for users to figure |
| out plugin compatibility. Otherwise, clearly document a compatibility matrix for the plugin somewhere that users can |
| find it. |
| |
| [source,java] |
| ---- |
| include::{basedir}/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/console/RemoteAcceptor.java[] |
| ---- |
| |
| The `RemoteAcceptor` can be bound to a `GremlinPlugin` by adding a `ConsoleCustomizer` implementation to the list of |
| `Customizer` instances that are returned from the `GremlinPlugin`. The `ConsoleCustomizer` will only be executed when |
| in use with the Gremlin Console plugin host. Simply instantiate and return a `RemoteAcceptor` in the |
| `ConsoleCustomizer.getRemoteAcceptor(GremlinShellEnvironment)` method. Generally speaking, each call to |
| `getRemoteAcceptor(GremlinShellEnvironment)` should produce a new instance of a `RemoteAcceptor`. |
| |
| include::gremlin-semantics.asciidoc[] |
| |
| include::policies.asciidoc[] |