| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[tinkergraph-gremlin]] |
| == TinkerGraph-Gremlin |
| |
| [source,xml] |
| ---- |
| <dependency> |
| <groupId>org.apache.tinkerpop</groupId> |
| <artifactId>tinkergraph-gremlin</artifactId> |
| <version>x.y.z</version> |
| </dependency> |
| ---- |
| |
| image:tinkerpop-character.png[width=100,float=left] TinkerGraph is a single machine, in-memory (with optional |
| persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with |
| TinkerPop and serves as the reference implementation for other providers to study in order to understand the |
| semantics of the various methods of the TinkerPop API. Its status as a reference implementation does not however imply |
| that it is not suitable for production. TinkerGraph has many practical use cases in production applications and their |
| development. Some examples of TinkerGraph use cases include: |
| |
| * Ad-hoc analysis of large immutable graphs that fit in memory. |
| * Extract subgraphs, from larger graphs that don't fit in memory, into TinkerGraph for further analysis or other |
| purposes. |
| * Use TinkerGraph as a sandbox to develop and debug complex traversals by simulating data from a larger graph inside |
| a TinkerGraph. |
| |
| Constructing a simple graph using TinkerGraph in Java is presented below: |
| |
| [source,java] |
| ---- |
| Graph graph = TinkerGraph.open(); |
| GraphTraversalSource g = traversal().withEmbedded(graph); |
| Vertex marko = g.addV("person").property("name","marko").property("age",29).next(); |
| Vertex lop = g.addV("software").property("name","lop").property("lang","java").next(); |
| g.addE("created").from(marko).to(lop).property("weight",0.6d).iterate(); |
| ---- |
| |
| The above Gremlin creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6 |
| property. The addition of these two vertices and the edge between them could also be done in a single Gremlin statement |
| as follows: |
| |
| [source,java] |
| ---- |
| g.addV("person").property("name","marko").property("age",29).as("m"). |
| addV("software").property("name","lop").property("lang","java").as("l"). |
| addE("created").from("m").to("l").property("weight",0.6d).iterate(); |
| ---- |
| |
| IMPORTANT: Pay attention to the fact that traversals end with `next()` or `iterate()`. These methods advance the |
| objects in the traversal stream and without those methods, the traversal does nothing. Review the |
| link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/#result-iteration[Result Iteration Section] |
| of The Gremlin Console tutorial for more information. |
| |
| Next, the graph can be queried as such. |
| |
| [source,java] |
| g.V().has("name","marko").out("created").values("name") |
| |
| The `g.V().has("name","marko")` part of the query can be executed in two ways. |
| |
| * A linear scan of all vertices filtering out those vertices that don't have the name "marko" |
| * A `O(log(|V|))` index lookup for all vertices with the name "marko" |
| |
| Given the initial graph construction in the first code block, no index was defined and thus, a linear scan is executed. |
| However, if the graph was constructed as such, then an index lookup would be used. |
| |
| [source,java] |
| Graph g = TinkerGraph.open(); |
| g.createIndex("name",Vertex.class) |
| |
| The execution times for a vertex lookup by property is provided below for both no-index and indexed version of |
| TinkerGraph over the Grateful Dead graph. |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = traversal().withEmbedded(graph) |
| g.io('data/grateful-dead.xml').read().iterate() |
| clock(1000) {g.V().has('name','Garcia').iterate()} <1> |
| graph = TinkerGraph.open() |
| g = traversal().withEmbedded(graph) |
| graph.createIndex('name',Vertex.class) |
| g.io('data/grateful-dead.xml').read().iterate() |
| clock(1000){g.V().has('name','Garcia').iterate()} <2> |
| ---- |
| |
| <1> Determine the average runtime of 1000 vertex lookups when no `name`-index is defined. |
| <2> Determine the average runtime of 1000 vertex lookups when a `name`-index is defined. |
| |
| IMPORTANT: Each graph system will have different mechanism by which indices and schemas are defined. TinkerPop |
| does not require any conformance in this area. In TinkerGraph, the only definitions are around indices. With other |
| graph systems, property value types, indices, edge labels, etc. may be required to be defined _a priori_ to adding |
| data to the graph. |
| |
| NOTE: TinkerGraph is distributed with Gremlin Server and is therefore automatically available to it for configuration. |
| |
| === Data Types |
| |
| TinkerGraph can store any Java `Object` for a property value. It is therefore important to take note of the types of |
| the values that are being used and it is often best to be explicit in terms of exactly what type is being used, |
| especially in the case of numbers. |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = traversal().withEmbedded(graph) |
| g.addV().property('vp2',0.65780294) |
| g.addV().property('vp2',0.65780294f) |
| g.addV().property('vp2',0.65780294d) |
| g.V().has('vp2',0.65780294) <1> |
| g.V().has('vp2',0.65780294f) <2> |
| g.V().has('vp2',0.65780294d) <3> |
| ---- |
| |
| <1> In Gremlin Console, `0.65780294` actually evaluates to a `BigDecimal`, which won't match the specifically typed |
| `float` property value. |
| <2> The explicit `float` will only match the `float` property value. |
| <3> The explicit `double` will only match the `double` and `BigDecimal` values. |
| |
| Unlike other graphs, the above demonstration shows that TinkerGraph does not do any form of type coercion (except for |
| type coercion related to element identifiers as described in the <<next section,tinkergraph-configuration>>). |
| |
| [[tinkergraph-configuration]] |
| === Configuration |
| |
| TinkerGraph has several settings that can be provided on creation via `Configuration` object: |
| |
| [width="100%",cols="2,10",options="header"] |
| |========================================================= |
| |Property |Description |
| |gremlin.graph |`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph` |
| |gremlin.tinkergraph.vertexIdManager |The `IdManager` implementation to use for vertices. |
| |gremlin.tinkergraph.edgeIdManager |The `IdManager` implementation to use for edges. |
| |gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties. |
| |gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called. |
| |gremlin.tinkergraph.allowNullPropertyValues |A boolean value that determines whether or not `null` property values are allowed and defaults to `false`. |
| |gremlin.tinkergraph.graphLocation |The path and file name for where TinkerGraph should persist the graph data. If a |
| value is specified here, the `gremlin.tinkergraph.graphFormat` should also be specified. If this value is not |
| included (default), then the graph will stay in-memory and not be loaded/persisted to disk. |
| |gremlin.tinkergraph.graphFormat |The format to use to serialize the graph which may be one of the following: |
| `graphml`, `graphson`, `gryo`, or a fully qualified class name that implements Io.Builder interface (which allows for |
| external third party graph reader/writer formats to be used for persistence). |
| If a value is specified here, then the `gremlin.tinkergraph.graphLocation` should |
| also be specified. If this value is not included (default), then the graph will stay in-memory and not be |
| loaded/persisted to disk. |
| |========================================================= |
| |
| NOTE: To use <<tinkergraph-gremlin-tx, transactions>>, configure `gremlin.graph` as |
| `org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerTransactionGraph`. |
| |
| The `IdManager` settings above refer to how TinkerGraph will control identifiers for vertices, edges and vertex |
| properties. There are several options for each of these settings: `ANY`, `LONG`, `INTEGER`, `UUID`, or the fully |
| qualified class name of an `IdManager` implementation on the classpath. When not specified, the default values |
| for all settings is `ANY`, meaning that the graph will work with any object on the JVM as the identifier and will |
| generate new identifiers from `Long` when the identifier is not user supplied. TinkerGraph will also expect the |
| user to understand the types used for identifiers when querying, meaning that `g.V(1)` and `g.V(1L)` could return |
| two different vertices. `LONG`, `INTEGER` and `UUID` settings will try to coerce identifier values to the expected |
| type as well as generate new identifiers with that specified type. |
| |
| TIP: Setting the `IdManager` to `ANY` also allows `String` type ID values to be used. |
| |
| If the TinkerGraph is configured for persistence with `gremlin.tinkergraph.graphLocation` and |
| `gremlin.tinkergraph.graphFormat`, then the graph will be written to the specified location with the specified |
| format when `Graph.close()` is called. In addition, if these settings are present, TinkerGraph will attempt to |
| load the graph from the specified location. |
| |
| IMPORTANT: If choosing `graphson` as the `gremlin.tinkergraph.graphFormat`, be sure to also establish the various |
| `IdManager` settings as well to ensure that identifiers are properly coerced to the appropriate types as GraphSON |
| can lose the identifier's type during serialization (i.e. it will assume `Integer` when the default for TinkerGraph |
| is `Long`, which could lead to load errors that result in a message like, "Vertex with id already exists"). |
| |
| It is important to consider the data being imported to TinkerGraph with respect to `defaultVertexPropertyCardinality` |
| setting. For example, if a `.gryo` file is known to contain multi-property data, be sure to set the default |
| cardinality to `list` or else the data will import as `single`. Consider the following: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = traversal().withEmbedded(graph) |
| g.io("data/tinkerpop-crew.kryo").read().iterate() |
| g.V().properties() |
| conf = new BaseConfiguration() |
| conf.setProperty("gremlin.tinkergraph.defaultVertexPropertyCardinality","list") |
| graph = TinkerGraph.open(conf) |
| g = traversal().withEmbedded(graph) |
| g.io("data/tinkerpop-crew.kryo").read().iterate() |
| g.V().properties() |
| ---- |
| |
| [[tinkergraph-gremlin-tx]] |
| === Transactions |
| |
| `TinkerGraph` includes optional transaction support and thread-safety through the `TinkerTransactionGraph` class. |
| The default configuration of TinkerGraph remains non-transactional. |
| |
| NOTE: This feature was first made available in TinkerPop 3.7.0. |
| |
| ==== Transaction Semantics |
| |
| `TinkerTransactionGraph` only has support for `ThreadLocal` transactions, so embedded graph transactions may not be fully |
| supported. You can think of the transaction as belonging to a thread, any traversals executed within the same thread |
| will share the same transaction even if you attempt to start a new transaction. |
| |
| `TinkerTransactionGraph` provides the `read committed` transaction isolation level. This means that it will always try to |
| guard against dirty reads. While you may notice stricter isolation semantics in some cases, you should not depend on |
| this behavior as it may change in the future. |
| |
| `TinkerTransactionGraph` employs optimistic locking as its locking strategy. This reduces complexity in the design as |
| there are fewer timeouts that the user needs to manage. However, a consequence of this approach is that a transaction |
| will throw a `TransactionException` if two different transactions attempt to lock the same element (see "Best Practices" |
| below). |
| |
| [[testing-remote-providers]] |
| ==== Testing Remote Providers |
| |
| These transaction semantics described above may not fit use cases for some production scenarios that require strict |
| ACID-like transactions. Therefore, it is recommended that `TinkerTransactionGraph` be used as a `Graph` for test |
| environments where you still require access to a `Graph` that supports transactions. `TinkerTransactionGraph` does fully |
| support TinkerPop's `Transaction` interface which still makes it a useful `Graph` for exploring the |
| <<transactions,Transaction API>>. |
| |
| A common scenario where this sort of testing is helpful is with <<connecting-rgp, Remote Graph Providers>>, where |
| developing unit tests might be hard against a graph service. Instead, configure `TinkerTransactionGraph`, either in an |
| embedded style if using Java or with Gremlin Server for other cases. |
| |
| [source,java] |
| ---- |
| // consider this class that returns the results of some Gremlin. by constructing the |
| // GraphService in a way that takes a GraphTraversalSource it becomes possible to |
| // execute getPersons() under any graph system. |
| public class GraphService { |
| private final GraphTraversalSource g; |
| |
| public GraphService(GraphTraversalSource g) { |
| this.g = g; |
| } |
| |
| public List<Vertex> getPersons() { |
| return g.V().hasLabel("person").toList(); |
| } |
| } |
| |
| // when writing tests for the GraphService it becomes possible to configure the test |
| // to run in a variety of scenarios. here we decide that TinkerTransactionGraph is a |
| // suitable test graph replacement for our actual production graph. |
| public class GraphServiceTest { |
| private static final TinkerTransactionGraph graph = TinkerTransactionGraph().open(); |
| private static final GraphTraversalSource g = traversal.withEmbedded(graph); |
| private static final GraphService service = new GraphService(g); |
| |
| @Test |
| public void shouldGetPersons() { |
| final List<Vertex> persons = service.getPersons(); |
| assertEquals(6, persons.size()); |
| } |
| } |
| |
| // or perhaps, since we're using a remote graph provider, we feel it would be better to |
| // start Gremlin Server with a TinkerTransactionGraph configured using a docker container, |
| // embedding it directly in our tests or running it as a separate process like: |
| // |
| // bin/gremlin-server.sh conf/gremlin-server-transaction.yaml |
| // |
| // and then connect to it with a driver in more of an integration test style. obviously, |
| // with this approach you could also configure your production graph directly or use custom |
| // build options to trigger different test configurations for a more dynamic approach |
| public class GraphServiceTest { |
| private static final GraphTraversalSource g = traversal.withRemote( |
| new DriverRemoteConnection('ws://localhost:8182/gremlin')); |
| private static final GraphService service = new GraphService(g); |
| |
| @Test |
| public void shouldGetPersons() { |
| final List<Vertex> persons = service.getPersons(); |
| assertEquals(6, persons.size()); |
| } |
| } |
| ---- |
| |
| WARNING: There can be subtle behavioral differences between TinkerGraph and the graph ultimately intended for use. |
| Be aware of the differences when writing tests to ensure that you are testing behaviors of your applications |
| appropriately. |
| |
| ==== Best Practices |
| |
| Errors can occur before a transaction gets committed. Specifically for `TinkerTransactionGraph`, you may encounter many |
| `TransactionException` errors in a highly concurrent environment due its optimistic approach to locking. Users should |
| follow the try-catch-rollback pattern described in the |
| link:https://tinkerpop.apache.org/docs/x.y.z/reference/#transactions[transactions] section in combination with |
| exponential backoff based retries to mitigate this issue. |
| |
| ==== Performance Considerations |
| |
| While transactions impose minimal impact for mutating workloads, users should expect performance degradation for |
| read-only work relative to the non-transactional configuration. However, its approach to locking |
| (write-only, optimistic) and its in-memory nature, TinkerTransactionGraph is likely faster than other `Graph` |
| implementations that support transactions. |
| |
| ==== Examples |
| |
| Constructing a simple graph using `TinkerTransactionGraph` in Java is presented below: |
| |
| [source,java] |
| ---- |
| Graph graph = TinkerTransactionGraph.open(); |
| g = traversal().withEmbedded(graph) |
| GraphTraversalSource gtx = g.tx().begin(); |
| |
| try { |
| Vertex marko = gtx.addV("person").property("name","marko").property("age",29).next(); |
| Vertex lop = gtx.addV("software").property("name","lop").property("lang","java").next(); |
| gtx.addE("created").from(marko).to(lop).property("weight",0.6d).iterate(); |
| |
| gtx.tx().commit(); |
| } catch (Exception ex) { |
| gtx.tx().rollback(); |
| } |
| ---- |
| |
| The above Gremlin creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6 |
| property. In case of any errors `rollback()` will be called and no changes will be performed. |
| |
| To use the embedded TinkerTransactionGraph in Gremlin Console: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerTransactionGraph.open() <1> |
| g = traversal().withEmbedded(graph) <2> |
| g.addV('test').property('name','one') |
| g.tx().commit() <3> |
| g.V().valueMap() |
| g.addV('test').property('name','two') <4> |
| g.V().valueMap() |
| g.tx().rollback() <5> |
| g.V().valueMap() |
| ---- |
| |
| <1> Open transactional graph. |
| <2> Spawn a GraphTraversalSource with transactional graph. |
| <3> Commit the add vertex operation |
| <4> Add a second vertex without committing |
| <5> Rollback the change |