blob: f0ba65f1ceceff63dd071b543f152ef45b576af3 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[tinkergraph-gremlin]]
== TinkerGraph-Gremlin
[source,xml]
----
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>tinkergraph-gremlin</artifactId>
<version>x.y.z</version>
</dependency>
----
image:tinkerpop-character.png[width=100,float=left] TinkerGraph is a single machine, in-memory (with optional
persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with
TinkerPop and serves as the reference implementation for other providers to study in order to understand the
semantics of the various methods of the TinkerPop API. Its status as a reference implementation does not however imply
that it is not suitable for production. TinkerGraph has many practical use cases in production applications and their
development. Some examples of TinkerGraph use cases include:
* Ad-hoc analysis of large immutable graphs that fit in memory.
* Extract subgraphs, from larger graphs that don't fit in memory, into TinkerGraph for further analysis or other
purposes.
* Use TinkerGraph as a sandbox to develop and debug complex traversals by simulating data from a larger graph inside
a TinkerGraph.
Constructing a simple graph using TinkerGraph in Java is presented below:
[source,java]
----
Graph graph = TinkerGraph.open();
GraphTraversalSource g = traversal().withEmbedded(graph);
Vertex marko = g.addV("person").property("name","marko").property("age",29).next();
Vertex lop = g.addV("software").property("name","lop").property("lang","java").next();
g.addE("created").from(marko).to(lop).property("weight",0.6d).iterate();
----
The above Gremlin creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6
property. The addition of these two vertices and the edge between them could also be done in a single Gremlin statement
as follows:
[source,java]
----
g.addV("person").property("name","marko").property("age",29).as("m").
addV("software").property("name","lop").property("lang","java").as("l").
addE("created").from("m").to("l").property("weight",0.6d).iterate();
----
IMPORTANT: Pay attention to the fact that traversals end with `next()` or `iterate()`. These methods advance the
objects in the traversal stream and without those methods, the traversal does nothing. Review the
link:https://tinkerpop.apache.org/docs/x.y.z/tutorials/the-gremlin-console/#result-iteration[Result Iteration Section]
of The Gremlin Console tutorial for more information.
Next, the graph can be queried as such.
[source,java]
g.V().has("name","marko").out("created").values("name")
The `g.V().has("name","marko")` part of the query can be executed in two ways.
* A linear scan of all vertices filtering out those vertices that don't have the name "marko"
* A `O(log(|V|))` index lookup for all vertices with the name "marko"
Given the initial graph construction in the first code block, no index was defined and thus, a linear scan is executed.
However, if the graph was constructed as such, then an index lookup would be used.
[source,java]
Graph g = TinkerGraph.open();
g.createIndex("name",Vertex.class)
The execution times for a vertex lookup by property is provided below for both no-index and indexed version of
TinkerGraph over the Grateful Dead graph.
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.io('data/grateful-dead.xml').read().iterate()
clock(1000) {g.V().has('name','Garcia').iterate()} <1>
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
graph.createIndex('name',Vertex.class)
g.io('data/grateful-dead.xml').read().iterate()
clock(1000){g.V().has('name','Garcia').iterate()} <2>
----
<1> Determine the average runtime of 1000 vertex lookups when no `name`-index is defined.
<2> Determine the average runtime of 1000 vertex lookups when a `name`-index is defined.
IMPORTANT: Each graph system will have different mechanism by which indices and schemas are defined. TinkerPop
does not require any conformance in this area. In TinkerGraph, the only definitions are around indices. With other
graph systems, property value types, indices, edge labels, etc. may be required to be defined _a priori_ to adding
data to the graph.
NOTE: TinkerGraph is distributed with Gremlin Server and is therefore automatically available to it for configuration.
=== Data Types
TinkerGraph can store any Java `Object` for a property value. It is therefore important to take note of the types of
the values that are being used and it is often best to be explicit in terms of exactly what type is being used,
especially in the case of numbers.
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.addV().property('vp2',0.65780294)
g.addV().property('vp2',0.65780294f)
g.addV().property('vp2',0.65780294d)
g.V().has('vp2',0.65780294) <1>
g.V().has('vp2',0.65780294f) <2>
g.V().has('vp2',0.65780294d) <3>
----
<1> In Gremlin Console, `0.65780294` actually evaluates to a `BigDecimal`, which won't match the specifically typed
`float` property value.
<2> The explicit `float` will only match the `float` property value.
<3> The explicit `double` will only match the `double` and `BigDecimal` values.
Unlike other graphs, the above demonstration shows that TinkerGraph does not do any form of type coercion (except for
type coercion related to element identifiers as described in the <<next section,tinkergraph-configuration>>).
[[tinkergraph-configuration]]
=== Configuration
TinkerGraph has several settings that can be provided on creation via `Configuration` object:
[width="100%",cols="2,10",options="header"]
|=========================================================
|Property |Description
|gremlin.graph |`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph`
|gremlin.tinkergraph.vertexIdManager |The `IdManager` implementation to use for vertices.
|gremlin.tinkergraph.edgeIdManager |The `IdManager` implementation to use for edges.
|gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties.
|gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called.
|gremlin.tinkergraph.allowNullPropertyValues |A boolean value that determines whether or not `null` property values are allowed and defaults to `false`.
|gremlin.tinkergraph.graphLocation |The path and file name for where TinkerGraph should persist the graph data. If a
value is specified here, the `gremlin.tinkergraph.graphFormat` should also be specified. If this value is not
included (default), then the graph will stay in-memory and not be loaded/persisted to disk.
|gremlin.tinkergraph.graphFormat |The format to use to serialize the graph which may be one of the following:
`graphml`, `graphson`, `gryo`, or a fully qualified class name that implements Io.Builder interface (which allows for
external third party graph reader/writer formats to be used for persistence).
If a value is specified here, then the `gremlin.tinkergraph.graphLocation` should
also be specified. If this value is not included (default), then the graph will stay in-memory and not be
loaded/persisted to disk.
|=========================================================
NOTE: To use <<tinkergraph-gremlin-tx, transactions>>, configure `gremlin.graph` as
`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerTransactionGraph`.
The `IdManager` settings above refer to how TinkerGraph will control identifiers for vertices, edges and vertex
properties. There are several options for each of these settings: `ANY`, `LONG`, `INTEGER`, `UUID`, or the fully
qualified class name of an `IdManager` implementation on the classpath. When not specified, the default values
for all settings is `ANY`, meaning that the graph will work with any object on the JVM as the identifier and will
generate new identifiers from `Long` when the identifier is not user supplied. TinkerGraph will also expect the
user to understand the types used for identifiers when querying, meaning that `g.V(1)` and `g.V(1L)` could return
two different vertices. `LONG`, `INTEGER` and `UUID` settings will try to coerce identifier values to the expected
type as well as generate new identifiers with that specified type.
TIP: Setting the `IdManager` to `ANY` also allows `String` type ID values to be used.
If the TinkerGraph is configured for persistence with `gremlin.tinkergraph.graphLocation` and
`gremlin.tinkergraph.graphFormat`, then the graph will be written to the specified location with the specified
format when `Graph.close()` is called. In addition, if these settings are present, TinkerGraph will attempt to
load the graph from the specified location.
IMPORTANT: If choosing `graphson` as the `gremlin.tinkergraph.graphFormat`, be sure to also establish the various
`IdManager` settings as well to ensure that identifiers are properly coerced to the appropriate types as GraphSON
can lose the identifier's type during serialization (i.e. it will assume `Integer` when the default for TinkerGraph
is `Long`, which could lead to load errors that result in a message like, "Vertex with id already exists").
It is important to consider the data being imported to TinkerGraph with respect to `defaultVertexPropertyCardinality`
setting. For example, if a `.gryo` file is known to contain multi-property data, be sure to set the default
cardinality to `list` or else the data will import as `single`. Consider the following:
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.io("data/tinkerpop-crew.kryo").read().iterate()
g.V().properties()
conf = new BaseConfiguration()
conf.setProperty("gremlin.tinkergraph.defaultVertexPropertyCardinality","list")
graph = TinkerGraph.open(conf)
g = traversal().withEmbedded(graph)
g.io("data/tinkerpop-crew.kryo").read().iterate()
g.V().properties()
----
[[tinkergraph-gremlin-tx]]
=== Transactions
`TinkerGraph` includes optional transaction support and thread-safety through the `TinkerTransactionGraph` class.
The default configuration of TinkerGraph remains non-transactional.
NOTE: This feature was first made available in TinkerPop 3.7.0.
==== Transaction Semantics
`TinkerTransactionGraph` only has support for `ThreadLocal` transactions, so embedded graph transactions may not be fully
supported. You can think of the transaction as belonging to a thread, any traversals executed within the same thread
will share the same transaction even if you attempt to start a new transaction.
`TinkerTransactionGraph` provides the `read committed` transaction isolation level. This means that it will always try to
guard against dirty reads. While you may notice stricter isolation semantics in some cases, you should not depend on
this behavior as it may change in the future.
`TinkerTransactionGraph` employs optimistic locking as its locking strategy. This reduces complexity in the design as
there are fewer timeouts that the user needs to manage. However, a consequence of this approach is that a transaction
will throw a `TransactionException` if two different transactions attempt to lock the same element (see "Best Practices"
below).
[[testing-remote-providers]]
==== Testing Remote Providers
These transaction semantics described above may not fit use cases for some production scenarios that require strict
ACID-like transactions. Therefore, it is recommended that `TinkerTransactionGraph` be used as a `Graph` for test
environments where you still require access to a `Graph` that supports transactions. `TinkerTransactionGraph` does fully
support TinkerPop's `Transaction` interface which still makes it a useful `Graph` for exploring the
<<transactions,Transaction API>>.
A common scenario where this sort of testing is helpful is with <<connecting-rgp, Remote Graph Providers>>, where
developing unit tests might be hard against a graph service. Instead, configure `TinkerTransactionGraph`, either in an
embedded style if using Java or with Gremlin Server for other cases.
[source,java]
----
// consider this class that returns the results of some Gremlin. by constructing the
// GraphService in a way that takes a GraphTraversalSource it becomes possible to
// execute getPersons() under any graph system.
public class GraphService {
private final GraphTraversalSource g;
public GraphService(GraphTraversalSource g) {
this.g = g;
}
public List<Vertex> getPersons() {
return g.V().hasLabel("person").toList();
}
}
// when writing tests for the GraphService it becomes possible to configure the test
// to run in a variety of scenarios. here we decide that TinkerTransactionGraph is a
// suitable test graph replacement for our actual production graph.
public class GraphServiceTest {
private static final TinkerTransactionGraph graph = TinkerTransactionGraph().open();
private static final GraphTraversalSource g = traversal.withEmbedded(graph);
private static final GraphService service = new GraphService(g);
@Test
public void shouldGetPersons() {
final List<Vertex> persons = service.getPersons();
assertEquals(6, persons.size());
}
}
// or perhaps, since we're using a remote graph provider, we feel it would be better to
// start Gremlin Server with a TinkerTransactionGraph configured using a docker container,
// embedding it directly in our tests or running it as a separate process like:
//
// bin/gremlin-server.sh conf/gremlin-server-transaction.yaml
//
// and then connect to it with a driver in more of an integration test style. obviously,
// with this approach you could also configure your production graph directly or use custom
// build options to trigger different test configurations for a more dynamic approach
public class GraphServiceTest {
private static final GraphTraversalSource g = traversal.withRemote(
new DriverRemoteConnection('ws://localhost:8182/gremlin'));
private static final GraphService service = new GraphService(g);
@Test
public void shouldGetPersons() {
final List<Vertex> persons = service.getPersons();
assertEquals(6, persons.size());
}
}
----
WARNING: There can be subtle behavioral differences between TinkerGraph and the graph ultimately intended for use.
Be aware of the differences when writing tests to ensure that you are testing behaviors of your applications
appropriately.
==== Best Practices
Errors can occur before a transaction gets committed. Specifically for `TinkerTransactionGraph`, you may encounter many
`TransactionException` errors in a highly concurrent environment due its optimistic approach to locking. Users should
follow the try-catch-rollback pattern described in the
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#transactions[transactions] section in combination with
exponential backoff based retries to mitigate this issue.
==== Performance Considerations
While transactions impose minimal impact for mutating workloads, users should expect performance degradation for
read-only work relative to the non-transactional configuration. However, its approach to locking
(write-only, optimistic) and its in-memory nature, TinkerTransactionGraph is likely faster than other `Graph`
implementations that support transactions.
==== Examples
Constructing a simple graph using `TinkerTransactionGraph` in Java is presented below:
[source,java]
----
Graph graph = TinkerTransactionGraph.open();
g = traversal().withEmbedded(graph)
GraphTraversalSource gtx = g.tx().begin();
try {
Vertex marko = gtx.addV("person").property("name","marko").property("age",29).next();
Vertex lop = gtx.addV("software").property("name","lop").property("lang","java").next();
gtx.addE("created").from(marko).to(lop).property("weight",0.6d).iterate();
gtx.tx().commit();
} catch (Exception ex) {
gtx.tx().rollback();
}
----
The above Gremlin creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6
property. In case of any errors `rollback()` will be called and no changes will be performed.
To use the embedded TinkerTransactionGraph in Gremlin Console:
[gremlin-groovy]
----
graph = TinkerTransactionGraph.open() <1>
g = traversal().withEmbedded(graph) <2>
g.addV('test').property('name','one')
g.tx().commit() <3>
g.V().valueMap()
g.addV('test').property('name','two') <4>
g.V().valueMap()
g.tx().rollback() <5>
g.V().valueMap()
----
<1> Open transactional graph.
<2> Spawn a GraphTraversalSource with transactional graph.
<3> Commit the add vertex operation
<4> Add a second vertex without committing
<5> Rollback the change