blob: eceec218d58e6ac6f9bb09864506c1cc52cdbc86 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[graph]]
The Graph
=========
image::gremlin-standing.png[width=125]
Features
--------
A `Feature` implementation describes the capabilities of a `Graph` instance. This interface is implemented by graph
system providers for two purposes:
. It tells users the capabilities of their `Graph` instance.
. It allows the features they do comply with to be tested against the Gremlin Test Suite - tests that do not comply are "ignored").
The following example in the Gremlin Console shows how to print all the features of a `Graph`:
[gremlin-groovy]
----
graph = TinkerGraph.open()
graph.features()
----
A common pattern for using features is to check their support prior to performing an operation:
[gremlin-groovy]
----
graph.features().graph().supportsTransactions()
graph.features().graph().supportsTransactions() ? g.tx().commit() : "no tx"
----
TIP: To ensure provider agnostic code, always check feature support prior to usage of a particular function. In that
way, the application can behave gracefully in case a particular implementation is provided at runtime that does not
support a function being accessed.
WARNING: Assignments of a `GraphStrategy` can alter the base features of a `Graph` in dynamic ways, such that checks
against a `Feature` may not always reflect the behavior exhibited when the `GraphStrategy` is in use.
[[vertex-properties]]
Vertex Properties
-----------------
image:vertex-properties.png[width=215,float=left] TinkerPop3 introduces the concept of a `VertexProperty<V>`. All the
properties of a `Vertex` are a `VertexProperty`. A `VertexProperty` implements `Property` and as such, it has a
key/value pair. However, `VertexProperty` also implements `Element` and thus, can have a collection of key/value
pairs. Moreover, while an `Edge` can only have one property of key "name" (for example), a `Vertex` can have multiple
"name" properties. With the inclusion of vertex properties, two features are introduced which ultimately advance the
graph modelers toolkit:
. Multiple properties (*multi-properties*): a vertex property key can have multiple values (i.e. a vertex can have
multiple "name" properties).
. Properties on properties (*meta-properties*): a vertex property can have properties (i.e. a vertex property can
have key/value data associated with it).
A collection of use cases are itemized below:
. *Permissions*: Vertex properties can have key/value ACL-type permission information associated with them.
. *Auditing*: When a vertex property is manipulated, it can have key/value information attached to it saying who the
creator, deletor, etc. are.
. *Provenance*: The "name" of a vertex can be declared by multiple users.
A running example using vertex properties is provided below to demonstrate and explain the API.
[gremlin-groovy]
----
graph = TinkerGraph.open()
g = graph.traversal(standard())
v = g.addV('name','marko','name','marko a. rodriguez').next()
g.V(v).properties().count()
g.V(v).properties('name').count() <1>
g.V(v).properties()
g.V(v).properties('name')
g.V(v).properties('name').hasValue('marko')
g.V(v).properties('name').hasValue('marko').property('acl','private') <2>
g.V(v).properties('name').hasValue('marko a. rodriguez')
g.V(v).properties('name').hasValue('marko a. rodriguez').property('acl','public')
g.V(v).properties('name').has('acl','public').value()
g.V(v).properties('name').has('acl','public').drop() <3>
g.V(v).properties('name').has('acl','public').value()
g.V(v).properties('name').has('acl','private').value()
g.V(v).properties()
g.V(v).properties().properties() <4>
g.V(v).properties().property('date',2014) <5>
g.V(v).properties().property('creator','stephen')
g.V(v).properties().properties()
g.V(v).properties('name').valueMap()
g.V(v).property('name','okram') <6>
g.V(v).properties('name')
g.V(v).values('name') <7>
----
<1> A vertex can have zero or more properties with the same key associated with it.
<2> A vertex property can have standard key/value properties attached to it.
<3> Vertex property removal is identical to property removal.
<4> It is property to get the properties of a vertex property.
<5> A vertex property can have any number of key/value properties attached to it.
<6> `property(...)` will remove all existing key'd properties before adding the new single property (see `VertexProperty.Cardinality`).
<7> If only the value of a property is needed, then `values()` can be used.
If the concept of vertex properties is difficult to grasp, then it may be best to think of vertex properties in terms
of "literal vertices." A vertex can have an edge to a "literal vertex" that has a single value key/value -- e.g.
"value=okram." The edge that points to that literal vertex has an edge-label of "name." The properties on the edge
represent the literal vertex's properties. The "literal vertex" can not have any other edges to it (only one from the
associated vertex).
[[the-crew-toy-graph]]
TIP: A toy graph demonstrating all of the new TinkerPop3 graph structure features is available at
`TinkerFactory.createTheCrew()` and `data/tinkerpop-crew*`. This graph demonstrates multi-properties and meta-properties.
.TinkerPop Crew
image::the-crew-graph.png[width=685]
[gremlin-groovy,theCrew]
----
g.V().as('a').
properties('location').as('b').
hasNot('endTime').as('c').
select('a','b','c').by('name').by(value).by('startTime') // determine the current location of each person
g.V().has('name','gremlin').inE('uses').
order().by('skill',incr).as('a').
outV().as('b').
select('a','b').by('skill').by('name') // rank the users of gremlin by their skill level
----
Graph Variables
---------------
TinkerPop3 introduces the concept of `Graph.Variables`. Variables are key/value pairs associated with the graph
itself -- in essence, a `Map<String,Object>`. These variables are intended to store metadata about the graph. Example
use cases include:
* *Schema information*: What do the namespace prefixes resolve to and when was the schema last modified?
* *Global permissions*: What are the access rights for particular groups?
* *System user information*: Who are the admins of the system?
An example of graph variables in use is presented below:
[gremlin-groovy]
----
graph = TinkerGraph.open()
graph.variables()
graph.variables().set('systemAdmins',['stephen','peter','pavel'])
graph.variables().set('systemUsers',['matthias','marko','josh'])
graph.variables().keys()
graph.variables().get('systemUsers')
graph.variables().get('systemUsers').get()
graph.variables().remove('systemAdmins')
graph.variables().keys()
----
IMPORTANT: Graph variables are not intended to be subject to heavy, concurrent mutation nor to be used in complex
computations. The intention is to have a location to store data about the graph for administrative purposes.
[[transactions]]
Graph Transactions
------------------
image:gremlin-coins.png[width=100,float=right] A link:http://en.wikipedia.org/wiki/Database_transaction[database transaction]
represents a unit of work to execute against the database. Transactions are controlled by an implementation of the
`Transaction` interface and that object can be obtained from the `Graph` interface using the `tx()` method. It is
important to note that the `Transaction` object does not represent a "transaction" itself. It merely exposes the
methods for working with transactions (e.g. committing, rolling back, etc).
Most `Graph` implementations that `supportsTransactions` will implement an "automatic" `ThreadLocal` transaction,
which means that when a read or write occurs after the `Graph` is instantiated a transaction is automatically
started within that thread. There is no need to manually call a method to "create" or "start" a transaction. Simply
modify the graph as required and call `graph.tx().commit()` to apply changes or `graph.tx().rollback()` to undo them.
When the next read or write action occurs against the graph, a new transaction will be started within that current
thread of execution.
When using transactions in this fashion, especially in web application (e.g. REST server), it is important to ensure
that transaction do not leak from one request to the next. In other words, unless a client is somehow bound via
session to process every request on the same server thread, ever request must be committed or rolled back at the end
of the request. By ensuring that the request encapsulates a transaction, it ensures that a future request processed
on a server thread is starting in a fresh transactional state and will not have access to the remains of one from an
earlier request. A good strategy is to rollback a transaction at the start of a request, so that if it so happens that
a transactional leak does occur between requests somehow, a fresh transaction is assured by the fresh request.
TIP: The `tx()` method is on the `Graph` interface, but it is also available on the `TraversalSource` spawned from a
`Graph`. Calls to `TraversalSource.tx()` are proxied through to the underlying `Graph` as a convenience.
Configuring
~~~~~~~~~~~
Determining when a transaction starts is dependent upon the behavior assigned to the `Transaction`. It is up to the
`Graph` implementation to determine the default behavior and unless the implementation doesn't allow it, the behavior
itself can be altered via these `Transaction` methods:
[source,java]
----
public Transaction onReadWrite(final Consumer<Transaction> consumer);
public Transaction onClose(final Consumer<Transaction> consumer);
----
Providing a `Consumer` function to `onReadWrite` allows definition of how a transaction starts when a read or a write
occurs. `Transaction.READ_WRITE_BEHAVIOR` contains pre-defined `Consumer` functions to supply to the `onReadWrite`
method. It has two options:
* `AUTO` - automatic transactions where the transaction is started implicitly to the read or write operation
* `MANUAL` - manual transactions where it is up to the user to explicitly open a transaction, throwing an exception
if the transaction is not open
Providing a `Consumer` function to `onClose` allows configuration of how a transaction is handled when
`Transaction.close()` is called. `Transaction.CLOSE_BEHAVIOR` has several pre-defined options that can be supplied to
this method:
* `COMMIT` - automatically commit an open transaction
* `ROLLBACK` - automatically rollback an open transaction
* `MANUAL` - throw an exception if a transaction is open, forcing the user to explicitly close the transaction
IMPORTANT: As transactions are `ThreadLocal` in nature, so are the transaction configurations for `onReadWrite` and
`onClose`.
Once there is an understanding for how transactions are configured, most of the rest of the `Transaction` interface
is self-explanatory. Note that <<neo4j-gremlin,Neo4j-Gremlin>> is used for the examples to follow as TinkerGraph does
not support transactions.
[source,groovy]
----
gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
gremlin> graph.features()
==>FEATURES
> GraphFeatures
>-- Transactions: true <1>
>-- Computer: false
>-- Persistence: true
...
gremlin> graph.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.AUTO) <2>
==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
gremlin> graph.addVertex("name","stephen") <3>
==>v[0]
gremlin> graph.tx().commit() <4>
==>null
gremlin> graph.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.MANUAL) <5>
==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
gremlin> graph.tx().isOpen()
==>false
gremlin> graph.addVertex("name","marko") <6>
Open a transaction before attempting to read/write the transaction
gremlin> graph.tx().open() <7>
==>null
gremlin> graph.addVertex("name","marko") <8>
==>v[1]
gremlin> graph.tx().commit()
==>null
----
<1> Check `features` to ensure that the graph supports transactions.
<2> By default, `Neo4jGraph` is configured with "automatic" transactions, so it is set here for demonstration purposes only.
<3> When the vertex is added, the transaction is automatically started. From this point, more mutations can be staged
or other read operations executed in the context of that open transaction.
<4> Calling `commit` finalizes the transaction.
<5> Change transaction behavior to require manual control.
<6> Adding a vertex now results in failure because the transaction was not explicitly opened.
<7> Explicitly open a transaction.
<8> Adding a vertex now succeeds as the transaction was manually opened.
NOTE: It may be important to consult the documentation of the `Graph` implementation when it comes to the specifics of
how transactions will behave. TinkerPop allows some latitude in this area and implementations may not have the exact
same behaviors and link:https://en.wikipedia.org/wiki/ACID[ACID] guarantees.
Retries
~~~~~~~
There are times when transactions fail. Failure may be indicative of some permanent condition, but other failures
might simply require the transaction to be retried for possible future success. The `Transaction` object also exposes
a method for executing automatic transaction retries:
[gremlin-groovy]
----
graph = Neo4jGraph.open('/tmp/neo4j')
graph.tx().submit {it.addVertex("name","josh")}.retry(10)
graph.tx().submit {it.addVertex("name","daniel")}.exponentialBackoff(10)
graph.close()
----
As shown above, the `submit` method takes a `Function<Graph, R>` which is the unit of work to execute and possibly
retry on failure. The method returns a `Transaction.Workload` object which has a number of default methods for common
retry strategies. It is also possible to supply a custom retry function if a default one does not suit the required
purpose.
Threaded Transactions
~~~~~~~~~~~~~~~~~~~~~
Most `Graph` implementations that support transactions do so in a `ThreadLocal` manner, where the current transaction
is bound to the current thread of execution. Consider the following example to demonstrate:
[source,java]
----
graph.addVertex("name","stephen");
Thread t1 = new Thread(() -> {
graph.addVertex("name","josh");
});
Thread t2 = new Thread(() -> {
graph.addVertex("name","marko");
});
t1.start()
t2.start()
t1.join()
t2.join()
graph.tx().commit();
----
The above code shows three vertices added to `graph` in three different threads: the current thread, `t1` and
`t2`. One might expect that by the time this body of code finished executing, that there would be three vertices
persisted to the `Graph`. However, given the `ThreadLocal` nature of transactions, there really were three separate
transactions created in that body of code (i.e. one for each thread of execution) and the only one committed was the
first call to `addVertex` in the primary thread of execution. The other two calls to that method within `t1` and `t2`
were never committed and thus orphaned.
A `Graph` that `supportsThreadedTransactions` is one that allows for a `Graph` to operate outside of that constraint,
thus allowing multiple threads to operate within the same transaction. Therefore, if there was a need to have three
different threads operating within the same transaction, the above code could be re-written as follows:
[source,java]
----
Graph threaded = graph.tx().newThreadedTx();
threaded.addVertex("name","stephen");
Thread t1 = new Thread(() -> {
threaded.addVertex("name","josh");
});
Thread t2 = new Thread(() -> {
threaded.addVertex("name","marko");
});
t1.start()
t2.start()
t1.join()
t2.join()
threaded.tx().commit();
----
In the above case, the call to `graph.tx().newThreadedTx()` creates a new `Graph` instance that is unbound from the
`ThreadLocal` transaction, thus allowing each thread to operate on it in the same context. In this case, there would
be three separate vertices persisted to the `Graph`.
Gremlin I/O
-----------
image:gremlin-io.png[width=250,float=right] The task of getting data in and out of `Graph` instances is the job of
the Gremlin I/O packages. Gremlin I/O provides two interfaces for reading and writing `Graph` instances: `GraphReader`
and `GraphWriter`. These interfaces expose methods that support:
* Reading and writing an entire `Graph`
* Reading and writing a `Traversal<Vertex>` as adjacency list format
* Reading and writing a single `Vertex` (with and without associated `Edge` objects)
* Reading and writing a single `Edge`
* Reading and writing a single `VertexProperty`
* Reading and writing a single `Property`
* Reading and writing an arbitrary `Object`
In all cases, these methods operate in the currency of `InputStream` and `OutputStream` objects, allowing graphs and
their related elements to be written to and read from files, byte arrays, etc. The `Graph` interface offers the `io`
method, which provides access to "reader/writer builder" objects that are pre-configured with serializers provided by
the `Graph`, as well as helper methods for the various I/O capabilities. Unless there are very advanced requirements
for the serialization process, it is always best to utilize the methods on the `Io` interface to construct
`GraphReader` and `GraphWriter` instances, as the implementation may provide some custom settings that would otherwise
have to be configured manually by the user to do the serialization.
It is up to the implementations of the `GraphReader` and `GraphWriter` interfaces to choose the methods they
implement and the manner in which they work together. The only semantics enforced and expected is that the write
methods should produce output that is compatible with the corresponding read method (e.g. the output of
`writeVertices` should be readable as input to `readVertices` and the output of `writeProperty` should be readable as
input to `readProperty`).
GraphML Reader/Writer
~~~~~~~~~~~~~~~~~~~~~
image:gremlin-graphml.png[width=350,float=left] The link:http://graphml.graphdrawing.org/[GraphML] file format is a
common XML-based representation of a graph. It is widely supported by graph-related tools and libraries making it a
solid interchange format for TinkerPop. In other words, if the intent is to work with graph data in conjunction with
applications outside of TinkerPop, GraphML may be the best choice to do that. Common use cases might be:
* Generate a graph using link:https://networkx.github.io/[NetworkX], export it with GraphML and import it to TinkerPop.
* Produce a subgraph and export it to GraphML to be consumed by and visualized in link:https://gephi.org/[Gephi].
* Migrate the data of an entire graph to a different graph database not supported by TinkerPop.
As GraphML is a specification for the serialization of an entire graph and not the individual elements of a graph,
methods that support input and output of single vertices, edges, etc. are not supported.
CAUTION: GraphML is a "lossy" format in that it only supports primitive values for properties and does not have
support for `Graph` variables. It will use `toString` to serialize property values outside of those primitives.
CAUTION: GraphML, as a specification, allows for `<edge>` and `<node>` elements to appear in any order. The
`GraphMLReader` will support that, however, that capability comes with a limitation. TinkerPop does not allow the
vertex label to be changed after the vertex has been created. Therefore, if an `<edge>` element comes before the
`<node>` the label on the vertex will be ignored. It is thus better to order `<node>` elements in the GraphML to
appear before all `<edge>` elements if vertex labels are important to the graph.
The following code shows how to write a `Graph` instance to file called `tinkerpop-modern.xml` and then how to read
that file back into a different instance:
[source,java]
----
final Graph graph = TinkerFactory.createModern();
graph.io(IoCore.graphml()).writeGraph("tinkerpop-modern.xml");
final Graph newGraph = TinkerGraph.open();
newGraph.io(IoCore.graphml()).readGraph("tinkerpop-modern.xml");
----
If a custom configuration is required, then have the `Graph` generate a `GraphReader` or `GraphWriter` "builder" instance:
[source,java]
----
final Graph graph = TinkerFactory.createModern();
try (final OutputStream os = new FileOutputStream("tinkerpop-modern.xml")) {
graph.io(IoCore.graphml()).writer().normalize(true).create().writeGraph(os, graph);
}
final Graph newGraph = TinkerGraph.open();
try (final InputStream stream = new FileInputStream("tinkerpop-modern.xml")) {
newGraph.io(IoCore.graphml()).reader().vertexIdKey("name").create().readGraph(stream, newGraph);
}
----
[[graphson-reader-writer]]
GraphSON Reader/Writer
~~~~~~~~~~~~~~~~~~~~~~
image:gremlin-graphson.png[width=350,float=left] GraphSON is a link:http://json.org/[JSON]-based format extended
from earlier versions of TinkerPop. It is important to note that TinkerPop3's GraphSON is not backwards compatible
with prior TinkerPop GraphSON versions. GraphSON has some support from graph-related application outside of TinkerPop,
but it is generally best used in two cases:
* A text format of the graph or its elements is desired (e.g. debugging, usage in source control, etc.)
* The graph or its elements need to be consumed by code that is not JVM-based (e.g. JavaScript, Python, .NET, etc.)
GraphSON supports all of the `GraphReader` and `GraphWriter` interface methods and can therefore read or write an
entire `Graph`, vertices, arbitrary objects, etc. The following code shows how to write a `Graph` instance to file
called `tinkerpop-modern.json` and then how to read that file back into a different instance:
[source,java]
----
final Graph graph = TinkerFactory.createModern();
graph.io(IoCore.graphson()).writeGraph("tinkerpop-modern.json");
final Graph newGraph = TinkerGraph.open();
newGraph.io(IoCore.graphson()).readGraph("tinkerpop-modern.json");
----
If a custom configuration is required, then have the `Graph` generate a `GraphReader` or `GraphWriter` "builder" instance:
[source,java]
----
final Graph graph = TinkerFactory.createModern();
try (final OutputStream os = new FileOutputStream("tinkerpop-modern.json")) {
final GraphSONMapper mapper = graph.io(IoCore.graphson()).mapper().normalize(true).create()
graph.io(IoCore.graphson()).writer().mapper(mapper).create().writeGraph(os, graph)
}
final Graph newGraph = TinkerGraph.open();
try (final InputStream stream = new FileInputStream("tinkerpop-modern.json")) {
newGraph.io(IoCore.graphson()).reader().vertexIdKey("name").create().readGraph(stream, newGraph);
}
----
One of the important configuration options of the `GraphSONReader` and `GraphSONWriter` is the ability to embed type
information into the output. By embedding the types, it becomes possible to serialize a graph without losing type
information that might be important when being consumed by another source. The importance of this concept is
demonstrated in the following example where a single `Vertex` is written to GraphSON using the Gremlin Console:
[gremlin-groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
f = new FileOutputStream("vertex-1.json")
graph.io(graphson()).writer().create().writeVertex(f, g.V(1).next(), BOTH)
f.close()
----
The following GraphSON example shows the output of `GraphSonWriter.writeVertex()` with associated edges:
[source,json]
----
{
"id": 1,
"label": "person",
"outE": {
"created": [
{
"id": 9,
"inV": 3,
"properties": {
"weight": 0.4
}
}
],
"knows": [
{
"id": 7,
"inV": 2,
"properties": {
"weight": 0.5
}
},
{
"id": 8,
"inV": 4,
"properties": {
"weight": 1
}
}
]
},
"properties": {
"name": [
{
"id": 0,
"value": "marko"
}
],
"age": [
{
"id": 1,
"value": 29
}
]
}
}
----
The vertex properly serializes to valid JSON but note that a consuming application will not automatically know how to
interpret the numeric values. In coercing those Java values to JSON, such information is lost.
With a minor change to the construction of the `GraphSONWriter` the lossy nature of GraphSON can be avoided:
[gremlin-groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
f = new FileOutputStream("vertex-1.json")
mapper = graph.io(graphson()).mapper().embedTypes(true).create()
graph.io(graphson()).writer().mapper(mapper).create().writeVertex(f, g.V(1).next(), BOTH)
f.close()
----
In the above code, the `embedTypes` option is set to `true` and the output below shows the difference in the output:
[source,json]
----
{
"@class": "java.util.HashMap",
"id": 1,
"label": "person",
"outE": {
"@class": "java.util.HashMap",
"created": [
"java.util.ArrayList",
[
{
"@class": "java.util.HashMap",
"id": 9,
"inV": 3,
"properties": {
"@class": "java.util.HashMap",
"weight": 0.4
}
}
]
],
"knows": [
"java.util.ArrayList",
[
{
"@class": "java.util.HashMap",
"id": 7,
"inV": 2,
"properties": {
"@class": "java.util.HashMap",
"weight": 0.5
}
},
{
"@class": "java.util.HashMap",
"id": 8,
"inV": 4,
"properties": {
"@class": "java.util.HashMap",
"weight": 1
}
}
]
]
},
"properties": {
"@class": "java.util.HashMap",
"name": [
"java.util.ArrayList",
[
{
"@class": "java.util.HashMap",
"id": [
"java.lang.Long",
0
],
"value": "marko"
}
]
],
"age": [
"java.util.ArrayList",
[
{
"@class": "java.util.HashMap",
"id": [
"java.lang.Long",
1
],
"value": 29
}
]
]
}
}
----
The ambiguity of components of the GraphSON is now removed by the `@class` property, which contains Java class
information for the data it is associated with. The `@class` property is used for all non-final types, with the
exception of a small number of "natural" types (String, Boolean, Integer, and Double) which can be correctly inferred
from JSON typing. While the output is more verbose, it comes with the security of not losing type information. While
non-JVM languages won't be able to consume this information automatically, at least there is a hint as to how the
values should be coerced back into the correct types in the target language.
[[gryo-reader-writer]]
Gryo Reader/Writer
~~~~~~~~~~~~~~~~~~
image:gremlin-kryo.png[width=400,float=left] link:https://github.com/EsotericSoftware/kryo[Kryo] is a popular
serialization package for the JVM. Gremlin-Kryo is a binary `Graph` serialization format for use on the JVM by JVM
languages. It is designed to be space efficient, non-lossy and is promoted as the standard format to use when working
with graph data inside of the TinkerPop stack. A list of common use cases is presented below:
* Migration from one Gremlin Structure implementation to another (e.g. `TinkerGraph` to `Neo4jGraph`)
* Serialization of individual graph elements to be sent over the network to another JVM.
* Backups of in-memory graphs or subgraphs.
CAUTION: When migrating between Gremlin Structure implementations, Kryo may not lose data, but it is important to
consider the features of each `Graph` and whether or not the data types supported in one will be supported in the
other. Failure to do so, may result in errors.
Kryo supports all of the `GraphReader` and `GraphWriter` interface methods and can therefore read or write an entire
`Graph`, vertices, edges, etc. The following code shows how to write a `Graph` instance to file called
`tinkerpop-modern.kryo` and then how to read that file back into a different instance:
[source,java]
----
final Graph graph = TinkerFactory.createModern();
graph.io(IoCore.gryo()).writeGraph("tinkerpop-modern.kryo");
final Graph newGraph = TinkerGraph.open();
newGraph.io(IoCore.gryo()).readGraph("tinkerpop-modern.kryo")'
----
If a custom configuration is required, then have the `Graph` generate a `GraphReader` or `GraphWriter` "builder" instance:
[source,java]
----
final Graph graph = TinkerFactory.createModern();
try (final OutputStream os = new FileOutputStream("tinkerpop-modern.kryo")) {
graph.io(IoCore.gryo()).writer().create().writeGraph(os, graph);
}
final Graph newGraph = TinkerGraph.open();
try (final InputStream stream = new FileInputStream("tinkerpop-modern.kryo")) {
newGraph.io(IoCore.gryo()).reader().vertexIdKey("name").create().readGraph(stream, newGraph);
}
----
NOTE: The preferred extension for files names produced by Gryo is `.kryo`.
TinkerPop2 Data Migration
~~~~~~~~~~~~~~~~~~~~~~~~~
image:data-migration.png[width=300,float=right] For those using TinkerPop2, migrating to TinkerPop3 will mean a number
of programming changes, but may also require a migration of the data depending on the graph implementation. For
example, trying to open `TinkerGraph` data from TinkerPop2 with TinkerPop3 code will not work, however opening a
TinkerPop2 `Neo4jGraph` with a TinkerPop3 `Neo4jGraph` should work provided there aren't Neo4j version compatibility
mismatches preventing the read.
If such a situation arises that a particular TinkerPop2 `Graph` can not be read by TinkerPop3, a "legacy" data
migration approach exists. The migration involves writing the TinkerPop2 `Graph` to GraphSON, then reading it to
TinkerPop3 with the `LegacyGraphSONReader` (a limited implementation of the `GraphReader` interface).
The following represents an example migration of the "classic" toy graph. In this example, the "classic" graph is
saved to GraphSON using TinkerPop2.
[source,groovy]
----
gremlin> Gremlin.version()
==>2.5.z
gremlin> graph = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> GraphSONWriter.outputGraph(graph,'/tmp/tp2.json',GraphSONMode.EXTENDED)
==>null
----
The above console session uses the `gremlin-groovy` distribution from TinkerPop2. It is important to generate the
`tp2.json` file using the `EXTENDED` mode as it will include data types when necessary which will help limit
"lossiness" on the TinkerPop3 side when imported. Once `tp2.json` is created, it can then be imported to a TinkerPop3
`Graph`.
[source,groovy]
----
gremlin> Gremlin.version()
==>x.y.z
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> r = LegacyGraphSONReader.build().create()
==>org.apache.tinkerpop.gremlin.structure.io.graphson.LegacyGraphSONReader@64337702
gremlin> r.readGraph(new FileInputStream('/tmp/tp2.json'), graph)
==>null
gremlin> g = graph.traversal(standard())
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.E()
==>e[11][4-created->3]
==>e[12][6-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
==>e[9][1-created->3]
==>e[10][4-created->5]
----
Namespace Conventions
---------------------
End users, <<implementations,graph system providers>>, <<graphcomputer,`GraphComputer`>> algorithm designers,
<<gremlin-plugins,GremlinPlugin>> creators, etc. all leverage properties on elements to store information. There are
a few conventions that should be respected when naming property keys to ensure that conflicts between these
stakeholders do not conflict.
* End users are granted the _flat namespace_ (e.g. `name`, `age`, `location`) to key their properties and label their elements.
* Graph system providers are granted the _hidden namespace_ (e.g. `~metadata`) to key their properties and labels.
Data keyed as such is only accessible via the graph system implementation and no other stakeholders are granted read
nor write access to data prefixed with "~" (see `Graph.Hidden`). Test coverage and exceptions exist to ensure that
graph systems respect this hard boundary.
* <<vertexprogram,`VertexProgram`>> and <<mapreduce,`MapReduce`>> developers should, like `GraphStrategy` developers,
leverage _qualified namespaces_ particular to their domain (e.g. `mydomain.myvertexprogram.computedata`).
* `GremlinPlugin` creators should prefix their plugin name with their domain (e.g. `mydomain.myplugin`).
IMPORTANT: TinkerPop uses `tinkerpop.` and `gremlin.` as the prefixes for provided strategies, vertex programs, map
reduce implementations, and plugins.
The only truly protected namespace is the _hidden namespace_ provided to graph systems. From there, its up to
engineers to respect the namespacing conventions presented.