| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[graph]] |
| = The Graph |
| |
| image::gremlin-standing.png[width=125] |
| |
| The <<intro,Introduction>> discussed the diversity of TinkerPop-enabled graphs, with special attention paid to the |
| different <<connecting-gremlin,connection models>>, and how TinkerPop makes it possible to bridge that diversity in |
| an <<staying-agnostic,agnostic>> manner. This particular section deals with elements of the Graph API which was noted |
| as an API to avoid when trying to build an agnostic system. The Graph API refers to the core elements of what composes |
| the <<graph-computing,structure of a graph>> within the Gremlin Traversal Machine (GTM), such as the `Graph`, `Vertex` |
| and `Edge` Java interfaces. |
| |
| To maintain the most portable code, users should only reference these interfaces. To "reference", simply means to |
| utilize it as a pointer. For `Graph`, that means holding a pointer to the location of graph data and then using it to |
| spawn `GraphTraversalSource` instances so as to write Gremlin: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = graph.traversal() |
| g.addV('person') |
| ---- |
| |
| In the above example, "graph" is the `Graph` interface produced by calling `open()` on `TinkerGraph` which creates the |
| instance. Note that while the end intent of the code is to create a "person" vertex, it does not use the APIs on |
| `Graph` to do that - e.g. `graph.addVertex(T.label,'person')`. |
| |
| Even if the developer desired to use the `graph.addVertex()` method there are only a handful of scenarios where it is |
| possible: |
| |
| * The application is being developed on the JVM and the developer is using <<connecting-embedded, embedded>> mode |
| * The architecture includes Gremlin Server and the user is sending Gremlin scripts to the server |
| * The graph system chosen is a <<connecting-rgp, Remote Gremlin Provider>> and they expose the Graph API via scripts |
| |
| Note that Gremlin Language Variants force developers to use the Graph API by reference. There is no `addVertex()` |
| method available to GLVs on their respective `Graph` instances, nor are their graph elements filled with data at the |
| call of `properties()`. Developing applications to meet this lowest common denominator in API usage will go a long |
| way to making that application portable across TinkerPop-enabled systems. |
| |
| When considering the remaining sub-sections that follow, recall that they are all generally bound to the Graph API. |
| They are described here for reference and in some sense backward compatibility with older recommended models of |
| development. In the future, the contents of this section will become less and less relevant. |
| |
| == Features |
| |
| A `Feature` implementation describes the capabilities of a `Graph` instance. This interface is implemented by graph |
| system providers for two purposes: |
| |
| . It tells users the capabilities of their `Graph` instance. |
| . It allows the features they do comply with to be tested against the Gremlin Test Suite - tests that do not comply are "ignored"). |
| |
| The following example in the Gremlin Console shows how to print all the features of a `Graph`: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| graph.features() |
| ---- |
| |
| A common pattern for using features is to check their support prior to performing an operation: |
| |
| [gremlin-groovy] |
| ---- |
| graph.features().graph().supportsTransactions() |
| graph.features().graph().supportsTransactions() ? g.tx().commit() : "no tx" |
| ---- |
| |
| TIP: To ensure provider agnostic code, always check feature support prior to usage of a particular function. In that |
| way, the application can behave gracefully in case a particular implementation is provided at runtime that does not |
| support a function being accessed. |
| |
| WARNING: Features of reference graphs which are used to connect to remote graphs do not reflect the features of the |
| graph to which it connects. It reflects the features of instantiated graph itself, which will likely be quite |
| different considering that reference graphs will typically be immutable. |
| |
| [[vertex-properties]] |
| == Vertex Properties |
| |
| image:vertex-properties.png[width=215,float=left] TinkerPop introduces the concept of a `VertexProperty<V>`. All the |
| properties of a `Vertex` are a `VertexProperty`. A `VertexProperty` implements `Property` and as such, it has a |
| key/value pair. However, `VertexProperty` also implements `Element` and thus, can have a collection of key/value |
| pairs. Moreover, while an `Edge` can only have one property of key "name" (for example), a `Vertex` can have multiple |
| "name" properties. With the inclusion of vertex properties, two features are introduced which ultimately advance the |
| graph modelers toolkit: |
| |
| . Multiple properties (*multi-properties*): a vertex property key can have multiple values. For example, a vertex can |
| have multiple "name" properties. |
| . Properties on properties (*meta-properties*): a vertex property can have properties (i.e. a vertex property can |
| have key/value data associated with it). |
| |
| Possible use cases for meta-properties: |
| |
| . *Permissions*: Vertex properties can have key/value ACL-type permission information associated with them. |
| . *Auditing*: When a vertex property is manipulated, it can have key/value information attached to it saying who the |
| creator, deletor, etc. are. |
| . *Provenance*: The "name" of a vertex can be declared by multiple users. For example, there may be multiple spellings |
| of a name from different sources. |
| |
| A running example using vertex properties is provided below to demonstrate and explain the API. |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| g = graph.traversal() |
| v = g.addV().property('name','marko').property('name','marko a. rodriguez').next() |
| g.V(v).properties('name').count() <1> |
| v.property(list, 'name', 'm. a. rodriguez') <2> |
| g.V(v).properties('name').count() |
| g.V(v).properties() |
| g.V(v).properties('name') |
| g.V(v).properties('name').hasValue('marko') |
| g.V(v).properties('name').hasValue('marko').property('acl','private') <3> |
| g.V(v).properties('name').hasValue('marko a. rodriguez') |
| g.V(v).properties('name').hasValue('marko a. rodriguez').property('acl','public') |
| g.V(v).properties('name').has('acl','public').value() |
| g.V(v).properties('name').has('acl','public').drop() <4> |
| g.V(v).properties('name').has('acl','public').value() |
| g.V(v).properties('name').has('acl','private').value() |
| g.V(v).properties() |
| g.V(v).properties().properties() <5> |
| g.V(v).properties().property('date',2014) <6> |
| g.V(v).properties().property('creator','stephen') |
| g.V(v).properties().properties() |
| g.V(v).properties('name').valueMap() |
| g.V(v).property('name','okram') <7> |
| g.V(v).properties('name') |
| g.V(v).values('name') <8> |
| ---- |
| |
| <1> A vertex can have zero or more properties with the same key associated with it. |
| <2> If a property is added with a cardinality of `Cardinality.list`, an additional property with the provided key will be added. |
| <3> A vertex property can have standard key/value properties attached to it. |
| <4> Vertex property removal is identical to property removal. |
| <5> Gets the meta-properties of each vertex property. |
| <6> A vertex property can have any number of key/value properties attached to it. |
| <7> `property(...)` will remove all existing key'd properties before adding the new single property (see `VertexProperty.Cardinality`). |
| <8> If only the value of a property is needed, then `values()` can be used. |
| |
| If the concept of vertex properties is difficult to grasp, then it may be best to think of vertex properties in terms |
| of "literal vertices." A vertex can have an edge to a "literal vertex" that has a single value key/value -- e.g. |
| "value=okram." The edge that points to that literal vertex has an edge-label of "name." The properties on the edge |
| represent the literal vertex's properties. The "literal vertex" can not have any other edges to it (only one from the |
| associated vertex). |
| |
| [[the-crew-toy-graph]] |
| TIP: A toy graph demonstrating all of the new TinkerPop graph structure features is available at |
| `TinkerFactory.createTheCrew()` and `data/tinkerpop-crew*`. This graph demonstrates multi-properties and meta-properties. |
| |
| .TinkerPop Crew |
| image::the-crew-graph.png[width=685] |
| |
| [gremlin-groovy,theCrew] |
| ---- |
| g.V().as('a'). |
| properties('location').as('b'). |
| hasNot('endTime').as('c'). |
| select('a','b','c').by('name').by(value).by('startTime') // determine the current location of each person |
| g.V().has('name','gremlin').inE('uses'). |
| order().by('skill',asc).as('a'). |
| outV().as('b'). |
| select('a','b').by('skill').by('name') // rank the users of gremlin by their skill level |
| ---- |
| |
| == Graph Variables |
| |
| `Graph.Variables` are key/value pairs associated with the graph itself -- in essence, a `Map<String,Object>`. These |
| variables are intended to store metadata about the graph. Example use cases include: |
| |
| * *Schema information*: What do the namespace prefixes resolve to and when was the schema last modified? |
| * *Global permissions*: What are the access rights for particular groups? |
| * *System user information*: Who are the admins of the system? |
| |
| An example of graph variables in use is presented below: |
| |
| [gremlin-groovy] |
| ---- |
| graph = TinkerGraph.open() |
| graph.variables() |
| graph.variables().set('systemAdmins',['stephen','peter','pavel']) |
| graph.variables().set('systemUsers',['matthias','marko','josh']) |
| graph.variables().keys() |
| graph.variables().get('systemUsers') |
| graph.variables().get('systemUsers').get() |
| graph.variables().remove('systemAdmins') |
| graph.variables().keys() |
| ---- |
| |
| IMPORTANT: Graph variables are not intended to be subject to heavy, concurrent mutation nor to be used in complex |
| computations. The intention is to have a location to store data about the graph for administrative purposes. |
| |
| WARNING: Attempting to set graph variables in a reference graph will not promote them to the remote graph. Typically, |
| a reference graph has immutable features and will not support this features. |
| |
| [[transactions]] |
| == Graph Transactions |
| |
| image:gremlin-coins.png[width=100,float=right] A link:http://en.wikipedia.org/wiki/Database_transaction[database transaction] |
| represents a unit of work to execute against the database. Transactions in TinkerPop can be considered in several |
| contexts: transactions for <<connecting-embedded,embedded graphs>> via the Graph API, |
| transactions for <<connecting-gremlin-server,Gremlin Server>> and transactions within |
| <<connecting-rgp,Remote Gremlin Providers>>. For those following recommended patterns, the concepts presented in the |
| embedded section should generally be of little interest and are present mainly for reference. Utilizing those |
| transactional features will greatly reduce the portability of an application's Gremlin code. |
| |
| [[tx-embedded]] |
| === Embedded |
| |
| When on the JVM using an <<connecting-embedded,embedded graph>>, there is considerable flexibility for working with |
| transactions. With the Graph API, transactions are controlled by an implementation of the `Transaction` interface and |
| that object can be obtained from the `Graph` interface using the `tx()` method. It is important to note that the |
| `Transaction` object does not represent a "transaction" itself. It merely exposes the methods for working with |
| transactions (e.g. committing, rolling back, etc). |
| |
| Most `Graph` implementations that `supportsTransactions` will implement an "automatic" `ThreadLocal` transaction, |
| which means that when a read or write occurs after the `Graph` is instantiated, a transaction is automatically |
| started within that thread. There is no need to manually call a method to "create" or "start" a transaction. Simply |
| modify the graph as required and call `graph.tx().commit()` to apply changes or `graph.tx().rollback()` to undo them. |
| When the next read or write action occurs against the graph, a new transaction will be started within that current |
| thread of execution. |
| |
| When using transactions in this fashion, especially in web application (e.g. HTTP server), it is important to ensure |
| that transactions do not leak from one request to the next. In other words, unless a client is somehow bound via |
| session to process every request on the same server thread, every request must be committed or rolled back at the end |
| of the request. By ensuring that the request encapsulates a transaction, it ensures that a future request processed |
| on a server thread is starting in a fresh transactional state and will not have access to the remains of one from an |
| earlier request. A good strategy is to rollback a transaction at the start of a request, so that if it so happens that |
| a transactional leak does occur between requests somehow, a fresh transaction is assured by the fresh request. |
| |
| TIP: The `tx()` method is on the `Graph` interface, but it is also available on the `TraversalSource` spawned from a |
| `Graph`. Calls to `TraversalSource.tx()` are proxied through to the underlying `Graph` as a convenience. |
| |
| WARNING: TinkerPop provides for basic transaction control, however, like many aspects of TinkerPop, it is up to the |
| graph system provider to choose the specific aspects of how their implementation will work and how it fits into the |
| TinkerPop stack. Be sure to understand the transaction semantics of the specific graph implementation that is being |
| utilized as it may present differing functionality than described here. |
| |
| ==== Configuring |
| |
| Determining when a transaction starts is dependent upon the behavior assigned to the `Transaction`. It is up to the |
| `Graph` implementation to determine the default behavior and unless the implementation doesn't allow it, the behavior |
| itself can be altered via these `Transaction` methods: |
| |
| [source,java] |
| ---- |
| public Transaction onReadWrite(Consumer<Transaction> consumer); |
| |
| public Transaction onClose(Consumer<Transaction> consumer); |
| ---- |
| |
| Providing a `Consumer` function to `onReadWrite` allows definition of how a transaction starts when a read or a write |
| occurs. `Transaction.READ_WRITE_BEHAVIOR` contains pre-defined `Consumer` functions to supply to the `onReadWrite` |
| method. It has two options: |
| |
| * `AUTO` - automatic transactions where the transaction is started implicitly to the read or write operation |
| * `MANUAL` - manual transactions where it is up to the user to explicitly open a transaction, throwing an exception |
| if the transaction is not open |
| |
| Providing a `Consumer` function to `onClose` allows configuration of how a transaction is handled when |
| `Transaction.close()` is called. `Transaction.CLOSE_BEHAVIOR` has several pre-defined options that can be supplied to |
| this method: |
| |
| * `COMMIT` - automatically commit an open transaction |
| * `ROLLBACK` - automatically rollback an open transaction |
| * `MANUAL` - throw an exception if a transaction is open, forcing the user to explicitly close the transaction |
| |
| IMPORTANT: As transactions are `ThreadLocal` in nature, so are the transaction configurations for `onReadWrite` and |
| `onClose`. |
| |
| Once there is an understanding for how transactions are configured, most of the rest of the `Transaction` interface |
| is self-explanatory. Note that <<neo4j-gremlin,Neo4j-Gremlin>> is used for the examples to follow as TinkerGraph does |
| not support transactions. |
| |
| [source,groovy] |
| ---- |
| gremlin> graph = Neo4jGraph.open('/tmp/neo4j') |
| ==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]] |
| gremlin> g = graph.traversal() |
| ==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard] |
| gremlin> graph.features() |
| ==>FEATURES |
| > GraphFeatures |
| >-- Transactions: true <1> |
| >-- Computer: false |
| >-- Persistence: true |
| ... |
| gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.AUTO) <2> |
| ==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d |
| gremlin> g.addV("person").("name","stephen") <3> |
| ==>v[0] |
| gremlin> g.tx().commit() <4> |
| ==>null |
| gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.MANUAL) <5> |
| ==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d |
| gremlin> g.tx().isOpen() |
| ==>false |
| gremlin> g.addV("person").("name","marko") <6> |
| Open a transaction before attempting to read/write the transaction |
| gremlin> g.tx().open() <7> |
| ==>null |
| gremlin> g.addV("person").("name","marko") <8> |
| ==>v[1] |
| gremlin> g.tx().commit() |
| ==>null |
| ---- |
| |
| <1> Check `features` to ensure that the graph supports transactions. |
| <2> By default, `Neo4jGraph` is configured with "automatic" transactions, so it is set here for demonstration purposes only. |
| <3> When the vertex is added, the transaction is automatically started. From this point, more mutations can be staged |
| or other read operations executed in the context of that open transaction. |
| <4> Calling `commit` finalizes the transaction. |
| <5> Change transaction behavior to require manual control. |
| <6> Adding a vertex now results in failure because the transaction was not explicitly opened. |
| <7> Explicitly open a transaction. |
| <8> Adding a vertex now succeeds as the transaction was manually opened. |
| |
| NOTE: It may be important to consult the documentation of the `Graph` implementation you are using when it comes to the |
| specifics of how transactions will behave. TinkerPop allows some latitude in this area and implementations may not have |
| the exact same behaviors and link:https://en.wikipedia.org/wiki/ACID[ACID] guarantees. |
| |
| ==== Threaded Transactions |
| |
| Most `Graph` implementations that support transactions do so in a `ThreadLocal` manner, where the current transaction |
| is bound to the current thread of execution. Consider the following example to demonstrate: |
| |
| [source,java] |
| ---- |
| GraphTraversalSource g = graph.traversal(); |
| g.addV("person").("name","stephen").iterate(); |
| |
| Thread t1 = new Thread(() -> { |
| g.addV("person").("name","josh").iterate(); |
| }); |
| |
| Thread t2 = new Thread(() -> { |
| g.addV("person").("name","marko").iterate(); |
| }); |
| |
| t1.start() |
| t2.start() |
| |
| t1.join() |
| t2.join() |
| |
| g.tx().commit(); |
| ---- |
| |
| The above code shows three vertices added to `graph` in three different threads: the current thread, `t1` and |
| `t2`. One might expect that by the time this body of code finished executing, that there would be three vertices |
| persisted to the `Graph`. However, given the `ThreadLocal` nature of transactions, there really were three separate |
| transactions created in that body of code (i.e. one for each thread of execution) and the only one committed was the |
| first call to `addV()` in the primary thread of execution. The other two calls to that method within `t1` and `t2` |
| were never committed and thus orphaned. |
| |
| A `Graph` that `supportsThreadedTransactions` is one that allows for a `Graph` to operate outside of that constraint, |
| thus allowing multiple threads to operate within the same transaction. Therefore, if there was a need to have three |
| different threads operating within the same transaction, the above code could be re-written as follows: |
| |
| [source,java] |
| ---- |
| Graph threaded = graph.tx().createThreadedTx(); |
| GraphTraversalSource g = graph.traversal(); |
| g.addV("person").("name","stephen").iterate(); |
| |
| Thread t1 = new Thread(() -> { |
| threaded.addV("person").("name","josh").iterate(); |
| }); |
| |
| Thread t2 = new Thread(() -> { |
| threaded.addV("person").("name","marko").iterate(); |
| }); |
| |
| t1.start() |
| t2.start() |
| |
| t1.join() |
| t2.join() |
| |
| g.tx().commit(); |
| ---- |
| |
| In the above case, the call to `graph.tx().createThreadedTx()` creates a new `Graph` instance that is unbound from the |
| `ThreadLocal` transaction, thus allowing each thread to operate on it in the same context. In this case, there would |
| be three separate vertices persisted to the `Graph`. |
| |
| [[tx-gremlin-server]] |
| === Gremlin Server |
| |
| The available capability for transactions with <<gremlin-server,Gremlin Server>> is dependent upon the method of |
| interaction that is used. The preferred method for <<connecting-gremlin-server,interacting with Gremlin Server>> |
| is via websockets and bytecode based requests. In this mode of operations each Gremlin traversal that is executed will |
| be treated as a single transaction. Traversals that fail will have their transaction rolled back and successful |
| iteration of a traversal will conclude with a transactional commit. How the graph hosted in Gremlin Server reacts to |
| those commands is dependent on the graph chosen and it is therefore important to understand the transactional semantics |
| of that graph when developing an application. |
| |
| Gremlin Server also has the option to accept Gremlin-based scripts. The scripting approach provides access to the |
| Graph API and thus also the transactional model described in the <<tx-embedded,embedded>> section. Therefore a single |
| script can have the ability to execute multiple transactions per request with complete control provided to the |
| developer to commit or rollback transactions as needed. |
| |
| There are two methods for sending scripts to Gremlin Server: sessionless and session-based. With sessionless requests |
| there will always be an attempt to close the transaction at the end of the request with a commit if there are no errors |
| or a rollback if there is a failure. It is therefore unnecessary to close transactions manually within scripts |
| themselves. By default, session-based requests do not have this quality. The transaction will be held open on the |
| server until the user closes it manually. There is an option to have automatic transaction management for sessions. |
| More information on this topic can be found in the <<considering-transactions,Considering Transactions>> Section and |
| the <<sessions,Considering Sessions>> Section. |
| |
| While those sections provide some additional details, the short advice is to avoid scripts when possible and prefer |
| bytecode based requests. |
| |
| [[tx-rgp]] |
| === Remote Gremlin Providers |
| |
| At this time, transactional patterns for Remote Gremlin Providers are largely in line with Gremlin Server. Most |
| offer bytecode or script based sessionless requests, which have automatic transaction management, such that a |
| successful traversal will commit on success and a failing traversal will rollback. As most of these RGPs do not |
| expose a `Graph` instances, access to lower level transactional functions even in a sessionless fashion are not |
| typically allowed. The nature of what a "transaction" means will be dependent on the RGP as is the case with any |
| TinkerPop-enabled graph system, so it is important to consult that systems documentation for more details. |
| |
| == Namespace Conventions |
| |
| End users, <<implementations,graph system providers>>, <<graphcomputer,`GraphComputer`>> algorithm designers, |
| <<gremlin-plugins,GremlinPlugin>> creators, etc. all leverage properties on elements to store information. There are |
| a few conventions that should be respected when naming property keys to ensure that conflicts between these |
| stakeholders do not conflict. |
| |
| * End users are granted the _flat namespace_ (e.g. `name`, `age`, `location`) to key their properties and label their elements. |
| * Graph system providers are granted the _hidden namespace_ (e.g. `~metadata`) to key their properties and labels. |
| Data keyed as such is only accessible via the graph system implementation and no other stakeholders are granted read |
| nor write access to data prefixed with "~" (see `Graph.Hidden`). Test coverage and exceptions exist to ensure that |
| graph systems respect this hard boundary. |
| * <<vertexprogram,`VertexProgram`>> and <<mapreduce,`MapReduce`>> developers should leverage _qualified namespaces_ |
| particular to their domain (e.g. `mydomain.myvertexprogram.computedata`). |
| * `GremlinPlugin` creators should prefix their plugin name with their domain (e.g. `mydomain.myplugin`). |
| |
| IMPORTANT: TinkerPop uses `tinkerpop.` and `gremlin.` as the prefixes for provided strategies, vertex programs, map |
| reduce implementations, and plugins. |
| |
| The only truly protected namespace is the _hidden namespace_ provided to graph systems. From there, it's up to |
| engineers to respect the namespacing conventions presented. |