docs/src/reference/the-graph.asciidoc - tinkerpop - Git at Google

 ////
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 ////
 [[graph]]
 = The Graph

 image::gremlin-standing.png[width=125]

 The <<intro,Introduction>> discussed the diversity of TinkerPop-enabled graphs, with special attention paid to the
 different <<connecting-gremlin,connection models>>, and how TinkerPop makes it possible to bridge that diversity in
 an <<staying-agnostic,agnostic>> manner. This particular section deals with elements of the Graph API which was noted
 as an API to avoid when trying to build an agnostic system. The Graph API refers to the core elements of what composes
 the <<graph-computing,structure of a graph>> within the Gremlin Traversal Machine (GTM), such as the `Graph`, `Vertex`
 and `Edge` Java interfaces.

 To maintain the most portable code, users should only reference these interfaces. To "reference", simply means to
 utilize it as a pointer. For `Graph`, that means holding a pointer to the location of graph data and then using it to
 spawn `GraphTraversalSource` instances so as to write Gremlin:

 [gremlin-groovy]
 ----
 graph = TinkerGraph.open()
 g = graph.traversal()
 g.addV('person')
 ----

 In the above example, "graph" is the `Graph` interface produced by calling `open()` on `TinkerGraph` which creates the
 instance. Note that while the end intent of the code is to create a "person" vertex, it does not use the APIs on
 `Graph` to do that - e.g. `graph.addVertex(T.label,'person')`.

 Even if the developer desired to use the `graph.addVertex()` method there are only a handful of scenarios where it is
 possible:

 * The application is being developed on the JVM and the developer is using <<connecting-embedded, embedded>> mode
 * The architecture includes Gremlin Server and the user is sending Gremlin scripts to the server
 * The graph system chosen is a <<connecting-rgp, Remote Gremlin Provider>> and they expose the Graph API via scripts

 Note that Gremlin Language Variants force developers to use the Graph API by reference. There is no `addVertex()`
 method available to GLVs on their respective `Graph` instances, nor are their graph elements filled with data at the
 call of `properties()`. Developing applications to meet this lowest common denominator in API usage will go a long
 way to making that application portable across TinkerPop-enabled systems.

 When considering the remaining sub-sections that follow, recall that they are all generally bound to the Graph API.
 They are described here for reference and in some sense backward compatibility with older recommended models of
 development. In the future, the contents of this section will become less and less relevant.

 == Features

 A `Feature` implementation describes the capabilities of a `Graph` instance. This interface is implemented by graph
 system providers for two purposes:

 . It tells users the capabilities of their `Graph` instance.
 . It allows the features they do comply with to be tested against the Gremlin Test Suite - tests that do not comply are "ignored").

 The following example in the Gremlin Console shows how to print all the features of a `Graph`:

 [gremlin-groovy]
 ----
 graph = TinkerGraph.open()
 graph.features()
 ----

 A common pattern for using features is to check their support prior to performing an operation:

 [gremlin-groovy]
 ----
 graph.features().graph().supportsTransactions()
 graph.features().graph().supportsTransactions() ? g.tx().commit() : "no tx"
 ----

 TIP: To ensure provider agnostic code, always check feature support prior to usage of a particular function.  In that
 way, the application can behave gracefully in case a particular implementation is provided at runtime that does not
 support a function being accessed.

 WARNING: Features of reference graphs which are used to connect to remote graphs do not reflect the features of the
 graph to which it connects. It reflects the features of instantiated graph itself, which will likely be quite
 different considering that reference graphs will typically be immutable.

 [[vertex-properties]]
 == Vertex Properties

 image:vertex-properties.png[width=215,float=left] TinkerPop introduces the concept of a `VertexProperty<V>`. All the
 properties of a `Vertex` are a `VertexProperty`. A `VertexProperty` implements `Property` and as such, it has a
 key/value pair. However, `VertexProperty` also implements `Element` and thus, can have a collection of key/value
 pairs. Moreover, while an `Edge` can only have one property of key "name" (for example), a `Vertex` can have multiple
 "name" properties. With the inclusion of vertex properties, two features are introduced which ultimately advance the
 graph modelers toolkit:

 . Multiple properties (*multi-properties*): a vertex property key can have multiple values.  For example, a vertex can
 have multiple "name" properties.
 . Properties on properties (*meta-properties*): a vertex property can have properties (i.e. a vertex property can
 have key/value data associated with it).

 Possible use cases for meta-properties:

 . *Permissions*: Vertex properties can have key/value ACL-type permission information associated with them.
 . *Auditing*: When a vertex property is manipulated, it can have key/value information attached to it saying who the
 creator, deletor, etc. are.
 . *Provenance*: The "name" of a vertex can be declared by multiple users.  For example, there may be multiple spellings
 of a name from different sources.

 A running example using vertex properties is provided below to demonstrate and explain the API.

 [gremlin-groovy]
 ----
 graph = TinkerGraph.open()
 g = graph.traversal()
 v = g.addV().property('name','marko').property('name','marko a. rodriguez').next()
 g.V(v).properties('name').count() <1>
 v.property(list, 'name', 'm. a. rodriguez') <2>
 g.V(v).properties('name').count()
 g.V(v).properties()
 g.V(v).properties('name')
 g.V(v).properties('name').hasValue('marko')
 g.V(v).properties('name').hasValue('marko').property('acl','private') <3>
 g.V(v).properties('name').hasValue('marko a. rodriguez')
 g.V(v).properties('name').hasValue('marko a. rodriguez').property('acl','public')
 g.V(v).properties('name').has('acl','public').value()
 g.V(v).properties('name').has('acl','public').drop() <4>
 g.V(v).properties('name').has('acl','public').value()
 g.V(v).properties('name').has('acl','private').value()
 g.V(v).properties()
 g.V(v).properties().properties() <5>
 g.V(v).properties().property('date',2014) <6>
 g.V(v).properties().property('creator','stephen')
 g.V(v).properties().properties()
 g.V(v).properties('name').valueMap()
 g.V(v).property('name','okram') <7>
 g.V(v).properties('name')
 g.V(v).values('name') <8>
 ----

 <1> A vertex can have zero or more properties with the same key associated with it.
 <2> If a property is added with a cardinality of `Cardinality.list`, an additional property with the provided key will be added.
 <3> A vertex property can have standard key/value properties attached to it.
 <4> Vertex property removal is identical to property removal.
 <5> Gets the meta-properties of each vertex property.
 <6> A vertex property can have any number of key/value properties attached to it.
 <7> `property(...)` will remove all existing key'd properties before adding the new single property (see `VertexProperty.Cardinality`).
 <8> If only the value of a property is needed, then `values()` can be used.

 If the concept of vertex properties is difficult to grasp, then it may be best to think of vertex properties in terms
 of "literal vertices." A vertex can have an edge to a "literal vertex" that has a single value key/value -- e.g.
 "value=okram." The edge that points to that literal vertex has an edge-label of "name." The properties on the edge
 represent the literal vertex's properties. The "literal vertex" can not have any other edges to it (only one from the
 associated vertex).

 [[the-crew-toy-graph]]
 TIP: A toy graph demonstrating all of the new TinkerPop graph structure features is available at
 `TinkerFactory.createTheCrew()` and `data/tinkerpop-crew*`. This graph demonstrates multi-properties and meta-properties.

 .TinkerPop Crew
 image::the-crew-graph.png[width=685]

 [gremlin-groovy,theCrew]
 ----
 g.V().as('a').
       properties('location').as('b').
       hasNot('endTime').as('c').
       select('a','b','c').by('name').by(value).by('startTime') // determine the current location of each person
 g.V().has('name','gremlin').inE('uses').
       order().by('skill',asc).as('a').
       outV().as('b').
       select('a','b').by('skill').by('name') // rank the users of gremlin by their skill level
 ----

 == Graph Variables

 `Graph.Variables` are key/value pairs associated with the graph itself -- in essence, a `Map<String,Object>`. These
 variables are intended to store metadata about the graph. Example use cases include:

  * *Schema information*: What do the namespace prefixes resolve to and when was the schema last modified?
  * *Global permissions*: What are the access rights for particular groups?
  * *System user information*: Who are the admins of the system?

 An example of graph variables in use is presented below:

 [gremlin-groovy]
 ----
 graph = TinkerGraph.open()
 graph.variables()
 graph.variables().set('systemAdmins',['stephen','peter','pavel'])
 graph.variables().set('systemUsers',['matthias','marko','josh'])
 graph.variables().keys()
 graph.variables().get('systemUsers')
 graph.variables().get('systemUsers').get()
 graph.variables().remove('systemAdmins')
 graph.variables().keys()
 ----

 IMPORTANT: Graph variables are not intended to be subject to heavy, concurrent mutation nor to be used in complex
 computations. The intention is to have a location to store data about the graph for administrative purposes.

 WARNING: Attempting to set graph variables in a reference graph will not promote them to the remote graph. Typically,
 a reference graph has immutable features and will not support this features.

 [[transactions]]
 == Graph Transactions

 image:gremlin-coins.png[width=100,float=right] A link:http://en.wikipedia.org/wiki/Database_transaction[database transaction]
 represents a unit of work to execute against the database. Transactions in TinkerPop can be considered in several
 contexts: transactions for <<connecting-embedded,embedded graphs>> via the Graph API,
 transactions for <<connecting-gremlin-server,Gremlin Server>> and transactions within
 <<connecting-rgp,Remote Gremlin Providers>>. For those following recommended patterns, the concepts presented in the
 embedded section should generally be of little interest and are present mainly for reference. Utilizing those
 transactional features will greatly reduce the portability of an application's Gremlin code.

 [[tx-embedded]]
 === Embedded

 When on the JVM using an <<connecting-embedded,embedded graph>>, there is considerable flexibility for working with
 transactions. With the Graph API, transactions are controlled by an implementation of the `Transaction` interface and
 that object can be obtained from the `Graph` interface using the `tx()` method.  It is important to note that the
 `Transaction` object does not represent a "transaction" itself.  It merely exposes the methods for working with
 transactions (e.g. committing, rolling back, etc).

 Most `Graph` implementations that `supportsTransactions` will implement an "automatic" `ThreadLocal` transaction,
 which means that when a read or write occurs after the `Graph` is instantiated, a transaction is automatically
 started within that thread.  There is no need to manually call a method to "create" or "start" a transaction.  Simply
 modify the graph as required and call `graph.tx().commit()` to apply changes or `graph.tx().rollback()` to undo them.
 When the next read or write action occurs against the graph, a new transaction will be started within that current
 thread of execution.

 When using transactions in this fashion, especially in web application (e.g. HTTP server), it is important to ensure
 that transactions do not leak from one request to the next.  In other words, unless a client is somehow bound via
 session to process every request on the same server thread, every request must be committed or rolled back at the end
 of the request.  By ensuring that the request encapsulates a transaction, it ensures that a future request processed
 on a server thread is starting in a fresh transactional state and will not have access to the remains of one from an
 earlier request. A good strategy is to rollback a transaction at the start of a request, so that if it so happens that
 a transactional leak does occur between requests somehow, a fresh transaction is assured by the fresh request.

 TIP: The `tx()` method is on the `Graph` interface, but it is also available on the `TraversalSource` spawned from a
 `Graph`.  Calls to `TraversalSource.tx()` are proxied through to the underlying `Graph` as a convenience.

 WARNING: TinkerPop provides for basic transaction control, however, like many aspects of TinkerPop, it is up to the
 graph system provider to choose the specific aspects of how their implementation will work and how it fits into the
 TinkerPop stack. Be sure to understand the transaction semantics of the specific graph implementation that is being
 utilized as it may present differing functionality than described here.

 ==== Configuring

 Determining when a transaction starts is dependent upon the behavior assigned to the `Transaction`.  It is up to the
 `Graph` implementation to determine the default behavior and unless the implementation doesn't allow it, the behavior
 itself can be altered via these `Transaction` methods:

 [source,java]
 ----
 public Transaction onReadWrite(Consumer<Transaction> consumer);

 public Transaction onClose(Consumer<Transaction> consumer);
 ----

 Providing a `Consumer` function to `onReadWrite` allows definition of how a transaction starts when a read or a write
 occurs. `Transaction.READ_WRITE_BEHAVIOR` contains pre-defined `Consumer` functions to supply to the `onReadWrite`
 method.  It has two options:

 * `AUTO` - automatic transactions where the transaction is started implicitly to the read or write operation
 * `MANUAL` - manual transactions where it is up to the user to explicitly open a transaction, throwing an exception
 if the transaction is not open

 Providing a `Consumer` function to `onClose` allows configuration of how a transaction is handled when
 `Transaction.close()` is called.  `Transaction.CLOSE_BEHAVIOR` has several pre-defined options that can be supplied to
 this method:

 * `COMMIT` - automatically commit an open transaction
 * `ROLLBACK` - automatically rollback an open transaction
 * `MANUAL` - throw an exception if a transaction is open, forcing the user to explicitly close the transaction

 IMPORTANT: As transactions are `ThreadLocal` in nature, so are the transaction configurations for `onReadWrite` and
 `onClose`.

 Once there is an understanding for how transactions are configured, most of the rest of the `Transaction` interface
 is self-explanatory. Note that <<neo4j-gremlin,Neo4j-Gremlin>> is used for the examples to follow as TinkerGraph does
 not support transactions.

 [source,groovy]
 ----
 gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
 ==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
 gremlin> g = graph.traversal()
 ==>graphtraversalsource[neo4jgraph[community single [/tmp/neo4j]], standard]
 gremlin> graph.features()
 ==>FEATURES
 > GraphFeatures
 >-- Transactions: true  <1>
 >-- Computer: false
 >-- Persistence: true
 ...
 gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.AUTO) <2>
 ==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
 gremlin> g.addV("person").("name","stephen")  <3>
 ==>v[0]
 gremlin> g.tx().commit() <4>
 ==>null
 gremlin> g.tx().onReadWrite(Transaction.READ_WRITE_BEHAVIOR.MANUAL) <5>
 ==>org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph$Neo4jTransaction@1c067c0d
 gremlin> g.tx().isOpen()
 ==>false
 gremlin> g.addV("person").("name","marko") <6>
 Open a transaction before attempting to read/write the transaction
 gremlin> g.tx().open() <7>
 ==>null
 gremlin> g.addV("person").("name","marko") <8>
 ==>v[1]
 gremlin> g.tx().commit()
 ==>null
 ----

 <1> Check `features` to ensure that the graph supports transactions.
 <2> By default, `Neo4jGraph` is configured with "automatic" transactions, so it is set here for demonstration purposes only.
 <3> When the vertex is added, the transaction is automatically started.  From this point, more mutations can be staged
 or other read operations executed in the context of that open transaction.
 <4> Calling `commit` finalizes the transaction.
 <5> Change transaction behavior to require manual control.
 <6> Adding a vertex now results in failure because the transaction was not explicitly opened.
 <7> Explicitly open a transaction.
 <8> Adding a vertex now succeeds as the transaction was manually opened.

 NOTE: It may be important to consult the documentation of the `Graph` implementation you are using when it comes to the
 specifics of how transactions will behave.  TinkerPop allows some latitude in this area and implementations may not have
 the exact same behaviors and link:https://en.wikipedia.org/wiki/ACID[ACID] guarantees.

 ==== Threaded Transactions

 Most `Graph` implementations that support transactions do so in a `ThreadLocal` manner, where the current transaction
 is bound to the current thread of execution. Consider the following example to demonstrate:

 [source,java]
 ----
 GraphTraversalSource g = graph.traversal();
 g.addV("person").("name","stephen").iterate();

 Thread t1 = new Thread(() -> {
     g.addV("person").("name","josh").iterate();
 });

 Thread t2 = new Thread(() -> {
     g.addV("person").("name","marko").iterate();
 });

 t1.start()
 t2.start()

 t1.join()
 t2.join()

 g.tx().commit();
 ----

 The above code shows three vertices added to `graph` in three different threads: the current thread, `t1` and
 `t2`.  One might expect that by the time this body of code finished executing, that there would be three vertices
 persisted to the `Graph`.  However, given the `ThreadLocal` nature of transactions, there really were three separate
 transactions created in that body of code (i.e. one for each thread of execution) and the only one committed was the
 first call to `addV()` in the primary thread of execution.  The other two calls to that method within `t1` and `t2`
 were never committed and thus orphaned.

 A `Graph` that `supportsThreadedTransactions` is one that allows for a `Graph` to operate outside of that constraint,
 thus allowing multiple threads to operate within the same transaction.  Therefore, if there was a need to have three
 different threads operating within the same transaction, the above code could be re-written as follows:

 [source,java]
 ----
 Graph threaded = graph.tx().createThreadedTx();
 GraphTraversalSource g = graph.traversal();
 g.addV("person").("name","stephen").iterate();

 Thread t1 = new Thread(() -> {
     threaded.addV("person").("name","josh").iterate();
 });

 Thread t2 = new Thread(() -> {
     threaded.addV("person").("name","marko").iterate();
 });

 t1.start()
 t2.start()

 t1.join()
 t2.join()

 g.tx().commit();
 ----

 In the above case, the call to `graph.tx().createThreadedTx()` creates a new `Graph` instance that is unbound from the
 `ThreadLocal` transaction, thus allowing each thread to operate on it in the same context.  In this case, there would
 be three separate vertices persisted to the `Graph`.

 [[tx-gremlin-server]]
 === Gremlin Server

 The available capability for transactions with <<gremlin-server,Gremlin Server>> is dependent upon the method of
 interaction that is used. The preferred method for <<connecting-gremlin-server,interacting with Gremlin Server>>
 is via websockets and bytecode based requests. In this mode of operations each Gremlin traversal that is executed will
 be treated as a single transaction. Traversals that fail will have their transaction rolled back and successful
 iteration of a traversal will conclude with a transactional commit. How the graph hosted in Gremlin Server reacts to
 those commands is dependent on the graph chosen and it is therefore important to understand the transactional semantics
 of that graph when developing an application.

 Gremlin Server also has the option to accept Gremlin-based scripts. The scripting approach provides access to the
 Graph API and thus also the transactional model described in the <<tx-embedded,embedded>> section. Therefore a single
 script can have the ability to execute multiple transactions per request with complete control provided to the
 developer to commit or rollback transactions as needed.

 There are two methods for sending scripts to Gremlin Server: sessionless and session-based. With sessionless requests
 there will always be an attempt to close the transaction at the end of the request with a commit if there are no errors
 or a rollback if there is a failure. It is therefore unnecessary to close transactions manually within scripts
 themselves. By default, session-based requests do not have this quality. The transaction will be held open on the
 server until the user closes it manually. There is an option to have automatic transaction management for sessions.
 More information on this topic can be found in the <<considering-transactions,Considering Transactions>> Section and
 the <<sessions,Considering Sessions>> Section.

 While those sections provide some additional details, the short advice is to avoid scripts when possible and prefer
 bytecode based requests.

 [[tx-rgp]]
 === Remote Gremlin Providers

 At this time, transactional patterns for Remote Gremlin Providers are largely in line with Gremlin Server. Most
 offer bytecode or script based sessionless requests, which have automatic transaction management, such that a
 successful traversal will commit on success and a failing traversal will rollback. As most of these RGPs do not
 expose a `Graph` instances, access to lower level transactional functions even in a sessionless fashion are not
 typically allowed. The nature of what a "transaction" means will be dependent on the RGP as is the case with any
 TinkerPop-enabled graph system, so it is important to consult that systems documentation for more details.

 == Namespace Conventions

 End users, <<implementations,graph system providers>>, <<graphcomputer,`GraphComputer`>> algorithm designers,
 <<gremlin-plugins,GremlinPlugin>> creators, etc. all leverage properties on elements to store information. There are
 a few conventions that should be respected when naming property keys to ensure that conflicts between these
 stakeholders do not conflict.

 * End users are granted the _flat namespace_ (e.g. `name`, `age`, `location`) to key their properties and label their elements.
 * Graph system providers are granted the _hidden namespace_ (e.g. `~metadata`) to key their properties and labels.
 Data keyed as such is only accessible via the graph system implementation and no other stakeholders are granted read
 nor write access to data prefixed with "~" (see `Graph.Hidden`). Test coverage and exceptions exist to ensure that
 graph systems respect this hard boundary.
 * <<vertexprogram,`VertexProgram`>> and <<mapreduce,`MapReduce`>> developers should leverage _qualified namespaces_
 particular to their domain (e.g. `mydomain.myvertexprogram.computedata`).
 * `GremlinPlugin` creators should prefix their plugin name with their domain (e.g. `mydomain.myplugin`).

 IMPORTANT: TinkerPop uses `tinkerpop.` and `gremlin.` as the prefixes for provided strategies, vertex programs, map
 reduce implementations, and plugins.

 The only truly protected namespace is the _hidden namespace_ provided to graph systems. From there, it's up to
 engineers to respect the namespacing conventions presented.