| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"] |
| |
| *4.0.0* |
| |
| :toc-position: left |
| |
| = TinkerPop 4.x Design Ideas |
| |
| Apache TinkerPop™ 4.x is not a version considered on the immediate horizon, but there are often points in the day to |
| day development of TinkerPop 3.x where there are changes of importance, novelty and usefulness that are so big that |
| they could only be implemented under a major new version. This document is meant to track these concepts as they |
| develop, so that at some point in the future they can be referenced in a single place. |
| |
| There is no particular layout or style to this document. Simple bullet points, open questions posed as single |
| sentences, or fully structured document headers and content are all acceptable. The main point is to capture ideas |
| for future consideration when 4.x becomes the agenda of the day for The TinkerPop. |
| |
| image:tp4-think.png[] |
| |
| == The Main Features |
| |
| TinkerPop4 should focus on the most successful aspects of TinkerPop3 and it should avoid the traps realized in TinkerPop3. |
| These items include: |
| |
| * The concept of Gremlin as both a virtual machine and language. |
| ** A standard bytecode specification should be provided. |
| ** A standard machine architecture should be provided. |
| * The concept of Gremlin language variants. |
| ** It should be easy to create Gremlin variants in every major programming language. |
| ** A standard template should be followed for all languages. |
| ** Apache TinkerPop should provide variants in all major programming languges. |
| * The concept of `Traversal` as the sole means of interacting with the graph. |
| ** The role of Blueprints should be significantly reduced. |
| ** The role of Gremlin should be significantly increased. |
| |
| == Hiding Blueprints |
| |
| Originally from the link:https://lists.apache.org/thread.html/b4d80072ad36849b4e9cd3308f87115660574e3e7a4abb7ee68e959b@%3Cdev.tinkerpop.apache.org%3E[mailing list]: |
| |
| Throughout our documentation we show uses of the “Blueprints API” (i.e. Graph/Vertex/Edge/etc. classes & methods) as |
| well as the use of the Traversal API (i.e. Gremlin). |
| |
| Enabling users to have two ways of interacting with the graph system has its problems: |
| |
| 1. The DetachedXXX problem — how much data should a returned vertex/edge/etc. have associated with it? |
| 2. `graph.addVertex()` and `g.addV()` — which should I use? The first is faster but is not recommended. |
| 3. `SubgraphStrategy` leaking — I get subgraphs with Gremlin, but can then directly interact with the vertex objects to see more than I should. |
| 4. `VertexProgram` model — I write traversals with Traversal API, but then develop VertexPrograms with the Blueprints API. That’s weird. |
| 5. GremlinServer returning fat objects — Serializers are created property-rich vertices and edges. The awkward HaltedTraversalStrategy solution. |
| 6. … various permutations of these source problems. |
| |
| In TinkerPop4 the solution might be as follows: |
| |
| There should be two “Graph APIs.” |
| |
| 1. Provider Graph API: This is the current Blueprints API with `Graph.addVertex()`, `Vertex.edges()`, `Edge.inVertex()`, etc. |
| 2. User Graph API: This is a ReferenceXXX API. |
| |
| The first API is well known, but the second bears further discussion. `ReferenceGraph` is simply a reference/dummy/proxy |
| to the provider Graph API. `ReferenceGraph` has the following API: |
| |
| * `ReferenceGraph.open()` |
| * `ReferenceGraph.close()` |
| * `ReferenceGraph.tx()` // assuming we like the current transaction model (??) |
| * `ReferenceGraph.traversal()` |
| |
| That is it. What does this entail? Assume the following traversal: |
| |
| [source,java] |
| ---- |
| g = ReferenceGraph.open(config).traversal() |
| g.V(1).out(‘knows’) |
| ---- |
| |
| `ReferenceGraph` is almost like a `RemoteGraph` (`RemoteStrategy`) in that it makes a connection (remote or inter-JVM) |
| to the provider Graph API. When `g.V(1).out(‘knows’)` executes, it is really sending the bytecode to the provider Graph |
| for execution (as specified by the config of `ReferenceGraph.open()`). Thus, once it hits the provider's graph, |
| `ProviderVertex`, `ProviderEdge`, etc. are the objects being processed. However, what the traversal’s `Iterator<Vertex>` |
| returns is `ReferenceVertex`! That is, it never returns `ProviderVertex`. In this way, regardless if the user is |
| going “over the wire” or within the same JVM or against a different provider’s graph database or from |
| Gremlin-Python/C#/etc., all the vertices are simply ‘reference vertices’ (id + label). This makes it so that users |
| never interact with the graph element objects themselves directly. They can ONLY interact with the graph via |
| traversals! At most they can `ReferenceVertex.id()` and `ReferenceVertex.label()`. Thats it, — no mutations, not |
| walking edges, nada! And moreover, since ReferenceXXX has enough information to re-attach to the source graph, they |
| can always do the following to get more information: |
| |
| [source,java] |
| ---- |
| v = g.V(1).out(‘knows’).next() |
| g.V(v).values(‘name’) |
| ---- |
| |
| This split into two Graph APIs will enables us to make a hard boundary between what the provider (vendor) needs to |
| implement and what the user (developer) gets to access. |
| |
| === Comments |
| |
| There is a question mark next to `ReferenceGraph.tx()` - Transactions are a bit of an open question for future versions |
| of TinkerPop and likely deserve their own section in this document. The model used for last three version of TinkerPop |
| now is rooted in the Neo4j approach to transactions and is often more trouble than it should be for us and providers. |
| Distributed transactions are a challenge and don't apply to every provider. Transactions are further complicated by |
| GLVs. The idea of local subgraphs for mutations and transaction management might be good but that goes against having |
| just `ReferenceGraph`. |
| |
| == Gremlin Language Subset |
| |
| On link:https://issues.apache.org/jira/browse/TINKERPOP-1417[TINKERPOP-1417], it was suggested that we "Create a |
| Gremlin language subset that is easy to implement on any VM". Implementing the Gremlin VM in another language is |
| pretty straightforward. However, its a lot of code.. all these steps implementations. One thing we could do to make |
| it easy for database providers not on the JVM (e.g. ArangoDB and C) is to create "Gremlito" (Gremlin--). This language |
| subset wouldn't support side-effects, sacks, match, etc. Basically, just simple traversal steps and reducing barrier |
| terminals. |
| |
| Thus: |
| |
| * out, in, both, values, outE, inV, id, label, etc. |
| * repeat |
| * select, project |
| * where, has, limit, range, is, dedup |
| * path, simplePath, cyclicPath |
| * groupCount, sum, group, count, max, min, etc. (reducing barriers) |
| |
| === Comments |
| |
| This has an interesting potential impact on GLVs because "Little Gremlin" could be implemented within them for |
| client-side traversals over remote subgraphs, where the subgraph is like a remote transaction. All graph mutations |
| essentially build a subgraph which is merged into the primary graph. That subgraph is effectively the "transaction". |
| Build it locally then submit it remotely and have the server sort out the merging. It's perhaps the most natural way |
| to load data. With "Gremlinito" you then get the added power of being able to traverse a local subgraph. |
| |
| == Testing Framework |
| |
| Consider a testing framework based on the Gherkin tests from 3.x and a `gremlin-test` parent module for all the test |
| framework modules that will potentially be needed. In 3.x, we found situation where having test modules beyond |
| `gremlin-test` would have been helpful, so a parent module that held all those would probably be smart. |