blob: 7d551ff2fc73e134c725f6bb938a8540a01051af [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
= TinkerPop 4.0.0
image::gremlins-wildest-dreams.png[width=185]
*Gremlin's Wildest Dreams*
== TinkerPop 4.0.0
*Release Date: NOT OFFICIALLY RELEASED YET*
Please see the link:https://github.com/apache/tinkerpop/blob/4.0.0/CHANGELOG.asciidoc#release-4-0-0[changelog] for a
complete list of all the modifications that are part of this release.
=== Upgrading for Users
==== SLF4j 2.x
TinkerPop has generally upgraded to SLF4j 2.x which brings with it some important changes to log initialization which
may affect user applications that are upgrading from TinkerPop 3.x. Please see the
[SLF4j documentation](https://www.slf4j.org/faq.html#changesInVersion200) that explains the differences and how they
might apply.
=== Upgrading for Providers
==== Graph System Providers
==== Graph Driver Providers
== TinkerPop 4.0.0-beta.1
*Release Date: January 17, 2025*
Please see the link:https://github.com/apache/tinkerpop/blob/4.0.0-beta.1/CHANGELOG.asciidoc#release-4-0-0-beta-1[changelog] for a
complete list of all the modifications that are part of this release.
NOTE: 4.0.0-beta.1 is a beta/milestone release. It is for meant as a preview version to try out the new HTTP API
features in the server and drivers (Java/Python only). As this is a beta version only, you can expect breaking changes to
occur in future betas for 4.0.0 on the way to its General Availability release. Items that have important
limitations and constraints pertinent to this beta will be highlighted through the documentation inside an
"IMPORTANT" box that starts with "4.0 Beta Release".
=== Upgrading for Users
[[result-bulking-from-server]]
==== Result Bulking from Server
In previous versions, when a traversal is submitted through the DriverRemoteConnection (DRC) via the Bytecode processor,
the results from the server were bulked as Traverser, which provides a form of result optimization across the wire.
Starting with 4.0, with the removal of Bytecode and Traverser serializer, this optimization is now achieved via
`GraphBinaryV4` response message serialization, and can be controlled through cluster setting or per request option.
Per request option setting will always override cluster settings, and regardless of cluster or request option settings,
bulking will only occur if the script processing language is set to `gremlin-lang` and the serializer is set to `GraphBinaryV4`.
[source,java]
----
// cluster setting
Cluster cluster = Cluster.build().bulkResults(true).create();
// per request option
GraphTraversalSource g = traversal().with(DriverRemoteConnection.using(cluster));
List result = g.with("language", "gremlin-lang").with("bulkResults", true).inject(1).toList();
----
By default, the cluster setting of `bulkResults` is false. To remain consistent with previous behavior, remote traversal
submitted through the DRC will always send a request option setting `bulkResults` to `true`. This implies that if `gremlin-lang`
script engine and `GraphBinaryV4` serializer are used, then server will bulk results before sending regardless of cluster setting,
and can only be disabled via per request option.
==== BulkSet Behavior Changes
Starting with 4.0, steps which return BulkSet (e.g. `aggregate()`) will have results returned in different format
depending on embedded or remote usage.
For embedded cases, a BulkSet will be returned as before.
For remote cases, BulkSets will now be expanded into Lists upon deserialization with `gremlin-driver`. All other GLVs
already expanded BulkSet to List prior to TinkerPop 4. Each element in the BulkSet will appear in the list the same
number of times as specified by its bulk value.
==== Configuration changes
This is a placeholder to summarize configuration-related changes.
* Gremlin Server
** `maxContentLength` setting has been renamed to `maxRequestContentLength`.
* Java GLV
** `enableCompression` setting has been removed.
** `bulkResults` has been added. See <<result-bulking-from-server, result bulking>> for more details.
** `maxContentLength` setting has been renamed to `maxResponseContentLength` and now blocks incoming responses that are
too large based on total response size.
** `addInterceptorAfter`, `addInterceptorBefore`, `addInterceptor` and `removeInterceptor` have been added.
See <<java-requestinterceptor, Changes to Java RequestInterceptor>> for more details.
* Python GLV
** `enableCompression` setting has been removed.
** `bulk_results` has been added. See <<result-bulking-from-server, result bulking>> for more details.
** `interceptors` has been added to the Python GLV to modify HTTP requests. See
<<python-requestinterceptor, Addition of Python interceptor>> for more details.
** `message_serializer` setting has been separated into `request_serializer` and `response_serializer`. See
<<python-requestinterceptor, Addition of Python interceptor>> for more details.
** `username` and `password` settings for the Python GLV has been replaced with `auth`.
** `kerberized_service` setting has been removed.
** `session` setting has been removed.
==== Removal of :remote and :submit Console Commands
The `:remote` and `:submit` commands, used to establish connections and send scripts to a remote `gremlin-server`, are
no longer supported in `gremlin-console`. If sending a traversal, users should instead create a `RemoteTraversalSource`
(see <<simplified-g-creation, Simplification to g Creation for examples>>).
While a `RemoteTraversalSource` is the preferred method of executing remote traversals from `gremlin-console`, it
remains possible to send scripts directly using a `gremlin-driver` `Client`.
[gremlin-groovy]
----
cluster = Cluster.open('conf/remote.yaml')
client = cluster.connect()
client.submit("g.V()")
client.close()
cluster.close()
----
==== Removal of Gephi Console Plugin
The Gephi Plugin for `gremlin-console` is no longer supported and has been removed. Apache TinkerPop no longer maintains
any graph visualization tool, however there are a number of community-led alternatives available, many of which are
listed on the link:https://tinkerpop.apache.org/community.html#powered-by[TinkerPop community page].
[[java-requestinterceptor]]
==== Changes to Java RequestInterceptor
Because the underlying transport has been changed from WebSockets to HTTP, the usage of the `RequestInterceptor` has
changed as well. The `RequestInterceptor` will now be run per request and will allow you to completely modify the HTTP
request that is sent to the server. `Cluster` has four new methods added to it: `addInterceptorAfter`,
`addInterceptorBefore`, `removeInterceptor` and `addInterceptor`. Each interceptor requires a name as it will be used
to insert new interceptors in different positions.
The interceptors work with a new class called HttpRequest. This is just a basic abstraction over a request but it also
contains some useful strings for common headers. The initial `HttpRequest` that is passed to the first interceptor will
contain a `RequestMessage`. `RequestMessage` is immutable and only certain keys can be added to them. If you want to
customize the body by adding other fields, you will need to make a different copy of the `RequestMessage` or completely
change the body to contain a different data type. The final interceptor must return a `HttpRequest` whose body contains
a `byte[]`.
After the initial HTTP request is generated, the interceptors will be called in order to allow the request to be
modified. After each `RequestInterceptor` is run, the request is updated with the data from the final `HttpRequest` and
that is sent to the endpoint. There is a default interceptor added to every `Cluster` called "serializer". This
interceptor is responsible for serializing the request body is which what the server normally expects. This is intended
to be an advanced customization technique that should only be used when needed.
[[python-requestinterceptor]]
==== Addition of Python interceptor
HTTP interceptors have been added to `gremlin-python` to enable capability similar to that of Java GLV. These
interceptors can be passed into either a `DriverRemoteConnection` or a `Client` using the interceptors parameter. An
interceptor is a `Callable` that accepts one argument which is the HTTP request (dictionary containing header, payload
and auth) or a list/tuple of these functions. The interceptors will run after the request serializer has run but before
any auth functions run so the HTTP request may still get modified after your interceptors are run. In situations where
you don't want the payload to be serialized, the `message_serializer` has been split into a `request_serializer` and a
`response_serializer`. Simply set the `request_serializer` to `None` and this will prevent the `RequestMessage` from
being serialized. Again, this is expected to be an advanced feature so some knowledge of implementation details will be
required to make this work. For example, you'll need to know what payload formats are accepted by `aiohttp` for the
request to be sent.
==== Changes to deserialization for gremlin-javascript
Starting from this version, `gremlin-javascript` will deserialize `Set` data into a ECMAScript 2015 Set. Previously,
these were deserialized into arrays.
==== Gremlin Grammar Changes
A number of changes have been introduced to the Gremlin grammar to help make it be more consistent and easier to use.
*`new` keyword is now optional*
The `new` keyword is now optional in all cases where it was previously used. Both of the following examples are now
valid syntax with the second being the preferred form going forward:
[source,groovy]
----
g.V().withStrategies(new SubgraphStrategy(vertices: __.hasLabel('person')))
g.V().withStrategies(SubgraphStrategy(vertices: __.hasLabel('person')))
----
In a future version, it is likely that the `new` keyword will be removed entirely from the grammar.
*Refined variable support*
The Gremlin grammar allows variables to be used in various places. Unlike Groovy, from which the Gremlin grammar is
partially derived and which allows variables to be used for any argument to a method, Gremlin only allows for variables
to be used when they refer to particular types. In making this change it did mean that all enums like, `Scope`, `Pop`,
`Order`, etc. can no longer be used in that way and can therefore only be recognized as literal values.
*Supports withoutStrategies()*
The `withoutStrategies()` configuration step is now supported syntax for the grammar. While this option is not commonly
used it is still a part of the Gremlin language and there are times where it is helpful to have this fine grained
control over how a traversal works.
[source,groovy]
----
g.V().withoutStrategies(CountStrategy)
----
See: link:https://issues.apache.org/jira/browse/TINKERPOP-2862[TINKERPOP-2862],
link:https://issues.apache.org/jira/browse/TINKERPOP-3046[TINKERPOP-3046]
==== Renamed none() to discard()
The `none()` step, which was primarily used by `iterate()` to discard traversal results in remote contexts, has been
renamed to `discard()`. In its place is a new list filtering step `none()`, which takes a predicate as an argument and
passes lists with no elements matching the predicate.
==== Improved handling of integer overflows
Integer overflows caused by addition and multiplication operations will throw an exception instead of being silently
skipped with incorrect result.
==== SeedStrategy Construction
The `SeedStrategy` public constructor has been removed for Java and has been replaced by the builder pattern common
to all strategies. This change was made to ensure that the `SeedStrategy` could be constructed in a consistent manner.
==== Removal of `gremlin-archetype`
`gremlin-archetype`, which contained example projects demonstrating the use cases of TinkerPop, has been removed in
favor of newer sample applications which can be found in each GLV's `examples` folder.
==== Improved Translators
The various Java `Translator` implementations allowing conversion of Gremlin traversals to string forms in various
languages have been modified considerably. First, they have been moved from to the
`org.apache.tinkerpop.gremlin.language.translator` package, because they now depend on the ANTLR grammar in
`gremlin-language` to handled the translation process. Making this change allowed for a more accurate translation of
Gremlin that doesn't need to rely on reflection and positional arguments to determine which step was intended for use.
Another important change was the introduction of specific translators for Groovy and Java. While Groovy translation
tends to work for most Java cases, there is syntax specific to Groovy where it does not. With a specific Java
translator, the translation process can be more accurate and less error prone.
The syntax for the translators has simplified as well. The translator function now takes a Gremlin string and a target
language to translate to. Consider the following example:
[source,text]
----
gremlin> GremlinTranslator.translate("g.V().out('knows')", Translator.GO)
==>g.V().Out("knows")
----
Further note that Gremlin language variants produce `gremlin-language` compliant strings directly since bytecode was
removed. As a result, all translators in .NET, Python, Go and Javascript have been removed.
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3028[TINKERPOP-3028]
==== Change to `OptionsStrategy` in `gremlin-python`
The `\\__init__()` syntax has been updated to be both more pythonic and more aligned to the `gremlin-lang` syntax.
Previously, `OptionsStrategy()` took a single argument `options` which was a `dict` of all options to be set.
Now, all options should be set directly as keyword arguments.
For example:
[source,python]
----
# 3.7 and before:
g.with_strategies(OptionsStrategy(options={'key1': 'value1', 'key2': True}))
# 4.x and newer:
g.with_strategies(OptionsStrategy(key1='value1', key2=True))
myOptions = {'key1': 'value1', 'key2': True}
# 3.7 and before:
g.with_strategies(OptionsStrategy(options=myOptions))
# 4.x and newer:
g.with_strategies(OptionsStrategy(**myOptions))
----
==== Custom Traversal Strategy Construction
Traversal strategy construction has been updated such that it is no longer required to have concrete classes for each
strategy being added to a graph traversal (use of concrete classes remains viable and is recommended for "native"
TinkerPop strategies). To use strategies without a concrete class, `TraversalStrategyProxy` can be used in Java, and
`TraversalStrategy` in Python.
All the following examples will produce the script `g.withStrategies(new MyStrategy(config1:'my value',config2:123))`:
[source,java]
----
Map<String, Object> configMap = new LinkedHashMap<>();
configMap.put("config1", "my value");
configMap.put("config2", 123);
TraversalStrategy strategyProxy = new TraversalStrategyProxy("MyStrategy", new MapConfiguration(configMap));
GraphTraversal traversal = g.withStrategies(strategyProxy);
----
[source,python]
----
g.with_strategies(TraversalStrategy(
strategy_name='MyStrategy',
config1='my value',
config2=123
))
----
==== Changes to Serialization
The GLVs will only support GraphBinaryV4 and GraphSON support will be removed. This means that the serializer option
that was available in most GLVs has been removed. GraphBinary is a more compact format and has support for the same
types. This should lead to increased performance for users upgrading from any version of GraphSON to GraphBinary.
The number of serializable types has been reduced in V4. For example, only a single temporal type remains. You have two
options when trying to work with data types whose serializer has been removed: first, you can attempt to convert the
data to another type that still have a serializer or, second, the type may have been too specific and therefore removed
in which case your provider should have a Provider Defined Type (PDT) for it. See the next paragraph for information on
PDTs.
Custom serializers have also been removed so if you previously included those as part of your application, they should
now be removed. In its place, PDTs have been introduced. In particular, there is the Primitive PDT and the Composite
PDT. Primitive PDTs are string-based representations of a primitive type supported by your provider. Composite types
contain a map of fields. You should consult your provider's documentation to determine what types of fields a
particular PDT may contain.
==== Changes to Authentication and Authorization
With the move to HTTP, the only authentication option supported out-of-the-box is HTTP basic access authentication
(username/password). The SASL-based authentication mechanisms are no longer supported (e.g. Kerberos). Your graph
system provider may choose to implement other authentication mechanisms over HTTP which you would have to use via a
request interceptor. Refer to your provider's documentation to determine if other authentication mechanisms are
available.
==== Transactions Disabled
IMPORTANT: 4.0.0-beta.1 Release - Transactions are currently disabled and use of `tx()` will return an error.
==== Removal of Sessions
Support for sessions has been removed. All requests are now "sessionless" with no shared state across subsequent requests.
==== Result Bulking Changes
Previous versions of the Gremlin Server would attempt to "bulk" the result if bytecode was used in the request. This
"bulking" increased performance by sending similar results once with a count of occurrences. Starting in 4.0, Gremlin
Server will bulk based on a newly introduced `bulked` field in the `RequestMessage`. It only applies to GraphBinary and
`gremlin-lang` requests and other requests won't be bulked. This can be toggled in the language variants by setting a
boolean value with `enableBulkedResult()` in the `Cluster` settings.
==== Gremlin Java Changes
Connection pooling has been updated to work with HTTP. Previously, connections could only be opened one at a time, but
this has changed and now many connections can be opened at the same time. This supports bursty workloads where many
queries may be issued within a short period of time. Connections are no longer closed based on how "busy" they are
based on the `minInProcessPerConnection` and `minSimultaneousUsagePerConnection`, rather they are closed based on an
idle timeout called `idleConnectionTimeout`. Because the number of connections can increase much faster and connections
are closed based on a timeout, the `minConnectionPoolSize` option has been removed and there may be zero connections
available if the driver has been idle for a while.
The Java driver can currently handle a response that is a maximum of 2^31-1 (`Integer.MAX_VALUE`) bytes in size.
Queries that return more data will have to be separated into multiple queries that return less data.
==== GValue
Parameterization was first added to gremlin-lang in TinkerPop 3.7.0, however was limited in that variables were
immediately resolved to literals during the compilation of a gremlin script. This direct resolution of variables limited
providers ability to detect and optimize recurring query patterns.
With this update, a new class `GValue` is introduced which wraps a parameter name and value. Steps which will benefit
from parameterization have been given overloads to accept GValues. Users can pass GValue's into their traversals to
inject parameters.
A `GValue` wraps a parameter name and value, and can be provided as input to parameterizable steps when building a
`GraphTraversal`. The following examples will produce a gremlin script of `g.V().hasLabel(label)` with a parameter map
of `["label": "person"]`:
[source,java]
----
g.V().hasLabel(GValue.of("label", "person"));
----
[source,python]
----
g.V().has_label(GValue('label', 'person'))
----
Use of `GValue` in traversals with repeated patterns may lead to improved performance in certain graphs. Consult the
documentation for your specific graph provider for recommendations on how to best utilize `GValue` in traversals.
A new `DefaultVariableResolver` has also been introduced with this change. The grammar will now resolve variables into
`GValue` which are passed to the constructed traversal, instead of directly resolving them to literals. The old variable
resolution behavior can still be obtained via the `DirectVariableResolver` if desired.
==== Gremlin Server Default Language
`gremlin-lang` is now the default language and script engine in Gremlin-Server, replacing `gremlin-groovy`. Users may
still explicitly set the `language` field of a request message to `"gremlin-groovy"` in cases where groovy scripts are
required.
=== Upgrading for Providers
==== Renaming NoneStep to DiscardStep
NoneStep, which was primarily used by `iterate()` to discard traversal results in remote contexts, has been renamed to
DiscardStep. In its place is a new list filtering NoneStep, which takes a predicate as an argument and passes lists with
no elements matching the predicate.
==== Changes to Serialization
The V4 versions of GraphBinary and GraphSON are being introduced. Support for the older versions of GraphBinary (V1)
and GraphSON (V1-3) is removed. Upon the full release of 4.0, the GLVs will only use GraphBinary, however, the Gremlin
Server will support both GraphSON and GraphBinary. The following is a list of the major changes to the GraphBinary
format:
* Removed type serializers:
** Period
** Date
** TimeStamp
** Instant
** ZonedDateTime
** OffsetTime
** LocalDateTime
** LocalDate
** LocalTime
** MonthDay
** YearMonth
** Year
** ZoneOffset
** BulkSet
** Class
** Binding
** Bytecode
** Barrier
** Cardinality
** Column
** Operator
** Order
** Pick
** Pop
** Scope
** DT
** Lambda
** P
** Traverser
** TextP
** TraversalStrategy
** Metrics
** TraversalMetrics
** InetAddress
* OffsetDatetime has been renamed to Datetime. This type maps to `OffsetDateTime` in Java and a `datetime` in Python.
* Byte is redefined from being unsigned byte to a signed byte.
* List has a `0x02` value_flag used to denote bulking.
* Map has a `0x02` value_flag used to denote ordering.
* `Element` (Vertex, Edge, VertexProperty) labels have been changed from `String` to `List` of `String`.
* `Element` (Vertex, Edge, VertexProperty) properties are no longer null and are `List` of `Property`.
* Custom is replaced with Provider Defined Types
One of the biggest differences is in datetime support. Previously, in the Java implementation, `java.util.Date`,
`java.sql.Timestamp` and most types from the `java.time` package had serializers. This isn't the case in GraphSON 4
as only `java.time.OffsetDateTime` is supported. Java provides methods to convert amongst these classes so they should
be used to convert your data to and from `java.time.OffsetDateTime`.
The `GraphSONSerializerProvider` is not used in GraphSON 4. The `GraphSONSerializerProvider` uses the
`ToStringSerializer` for any unknown type and was used in previous GraphSON versions. Because GraphSON 4 is only
intended to serialize specific types and not used as a general serializer, GraphSON 4 serializers will throw an error
when encountering unknown types.
==== Graph System Providers
===== AbstractAuthenticatorHandler Constructor
The deprecated one-arg constructor for `AbstractAuthenticationHandler` has been removed along with two-arg constructors
for the implementations. Gremlin Server formerly supported the two-arg `Authenticator`, and `Settings` constructor for
instantiating new custom instances. It now expects implementations of `AbstractAuthenticationHandler` to use a
three-arg constructor that takes `Authenticator`, `Authorizer`, and `Settings`.
===== GraphManager Changes
The `beforeQueryStart()`, `onQueryError()`, and `onQuerySuccess()` of `GraphManager` have been removed. These were
originally intended to give providers more insight into when execution occurs in the server and the outcome of that
execution. However, they depended on `RequestMessage` containing a Request ID, which isn't the case anymore.
===== Gremlin Server Updates
The `OpProcessor` extension point of the server has been removed. In order to extend the functionality of the Gremlin
Server, you have to implement your own `Channelizer`.
If you are a provider that makes use of the Gremlin Server, you may need to update server configuration YAML files that
you provide to your users. With the change from WebSockets to HTTP, some of the previous default values are invalid and
some of the fields no longer exist. See link:https://tinkerpop.apache.org/docs/4.0.0/reference/#_configuring_2[options]
for an updated list. One of the most important changes is to the `Channelizer` configuration as only the
`HttpChannelizer` remains and the rest have been removed.
===== Gremlin Server Default Language
`gremlin-lang` is now the default language and script engine in Gremlin-Server, replacing `gremlin-groovy`. Users may
still explicitly set the `language` field of a request message to `"gremlin-groovy"` in cases where groovy scripts are
required.
==== Graph Driver Providers
===== Application Layer Protocol Support
HTTP/1.1 is now the only supported application-layer protocol and WebSockets support is dropped. Please follow the
instructions in the
link:https://tinkerpop.apache.org/docs/4.0.0/dev/provider/#_graph_driver_provider_requirements[provider documentation]
for more detailed information. The subprotocol remains fairly similar but has been adjusted to work better with HTTP.
Also, the move to HTTP means that SASL has been removed as an authentication mechanism and only HTTP basic remains.
===== Request Interceptor
It is strongly recommended that every graph driver provider give a way for users to intercept and modify the HTTP
request before it is sent off to the server. This capability is needed in cases where the graph system provider has
additional functionality that can be enabled by modifying the HTTP request.