blob: 1604d8a34781df64746852b8a5d8a12e8d1f185f [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[gremlin-applications]]
= Gremlin Applications
Gremlin applications represent tools that are built on top of the core APIs to help expose common functionality to
users when working with graphs. There are two key applications:
. Gremlin Console - A link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL] environment for
interactive development and analysis
. Gremlin Server - A server that hosts a Gremlin Traversal Machine thus enabling remote Gremlin execution
image:gremlin-lab-coat.png[width=310,float=left] Gremlin is designed to be extensible, making it possible for users
and graph system/language providers to customize it to their needs. Such extensibility is also found in the Gremlin
Console and Server, where a universal plugin system makes it possible to extend their capabilities. One of the
important aspects of the plugin system is the ability to help the user install the plugins through the command line
thus automating the process of gathering dependencies and other error prone activities.
The process of plugin installation is handled by link:http://www.groovy-lang.org/Grape[Grape], which helps resolve
dependencies into the classpath. It is therefore important to ensure that Grape is properly configured in order to
use the automated capabilities of plugin installation. Grape is configured by `~/.groovy/grapeConfig.xml` and
generally speaking, if that file is not present, the default settings will suffice. However, they will not suffice
if a required dependency is not in one of the default configured repositories. Please see the
link:http://www.groovy-lang.org/Grape#Grape-CustomizeIvysettings[Customize Ivy settings] section of the Grape documentation for more details on
the defaults. For current TinkerPop plugins and dependencies the following configuration which is also the default
for Ivy should be acceptable:
[source,xml]
----
<ivysettings>
<settings defaultResolver="downloadGrapes"/>
<resolvers>
<chain name="downloadGrapes" returnFirst="true">
<filesystem name="cachedGrapes">
<ivy pattern="${user.home}/.groovy/grapes/[organisation]/[module]/ivy-[revision].xml"/>
<artifact pattern="${user.home}/.groovy/grapes/[organisation]/[module]/[type]s/[artifact]-[revision](-[classifier]).[ext]"/>
</filesystem>
<ibiblio name="localm2" root="${user.home.url}/.m2/repository/" checkmodified="true" changingPattern=".*" changingMatcher="regexp" m2compatible="true"/>
<ibiblio name="jcenter" root="https://jcenter.bintray.com/" m2compatible="true"/>
<ibiblio name="ibiblio" m2compatible="true"/>
</chain>
</resolvers>
</ivysettings>
----
TIP: Please see the link:https://tinkerpop.apache.org/docs/x.y.z/dev/developer/#groovy-environment[Developer Documentation]
for additional configuration options when working with "snapshot" releases.
[[gremlin-console]]
== Gremlin Console
image:gremlin-console.png[width=325,float=right] The Gremlin Console is an interactive terminal or
link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL] that can be used to traverse graphs
and interact with the data that they contain. It represents the most common method for performing ad hoc graph
analysis, small to medium sized data loading projects and other exploratory functions. The Gremlin Console is
highly extensible, featuring a rich plugin system that allows new tools, commands,
link:http://en.wikipedia.org/wiki/Domain-specific_language[DSLs], etc. to be exposed to users.
To start the Gremlin Console, run `gremlin.sh` or `gremlin.bat`:
[source,text]
----
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
plugin loaded: tinkerpop.server
plugin loaded: tinkerpop.utilities
plugin loaded: tinkerpop.tinkergraph
gremlin>
----
NOTE: If the above plugins are not loaded then they will need to be enabled or else certain examples will not work.
If using the standard Gremlin Console distribution, then the plugins should be enabled by default. See below for
more information on the `:plugin use` command to manually enable plugins. These plugins, with the exception of
`tinkerpop.tinkergraph`, cannot be removed from the Console as they are a part of the `gremlin-console.jar` itself.
These plugins can only be deactivated.
The Gremlin Console is loaded and ready for commands. Recall that the console hosts the Gremlin-Groovy language.
Please review link:http://www.groovy-lang.org/[Groovy] for help on Groovy-related constructs. In short, Groovy is a
superset of Java. What works in Java, works in Groovy. However, Groovy provides many shorthands to make it easier
to interact with the Java API. Moreover, Gremlin provides many neat shorthands to make it easier to express paths
through a property graph.
[gremlin-groovy]
----
i = 'goodbye'
j = 'self'
i + " " + j
"${i} ${j}"
----
The "toy" graph provides a way to get started with Gremlin quickly.
[gremlin-groovy]
----
g = traversal().with(TinkerFactory.createModern())
g.V()
g.V().values('name')
g.V().has('name','marko').out('knows').values('name')
----
TIP: When using Gremlin-Groovy in a Groovy class file, add `static { GremlinLoader.load() }` to the head of the file.
=== Console Commands
In addition to the standard commands of the link:http://groovy-lang.org/groovysh.html[Groovy Shell], Gremlin adds
some other useful operations. The following table outlines the most commonly used commands:
[width="100%",cols="3,^2,10",options="header"]
|=========================================================
|Command |Alias |Description
|:help |:? |Displays list of commands and descriptions. When followed by a command name, it will display more specific help on that particular item.
|:exit |:x |Ends the Console session.
|import |:i |Import a class into the Console session.
|:cls |:C |Clear the screen of the Console.
|:clear |:c |Sometimes the Console can get into a state where the command buffer no longer understands input (e.g. a misplaced `(` or `}`). Use this command to clear that buffer.
|:load |:l |Load a file or URL into the command buffer for execution.
|:install |:+ |Imports a Maven library and its dependencies into the Console.
|:uninstall |:- |Removes a Maven library and its dependencies. A restart of the console is required for removal to fully take effect.
|:plugin |:pin |Plugin management functions to list, activate and deactivate available plugins.
|=========================================================
NOTE: The Console also exposes the `:record` command which is inherited from the Groovy Shell.
=== Interrupting Evaluations
If there is some input that is taking too long to evaluate or to iterate through, use `ctrl+c` to attempt to interrupt
that process. It is an "attempt" in the sense that the long running process is only informed of the interruption by
the user and must respond to it (as with any call to `interrupt()` on a `Thread`). A `Traversal` will typically respond
to such requests as do most commands.
[source,text]
----
gremlin> java.util.stream.IntStream.range(0, 1000).iterator()
==>0
==>1
==>2
==>3
==>4
...
==>348
==>349
==>350
==>351
==>352
Execution interrupted by ctrl+c
gremlin>
----
[[console-preferences]]
=== Console Preferences
Preferences are set with `:set name value`. Values can contain spaces when quoted. All preferences are reset by `:purge preferences`
[width="100%",cols="3,^2,10",options="header"]
|=========================================================
|Preference |Type |Description
|max-iteration | int | Controls the maximum number of results that the Console will display. Default: 100 results.
|colors | bool | Enable ANSI color rendering. Default: true
|warnings | bool | Enable display of remote execution warnings. Default: true
|gremlin.color | colors | Color of the ASCII art gremlin on startup.
|info.color | colors | Color of "info" type messages.
|error.color | colors | Color of "error" type messages.
|vertex.color | colors | Color of vertices results.
|edge.color | colors | Color of edges in results.
|string.color | colors | Colors of strings in results.
|number.color | colors | Color of numbers in results.
|T.color | colors| Color of Tokens in results.
|input.prompt.color | colors | Color of the input prompt.
|result.prompt.color | colors | Color of the result prompt.
|input.prompt | string | Text of the input prompt.
|result.prompt | string | Text of the result prompt.
|result.indicator.null | string | Text of the void/no results indicator - setting to empty string (i.e. "" at the
command line) will print no result line in these cases.
|=========================================================
Colors can contain a comma-separated combination of 1 each of foreground, background, and attribute.
[width="100%",cols="3,^2,10",options="header"]
|=========================================================
|Foreground |Background |Attributes
|black|bg_black|bold
|blue|bg_blue|faint
|cyan|bg_cyan|underline
|green|bg_green|
|magenta|bg_magenta|
|red|bg_red|
|white|bg_white|
|yellow|bg_yellow|
|=========================================================
Example:
[source,text]
----
:set gremlin.color bg_black,green,bold
----
=== Dependencies and Plugin Usage
The Gremlin Console can dynamically load external code libraries and make them available to the user. Furthermore,
those dependencies may contain Gremlin plugins which can expand the language, provide useful functions, etc. These
important console features are managed by the `:install` and `:plugin` commands.
The following Gremlin Console session demonstrates the basics of these features:
[source,groovy]
----
gremlin> :plugin list <1>
==>tinkerpop.server[active]
==>tinkerpop.utilities[active]
==>tinkerpop.sugar
==>tinkerpop.tinkergraph[active]
gremlin> :plugin use tinkerpop.sugar <2>
==>tinkerpop.sugar activated
gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z <3>
==>loaded: [org.apache.tinkerpop, neo4j-gremlin, x.y.z]
gremlin> :plugin list <4>
==>tinkerpop.server[active]
==>tinkerpop.utilities[active]
==>tinkerpop.sugar
==>tinkerpop.tinkergraph[active]
==>tinkerpop.neo4j
gremlin> :plugin use tinkerpop.neo4j <5>
==>tinkerpop.neo4j activated
gremlin> :plugin list <6>
==>tinkerpop.server[active]
==>tinkerpop.sugar[active]
==>tinkerpop.utilities[active]
==>tinkerpop.neo4j[active]
==>tinkerpop.tinkergraph[active]
----
<1> Show a list of "available" plugins. The list of "available" plugins is determined by the classes available on
the Console classpath. Plugins need to be "active" for their features to be available.
<2> To make a plugin "active" execute the `:plugin use` command and specify the name of the plugin to enable.
<3> Sometimes there are external dependencies that would be useful within the Console. To bring those in, execute
`:install` and specify the Maven coordinates for the dependency.
<4> Note that there is a "tinkerpop.neo4j" plugin available, but it is not yet "active".
<5> Again, to use the "tinkerpop.neo4j" plugin, it must be made "active" with `:plugin use`.
<6> Now when the plugin list is displayed, the "tinkerpop.neo4j" plugin is displayed as "active".
WARNING: Plugins must be compatible with the version of the Gremlin Console (or Gremlin Server) being used. Attempts
to use incompatible versions cannot be guaranteed to work. Moreover, be prepared for dependency conflicts in
third-party plugins that may only be resolved via manual jar removal from the `ext/{plugin}` directory.
TIP: It is possible to manage plugin activation and deactivation by manually editing the `ext/plugins.txt` file which
contains the class names of the "active" plugins. It is also possible to clear dependencies added by `:install` by
deleting them from the `ext` directory.
[[execution-mode]]
=== Execution Mode
For automated tasks and batch executions of Gremlin, it can be useful to execute Gremlin scripts in "execution" mode
from the command line. Consider the following file named `gremlin.groovy`:
[source,groovy]
----
graph = TinkerFactory.createModern()
g = traversal().with(graph)
g.V().each { println it }
----
This script creates the toy graph and then iterates through all its vertices printing each to the system out. To
execute this script from the command line, `gremlin.sh` has the `-e` option used as follows:
[source,bash]
----
$ bin/gremlin.sh -e gremlin.groovy
v[1]
v[2]
v[3]
v[4]
v[5]
v[6]
----
It is also possible to pass arguments to scripts. Any parameters following the file name specification are treated
as arguments to the script. They are collected into a list and passed in as a variable called "args". The following
Gremlin script is exactly like the previous one, but it makes use of the "args" option to filter the vertices printed
to system out:
[source,groovy]
----
graph = TinkerFactory.createModern()
g = traversal().with(graph)
g.V().has('name',args[0]).each { println it }
----
When executed from the command line a parameter can be supplied:
[source,bash]
----
$ bin/gremlin.sh -e gremlin.groovy marko
v[1]
$ bin/gremlin.sh -e gremlin.groovy vadas
v[2]
----
It is also possible to pass multiple scripts by specifying multiple `-e` options. The scripts will execute in the order
in which they are specified. Note that only the arguments from the last script executed will be preserved in the console.
Finally, if the arguments conflict with the reserved flags to which `gremlin.sh` responds, double quotes can be used to
wrap all the arguments to the option:
[source,bash]
----
$ bin/gremlin.sh -e "gremlin.groovy -e -i --color"
----
[[interactive-mode]]
=== Interactive Mode
The Gremlin Console can be started in an "interactive" mode. Interactive mode is like <<execution-mode, execution mode>>
but the console will not exit at the completion of the script, even if the script completes unsuccessfully. In such a
case, it will simply stop processing on the line of the script that failed. In this way, the state of the console
is such that a user could examine the state of things up to the point of failure, which might make the script easier to
debug.
In addition to debugging, interactive mode is a helpful way for users to initialize their console environment to
avoid otherwise repetitive typing. For example, a user who spends a lot of time working with the TinkerPop "modern"
graph might create a script called `init.groovy` like:
[source,groovy]
----
graph = TinkerFactory.createModern()
g = traversal().with(graph)
----
and then start Gremlin Console as follows:
[source,text]
----
$ bin/gremlin.sh -i init.groovy
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> g.V()
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
----
Note that the user can now reference `g` (and `graph` for that matter) at startup without having to directly type that
variable initialization code into the console.
As in execution mode, it is also possible to pass multiple scripts by specifying multiple `-i` options. See the
<<execution-mode, Execution Mode Section>> for more information on the specifics of that capability.
[[gremlin-console-docker-image]]
=== Docker Image
The Gremlin Console can also be started as a link:https://hub.docker.com/r/tinkerpop/gremlin-console/[Docker image]:
[source,text]
----
$ docker run -it tinkerpop/gremlin-console:x.y.z
Feb 25, 2018 3:47:24 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
----
The Docker image offers the same options as the standalone Console. It can be used for example to execute scripts:
[source,bash]
----
$ docker run -it tinkerpop/gremlin-console:x.y.z -e gremlin.groovy
v[1]
v[2]
v[3]
v[4]
v[5]
v[6]
----
[[gremlin-server]]
== Gremlin Server
image:gremlin-server.png[width=400,float=right] Gremlin Server provides a way to remotely execute Gremlin against one
or more `Graph` instances hosted within it. The benefits of using Gremlin Server include:
* Allows any Gremlin Structure-enabled graph (i.e. implements the `Graph` API on the JVM) to exist as a standalone
server, which in turn enables the ability for multiple clients to communicate with the same graph database.
* Enables execution of ad hoc queries through remotely submitted Gremlin.
* Provides a method for non-JVM languages which may not have a Gremlin Traversal Machine (e.g. Python, Javascript, Go, etc.)
to communicate with the TinkerPop stack on the JVM.
* Exposes numerous methods for extension and customization to include serialization options, remote commands, etc.
NOTE: Gremlin Server is the replacement for link:https://github.com/tinkerpop/rexster[Rexster].
NOTE: Please see the link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/[Provider Documentation] for information
on how to develop a driver for Gremlin Server.
By default, communication with Gremlin Server occurs over HTTP/1.1. The TinkerPop HTTP API is described in the
link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_http_api[HTTP provider documentation].
WARNING: Gremlin Server allows for the execution of remotely submitted "scripts" (i.e. arbitrary code sent by a client
to the server). Developers should consider the security implications involved in running Gremlin Server without the
appropriate precautions. Please review the <<security,Security Section>> and more specifically, the
<<script-execution,Script Execution Section>> for more information.
[[starting-gremlin-server]]
=== Starting Gremlin Server
Gremlin Server comes packaged with a script called `bin/gremlin-server.sh` to get it started (use `gremlin-server.bat`
on Windows):
[source,text]
----
$ bin/gremlin-server.sh conf/gremlin-server-modern.yaml
[INFO] GremlinServer
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-modern.yaml
[INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
[INFO] DefaultGraphManager - Graph [graph] was successfully configured via [conf/tinkergraph-empty.properties].
[INFO] ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
[INFO] ServerGremlinExecutor - Initialized gremlin-groovy GremlinScriptEngine and registered metrics
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] GremlinServer - Executing start up LifeCycleHook
[INFO] Logger$info - Loading 'modern' graph data.
[INFO] GremlinServer - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[INFO] GremlinServer - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v4.0+json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV4
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV4
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v4.0 with org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV4
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
Gremlin Server is configured by the provided link:http://www.yaml.org/[YAML] file `conf/gremlin-server-modern.yaml`.
That file tells Gremlin Server many things such as:
* The host and port to serve on
* Thread pool sizes
* Where to report metrics gathered by the server
* The serializers to make available
* The Gremlin `ScriptEngine` instances to expose and external dependencies to inject into them
* `Graph` instances to expose
The log messages that printed above show a number of things, but most importantly, there is a `Graph` instance named
`graph` that is exposed in Gremlin Server. This graph is an in-memory TinkerGraph and was empty at the start of the
server. An initialization script at `scripts/generate-modern.groovy` was executed during startup. Its contents are
as follows:
[source,groovy]
----
include::{basedir}/gremlin-server/scripts/generate-modern.groovy[]
----
The script above initializes a `Map` and assigns two key/values to it. The first, assigned to "hook", defines a
`LifeCycleHook` for Gremlin Server. The "hook" provides a way to tie script code into the Gremlin Server startup and
shutdown sequences. The `LifeCycleHook` has two methods that can be implemented: `onStartUp` and `onShutDown`.
These events are called once at Gremlin Server start and once at Gremlin Server stop. This is an important point
because code outside of the "hook" is executed for each `ScriptEngine` creation (multiple may be created when
"sessions" are enabled) and therefore the `LifeCycleHook` provides a way to ensure that a script is only executed a
single time. In this case, the startup hook loads the "modern" graph into the empty TinkerGraph instance, preparing
it for use. The second key/value pair assigned to the `Map`, named "g", defines a `TraversalSource` from the `Graph`
bound to the "graph" variable in the YAML configuration file. This variable `g`, as well as any other variable
assigned to the `Map`, will be made available as variables for future remote script executions. In more general
terms, any key/value pairs assigned to a `Map` returned from the initialization script will become variables that
are global to all requests. In addition, any functions that are defined will be cached for future use.
WARNING: Transactions on graphs in initialization scripts are not closed automatically after the script finishes
executing. It is up to the script to properly commit or rollback transactions in the script itself.
[[connecting-via-drivers]]
=== Connecting via Drivers
image:rexster-connect.png[width=180,float=right] TinkerPop offers client-side drivers for the Gremlin Server that
communicates via the TinkerPop HTTP API in a variety of languages:
* <<gremlin-dotnet,C#>>
* <<gremlin-go,Go>>
* <<gremlin-java,Java>>
* <<gremlin-javascript,Javascript>>
* <<gremlin-python,Python>>
These drivers provide methods to send Gremlin based requests and get back traversal results as a response. The requests
are script-based or traversal-based. Traversals are internally translated into a `gremlin-lang` compatible script.
Traversals are still the recommended way to send Gremlin as it has better integration with your language of choice. The
difference between sending scripts and sending traversals are demonstrated below in some basic examples:
[source,java,tab]
----
// script
Cluster cluster = Cluster.open();
Client client = cluster.connect();
Map<String,Object> params = new HashMap<>();
params.put("name","marko");
List<Result> list = client.submit("g.V().has('person','name',name).out('knows')", params).all().get();
// traversal
GraphTraversalSource g = traversal().with(DriverRemoteConnection.using("localhost",8182,"g"));
List<Vertex> list = g.V().has("person","name","marko").out("knows").toList();
----
[source,groovy]
----
// script
def cluster = Cluster.open()
def client = cluster.connect()
def list = client.submit("g.V().has('person','name',name).out('knows')", [name: "marko"]).all().get();
// traversal
def g = traversal().with(DriverRemoteConnection.using("localhost",8182,"g"))
def list = g.V().has('person','name','marko').out('knows').toList()
----
[source,csharp]
----
include::../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinApplicationsTests.cs[tags=connectingViaDrivers]
----
[source,javascript]
----
// script
const client = new Client('ws://localhost:45940/gremlin', { traversalSource: "g" });
const conn = client.open();
const list = conn.submit("g.V().has('person','name',name).out('knows')",{name: 'marko'}).then(function (response) { ... });
// traversal
const g = gtraversal().with(new DriverRemoteConnection('ws://localhost:8182/gremlin'));
const list = g.V().has("person","name","marko").out("knows").toList();
----
[source,python]
----
# script
client = Client('ws://localhost:8182/gremlin', 'g')
list = client.submit("g.V().has('person','name',name).out('knows')",{'name': 'marko'}).all()
# traversal
g = traversal().with(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
list = g.V().has("person","name","marko").out("knows").toList()
----
[source,go]
----
// script
client, err := NewClient("ws://localhost:8182/gremlin")
resultSet, err := client.SubmitWithOptions("g.V().has('person','name',name).out('knows')",
new(RequestOptionsBuilder).AddBinding("name", "marko").Create())
result, err := resultSet.All()
// traversal
remote, err := NewDriverRemoteConnection("ws://localhost:8182/gremlin")
g := Traversal_().With(remote)
list, err := g.V().Has("person", "name", "marko").Out("knows").ToList()
----
The advantage of traversals over scripts should be apparent from the above examples. Scripts are just strings that are
embedded in code (in the above examples, the strings are Groovy-based) whereas traversal based requests are themselves
code written in the native language of use. Obviously, the advantage of the Gremlin being actual code is that there
are checks (e.g. compile-time, auto-complete and other IDE support, language level checks, etc.) that help validate the
Gremlin during the development process.
When sending requests to the server, it is important to remember that the results of the request be something that is
serializable by the server and driver. If the server cannot serialize the result or if what the server serializes is not
recognized by the serializer used by the driver, there will be an error. The most common cases for seeing serialization
problems include:
* Connecting to a graph that requires custom serializers, such as the ones JanusGraph provides for its relation
identifier. Always be take time to get to know the graph database that's been chosen to determine if there are customer
serializers that need to be registered to the server or the driver.
* Driver versions that don't match server versions can sometimes create scenarios where serialization failures will
present themselves. TinkerPop typically does the most testing on drivers and servers of the same version and therefore
has the greatest confidence where those versions match. When possible, try to align the driver version with the server
version.
* Groovy-scripts can return anything since it has full access to the JVM. While a simple non-Gremlin traversal script
like "1+1" simply returns a number which is perfectly serializable, it is just as easy to send a script like
"graph.openManagement()" which is a JanusGraph API and returns an object that is not, returning an error.
TinkerPop makes an effort to ensure a high-level of consistency among the drivers and their features, but there are
differences in capabilities and features as they are each developed independently. The Java driver was the first and
is therefore the most advanced. Please see the related documentation for the driver of interest for more information
and details in the <<gremlin-drivers-variants,Gremlin Drivers and Variants>> Section of this documentation.
[[connecting-via-console]]
=== Connecting via Console
Gremlin Console can be used to send remote traversals to a running Gremlin Server. Start Gremlin Console as
follows:
[source,text]
----
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
gremlin>
----
Remote traversals can then be constructed as follows.
[gremlin-groovy]
----
g = traversal().with('conf/remote-graph.properties')
g.V().values('name')
g.V().has('name','marko').out('created').values('name')
g.E().label().groupCount()
g.close()
----
One can also construct remote traversal source via DRC, as well as via client to submit scripts.
[source, text]
----
// with DRC
g = traversal().with(DriverRemoteConnection.using("localhost",8182,"g"))
g.V().count()
// with client
client = Cluster.build("localhost").port(8182).create().connect()
client.submit("g.V().count()")
----
[[connecting-via-http]]
=== Connecting via HTTP
image:gremlin-rexster.png[width=225,float=left] The HTTP endpoint provides for a communication protocol familiar to
most developers, with a wide support of programming languages, tools and libraries for accessing it. As a result, HTTP
provides a fast way to get started with Gremlin Server.
Gremlin Server implements the link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_http_api[TinkerPop HTTP API].
It provides a single endpoint which allows for the submission of a Gremlin script as a request. For each request, it
returns a response containing the serialized results of that script.
The `HttpChannelizer` is already configured in the `gremlin-server-modern.yaml` file that is packaged with the Gremlin
Server distribution. To utilize it, start Gremlin Server as follows:
[source,text]
bin/gremlin-server.sh conf/gremlin-server-modern.yaml
Once the server has started, issue a request. Here's an example with link:http://curl.haxx.se/[cURL]:
[source,text]
curl -X POST -d "{\"gremlin\":\"100-1\"}" "http://localhost:8182"
returns:
[source,js]
{
"result": {
"data": {
"@type": "g:List",
"@value": [
{
"@type": "g:Int32",
"@value": 99
}
]
}
},
"status": {
"code": 200
}
}
`POST` is the only supported method for this endpoint. This means that `GET` with query parameters is not supported.
It is also preferred that Gremlin scripts be parameterized when possible via `bindings`:
[source,text]
curl -X POST -d "{\"gremlin\":\"100-x\", \"bindings\":{\"x\":1}}" "http://localhost:8182"
The `bindings` argument is a `Map` of variables where the keys become available as variables in the Gremlin script.
Note that parameterization of requests is critical to performance, as repeated script compilation can be avoided on
each request.
NOTE: It is possible to pass bindings via `GET` based requests. Query string arguments prefixed with "bindings." will
be treated as parameters, where that prefix will be removed and the value following the period will become the
parameter name. In other words, `bindings.x` will create a parameter named "x" that can be referenced in the submitted
Gremlin script. The caveat is that these arguments will always be treated as `String` values. To ensure that data
types are preserved or to pass complex objects such as lists or maps, use `POST` which will at least support the
allowed JSON data types.
NOTE: The Gremlin Server doesn't support link:https://en.wikipedia.org/wiki/HTTP_pipelining[HTTP pipelining]. Attempts
to use this feature will cause the server to throw an error and may lead to results being sent out-of-order.
Passing the `Accept` header with a valid MIME type will trigger the server to return the result in a particular format.
Note that in addition to the formats available given the server's `serializers` configuration, there is also a basic
`text/plain` format which produces a text representation of results similar to the Gremlin Console:
[source,text]
----
$ curl -H "Accept:text/plain" -X POST -d "{\"gremlin\":\"g.V()\"}" "http://localhost:8182"
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
----
Finally, as Gremlin Server can host multiple `ScriptEngine` instances (e.g. `gremlin-groovy`, `nashorn`), it is
possible to define the language to utilize to process the request:
[source,text]
curl -X POST -d "{\"gremlin\":\"100-x\", \"language\":\"gremlin-groovy\", \"bindings\":{\"x\":1}}" "http://localhost:8182"
By default this value is set to `gremlin-groovy`. If using a `GET` operation, this value can be set as a query
string argument with by setting the `language` key.
[[server-configuring]]
=== Configuring
The `gremlin-server.sh` file serves multiple purposes. It can be used to "install" dependencies to the Gremlin
Server path. For example, to be able to configure and use other `Graph` implementations, the dependencies must be
made available to Gremlin Server. To do this, use the `install` switch and supply the Maven coordinates for the
dependency to "install". For example, to use Neo4j in Gremlin Server:
[source,text]
----
bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin x.y.z
----
This command will "grab" the appropriate dependencies and copy them to the `ext` directory of Gremlin Server, which
will then allow them to be "used" the next time the server is started. To uninstall dependencies, simply delete them
from the `ext` directory.
`bin/gremlin-server.sh` has several other options.
[width="100%",cols="3,10",options="header"]
|=========================================================
|Parameter|Description
|start|Start the server in the background.
|stop|Shutdown the server.
|restart|Shutdown a running server then start it again.
|status|Check if the server is running.
|console|Start the server in the foreground. Use ^C to kill it.
|install <group> <artifact> <version>| Install dependencies into the server. "-i" exists for backwards compatibility but is deprecated.
|<conf file>| Start the server in the foreground using the provided YAML config file.
|=========================================================
The `bin/gremlin-server.sh` script can be customized with environment variables in `bin/gremlin-server.conf`.
[width="100%",cols="3,10",options="header"]
|=========================================================
|Variable |Description
|DEBUG| Enable debugging of the startup script
|GREMLIN_HOME| The Gremlin Server install directory. Use this if the script has trouble finding itself.
|GREMLIN_YAML| The default server YAML file (conf/gremlin-server.yaml)
|LOG_DIR| Location of gremlin.log where stdout/stderr are captured (logs/)
|PID_DIR| Location of gremlin.pid
|RUNAS| User to run the server as
|JAVA_HOME| Java install location. Will use $JAVA_HOME/bin/java
|JAVA_OPTIONS| Options passed to the JVM
|=========================================================
As mentioned earlier, Gremlin Server is configured though a YAML file. By default, Gremlin Server will look for a
file called `conf/gremlin-server.yaml` to configure itself on startup. To override this default, set GREMLIN_YAML in
`bin/gremlin-server.conf` or supply the file to use to `bin/gremlin-server.sh` as in:
[source,text]
----
bin/gremlin-server.sh conf/gremlin-server-min.yaml
----
WARNING: On Windows, gremlin-server.bat will always start in the foreground. When no parameter is provided, it will
start with the default `conf/gremlin-server.yaml` file.
The following table describes the various YAML configuration options that Gremlin Server expects:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|authentication.authenticator |The fully qualified classname of an `Authenticator` implementation to use. If this setting is not present, then authentication is effectively disabled. |`AllowAllAuthenticator`
|authentication.authenticationHandler | The fully qualified classname of an `AbstractAuthenticationHandler` implementation to use. If this setting is not present, but the `authentication.authenticator` is, it will use that authenticator with the default `AbstractAuthenticationHandler` implementation for the specified `Channelizer` |_none_
|authentication.config |A `Map` of configuration settings to be passed to the `Authenticator` when it is constructed. The settings available are dependent on the implementation. |_none_
|authorization.authorizer |The fully qualified classname of an `Authorizer` implementation to use. |_none_
|authorization.config |A `Map` of configuration settings to be passed to the `Authorizer` when it is constructed. The settings available are dependent on the implementation. |_none_
|channelizer |The fully qualified classname of the `Channelizer` implementation to use. A `Channelizer` is a "channel initializer" which Gremlin Server uses to define the type of processing pipeline to use. By allowing different `Channelizer` implementations, Gremlin Server can support different communication protocols (e.g. HTTP). |`HttpChannelizer`
|enableAuditLog |The `AuthenticationHandler`, `AuthorizationHandler` and processors can issue audit logging messages with the authenticated user, remote socket address and requests with a gremlin query. For privacy reasons, the default value of this setting is false. The audit logging messages are logged at the INFO level via the `audit.org.apache.tinkerpop.gremlin.server` logger, which can be configured using the `logback.xml` file. |_false_
|graphManager |The fully qualified classname of the `GraphManager` implementation to use. A `GraphManager` is a class that adheres to the TinkerPop `GraphManager` interface, allowing custom implementations for storing and managing graph references, as well as defining custom methods to open and close graphs instantiations. To prevent Gremlin Server from starting when all graphs fails, the `CheckedGraphManager` can be used.|`DefaultGraphManager`
|graphs |A `Map` of `Graph` configuration files where the key of the `Map` becomes the name to which the `Graph` will be bound and the value is the file name of a `Graph` configuration file. |_none_
|gremlinPool |The number of "Gremlin" threads available to execute actual scripts in a `ScriptEngine`. This pool represents the workers available to handle blocking operations in Gremlin Server. When set to `0`, Gremlin Server will use the value provided by `Runtime.availableProcessors()`. |0
|host |The name of the host to bind the server to. |localhost
|idleConnectionTimeout |Time in milliseconds that the server will allow a channel to not receive requests from a client before it automatically closes. If enabled, the value provided should typically exceed the amount of time given to `keepAliveInterval`. Note that while this value is to be provided as milliseconds it will resolve to second precision. Set this value to `0` to disable this feature. |0
|keepAliveInterval |Time in milliseconds that the server will allow a channel to not send responses to a client before it sends a "ping" to see if it is still present. If it is present, the client should respond with a "pong" which will thus reset the `idleConnectionTimeout` and keep the channel open. If enabled, this number should be smaller than the value provided to the `idleConnectionTimeout`. Note that while this value is to be provided as milliseconds it will resolve to second precision. Set this value to `0` to disable this feature. |0
|maxAccumulationBufferComponents |Maximum number of request components that can be aggregated for a message. |1024
|maxChunkSize |The maximum length of the content or each chunk. If the content length exceeds this value, the transfer encoding of the decoded request will be converted to 'chunked' and the content will be split into multiple `HttpContent` objects. If the transfer encoding of the HTTP request is 'chunked' already, each chunk will be split into smaller chunks if the length of the chunk exceeds this value. |8192
|maxRequestContentLength |The maximum length of the aggregated content for a request message. Works in concert with `maxChunkSize` where chunked requests are accumulated back into a single message. A request exceeding this size will return a `413 - Request Entity Too Large` status code. |10485760
|maxHeaderSize |The maximum length of all headers. |8192
|maxInitialLineLength |The maximum length of the initial line (e.g. "GET / HTTP/1.0") processed in a request, which essentially controls the maximum length of the submitted URI. |4096
|maxParameters |The maximum number of parameters that can be passed on a request. Larger numbers may impact performance for scripts. This configuration only applies to the `HttpChannelizer`. |16
|maxWorkQueueSize |The maximum size the general processing queue can grow before the `gremlinPool` starts to reject requests. |8192
|metrics.consoleReporter.enabled |Turns on console reporting of metrics. |false
|metrics.consoleReporter.interval |Time in milliseconds between reports of metrics to console. |180000
|metrics.csvReporter.enabled |Turns on CSV reporting of metrics. |false
|metrics.csvReporter.fileName |The file to write metrics to. |_none_
|metrics.csvReporter.interval |Time in milliseconds between reports of metrics to file. |180000
|metrics.gangliaReporter.addressingMode |Set to `MULTICAST` or `UNICAST`. |_none_
|metrics.gangliaReporter.enabled |Turns on Ganglia reporting of metrics. Additional link:https://tinkerpop.apache.org/docs/x.y.z/reference/#metrics[setup] is required. |false
|metrics.gangliaReporter.host |Define the Ganglia host to report Metrics to. |localhost
|metrics.gangliaReporter.interval |Time in milliseconds between reports of metrics for Ganglia. |180000
|metrics.gangliaReporter.port |Define the Ganglia port to report Metrics to. |8649
|metrics.graphiteReporter.enabled |Turns on Graphite reporting of metrics. Additional link:https://tinkerpop.apache.org/docs/x.y.z/reference/#metrics[setup] is required. |false
|metrics.graphiteReporter.host |Define the Graphite host to report Metrics to. |localhost
|metrics.graphiteReporter.interval |Time in milliseconds between reports of metrics for Graphite. |180000
|metrics.graphiteReporter.port |Define the Graphite port to report Metrics to. |2003
|metrics.graphiteReporter.prefix |Define a "prefix" to append to metrics keys reported to Graphite. |_none_
|metrics.jmxReporter.enabled |Turns on JMX reporting of metrics. |false
|metrics.slf4jReporter.enabled |Turns on SLF4j reporting of metrics. |false
|metrics.slf4jReporter.interval |Time in milliseconds between reports of metrics to SLF4j. |180000
|port |The port to bind the server to. |8182
|resultIterationBatchSize |Defines the size in which the result of a request is "batched" back to the client. In other words, if set to `1`, then a result that had ten items in it would get each result sent back individually. If set to `2` the same ten results would come back in five batches of two each. |64
|scriptEngines |A `Map` of `ScriptEngine` implementations to expose through Gremlin Server, where the key is the name given by the `ScriptEngine` implementation. The key must match the name exactly for the `ScriptEngine` to be constructed. The value paired with this key is itself a `Map` of configuration for that `ScriptEngine`. If this value is not set, it will default to "gremlin-lang". |_gremlin-lang_
|scriptEngines.<name>.imports |A comma separated list of classes/packages to make available to the `ScriptEngine`. |_none_
|scriptEngines.<name>.staticImports |A comma separated list of "static" imports to make available to the `ScriptEngine`. |_none_
|scriptEngines.<name>.scripts |A comma separated list of script files to execute on `ScriptEngine` initialization. `Graph` and `TraversalSource` instance references produced from scripts will be stored globally in Gremlin Server, therefore it is possible to use initialization scripts to add Traversal Strategies or create entirely new `Graph` instances all together. Instantiating a `LifeCycleHook` in a script provides a way to execute scripts when Gremlin Server starts and stops.|_none_
|scriptEngines.<name>.config |A `Map` of configuration settings for the `ScriptEngine`. These settings are dependent on the `ScriptEngine` implementation being used. |_none_
|evaluationTimeout |The amount of time in milliseconds before a request evaluation and iteration of result times out. This feature can be turned off by setting the value to `0`. |30000
|serializers |A `List` of `Map` settings, where each `Map` represents a `MessageSerializer` implementation to use along with its configuration. If this value is not set, then Gremlin Server will configure with GraphSON and GraphBinary but will not register any `ioRegistries` for configured graphs. |_empty_
|serializers[X].className |The full class name of the `MessageSerializer` implementation. |_none_
|serializers[X].config |A `Map` containing `MessageSerializer` specific configurations. |_none_
|ssl.enabled |Determines if SSL is turned on or not. |false
|ssl.keyStore |The private key in JKS or PKCS#12 format. |_none_
|ssl.keyStorePassword |The password of the `keyStore` if it is password-protected. |_none_
|ssl.keyStoreType |`PKCS12` |_none_
|ssl.needClientAuth | Optional. One of NONE, REQUIRE. Enables client certificate authentication at the enforcement level specified. Can be used in combination with Authenticator. |_none_
|ssl.sslCipherSuites |The list of JSSE ciphers to support for SSL connections. If specified, only the ciphers that are listed and supported will be enabled. If not specified, the JVM default is used. |_none_
|ssl.sslEnabledProtocols |The list of SSL protocols to support for SSL connections. If specified, only the protocols that are listed and supported will be enabled. If not specified, the JVM default is used. |_none_
|ssl.trustStore |Required when needClientAuth is REQUIRE. Trusted certificates for verifying the remote endpoint's certificate. If this value is not provided and SSL is enabled, the default `TrustManager` will be used, which will have a set of common public certificates installed to it. |_none_
|ssl.trustStorePassword |The password of the `trustStore` if it is password-protected |_none_
|strictTransactionManagement |Set to `true` to require `aliases` to be submitted on every requests, where the `aliases` become the scope of transaction management. |false
|threadPoolBoss |The number of threads available to Gremlin Server for accepting connections. Should always be set to `1`. |1
|threadPoolWorker |The number of threads available to Gremlin Server for processing non-blocking reads and writes. |1
|useEpollEventLoop |Try to use epoll event loops (works only on Linux os) instead of netty NIO. |false
|writeBufferHighWaterMark | If the number of bytes in the network send buffer exceeds this value then the channel is no longer writeable, accepting no additional writes until buffer is drained and the `writeBufferLowWaterMark` is met. |65536
|writeBufferLowWaterMark | Once the number of bytes queued in the network send buffer exceeds the `writeBufferHighWaterMark`, the channel will not become writeable again until the buffer is drained and it drops below this value. |32768
|=========================================================
See the <<metrics,Metrics>> section for more information on how to configure Ganglia and Graphite.
==== Serialization
Gremlin Server can accept requests and return results using different serialization formats. Serializers implement the
`MessageSerializer` interface. In doing so, they express the list of mime types they expect to support. When
configuring multiple serializers it is possible for two or more serializers to support the same mime type. Such a
situation may be common with a generic mime type such as `application/json`. Serializers are added in the order that
they are encountered in the configuration file and the first one added for a specific mime type will not be overridden
by other serializers that also support it.
The format of the serialization is configured by the `serializers` setting described in the table above. Note that
some serializers have additional configuration options as defined by the `serializers[X].config` setting. The
`config` setting is a `Map` where the keys and values get passed to the serializer at its initialization. The
available and/or expected keys are dependent on the serializer being used. Gremlin Server comes packaged with two
different serializers: GraphSON and GraphBinary.
WARNING: Irrespective of the serialization format chosen, it is highly recommended that the serialization format is
specified explicitly. For example, prefer `application/vnd.gremlin-v3.0+json` to `application/json`. Use of the drivers
tend to take care of this issue internally, but for all other mechanisms it is best to ensure the `Accept` type is
defined this way to avoid possible breaking changes or unexpected results, as defaults may vary from server to server.
WARNING: When connecting with drivers, never try to specify a serialization format that does not have embedded types.
The drivers are designed to use that type information to properly produce results in the programming language's type
system and may not function correctly without it. Generally speaking, `GraphBinary` is always the best choice for the
drivers.
===== GraphSON
The GraphSON serializer produces human-readable output in JSON format and is a good configuration choice for those
trying to use TinkerPop from non-JVM languages. JSON obviously has wide support across virtually all major
programming languages and can be consumed by a wide variety of tools. The format itself is described in the
link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[IO Documentation]. The following table shows the
available GraphSON serializers that can be configured:
[width="100%",cols="2,2,4,4",options="header"]
|=========================================================
|Version |Embedded Types |Mime Type |Class
|4.0 |yes |`application/vnd.gremlin-v4.0+json` |`GraphSONMessageSerializerV4`
|4.0 |no |`application/vnd.gremlin-v4.0+json;types=false` |`GraphSONMessageSerializerV4`
|=========================================================
The above serializer classes can be found in the `org.apache.tinkerpop.gremlin.util.ser` package of `gremlin-util`.
NOTE: Gremlin can produce results that cannot be serialized with untyped GraphSON as the result simply cannot fit
the structure JSON inherently allows. A simple example would be `g.V().groupCount()` which returns a `Map`. A `Map`
is no problem for JSON, but the key to this `Map` is a `Vertex`, which is a complex object, and cannot be a key in
JSON which only allows `String` keys. Untyped GraphSON will simply convert the `Vertex` to a `String` for purpose of
serialization and as a result that data and type is lost. If this information is needed, switch to a typed format or
adjust the Gremlin query in some way to return it in a different form that fits JSON structure.
Configuring GraphSON in the Gremlin Server configuration looks like this:
[source,yaml]
----
- { className: org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV4 }
----
Gremlin Server is configured by default with GraphSON 4.0 as shown above. It has the following configuration option:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|ioRegistries |A list of `IoRegistry` implementations to be applied to the serializer. |_none_
|=========================================================
It is possible for two serializers to bind to the same mime type such as `application/json`. When such conflicts arise,
Gremlin Server will use the order of the serializers to determine priority such that the first serializer to bind to a
type will be used and the others ignored. The following log message will indicate how the server is ultimately configured:
NOTE: The below examples use GraphSON 1.0 and GraphSON 3.0 for example purposes, however, they are no longer available.
[source,text]
----
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV1
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3
[INFO] AbstractChannelizer - application/json already has org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV1 configured - it will not be replaced by org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3, change order of serialization configuration if this is not desired.
----
Given the above, using GraphSON 3.0 under this configuration will require that the user specific the type:
[source,text]
----
$ curl -X POST -d "{\"gremlin\":\"100-1\"}" "http://localhost:8182"
{"requestId":"f8720ad9-2c8b-4eef-babe-21792a3e3157","status":{"message":"","code":200,"attributes":{}},"result":{"data":[99],"meta":{}}}
$ curl -H "Accept:application/vnd.gremlin-v3.0+json" -X POST -d "{\"gremlin\":\"100-1\"}" "http://localhost:8182"
{"requestId":"9fdf0892-d86c-41f2-94b5-092785c473eb","status":{"message":"","code":200,"attributes":{"@type":"g:Map","@value":[]}},"result":{"data":{"@type":"g:List","@value":[{"@type":"g:Int32","@value":99}]},"meta":{"@type":"g:Map","@value":[]}}
----
[[server-graphbinary]]
===== GraphBinary
GraphBinary is a binary serialization format suitable for object trees, designed to reduce serialization overhead on
both the client and the server, as well as limiting the size of the payload that is transmitted over the wire. The
format itself is described in the link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphbinary[IO Documentation].
[source,yaml]
- { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV4 }
It has the MIME type of `application/vnd.graphbinary-v4.0` and the following configuration options:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|ioRegistries |A list of `IoRegistry` implementations to be applied to the serializer. |_none_
|builder |Name of the `TypeSerializerRegistry.Builder` instance to be used to construct the `TypeSerializerRegistry`. |_none_
|=========================================================
As described above, there are multiple ways in which to register serializers for GraphBinary-based serialization.
[[metrics]]
==== Metrics
Gremlin Server produces metrics about its operations that can yield some insight into how it is performing. These
metrics are exposed in a variety of ways:
* Directly to the console where Gremlin Server is running
* CSV file
* link:http://ganglia.info/[Ganglia]
* link:http://graphite.wikidot.com/[Graphite]
* link:http://www.slf4j.org/[SLF4j]
* link:https://en.wikipedia.org/wiki/Java_Management_Extensions[JMX]
The configuration of each of these outputs is described in the Gremlin Server <<_configuring_2, Configuring>> section.
Note that Graphite and Ganglia are not included as part of the Gremlin Server distribution and must be installed
to the server manually.
[source,text]
----
bin/gremlin-server.sh install com.codahale.metrics metrics-ganglia 3.0.2
bin/gremlin-server.sh install com.codahale.metrics metrics-graphite 3.0.2
----
WARNING: Gremlin Server is built to work with Metrics 3.0.2. Usage of other versions may lead to unexpected problems.
NOTE: Installing Ganglia will include `org.acplt:oncrpc`, which is an LGPL licensed dependency.
Regardless of the output, the metrics gathered are the same. Each metric is prefixed with
`org.apache.tinkerpop.gremlin.server.GremlinServer` and the following metrics are reported:
* `channels.paused` - The current number of open channels (HTTP and Websocket) that have their writes to buffer paused
when the `writeBufferHighWaterMark` configuration is exceeded.
* `channels.total` - The current number of open channels (HTTP and Websocket).
* `channels.write-pauses` - The total number of pauses across all channels (HTTP and Websocket) to buffer writes where
the `writeBufferHighWaterMark` configuration is exceeded, with mean rate, as well as the 1, 5, and 15-minute rates.
* `engine-name.session.session-id.*` - Metrics related to different `GremlinScriptEngine` instances configured for
session-based requests where "engine-name" will be the actual name of the engine, such as "gremlin-groovy" and
"session-id" will be the identifier for the session itself.
* `engine-name.sessionless.*` - Metrics related to different `GremlinScriptEngine` instances configured for sessionless
requests where "engine-name" will be the actual name of the engine, such as "gremlin-groovy".
* `errors` - The number of total errors, mean rate, as well as the 1, 5, and 15-minute error rates.
* `op.eval` - The number of script evaluations, mean rate, 1, 5, and 15 minute rates, minimum, maximum, median, mean,
and standard deviation evaluation times, as well as the 75th, 95th, 98th, 99th and 99.9th percentile evaluation times
(note that these time apply to both sessionless and in-session requests).
* `op.traversal` - The number of `Traversal` bytecode-based executions, mean rate, 1, 5, and 15 minute rates, minimum,
maximum, median, mean, and standard deviation evaluation times, as well as the 75th, 95th, 98th, 99th and 99.9th
percentile evaluation times.
* `sessions` - The number of sessions open at the time the metric was last measured.
* `user-agent.*` - Counts the number of connection requests from clients providing a given user agent.
NOTE: Gremlin Server has a limit of 10000 unique user agents to be tracked by metrics. If this cap is exceeded
any additional unique user agents will be counted as `user-agent.other`.
==== As A Service
Gremlin server can be configured to run as a service.
===== Init.d (SysV)
Link `bin/gremlin-server.sh` to `init.d`
Be sure to set RUNAS to the service user in `bin/gremlin-server.conf`
[source,bash]
----
# Install
ln -s /path/to/apache-tinkerpop-gremlin-server-x.y.z/bin/gremlin-server.sh /etc/init.d/gremlin-server
# Systems with chkconfig/service. E.g. Fedora, Red Hat
chkconfig --add gremlin-server
# Start
service gremlin-server start
# Or call directly
/etc/init.d/gremlin-server restart
----
===== Systemd
To install, copy the service template below to /etc/systemd/system/gremlin.service
and update the paths `/path/to/apache-tinkerpop-gremlin-server` with the actual install path of Gremlin Server.
[source,bash]
----
[Unit]
Description=Apache TinkerPop Gremlin Server daemon
Documentation=https://tinkerpop.apache.org/
After=network.target
[Service]
Type=forking
ExecStart=/path/to/apache-tinkerpop-gremlin-server/bin/gremlin-server.sh start
ExecStop=/path/to/apache-tinkerpop-gremlin-server/bin/gremlin-server.sh stop
PIDFile=/path/to/apache-tinkerpop-gremlin-server/run/gremlin.pid
[Install]
WantedBy=multi-user.target
----
Enable the service with `systemctl enable gremlin-server`
Start the service with `systemctl start gremlin-server`
[[security]]
=== Security
image:gremlin-server-secure.png[width=175,float=right] Gremlin Server provides for several features that aid in the
security of the graphs that it exposes. In particular it supports SSL for transport layer security, authentication,
authorization and protective measures against malicious script execution. Client SSL options are described in the
<<gremlin-drivers-variants, Gremlin Drivers and Variants">> sections with varying capability depending on the driver
chosen. Script execution options are covered <<script-execution, "at the end of this section">>. This section
starts with authentication.
By default, Gremlin Server supports HTTP basic authentication.
By default, Gremlin Server is configured to allow all requests to be processed (i.e. no authentication). To enable
authentication, Gremlin Server must be configured with an `Authenticator` implementation in its YAML file. Gremlin
Server comes packaged with an implementation called `SimpleAuthenticator` for plain text authentication using HTTP
BASIC.
==== Plain text authentication
The `SimpleAuthenticator` supports handling basic authentication requests from http clients. It validates
username/password pairs against a graph database, which must be provided to it as part of the configuration.
[source,yaml]
authentication: {
authenticator: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator,
config: {
credentialsDb: conf/tinkergraph-credentials.properties}}
A quick way to get started with the `SimpleAuthenticator` is to use TinkerGraph for the "credentials graph" and the
"sample" credential graph that is packaged with the server. To secure the transport for the credentials,
SSL should be enabled. For this Quick Start, a self-signed certificate will be created but this should not
be used in a production environment.
Generate the self-signed SSL certificate:
[source,text]
----
$ keytool -genkey -alias localhost -keyalg RSA -keystore server.jks
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]: localhost
What is the name of your organizational unit?
[Unknown]:
What is the name of your organization?
[Unknown]:
What is the name of your City or Locality?
[Unknown]:
What is the name of your State or Province?
[Unknown]:
What is the two-letter country code for this unit?
[Unknown]:
Is CN=localhost, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
[no]: yes
Enter key password for <localhost>
(RETURN if same as keystore password):
----
Next, uncomment the `keyStore` and `keyStorePassword` lines in `conf/gremlin-server-secure.yaml`.
[source,yaml]
----
ssl: {
enabled: true,
sslEnabledProtocols: [TLSv1.2],
keyStore: server.jks,
keyStorePassword: changeit
}
----
[source,text]
----
$ bin/gremlin-server.sh conf/gremlin-server-secure.yaml
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-secure.yaml
...
[INFO] AbstractChannelizer - SSL enabled
[INFO] SimpleAuthenticator - Initializing authentication with the org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator
[INFO] SimpleAuthenticator - CredentialGraph initialized at CredentialGraph{graph=tinkergraph[vertices:1 edges:0]}
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
When SSL is enabled on the server, it must also be enabled on the client when connecting. To connect to
Gremlin Server with the <<gremlin-java,`gremlin-driver`>>, set the `credentials`, `enableSsl`, and `trustStore`
when constructing the `Cluster`.
[source,java]
Cluster cluster = Cluster.build().credentials("stephen", "password")
.enableSsl(true).trustStore("server.jks").create();
If connecting with Gremlin Console, which utilizes `gremlin-driver` for remote script execution, use the provided
`conf/remote-secure.yaml` file when defining the remote. That file contains configuration for the username and
password as well as enablement of SSL from the client side. Be sure to configure the trustStore if using self-signed
certificates.
Similarly, Gremlin Server can be configured for REST and security. Follow the steps above for configuring the SSL
certificate.
[source,text]
----
$ bin/gremlin-server.sh conf/gremlin-server-rest-secure.yaml
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-secure.yaml
...
[INFO] AbstractChannelizer - SSL enabled
[INFO] SimpleAuthenticator - Initializing authentication with the org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator
[INFO] SimpleAuthenticator - CredentialGraph initialized at CredentialGraph{graph=tinkergraph[vertices:1 edges:0]}
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
Once the server has started, issue a request passing the credentials with an `Authentication` header, as described in link:http://tools.ietf.org/html/rfc2617#section-2[RFC2617]. Here's a HTTP Basic authentication example with cURL:
[source,text]
curl -X POST --insecure -u stephen:password -d "{\"gremlin\":\"100-1\"}" "https://localhost:8182"
[[credentials-dsl]]
==== Credentials Graph DSL
The "credentials graph", which has been mentioned in previous sections, is used by Gremlin Server to hold the list of
users who can authenticate to the server. It is possible to use virtually any `Graph` instance for this task as long
as it complies to a defined schema. The credentials graph stores users as vertices with the `label` of "user". Each
"user" vertex has two properties: `username` and `password`. Naturally, these are both `String` values. The password
must not be stored in plain text and should be hashed.
IMPORTANT: Be sure to define an index on the `username` property, as this will be used for lookups. If supported by
the `Graph`, consider specifying a unique constraint as well.
To aid with the management of a credentials graph, Gremlin Server provides a Gremlin Console plugin which can be
used to add and remove users so as to ensure that the schema is adhered to, thus ensuring compatibility with Gremlin
Server. In addition, as it is a plugin, it works naturally in the Gremlin Console as an extension of its
capabilities (though one could use it programmatically, if desired). This plugin is distributed with the Gremlin
Console so it does not have to be "installed". It does however need to be activated:
[source,groovy]
gremlin> :plugin use tinkerpop.credentials
==>tinkerpop.credentials activated
Please see the example usage as follows:
[gremlin-groovy]
----
graph = TinkerGraph.open()
graph.createIndex("username",Vertex.class)
credentials = traversal(CredentialTraversalSource.class).with(graph)
credentials.user("stephen","password")
credentials.user("daniel","better-password")
credentials.user("marko","rainbow-dash")
credentials.users("marko").elementMap()
credentials.users().count()
credentials.users("daniel").drop()
credentials.users().count()
----
NOTE: The Credentials DSL is built using TinkerPop's DSL Annotation Processor described <<gremlin-java-dsl,here>>.
IMPORTANT: In the above example, an empty in-memory TinkerGraph was used for demonstrating the API of the DSL.
Obviously, this data will not be retained and usable with Gremlin Server. It would be important to configure
TinkerGraph to persist that data or to manually persist it (e.g. write the graph data to Gryo) once changes are
complete. Alternatively, use a persistent graph to hold the credentials and configure Gremlin Server accordingly.
[[authorization]]
==== Authorization
While authentication determines which clients can connect to Gremlin Server, authorization regulates which elements
of the exposed graphs a specific user is allowed to create, read, update or delete (CRUD). Authorization in Gremlin
Server can take place at two instances. Before execution a user request can be allowed or denied based on the
presence of operations such as:
* reading from a GraphTraversalSource
* writing to a GraphTraversalSource
* presence of lambdas in bytecode
* script execution
* `VertexProgram` execution (OLAP)
* removal or modification of `TraversalStrategy` instances
During execution the applied traversal strategies influence the results and side-effects of a given query.
IMPORTANT: Authorization is a feature of Gremlin Server, but is not implemented as an element of the server protocol
and therefore Remote Graph Providers may not have this feature or may not implement it in this particular way. Please
consult the documentation of the graph you are using to determine what authorization features it supports.
===== Mechanisms
Gremlin Server supports three mechanisms to configure authorization:
. With the `ScriptFileGremlinPlugin` a groovy script is configured that instantiates the `GraphTraversalSources` that
can be accessed by client requests. Using the `withStrategies()` gremlin
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#start-steps[start step], one can apply so-called
link:https://tinkerpop.apache.org/docs/x.y.z/reference/#traversalstrategy[TraversalStrategy instances] to these
`GraphTraversalSource` instances, some of which can serve for authorization purposes (`ReadOnlyStrategy`,
`LambdaRestrictionStrategy`, `VertexProgramRestrictionStrategy`, `SubgraphStrategy`, `PartitionStrategy`,
`EdgeLabelVerificationStrategy`), provided that users are not allowed to remove or modify these `TraversalStrategy`
instances afterwards. The `ScriptFileGremlinPlugin` is found in the yaml configuration file for Gremlin Server:
+
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: {
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
----
. Administrators can configure an authorizer class, an implementation of the `Authorizer` interface. An authorizer receives
a request before it is executed and it can decide to pass or deny the request, based on the information it has available
on the requesting user or can seek externally.
. Apart from passing or denying requests, an `Authorizer` implementation can actively modify the request, in particular
add the `TraversalStrategy` instances mentioned in item 1.
IMPORTANT: This section is written with gremlin bytecode requests in mind. Realizing authorization for script requests
is hardly feasible, because such requests get full access to Gremlin Server's execution environment. Although the section
<<script-execution>> explains how the client access to this environment can be restricted, it is not possible to deny
execution of `GraphFactory.open()` or `GraphTraversalSource.getGraph()` methods without resorting to TinkerPop
implementation details (that is, internal API's that can change without notice).
IMPORTANT: 4.0.0-beta.1 Release - These authorization mechanisms don't currently apply due to the removal of Bytecode.
The three mechanisms for authorization each have their merits in terms of simplicity and flexibility. The table below
gives an overview.
[width="95%",cols="5,2,2,4",options="header"]
|=========================================================
|Type (mechanism) |GraphTraversalSources |Groups |Bytecode analysis
|Implicit (init script) | all accessible |one |`withStrategies()`
|Passive (pass/deny) | selected access |few |hybrid
|Active (inject) |selected access |many |hybrid
|=========================================================
With implicit authorization (only adding restricting `TraversalStrategy` instances in the initialization script of
Gremlin Server) all authenticated users can access all hosted `GraphTraversalSources` and all face the same
restrictions. One would need separate Gremlin Server instances for each authorization policy and apply an authenticator
that restricts access to a group of users (that is, supports in authorization).
The other extreme is the active authorization solution that injects the restricting `Strategies` into the user request,
following a policy that takes into account both the authenticated user and the original request. While this solution is
the most flexible and can support an almost unlimited number of authorization policies, it is somewhat complex to
implement. In particular, applying the `SubgraphStrategy` requires knowledge about the schema of the graph.
The passive authorization solution perhaps provides a middle ground to start implementing authorization. This
solution assumes that the `SubgraphStrategy` is applied in the Gremlin Server initialization script, because compliance
with a subgraph restriction can only be determined during the actual execution of the gremlin traversal. Note that the
same graph can be reused with different `SubgraphStrategies`. Now, authorization policies can be defined in terms of
accessible `GraphTraversalSources` and the authorizer can simply match the requested access to a `GraphTraversalSource`
against the policies applicable to the authenticated user. Like for the active authorization solution, other restrictions
such as read only access can be either applied at authorization time as policy in the authorizer itself or at request
execution time as a result of an applied `Strategy` (denoted as 'hybrid' bytecode analysis in the table). A code
example pursuing the former option is provided in the <<authz-code-example, next section>>.
IMPORTANT: 4.0.0-beta.1 Release - passive and active authorization mechanisms are not supported in this beta
because bytecode has been replaced with GremlinLang.
NOTE: Gremlin Server is not shipped with `Authorizer` implementations, because these would heavily depend on the external
systems to integrate with, e.g. link:https://ldap.com/directory-servers/[LDAP systems] or
link:https://ranger.apache.org/[Apache Ranger ]. However, third-party implementations can be
offered as <<gremlin-plugins, gremlin plugins>>.
[[authz-code-example]]
===== Code example
The two java classes below provide an example implementation of the `Authorizer` interface; they originate from
link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/authz[Gremlin Server's test package].
If you copy the files into a project, build them into a jar and add the jar to Gremlin Server's CLASSPATH, you can use
them by adding the following to Gremlin Server's yaml configuration file:
[source, yaml]
----
authentication: {
authenticator: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator,
config: {
credentialsDb: conf/tinkergraph-credentials.properties}}
authorization: {
authorizer: org.yourpackage.AllowListAuthorizer,
config: {
authorizationAllowList: your/path/allow-list.yaml}}
----
The `AllowListAuthorizer` supports granting groups of users access to statically configured `GraphTraversalSource`
instances and to the "sandbox", where sandbox means that the group is allowed anything unless restricted by Gremlin
Server's <<script-execution,sandbox>>. For denying mutating steps and OLAP operations in bytecode requests, the
`AllowListAuthorizer` relies on the `ReadOnlyStrategy` and `VertexProgramRestrictionStrategy` being present in the
`GraphTraversalSource`. However, it always denies the use of lambdas in bytecode requests unless the user has the
"sandbox" grant. It uses the `BytecodeHelper.getLambdaLanguage()` method to detect these.
The grants to groups of users can be configured in a simple yaml file. In addition to the special value "sandbox" for
a grant for string based requests and lambdas, the special value "anonymous" can be used to denote any user.
[source,java]
----
package org.yourpackage;
import org.apache.tinkerpop.gremlin.util.message.RequestMessage;
import org.apache.tinkerpop.gremlin.process.computer.traversal.strategy.verification.VertexProgramRestrictionStrategy;
import org.apache.tinkerpop.gremlin.process.traversal.Bytecode;
import org.apache.tinkerpop.gremlin.process.traversal.TraversalSource;
import org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.SubgraphStrategy;
import org.apache.tinkerpop.gremlin.process.traversal.strategy.verification.ReadOnlyStrategy;
import org.apache.tinkerpop.gremlin.server.Settings.AuthorizationSettings;
import org.apache.tinkerpop.gremlin.server.auth.AuthenticatedUser;
import java.util.*;
/**
* Authorizes a user per request, based on a list that grants access to {@link TraversalSource} instances for
* bytecode requests and to gremlin server's sandbox for string requests and lambdas. The {@link
* AuthorizationSettings}.config must have an authorizationAllowList entry that contains the name of a YAML file.
* This authorizer is for demonstration purposes only. It does not scale well in the number of users regarding
* memory usage and administrative burden.
*/
public class AllowListAuthorizer implements Authorizer {
public static final String SANDBOX = "sandbox";
public static final String REJECT_BYTECODE = "User not authorized for bytecode requests on %s";
public static final String REJECT_LAMBDA = "lambdas";
public static final String REJECT_MUTATE = "the ReadOnlyStrategy";
public static final String REJECT_OLAP = "the VertexProgramRestrictionStrategy";
public static final String REJECT_SUBGRAPH = "the SubgraphStrategy";
public static final String REJECT_STRING = "User not authorized for string-based requests.";
public static final String KEY_AUTHORIZATION_ALLOWLIST = "authorizationAllowList";
// Collections derived from the list with allowed users for fast lookups
private final Map<String, List<String>> usernamesByTraversalSource = new HashMap<>();
private final Set<String> usernamesSandbox = new HashSet<>();
/**
* This method is called once upon system startup to initialize the {@code AllowListAuthorizer}.
*/
@Override
public void setup(final Map<String,Object> config) {
AllowList allowList;
final String file = (String) config.get(KEY_AUTHORIZATION_ALLOWLIST);
try {
allowList = AllowList.read(file);
} catch (Exception e) {
throw new IllegalArgumentException(String.format("Failed to read list with allowed users from %s", file));
}
for (Map.Entry<String, List<String>> entry : allowList.grants.entrySet()) {
if (!entry.getKey().equals(SANDBOX)) {
usernamesByTraversalSource.put(entry.getKey(), new ArrayList<>());
}
for (final String group : entry.getValue()) {
if (allowList.groups.get(group) == null) {
throw new RuntimeException(String.format("Group '%s' not defined in file with allowed users.", group));
}
if (entry.getKey().equals(SANDBOX)) {
usernamesSandbox.addAll(allowList.groups.get(group));
} else {
usernamesByTraversalSource.get(entry.getKey()).addAll(allowList.groups.get(group));
}
}
}
}
/**
* Checks whether a user is authorized to have a gremlin bytecode request from a client answered and raises an
* {@link AuthorizationException} if this is not the case. For a request to be authorized, the user must either
* have a grant for the requested {@link TraversalSource}, without using lambdas, mutating steps or OLAP, or have a
* sandbox grant.
*
* @param user {@link AuthenticatedUser} that needs authorization.
* @param bytecode The gremlin {@link Bytecode} request to authorize the user for.
* @param aliases A {@link Map} with a single key/value pair that maps the name of the {@link TraversalSource} in the
* {@link Bytecode} request to name of one configured in Gremlin Server.
* @return The original or modified {@link Bytecode} to be used for further processing.
*/
@Override
public Bytecode authorize(final AuthenticatedUser user, final Bytecode bytecode, final Map<String, String> aliases) throws AuthorizationException {
final Set<String> usernames = new HashSet<>();
for (final String resource: aliases.values()) {
usernames.addAll(usernamesByTraversalSource.get(resource));
}
final boolean userHasTraversalSourceGrant = usernames.contains(user.getName()) || usernames.contains(AuthenticatedUser.ANONYMOUS_USERNAME);
final boolean userHasSandboxGrant = usernamesSandbox.contains(user.getName()) || usernamesSandbox.contains(AuthenticatedUser.ANONYMOUS_USERNAME);
final boolean runsLambda = BytecodeHelper.getLambdaLanguage(bytecode).isPresent();
final boolean touchesReadOnlyStrategy = bytecode.toString().contains(ReadOnlyStrategy.class.getSimpleName());
final boolean touchesOLAPRestriction = bytecode.toString().contains(VertexProgramRestrictionStrategy.class.getSimpleName());
// This element becomes obsolete after resolving TINKERPOP-2473 for allowing only a single instance of each traversal strategy.
final boolean touchesSubgraphStrategy = bytecode.toString().contains(SubgraphStrategy.class.getSimpleName());
final List<String> rejections = new ArrayList<>();
if (runsLambda) {
rejections.add(REJECT_LAMBDA);
}
if (touchesReadOnlyStrategy) {
rejections.add(REJECT_MUTATE);
}
if (touchesOLAPRestriction) {
rejections.add(REJECT_OLAP);
}
if (touchesSubgraphStrategy) {
rejections.add(REJECT_SUBGRAPH);
}
String rejectMessage = REJECT_BYTECODE;
if (rejections.size() > 0) {
rejectMessage += " using " + String.join(", ", rejections);
}
rejectMessage += ".";
if ( (!userHasTraversalSourceGrant || runsLambda || touchesOLAPRestriction || touchesReadOnlyStrategy || touchesSubgraphStrategy) && !userHasSandboxGrant) {
throw new AuthorizationException(String.format(rejectMessage, aliases.values()));
}
return bytecode;
}
/**
* Checks whether a user is authorized to have a script request from a gremlin client answered and raises an
* {@link AuthorizationException} if this is not the case.
*
* @param user {@link AuthenticatedUser} that needs authorization.
* @param msg {@link RequestMessage} in which the {@link org.apache.tinkerpop.gremlin.util.Tokens}.ARGS_GREMLIN argument can contain an arbitrary succession of script statements.
*/
public void authorize(final AuthenticatedUser user, final RequestMessage msg) throws AuthorizationException {
if (!usernamesSandbox.contains(user.getName())) {
throw new AuthorizationException(REJECT_STRING);
}
}
}
----
[source,java]
----
package org.yourpackage;
import org.yaml.snakeyaml.TypeDescription;
import org.yaml.snakeyaml.Yaml;
import org.yaml.snakeyaml.constructor.Constructor;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.List;
import java.util.Map;
import java.util.Optional;
/**
* AllowList for the AllowListAuthorizer as configured by a YAML file.
*/
public class AllowList {
/**
* Holds lists of groups by grant. A grant is either a TraversalSource name or the "sandbox" value. With the
* sandbox grant users can access all TraversalSource instances and execute groovy scripts as string based
* requests or as lambda functions, only limited by Gremlin Server's sandbox definition.
*/
public Map<String, List<String>> grants;
/**
* Holds lists of user names by groupname. The "anonymous" user name can be used to denote any user.
*/
public Map<String, List<String>> groups;
/**
* Read a configuration from a YAML file into an {@link AllowList} object.
*
* @param file the location of a AllowList YAML configuration file
* @return An {@link Optional} object wrapping the created {@link AllowList}
*/
public static AllowList read(final String file) throws Exception {
final InputStream stream = new FileInputStream(new File(file));
final Constructor constructor = new Constructor(AllowList.class);
final TypeDescription allowListDescription = new TypeDescription(AllowList.class);
allowListDescription.putMapPropertyType("grants", String.class, Object.class);
allowListDescription.putMapPropertyType("groups", String.class, Object.class);
constructor.addTypeDescription(allowListDescription);
final Yaml yaml = new Yaml(constructor);
return yaml.loadAs(stream, AllowList.class);
}
}
----
allow-list.yaml:
[source,yaml]
----
grants: {
gclassic: [groupclassic],
gmodern: [groupmodern],
gcrew: [groupclassic, groupmodern],
ggrateful: [groupgrateful],
sandbox: [groupsandbox]
}
groups: {
groupclassic: [userclassic],
groupmodern: [usermodern, stephen],
groupsink: [usersink],
groupgrateful: [anonymous],
groupsandbox: [usersandbox, marko]
}
----
[[script-execution]]
==== Protecting Script Execution
It is important to remember that Gremlin Server exposes `GremlinScriptEngine` instances that allows for remote execution
of arbitrary code on the server. Obviously, this situation can represent a security risk or, more minimally, provide
ways for "bad" scripts to be inadvertently executed. A simple example of a "valid" Gremlin script that would cause
some problems would be, `while(true) {}`, which would consume a thread in the Gremlin pool indefinitely, thus
preventing it from serving other requests. Sending enough of these kinds of scripts would eventually consume all
available threads and Gremlin Server would stop responding.
Scripts have access to the full power of their language and the JVM on which they are running. This means that they
can access certain APIs that have nothing to do with Gremlin itself, such as `java.lang.System` or the `java.io`
and `java.net` packages. Scripts offer developers a lot of flexibility, but having that flexibility comes at the cost
of safety. A Gremlin Server instance that is not secured appropriately provides for a big security risk.
The previous sections discussed methods for securing Gremlin Server through authentication and encryption, which is a
good first step in protection. Another layer of protection comes in the form of specific configurations for the
`GremlinGroovyScriptEngine`. A user can configure the script engine with a `GroovyCompilerGremlinPlugin`
implementation. Consider the basic configuration from the Gremlin Server YAML file:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
----
This configuration can be expanded to include a the `GroovyCompilerGremlinPlugin`:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample-secure.groovy]},
org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {enableThreadInterrupt: true}}}}
----
This configuration sets up the script engine with to ensure that loops (like `while`) will respect interrupt requests.
With this configuration in place, a remote execution as follows, now times out rather than consuming the thread
continuously:
[source,groovy]
client = Cluster.build("localhost").port(8182).create().connect()
==>org.apache.tinkerpop.gremlin.driver.Client$ClusteredClient@42ff9a77
client.submit("while(true) { }")
org.apache.tinkerpop.gremlin.driver.exception.ResponseException: A timeout occurred during traversal evaluation of [RequestMessage{, fields={bindings={}, language=gremlin-groovy, batchSize=64}, gremlin=while(true) { }}] - consider increasing the limit given to evaluationTimeout
The `GroovyCompilerGremlinPlugin` has a number of configuration options:
[width="100%",cols="3,10a",options="header"]
|=========================================================
|Customizer |Description
|`compilation` |Allows for three configurations: `COMPILE_STATIC`, `TYPE_CHECKED` or `NONE` (default). When configured with `COMPILE_STATIC` or `TYPE_CHECKED` it applies `CompileStatic` or `TypeChecked` annotations (respectively) to incoming scripts thus removing dynamic dispatch. More information about static compilation can be found link:http://docs.groovy-lang.org/latest/html/documentation/#_static_compilation[here] and additional information on `TypeChecked` usage can be found link:http://docs.groovy-lang.org/latest/html/documentation/#_the_code_typechecked_code_annotation[here].
|`compilerConfigurationOptions` |Allows configuration of the Groovy `CompilerConfiguration` object by taking a `Map` of key/value pairs where the "key" is a property to set on the `CompilerConfiguration`.
|`enableThreadInterrupt` |Injects checks for thread interruption, thus allowing the script to potentially respect calls to `Thread.interrupt()`
|`expectedCompilationTime` |The amount of time in milliseconds a script is allowed to compile before a warning message is sent to the logs.
|`globalFunctionCacheEnabled` |Determines if the global function cache is enabled. By default, this value is `true` - described in more detail in the <<gremlin-server-cache,Cache Management>> Section.
|`classMapCacheSpecification` |The cache specification for the `GremlinGroovyScriptEngine` class map cache - described in more detail in the <<gremlin-server-cache,Cache Management>> Section.
|`extensions` | This setting is for use when `compilation` is configured with `COMPILE_STATIC` or `TYPE_CHECKED` and accepts a comma separated list of link:http://docs.groovy-lang.org/latest/html/documentation/#Typecheckingextensions-Workingwithextensions[type checking extensions] that can have the effect of securing calls to various methods.
|=========================================================
NOTE: Consult the latest link:http://docs.groovy-lang.org/latest/html/documentation/#_typing[Groovy Documentation]
for information on the differences on the various compilation options. It is important to understand the impact that
these configuration will have on submitted scripts before enabling this feature.
IMPORTANT: TinkerPop does not offer an end-to-end out-of-the-box solution to perfectly protect against bad actors
submitting nefarious scripts. The configurations to follow which discuss the `SimpleSandboxExtension` and
`FileSandboxExtension` are meant to represent example implementations that users and providers can gain some
inspiration from in developing their own solutions. Please consult the documentation of your TinkerPop implementation
to determine how scripts are "secured" as many providers have taken their own approaches to solving this problem.
Securing scripts (i.e. preventing access to certain methods) is a bit more complicated of a story. As an example,
TinkerPop implemented some basic "sandbox" implementations as described in this
link:https://melix.github.io/blog/2015/03/sandboxing.html[blog post] to try to demonstrate a method by which script
security could be achieved. Consider the following configuration of the `GroovyCompilerGremlinPlugin`:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}
org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {enableThreadInterrupt: true, compilation: COMPILE_STATIC, extensions: org.apache.tinkerpop.gremlin.groovy.jsr223.customizer.SimpleSandboxExtension},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample-secure.groovy]}}}}
----
This configuration uses the `SimpleSandboxExtension`, which blocks calls to methods on the `System` class, thereby
preventing someone from remotely killing the server:
[source,groovy]
----
gremlin> :> System.exit(0)
Script8.groovy: 1: [Static type checking] - Not authorized to call this method: java.lang.System#exit(int)
@ line 1, column 1.
System.exit(0)
^
1 error
----
The `SimpleSandboxExtension` is by no means a "complete" implementation protecting against all manner of nefarious
scripts, but it does provide an example for how such a capability might be implemented. A slightly more advanced
example is offered in the `FileSandboxExtension` which uses a configuration file to allow certain classes and methods.
The configuration file is YAML-based and an example is presented as follows:
[source,yaml]
----
autoTypeUnknown: true
methodWhiteList:
- java\.lang\.Boolean.*
- java\.lang\.Byte.*
- java\.lang\.Character.*
- java\.lang\.Double.*
- java\.lang\.Enum.*
- java\.lang\.Float.*
- java\.lang\.Integer.*
- java\.lang\.Long.*
- java\.lang\.Math.*
- java\.lang\.Number.*
- java\.lang\.Object.*
- java\.lang\.Short.*
- java\.lang\.String.*
- java\.lang\.StringBuffer.*
- java\.lang\.System#currentTimeMillis\(\)
- java\.lang\.System#nanoTime\(\)
- java\.lang\.Throwable.*
- java\.lang\.Void.*
- java\.util\..*
- org\.codehaus\.groovy\.runtime\.DefaultGroovyMethods.*
- org\.codehaus\.groovy\.runtime\.InvokerHelper#runScript\(java\.lang\.Class,java\.lang\.String\[\]\)
- org\.codehaus\.groovy\.runtime\.StringGroovyMethods.*
- groovy\.lang\.Script#<init>\(groovy.lang.Binding\)
- org\.apache\.tinkerpop\.gremlin\.structure\..*
- org\.apache\.tinkerpop\.gremlin\.process\..*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\..*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.bulkloading\..*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.clustering\.peerpressure\.*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.ranking\.pagerank\.*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.traversal\..*
- org\.apache\.tinkerpop\.gremlin\.process\.traversal\..*
- org\.apache\.tinkerpop\.gremlin\.process\.traversal\.dsl\.graph\..*
- org\.apache\.tinkerpop\.gremlin\.process\.traversal\.engine\..*
- org\.apache\.tinkerpop\.gremlin\.server\.util\.LifeCycleHook.*
staticVariableTypes:
graph: org.apache.tinkerpop.gremlin.structure.Graph
g: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource
----
There are three keys in this configuration file that control different aspects of the sandbox:
. `autoTypeUnknown` - When set to `true`, unresolved variables are typed as `Object`.
. `methodWhiteList` - A white list of classes and methods that follow a regex pattern which can then be matched against
method descriptors to determine if they can be executed. The method descriptor is the fully-qualified class name
of the method, its name and parameters. For example, `Math.ceil` would have a descriptor of
`java.lang.Math#ceil(double)`.
. `staticVariableTypes` - A list of variables that will be used in the `ScriptEngine` for which the types are
always known. In the above example, the variable "graph" will always be bound to a `Graph` instance.
At Gremlin Server startup, the `FileSandboxExtension` looks in the root of Gremlin Server installation directory for a
file called `sandbox.yaml` and configures itself. To use a file in a different location set the
`gremlinServerSandbox` system property to the location of the file (e.g. `-DgremlinServerSandbox=conf/my-sandbox.yaml`).
A final thought on the topic of `GroovyCompilerGremlinPlugin` implementation is that it is not just for
"security" (though it is demonstrated in that capacity here). It can be used for a variety of features that
can fine tune the Groovy compilation process. Read more about compilation customization in the
link:http://docs.groovy-lang.org/latest/html/documentation/#compilation-customizers[Groovy Documentation].
=== Best Practices
The following sections define best practices for working with Gremlin Server.
==== Tuning
image:gremlin-handdrawn.png[width=120,float=right] Tuning Gremlin Server for a particular environment may require some
simple trial-and-error, but the following represent some basic guidelines that might be useful:
* Gremlin Server defaults to a very modest maximum heap size. Consider increasing this value for non-trivial uses.
Maximum heap size (`-Xmx`) is defined with the `JAVA_OPTIONS` setting in `gremlin-server.conf`.
* TinkerPop tends to discourage the use of link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#long-traversals[long traversals]
as they can introduce performance problems in some cases and in others simply fail with a `StackOverflowError`. Aside
from restructuring the traversal into multiple commands or stream based inserts, it may sometimes make sense to simply
increase the stack size of the JVM for Gremlin Server by configuring an `-Xss` setting in `JAVA_OPTIONS` of
`gremlin-server.conf`.
* If Gremlin Server is processing scripts or lambdas in bytecode requests, consider fine tuning the JVM's handling of
the metaspace size. Consider modifying the `-XX:MetaspaceSize`,`-XX:MaxMetaspaceSize`, and related settings given the
expected workload. More discussion on this topic can be found in the <<parameterized-scripts,Parameterized Scripts>>
Section below.
* When configuring the size of `threadPoolWorker` start with the default of `1` and increment by one as needed to a
maximum of `2*number of cores`.
* The "right" size of the `gremlinPool` setting is somewhat dependent on the type of requests that will be processed
by Gremlin Server. As requests arrive to Gremlin Server they are decoded and queued to be processed by threads in
this pool. When this pool is exhausted of threads, Gremlin Server will continue to accept incoming requests, but
the queue will continue to grow. If left to grow too large, the server will begin to slow. When tuning around
this setting, consider whether the bulk of the scripts being processed will be "fast" or "slow", where "fast"
generally means being measured in the low hundreds of milliseconds and "slow" means anything longer than that.
* Requests that are "slow" can really hurt Gremlin Server if they are not properly accounted for. Since these requests
block a thread until the job is complete or successfully interrupted, lots of long-run requests will eventually consume
the `gremlinPool` preventing other requests from getting processed from the queue.
** To limit the impact of this problem, consider properly setting the `evaluationTimeout` to something "sane".
In other words, test the traversals being sent to Gremlin Server and determine the maximum time they take to evaluate
and iterate over results, then set the timeout value accordingly. Also, consider setting a shorter global timeout for
requests and then use longer per-request timeouts for those specific ones that might execute at a longer rate.
** Note that `evaluationTimeout` can only attempt to interrupt the evaluation on timeout. It allows Gremlin
Server to "ignore" the result of that evaluation, which means the thread in the `gremlinPool` that did the evaluation
may still be consumed after the timeout if interruption does not succeed on the thread.
* When using sessions, there are different options to consider depending on the `Channelizer` implementation being
used:
** `WebSocketChannelizer` and `WsAndHttpChannelizer` - Both of these channelizers use the `gremlinPool` only for
sessionless requests and construct a single threaded pool for each session created. In this way, these channelizers
tend to optimize sessions to be long-lived. For short-lived sessions, which may be typical when using bytecode based
remote transactions, quickly creating and destroying these sessions can be expensive. It is likely that there will be
increased garbage collection times and frequency as well as a general increase in overall server processing.
* Graph element serialization for `Vertex` and `Edge` can be expensive, as their data structures are complex given the
possible existence of multi-properties and meta-properties. When returning data from Gremlin Server only return the
data that is required. For example, if only two properties of a `Vertex` are needed then simply return the two rather
than returning the entire `Vertex` object itself. Even with an entire `Vertex`, it is typically much faster to issue
the query as `g.V(1).elementMap()` than `g.V(1)`, as the former returns a `Map` of the same data as a `Vertex`, but
without all the associated structure which can slow the response.
* Gremlin Server writes responses to a buffer held in direct memory prior to flushing them to the TCP socket. If the
logs show `OutOfDirectMemoryError`, particularly when the `channels.write-pauses` <<metrics,metric>> is high, it is
likely caused by this buffer being filled. The buffer can fill when clients are slow to consume results being sent to
them (e.g. network problems, underpowered client instances, etc.). Gremlin Server will attempt to throttle the speed at
which the buffer gets filled by pausing writes for any channel that exceeds its allowed buffer space allotment as
determined by the `writeBufferHighWaterMark` and `writeBufferLowWaterMark` described in the
<<server-configuring,Server Configuration Section>>. Pauses obviously increase latency, but do so for benefit of
server stability in continuing to serve channels that have clients without issue consuming the results.
** Write pauses are generally considered a natural part of server operations, though a continuous amount of pausing
means that threads used for query execution are tied up and are therefore preventing the processing of other requests.
As a result, requests may begin to queue which further adds to server load and potential latency. Increasing the
`writeBufferHighWaterMark` and `writeBufferLowWaterMark` settings could allow the server to delay pauses at the expense
of direct memory and therefore allow more requests to be handled by freeing those query execution threads.
** Client applications should be selective in their retries. Quickly resending a query that triggered an
`OutOfDirectMemoryError` without giving the server time to recover will just further burden a taxed system. Even retry
systems that use exponential back-off may not be suitable for these cases as early retries may land too quickly and
therefore just queue another heavy request.
** Consider the shape of query results as they can have an impact on server performance. The "shape" refers to the form
of the result given the query. For example, `g.V()` and `g.V().fold()` both return the same results (i.e. all the
vertices in the graph) but the former returns them one at a time in a stream and the latter collects them all in
memory in a `List` and then returns the one `List` result. Writing queries in ways that allow results that can stream
(only applies for websockets) is preferable and will allow the server to perform better. Another aspect of "shape"
can come into play when returning data of individual graph elements. For example, the `g.V()` form of query will stream,
but if each `Vertex` returned has lots of properties (e.g. properties with large strings or heavy blobs), this could
trigger scenarios where each streamed batch immediately exceeds `writeBufferHighWaterMark`. Simply exceeding the
`writeBufferHighWaterMark` may not trigger a pause as the server may quickly flush the buffer before the next batch, but
one could see how easily a write pause could be triggered in that state. It could make sense to configure a smaller
`batchSize` for queries results that have heavy individual objects in them as that would reduce the byte size of the
batch and allow buffer flushes to happen more often (though that may be a cost in and of itself).
[[parameterized-scripts]]
==== Parameterized Scripts
image:gremlin-parameterized.png[width=150,float=left] If using `GremlinGroovyScriptEngine` in Gremlin
Server, it is imperative to use script parameterization. Period. There are at least two good
reasons for doing so: script caching and protection from "Gremlin injection" (conceptually the same as the notion of
SQL injection).
IMPORTANT: It is possible to use the `GremlinLangScriptEngine` in Gremlin Server as opposed to the
`GremlinGroovyScriptEngine`. The former makes use of `gremlin-language` and its ANTLR grammar for parsing Gremlin
scripts. This processing is different from the processing performed by Groovy and therefore spares users from the
concerns of this section. When considering parameterization, users should also consider the graph database they are
using to determine if it has native mechanisms that preclude the need for parameterization.
With respect to caching, Gremlin Server caches all scripts that are passed to it. The cache is keyed based on the
hash of the script. Therefore `g.V(1)` and `g.V(2)` will be recognized as two separate scripts in the cache. If that
script is parameterized to `g.V(x)` where `x` is passed as a parameter from the client, there will be no additional
compilation cost for future requests on that script. Compilation of a script should be considered "expensive" and
avoided when possible.
IMPORTANT: The parameterized script of `g.V(x)` is keyed in the cache differently than `g.V(y)` or even `g.V( x )`.
Scripts must be exact string matches for recompilation to be avoided.
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect();
Map<String,Object> params = new HashMap<>();
params.put("x",4);
client.submit("[1,2,3,x]", params);
----
The more parameters that are used in a script the more expensive the compilation step becomes. Gremlin Server has a
`OpProcessor` setting called `maxParameters`, which is mentioned in the <<opprocessor-configurations,OpProcessor Configuration>>
section. It controls the maximum number of parameters that can be passed to the server for script evaluation purposes.
Use of this setting can prevent accidental long run compilations, which individually are not terribly oppressive to
the server, but taken as a group under high concurrency would be considered detrimental.
On the topic of Gremlin injection, note that it is possible to take advantage of Gremlin scripts in the same fashion
as SQL scripts that are submitted as strings. When using string building patterns for queries without proper input
scrubbing, it would be quite simple to do:
[source,java]
----
String lbl = "person";
String nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas";
String query = "g.addV('" + lbl + "').property('identifier','" + nodeId + "')";
client.submit(query);
----
The above case would `drop()` all vertices in the graph. By using script parameterization, there is a different outcome
in that the `nodeId` string is not treated as something executable, but rather as a literal string that just becomes
part of the "identifier" for the vertex on insertion:
[source,java]
----
String lbl = "person";
String nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas";
String query = "g.addV(lbl).property('identifier',nodeId)";
Map<String,Object> params = new HashMap<>();
params.put("lbl", lbl);
params.put("nodeId", nodeId);
client.submit(query, params);
----
Groovy scripts create classes which get loaded to the JVM metaspace and to a `Class` cache. For those using script
parameterization, a typical application should not generate an overabundance of pressure on these two components of
Gremlin Server's memory footprint. On the other hand, it's not too hard to imagine a situation where problems might
emerge:
* An application use case makes parameterization impossible and therefore all scripts are unique.
* There is a bug in an applications parameterization code that is actually instead producing unique scripts.
* A long running Gremlin Server takes lots of non-parameterized scripts from Gremlin Console or similar tools.
In these sorts of cases, Gremlin Server's performance can be affected adversely as without some additional configuration
the metaspace will grow indefinitely (possibly along with the general heap) triggering longer and more frequent rounds
of garbage collection (GC). Some tuning of JVM settings can help abate this issue.
As a first guard against this problem consider setting the `-XX:SoftRefLRUPolicyMSPerMB` to release soft references
earlier. The `ScriptEngine` cache for created `Class` objects uses soft references and if the workload expectation is
such that cache hits will be low there is little need to keep such references around.
Perhaps the more important guards are related to the JVM metaspace. Start by setting the initial size of this space
with `-XX:MetaspaceSize`. When this value is exceeded it will trigger a GC round - it is essentially a threshold for
GC. The growth of this value can be capped with `-XX:MaxMetaspaceSize` (this value is unlimited by default). In an ideal
situation (i.e. parameterization), the `-XX:MetaspaceSize` should have a large enough setting so as to avoid early GC
rounds for metaspace, but outside of an ideal world (i.e. non-parameterization) it may not be smart to make this number
too large. Making the setting too large (and thus the `-XX:MaxMetaspaceSize` even larger) may trigger longer GC rounds
when they inevitably arrive.
In addition to those two metaspace settings it may also be useful to consider the following additional options:
* `MinMetaspaceFreeRatio` - When the percentage for committed space available for class metadata is less than this
value, then the threshold of metaspace GC will be raised, but only if the incremental size of the threshold meets the
requirement set by `MinMetaspaceExpansion`. A larger number should make the metaspace grow more aggressively.
* `MaxMetaspaceFreeRatio` - When the percentage for committed space available for class metadata is more than this
value, then the threshold of metaspace GC will be lowered, but only if the incremental size of the threshold meets the
requirement set by `MaxMetaspaceExpansion`. A larger number should reduce the chance of the metaspace shrinking.
* `MinMetaspaceExpansion` - The minimum size by which the metaspace is expanded after a metaspace GC round.
* `MaxMetaspaceExpansion`` - If the incremental size exceeds `MinMetaspaceExpansion` but less than
`MaxMetaspaceExpansion`, then the incremental size is `MaxMetaspaceExpansion`. If the incremental size exceeds
`MaxMetaspaceExpansion`, then the incremental size is `MinMetaspaceExpansion` plus the original incremental size.
There really aren't any general guidelines for how to initially set these values. Using profiling tools to examine GC
trends is likely the best way to understand how a particular workload is affecting the metaspace and its relation to
GC. Getting these settings "right" however will help ensure much more predictable Gremlin Server operations.
==== Properties of Elements
It was mentioned above at the start of this "Best Practices" section that serialization of graph elements (i.e.
`Vertex`, `Edge`, and `VertexProperty`) can be expensive and that it is best to only return the data that is required
by the requesting system. This point begs for further clarification as there are a number of ways to use and configure
Gremlin Server which might influence its interpretation.
To begin to discuss these nuances, first consider the method of making requests to Gremlin Server: script or traversal.
For scripts, that will mean that users are sending string representation of Gremlin to the server directly through a
driver or over HTTP. For bytecode, users will be utilize a <<gremlin-drivers-variants, Gremlin GLV>> which will
construct bytecode for them and submit the request to the server upon iteration of their traversal.
In either case, it is important to also consider the method of "detachment". Detachment refers to the manner in which
a graph element is disconnected from the graph for purpose of serialization. Depending on the case and configuration,
graph elements may be detached with or without properties. Cases where they include properties is generally referred
to as "detached elements" and cases where properties are not included are "reference elements".
With the type of request and detachment model in mind, it is now possible to discuss how best to consider element
properties in relation to them all in concert.
By default, Gremlin Server configuration returns all properties.
To manage properties for each request you can use the <<configuration-steps-with,with()>> configuration option
`materializeProperties`
[source,groovy]
----
g.with('materializeProperties', 'tokens').V()
----
The `tokens` value for the `materializeProperties` means that only `id` and `label` should be returned.
Another option, `all`, can be used to indicate that all properties should be returned and is the default value.
In some cases it can be inconvenient to load Elements with properties due to large data size or for compatibility reasons.
That can be solved by utilizing `ReferenceElementStrategy` when creating the out-of-the-box `GraphTraversalSource`.
As the name suggests, this means that elements will be detached by reference and will therefore not have properties
included. The relevant configuration from the Gremlin Server initialization script looks like this:
[source,groovy]
----
globals << [g : traversal().with(graph).withStrategies(ReferenceElementStrategy)]
----
This configuration is global to Gremlin Server and therefore all methods of connection will always return elements
without properties. If this strategy is not included, then elements will be returned with properties.
Ultimately, the detachment model should have little impact to Gremlin usage if the best practice of specifying only
the data required by the application is adhered to.
The best practice of requesting only the data the application needs:
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect();
ResultSet results = client.submit("g.V().hasLabel('person').elementMap('name')");
GraphTraversalSource g = traversal().with('conf/remote-graph.properties');
List<Vertex> results = g.V().hasLabel("person").elementMap('name').toList();
----
Both of the above requests return a list of `Map` instances that contain the `id`, `label` and the "name" property.
*Compatibility*
*It is not recommended to use 3.6.x or below driver versions with 3.7.x or above Gremlin Server*, as some older drivers do not construct
graph elements with properties and thus are not designed to handle the returned properties by default; however, compatibility
can be achieved by configuring `ReferenceElementStrategy` in the server such that properties are not returned.
Per-request configuration option `materializeProperties` is not supported older driver versions.
Also note that older drivers of different language variants will handle incoming properties differently with different
serializers used. Drivers using `GraphSON` serializers will remain compatible, but may encounter deserialization errors
with `GraphBinary`. Below is a table documenting GLV behaviors using `GraphBinary` when properties are returned by the
default 3.7.x server, as well as if `ReferenceElementStrategy` is configured (i.e. mimic the behavior
of a 3.6.x server). This can be observed with the results of `g.V().next()`. Note that only `gremlin-driver`
and `gremlin-javacript` have the `properties` attribute in the Element objects, all other GLVs only have `id` and `label`.
[cols="1,1,1"]
|===
|3.6.x drivers with `GraphBinary` |Behavior with default 3.7.x Server | Behavior with `ReferenceElementStrategy`
|`gremlin-driver`
|Properties returned as empty iterator
|Properties returned as empty iterator
|`gremlin-dotnet`
|Skips properties in Elements
|Skips properties in Elements
|`gremlin-javascript`
|Deserialization error
|Properties returned as empty list
|`gremlin-python`
|Deserialization error
|Skips properties in Elements
|`gremlin-go`
|Deserialization error
|Skips properties in Elements
|===
TIP: Consider utilizing `ReferenceElementStrategy` whenever creating a `GraphTraversalSource` in Java to ensure
the most portable Gremlin.
NOTE: For those interested, please see link:https://lists.apache.org/thread.html/e959e85d4f8b3d46d281f2742a6e574c7d27c54bfc52f802f7c04af3%40%3Cdev.tinkerpop.apache.org%3E[this post]
to the TinkerPop dev list which outlines the full history of this issue and related concerns.
[[gremlin-server-cache]]
==== Cache Management
If Gremlin Server processes a large number of unique scripts, the global function cache will grow beyond the memory
available to Gremlin Server and an `OutOfMemoryError` will loom. Script parameterization goes a long way to solving
this problem and running out of memory should not be an issue for those cases. If it is a problem or if there is no
script parameterization due to a given use case, it is possible to better control the nature of the global function
cache from the client side, by issuing scripts with a parameter to help define how the garbage collector should treat
the references.
The parameter is called `#jsr223.groovy.engine.keep.globals` and has four options:
* `hard` - available in the cache for the life of the JVM (default when not specified).
* `soft` - retained until memory is "low" and should be reclaimed before an `OutOfMemoryError` is thrown.
* `weak` - garbage collected even when memory is abundant.
* `phantom` - removed immediately after being evaluated by the `ScriptEngine`.
By specifying an option other than `hard`, an `OutOfMemoryError` in Gremlin Server should be avoided. Of course,
this approach will come with the downside that functions could be garbage collected and thus removed from the
cache, forcing Gremlin Server to recompile later if that script is later encountered.
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect();
Map<String,Object> params = new HashMap<>();
params.put("#jsr223.groovy.engine.keep.globals", "soft");
client.submit("def addItUp(x,y){x+y}", params);
----
In cases where maintaining the expense of the global function cache is unecessary this cache can be disabled with the
`globalFunctionCacheEnabled` configuration on the `GroovyCompilerGremlinPlugin`.
Gremlin Server also has a "class map" cache which holds compiled scripts which helps avoid recompilation costs on
future requests. This cache can be tuned in the Gremlin Server configuration with the `GroovyCompilerGremlinPlugin`
in the following fashion:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { ...
org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {classMapCacheSpecification: "initialCapacity=1000,maximumSize=10000"},
...}
----
The specifics for this comma delimited format can be found
link:https://static.javadoc.io/com.github.ben-manes.caffeine/caffeine/2.6.2/com/github/benmanes/caffeine/cache/CaffeineSpec.html[here].
By default, the cache is set to `softValues` which means they are garbage collected in a globally least-recently-used
manner as memory gets low. For production systems, it is likely that a more predictable strategy be taken as shown
above with the use of the `maximumSize`.
[[considering-transactions]]
==== Considering Transactions
Gremlin Server performs automated transaction handling for "sessionless" requests (i.e. no state between requests) and
for "in-session" requests with that feature enabled. It will automatically commit or rollback transactions depending
on the success or failure of the request.
IMPORTANT: Understand the transactional capabilities of the graph configured in Gremlin Server when using sessions. For
example, a basic `TinkerGraph` in its non-transactional form won't be able to rollback a failed traversal, therefore it
is quite possible to get partial updates if the first part of a traversal succeeds and the rest fails.
Another aspect of Transaction Management that should be considered is the usage of the `strictTransactionManagement`
setting. It is `false` by default, but when set to `true`, it forces the user to pass `aliases` for all requests.
The aliases are then used to determine which graphs will have their transactions closed for that request. Running
Gremlin Server in this configuration should be more efficient when there are multiple graphs being hosted as
Gremlin Server will only close transactions on the graphs specified by the `aliases`. Keeping this setting `false`,
will simply have Gremlin Server close transactions on all graphs for every request.
[[request-retry]]
==== Request Retry
The server has the ability to instruct the client that an error condition is transient and that the client should
simply retry the request later. In the event a client detects a `ResponseStatusCode` of `SERVER_ERROR_TEMPORARY`,
which is error code `596`, the client may choose to retry that request. Note that drivers do not have the ability to
automatically retry and that it is up to the application to provide such logic.
[[gremlin-server-docker-image]]
=== Docker Image
The Gremlin Server can also be started as a link:https://hub.docker.com/r/tinkerpop/gremlin-server/[Docker image]:
[source,text]
----
$ docker run tinkerpop/gremlin-server:x.y.z
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(4)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server.yaml
...
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
By default, Gremlin Server listens on port 8182. So that port needs to be exposed if it should be reachable on the host:
[source,bash]
----
$ docker run -p 8182:8182 tinkerpop/gremlin-server:x.y.z
----
Arguments provided with `docker run` are forwarded to the script that starts Gremlin Server. This allows for example
to use an alternative config file:
[source,bash]
----
$ docker run tinkerpop/gremlin-server:x.y.z conf/gremlin-server-secure.yaml
----
[[gremlin-plugins]]
== Gremlin Plugins
image:gremlin-plugin.png[width=125]
Plugins provide a way to expand the features of Gremlin Console and Gremlin Server. The following sections describe
the plugins that are available directly from TinkerPop. Please see the
link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#gremlin-plugins[Provider Documentation] for information on
how to develop custom plugins.
[[credentials-plugin]]
=== Credentials Plugin
image:gremlin-server.png[width=200,float=left] xref:gremlin-server[Gremlin Server] supports an authentication model
where user credentials are stored inside of a `Graph` instance. This database can be managed with the
xref:credentials-dsl[Credentials DSL], which can be installed in the console via the Credentials Plugin. This plugin
is packaged with the console, but is not enabled by default.
[source,groovy]
gremlin> :plugin use tinkerpop.credentials
==>tinkerpop.credentials activated
This plugin imports the appropriate classes for managing the credentials graph.
[[graph-plugins]]
=== Graph Plugins
This section does not refer to a specific Gremlin Plugin, but a class of them. Graph Plugins are typically created by
graph providers to make it easy to integrate their graph systems into Gremlin Console and Gremlin Server. As TinkerPop
provides two reference `Graph` implementations in <<tinkergraph-gremlin,TinkerGraph>> and <<neo4j-gremlin,Neo4j>>,
there is also one Gremlin Plugin for each of them.
The TinkerGraph plugin is installed and activated in the Gremlin Console by default and the sample configurations that
are supplied with the Gremlin Server distribution include the `TinkerGraphGremlinPlugin` as part of the default setup.
If using Neo4j, however, the plugin must be installed manually. Instructions for doing so can be found in the
<<neo4j-gremlin,Neo4j>> section.
[[hadoop-plugin]]
=== Hadoop Plugin
image:hadoop-logo-notext.png[width=100,float=left] The Hadoop Plugin installs as part of `hadoop-gremlin` and provides
a number of imports and utility functions to the environment within which it is used. Those classes and functions
provide the basis for supporting <<graphcomputer,OLAP based traversals>> with Gremlin. This plugin is defined in
greater detail in the <<hadoop-gremlin,Hadoop-Gremlin>> section.
[[server-plugin]]
=== Server Plugin
image:gremlin-server.png[width=200,float=left] xref:gremlin-server[Gremlin Server] remotely executes Gremlin scripts
that are submitted to it. The Server Plugin provides a way to submit scripts to Gremlin Server for remote
processing. Read more about the plugin and how it works in the Gremlin Server section on
<<connecting-via-console,Connecting via Console>>.
NOTE: This plugin is typically only useful to the Gremlin Console and is enabled in the there by default.
The Server Plugin for remoting with the Gremlin Console should not be confused with a plugin of similar name that is
used by the server. `GremlinServerGremlinPlugin` is typically only configured in Gremlin Server and provides a number
of imports that are required for writing <<starting-gremlin-server,initialization scripts>>.
[[spark-plugin]]
=== Spark Plugin
image:spark-logo.png[width=175,float=left] The Spark Plugin installs as part of `spark-gremlin` and provides
a number of imports and utility functions to the environment within which it is used. Those classes and functions
provide the basis for supporting <<graphcomputer,OLAP based traversals>> using link:http://spark.apache.org[Spark].
This plugin is defined in greater detail in the <<sparkgraphcomputer,SparkGraphComputer>> section and is typically
installed in conjuction with the <<hadoop-plugin,Hadoop-Plugin>>.
[[sugar-plugin]]
=== Sugar Plugin
image:gremlin-sugar.png[width=120,float=left] In previous versions of Gremlin-Groovy, there were numerous
link:http://en.wikipedia.org/wiki/Syntactic_sugar[syntactic sugars] that users could rely on to make their traversals
more succinct. Unfortunately, many of these conventions made use of link:http://docs.oracle.com/javase/tutorial/reflect/[Java reflection]
and thus, were not performant. In TinkerPop, these conveniences have been removed in support of the standard
Gremlin-Groovy syntax being both inline with Gremlin-Java syntax as well as always being the most performant
representation. However, for those users that would like to use the previous syntactic sugars (as well as new ones),
there is `SugarGremlinPlugin` (a.k.a Gremlin-Groovy-Sugar).
IMPORTANT: It is important that the sugar plugin is loaded in a Gremlin Console session prior to any manipulations of
the respective TinkerPop objects as Groovy will cache unavailable methods and properties.
[source,groovy]
----
gremlin> :plugin use tinkerpop.sugar
==>tinkerpop.sugar activated
----
TIP: When using Sugar in a Groovy class file, add `static { SugarLoader.load() }` to the head of the file. Note that
`SugarLoader.load()` will automatically call `GremlinLoader.load()`.
==== Graph Traversal Methods
If a `GraphTraversal` property is unknown and there is a corresponding method with said name off of `GraphTraversal`
then the property is assumed to be a method call. This enables the user to omit `( )` from the method name. However,
if the property does not reference a `GraphTraversal` method, then it is assumed to be a call to `values(property)`.
[gremlin-groovy,modern]
----
g.V <1>
g.V.name <2>
g.V.outE.weight <3>
----
<1> There is no need for the parentheses in `g.V()`.
<2> The traversal is interpreted as `g.V().values('name')`.
<3> A chain of zero-argument step calls with a property value call.
==== Range Queries
The `[x]` and `[x..y]` range operators in Groovy translate to `RangeStep` calls.
[gremlin-groovy,modern]
----
g.V[0..2]
g.V[0..<2]
g.V[2]
----
==== Logical Operators
The `&` and `|` operator are overloaded in `SugarGremlinPlugin`. When used, they introduce the `AndStep` and `OrStep`
markers into the traversal. See <<and-step,`and()`>> and <<or-step,`or()`>> for more information.
[gremlin-groovy,modern]
----
g.V.where(outE('knows') & outE('created')).name <1>
t = g.V.where(outE('knows') | inE('created')).name; null <2>
t.toString()
t
t.toString()
----
<1> Introducing the `AndStep` with the `&` operator.
<2> Introducing the `OrStep` with the `|` operator.
==== Traverser Methods
It is rare that a user will ever interact with a `Traverser` directly. However, if they do, some method redirects exist
to make it easy.
[gremlin-groovy,modern]
----
g.V().map{it.get().value('name')} // conventional
g.V.map{it.name} // sugar
----
[[utilities-plugin]]
=== Utilities Plugin
The Utilities Plugin provides various functions, helper methods and imports of external classes that are useful in
the console.
NOTE: The Utilities Plugin is enabled in the Gremlin Console by default.
[[describe-graph]]
==== Describe Graph
A good implementation of the Gremlin APIs will validate their features against the
link:../dev/provider/#validating-with-gremlin-test[Gremlin test suite]. To learn more about a specific
implementation's compliance with the test suite, use the `describeGraph` function. The following shows the output
for `HadoopGraph`:
[gremlin-groovy,modern]
----
describeGraph(HadoopGraph)
----
[[gremlin-mcp]]
=== Gremlin MCP
Gremlin MCP integrates Apache TinkerPop with the Model Context Protocol (MCP) so that MCP‑capable assistants (for
example, desktop chat clients that support MCP) can discover your graph, run Gremlin traversals and exchange graph data
through a small set of well‑defined tools. It allows users to “talk to your graph” while keeping full Gremlin power
available when they or the assistant need it.
MCP is an open protocol that lets assistants call server‑hosted tools in a structured way. Each tool has a name, an
input schema, and a result schema. When connected to a Gremlin MCP server, the assistant can:
* Inspect the server’s health and connection to a Gremlin data source
* Discover the graph’s schema (labels, properties, relationships, counts)
* Execute Gremlin traversals
The Gremlin MCP server sits alongside Gremlin Server (or any TinkerPop‑compatible endpoint) and forwards tool calls to
the graph via standard Gremlin traversals.
IMPORTANT: This MCP server is designed for development and trusted environments.
WARNING: Gremlin MCP can modify the graph to which it is connected. To prevent such changes, ensure that Gremlin MCP is
configured to work against a read-only instance of the graph. Gremlin Server hosted graphs can configure their graph
using `withStrategies(ReadOnlyStrategy)` for that protection.
WARNING: Gremlin MCP executes global graph traversals to help it understand the schema and gather statistics. On a large
graph these queries will be costly. If you are trying Gremlin MCP, please try it with a smaller subset of your graph for
experimentation purposes.
MCP defines a simple request/response model for invoking named tools. A tool declares its input and output schema so an
assistant can construct valid calls and reason about results. The Gremlin MCP server implements several tools and, when
invoked by an MCP client, translates those calls to Gremlin traversals against a configured Gremlin endpoint. The
endpoint is typically Gremlin Server, but could be used with any graph system that implements its protocols.
TIP: Gremlin MCP does not replace Gremlin itself. It complements it by helping assistants discover data and propose
traversals. You can always provide an explicit traversal when you know what you want.
The Gremlin MCP server exposes these tools:
* `get_graph_status` — Returns basic health and connectivity information for the backing Gremlin data source.
* `get_graph_schema` — Discovers vertex labels, edge labels, property keys, and relationship patterns. Low‑cardinality
properties may be surfaced as enums to encourage valid values in queries.
* `run_gremlin_query` — Executes an arbitrary Gremlin traversal and returns JSON results.
* `refresh_schema_cache` — Forces schema discovery to run again when the graph has changed.
==== Schema discovery
Schema discovery is the foundation that lets humans and AI assistants reason about a graph without prior tribal
knowledge. By automatically mapping the graph’s structure and commonly observed patterns, it produces a concise,
trustworthy description that accelerates onboarding, improves the quality of suggested traversals, and reduces
trial‑and‑error against production data. For assistants, a discovered schema becomes the guidance layer for planning
valid queries, generating meaningful filters, and explaining results in natural language. For operators, it offers safer
and more efficient interactions by avoiding blind exploratory scans, enabling caching and change detection, and
providing hooks to steer what should or shouldn’t be surfaced (for example, excluding sensitive or non‑categorical
fields). In short, schema discovery turns an opaque dataset into an actionable contract between your graph and the tools
that use it.
Schema discovery uses Gremlin traversals and sampling to uncover the following information about the graph:
* Labels - Vertex and edge labels are collected and de‑duplicated.
* Properties - For each label, a sample of elements is inspected to list observed property keys.
* Counts (optional) - Approximate counts can be included per label.
* Relationship patterns - Connectivity is derived from the labels of edges and their incident vertices.
* Enums - Properties with a small set of distinct values may be surfaced as enumerations to promote precise filters.
==== Executing traversals
When the assistant needs to answer a question, a common sequence is:
. Optionally, call get_graph_status.
. Retrieve (or reuse) schema via `get_graph_schema`.
. Formulate a traversal and call `run_gremlin_query`.
. Present results and, if required, refine the traversal.
For example, the assistant may execute a traversal like the following:
[source,groovy]
----
// list the names of people over 30 and who they know
g.V().hasLabel('person').has('age', gt(30)).out('knows').values('name')
----
==== Configuring an MCP Client
The MCP client is responsible for launching the Gremlin MCP server and providing connection details for the Gremlin
endpoint the server should use.
Basic connection settings:
* `GREMLIN_MCP_ENDPOINT` — `host:port` or `host:port/traversal_source` for the target Gremlin Server or compatible endpoint (default traversal source: `g`)
* `GREMLIN_MCP_USE_SSL` — set to `true` when TLS is required by the endpoint (default: `false`)
* `GREMLIN_MCP_USERNAME` / `GREMLIN_PASSWORD` — credentials when authentication is enabled (optional)
* `GREMLIN_MCP_IDLE_TIMEOUT` — idle connection timeout in seconds (default: `300`)
* `GREMLIN_MCP_LOG_LEVEL` — logging verbosity for troubleshooting: `error`, `warn`, `info`, or `debug` (default: `info`)
Advanced schema discovery and performance tuning:
* `GREMLIN_MCP_ENUM_DISCOVERY_ENABLED` — enable enum property discovery (default: `true`)
* `GREMLIN_MCP_ENUM_CARDINALITY_THRESHOLD` — max distinct values for a property to be considered an enum (default: `10`)
* `GREMLIN_MCP_ENUM_PROPERTY_DENYLIST` — comma-separated property names to exclude from enum detection (default: `id,pk,name,description,startDate,endDate,timestamp,createdAt,updatedAt`)
* `GREMLIN_MCP_SCHEMA_MAX_ENUM_VALUES` — limit the number of enum values returned per property in the schema (default: `10`)
* `GREMLIN_MCP_SCHEMA_INCLUDE_SAMPLE_VALUES` — include small example values for properties in the schema (default: `false`)
* `GREMLIN_MCP_SCHEMA_INCLUDE_COUNTS` — include approximate vertex/edge label counts in the schema (default: `false`)
The configurations related to enums begs additional explanation as to their importance. Treating only truly categorical
properties as enums prevents misleading suggestions and sensitive data exposure in assistant‑facing schemas. Without a
denylist and related controls, low‑sample snapshots can make non‑categorical fields like IDs, timestamps, or free text
appear “enum‑like,” degrading query guidance and result explanations. By explicitly excluding such keys, the schema
remains focused on meaningful categories (e.g., status or type), which improves AI query formulation, reduces noise, and
avoids surfacing unstable or private values. It also streamlines schema discovery by skipping properties that would
create large or frequently changing value sets, improving performance and stability.
Consult the MCP client documentation for how environment variables are supplied and how tool calls are approved and
presented to the user.