blob: 51f39dfdc227c95ce93dc8ec9fa6195c5bde8faa [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[gremlin-applications]]
= Gremlin Applications
Gremlin applications represent tools that are built on top of the core APIs to help expose common functionality to
users when working with graphs. There are two key applications:
. Gremlin Console - A link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL] environment for
interactive development and analysis
. Gremlin Server - A server that hosts a Gremlin Traversal Machine thus enabling remote Gremlin execution
image:gremlin-lab-coat.png[width=310,float=left] Gremlin is designed to be extensible, making it possible for users
and graph system/language providers to customize it to their needs. Such extensibility is also found in the Gremlin
Console and Server, where a universal plugin system makes it possible to extend their capabilities. One of the
important aspects of the plugin system is the ability to help the user install the plugins through the command line
thus automating the process of gathering dependencies and other error prone activities.
The process of plugin installation is handled by link:http://www.groovy-lang.org/Grape[Grape], which helps resolve
dependencies into the classpath. It is therefore important to ensure that Grape is properly configured in order to
use the automated capabilities of plugin installation. Grape is configured by `~/.groovy/grapeConfig.xml` and
generally speaking, if that file is not present, the default settings will suffice. However, they will not suffice
if a required dependency is not in one of the default configured repositories. Please see the
link:http://www.groovy-lang.org/Grape#Grape-CustomizeIvysettings[Customize Ivy settings] section of the Grape documentation for more details on
the defaults. For current TinkerPop plugins and dependencies the following configuration which is also the default
for Ivy should be acceptable:
[source,xml]
----
<ivysettings>
<settings defaultResolver="downloadGrapes"/>
<resolvers>
<chain name="downloadGrapes" returnFirst="true">
<filesystem name="cachedGrapes">
<ivy pattern="${user.home}/.groovy/grapes/[organisation]/[module]/ivy-[revision].xml"/>
<artifact pattern="${user.home}/.groovy/grapes/[organisation]/[module]/[type]s/[artifact]-[revision](-[classifier]).[ext]"/>
</filesystem>
<ibiblio name="localm2" root="${user.home.url}/.m2/repository/" checkmodified="true" changingPattern=".*" changingMatcher="regexp" m2compatible="true"/>
<ibiblio name="jcenter" root="https://jcenter.bintray.com/" m2compatible="true"/>
<ibiblio name="ibiblio" m2compatible="true"/>
</chain>
</resolvers>
</ivysettings>
----
TIP: Please see the link:https://tinkerpop.apache.org/docs/x.y.z/dev/developer/#groovy-environment[Developer Documentation]
for additional configuration options when working with "snapshot" releases.
[[gremlin-console]]
== Gremlin Console
image:gremlin-console.png[width=325,float=right] The Gremlin Console is an interactive terminal or
link:http://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL] that can be used to traverse graphs
and interact with the data that they contain. It represents the most common method for performing ad hoc graph
analysis, small to medium sized data loading projects and other exploratory functions. The Gremlin Console is
highly extensible, featuring a rich plugin system that allows new tools, commands,
link:http://en.wikipedia.org/wiki/Domain-specific_language[DSLs], etc. to be exposed to users.
To start the Gremlin Console, run `gremlin.sh` or `gremlin.bat`:
[source,text]
----
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin loaded: tinkerpop.server
plugin loaded: tinkerpop.utilities
plugin loaded: tinkerpop.tinkergraph
gremlin>
----
NOTE: If the above plugins are not loaded then they will need to be enabled or else certain examples will not work.
If using the standard Gremlin Console distribution, then the plugins should be enabled by default. See below for
more information on the `:plugin use` command to manually enable plugins. These plugins, with the exception of
`tinkerpop.tinkergraph`, cannot be removed from the Console as they are a part of the `gremlin-console.jar` itself.
These plugins can only be deactivated.
The Gremlin Console is loaded and ready for commands. Recall that the console hosts the Gremlin-Groovy language.
Please review link:http://www.groovy-lang.org/[Groovy] for help on Groovy-related constructs. In short, Groovy is a
superset of Java. What works in Java, works in Groovy. However, Groovy provides many shorthands to make it easier
to interact with the Java API. Moreover, Gremlin provides many neat shorthands to make it easier to express paths
through a property graph.
[gremlin-groovy]
----
i = 'goodbye'
j = 'self'
i + " " + j
"${i} ${j}"
----
The "toy" graph provides a way to get started with Gremlin quickly.
[gremlin-groovy]
----
g = TinkerFactory.createModern().traversal()
g.V()
g.V().values('name')
g.V().has('name','marko').out('knows').values('name')
----
TIP: When using Gremlin-Groovy in a Groovy class file, add `static { GremlinLoader.load() }` to the head of the file.
=== Console Commands
In addition to the standard commands of the link:http://groovy-lang.org/groovysh.html[Groovy Shell], Gremlin adds
some other useful operations. The following table outlines the most commonly used commands:
[width="100%",cols="3,^2,10",options="header"]
|=========================================================
|Command |Alias |Description
|:help |:? |Displays list of commands and descriptions. When followed by a command name, it will display more specific help on that particular item.
|:exit |:x |Ends the Console session.
|import |:i |Import a class into the Console session.
|:cls |:C |Clear the screen of the Console.
|:clear |:c |Sometimes the Console can get into a state where the command buffer no longer understands input (e.g. a misplaced `(` or `}`). Use this command to clear that buffer.
|:load |:l |Load a file or URL into the command buffer for execution.
|:install |:+ |Imports a Maven library and its dependencies into the Console.
|:uninstall |:- |Removes a Maven library and its dependencies. A restart of the console is required for removal to fully take effect.
|:plugin |:pin |Plugin management functions to list, activate and deactivate available plugins.
|:remote |:rem |Configures a "remote" context where Gremlin or results of Gremlin will be processed via usage of `:submit`.
|:submit |:> |Submit Gremlin to the currently active context defined by `:remote`.
|:bytecode |:bc |Provides options for translating and evaluating `Bytecode` for debugging purposes.
|=========================================================
Many of the above commands are described elsewhere or are generally self-explanatory, but the `:bytecode` command
could use some additional explanation. The following code shows example usage:
[source,text]
----
gremlin> :bytecode from g.V().out('knows') <1>
==>{"@type":"g:Bytecode","@value":{"step":[["V"],["out","knows"]]}}
gremlin> :bytecode translate g {"@type":"g:Bytecode","@value":{"step":[["V"],["out","knows"]]}} <2>
==>g.V().out("knows")
----
<1> Generates a GraphSON 3.0 representation of the traversal as bytecode.
<2> Converts bytecode in GraphSON 3.0 format to a traversal string.
NOTE: The Console does expose the `:record` command which is inherited from the Groovy Shell. This command works well
with local commands, but may record session outputs differently for `:remote` commands. If there is a need to use
`:record` it may be best to manually create a `Cluster` object and issue commands that way so that they evaluate
locally in the shell.
=== Interrupting Evaluations
If there is some input that is taking too long to evaluate or to iterate through, use `ctrl+c` to attempt to interrupt
that process. It is an "attempt" in the sense that the long running process is only informed of the interruption by
the user and must respond to it (as with any call to `interrupt()` on a `Thread`). A `Traversal` will typically respond
to such requests as do most commands, including `:remote` operations.
[source,text]
----
gremlin> java.util.stream.IntStream.range(0, 1000).iterator()
==>0
==>1
==>2
==>3
==>4
...
==>348
==>349
==>350
==>351
==>352
Execution interrupted by ctrl+c
gremlin>
----
[[console-preferences]]
=== Console Preferences
Preferences are set with `:set name value`. Values can contain spaces when quoted. All preferences are reset by `:purge preferences`
[width="100%",cols="3,^2,10",options="header"]
|=========================================================
|Preference |Type |Description
|max-iteration | int | Controls the maximum number of results that the Console will display. Default: 100 results.
|colors | bool | Enable ANSI color rendering. Default: true
|warnings | bool | Enable display of remote execution warnings. Default: true
|gremlin.color | colors | Color of the ASCII art gremlin on startup.
|info.color | colors | Color of "info" type messages.
|error.color | colors | Color of "error" type messages.
|vertex.color | colors | Color of vertices results.
|edge.color | colors | Color of edges in results.
|string.color | colors | Colors of strings in results.
|number.color | colors | Color of numbers in results.
|T.color | colors| Color of Tokens in results.
|input.prompt.color | colors | Color of the input prompt.
|result.prompt.color | colors | Color of the result prompt.
|input.prompt | string | Text of the input prompt.
|result.prompt | string | Text of the result prompt.
|result.indicator.null | string | Text of the void/no results indicator - setting to empty string (i.e. "" at the
command line) will print no result line in these cases.
|=========================================================
Colors can contain a comma-separated combination of 1 each of foreground, background, and attribute.
[width="100%",cols="3,^2,10",options="header"]
|=========================================================
|Foreground |Background |Attributes
|black|bg_black|bold
|blue|bg_blue|faint
|cyan|bg_cyan|underline
|green|bg_green|
|magenta|bg_magenta|
|red|bg_red|
|white|bg_white|
|yellow|bg_yellow|
|=========================================================
Example:
[source,text]
----
:set gremlin.color bg_black,green,bold
----
=== Dependencies and Plugin Usage
The Gremlin Console can dynamically load external code libraries and make them available to the user. Furthermore,
those dependencies may contain Gremlin plugins which can expand the language, provide useful functions, etc. These
important console features are managed by the `:install` and `:plugin` commands.
The following Gremlin Console session demonstrates the basics of these features:
[source,groovy]
----
gremlin> :plugin list <1>
==>tinkerpop.server[active]
==>tinkerpop.gephi
==>tinkerpop.utilities[active]
==>tinkerpop.sugar
==>tinkerpop.tinkergraph[active]
gremlin> :plugin use tinkerpop.sugar <2>
==>tinkerpop.sugar activated
gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z <3>
==>loaded: [org.apache.tinkerpop, neo4j-gremlin, x.y.z]
gremlin> :plugin list <4>
==>tinkerpop.server[active]
==>tinkerpop.gephi
==>tinkerpop.utilities[active]
==>tinkerpop.sugar
==>tinkerpop.tinkergraph[active]
==>tinkerpop.neo4j
gremlin> :plugin use tinkerpop.neo4j <5>
==>tinkerpop.neo4j activated
gremlin> :plugin list <6>
==>tinkerpop.server[active]
==>tinkerpop.gephi
==>tinkerpop.sugar[active]
==>tinkerpop.utilities[active]
==>tinkerpop.neo4j[active]
==>tinkerpop.tinkergraph[active]
----
<1> Show a list of "available" plugins. The list of "available" plugins is determined by the classes available on
the Console classpath. Plugins need to be "active" for their features to be available.
<2> To make a plugin "active" execute the `:plugin use` command and specify the name of the plugin to enable.
<3> Sometimes there are external dependencies that would be useful within the Console. To bring those in, execute
`:install` and specify the Maven coordinates for the dependency.
<4> Note that there is a "tinkerpop.neo4j" plugin available, but it is not yet "active".
<5> Again, to use the "tinkerpop.neo4j" plugin, it must be made "active" with `:plugin use`.
<6> Now when the plugin list is displayed, the "tinkerpop.neo4j" plugin is displayed as "active".
WARNING: Plugins must be compatible with the version of the Gremlin Console (or Gremlin Server) being used. Attempts
to use incompatible versions cannot be guaranteed to work. Moreover, be prepared for dependency conflicts in
third-party plugins that may only be resolved via manual jar removal from the `ext/{plugin}` directory.
TIP: It is possible to manage plugin activation and deactivation by manually editing the `ext/plugins.txt` file which
contains the class names of the "active" plugins. It is also possible to clear dependencies added by `:install` by
deleting them from the `ext` directory.
[[execution-mode]]
=== Execution Mode
For automated tasks and batch executions of Gremlin, it can be useful to execute Gremlin scripts in "execution" mode
from the command line. Consider the following file named `gremlin.groovy`:
[source,groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
g.V().each { println it }
----
This script creates the toy graph and then iterates through all its vertices printing each to the system out. To
execute this script from the command line, `gremlin.sh` has the `-e` option used as follows:
[source,bash]
----
$ bin/gremlin.sh -e gremlin.groovy
v[1]
v[2]
v[3]
v[4]
v[5]
v[6]
----
It is also possible to pass arguments to scripts. Any parameters following the file name specification are treated
as arguments to the script. They are collected into a list and passed in as a variable called "args". The following
Gremlin script is exactly like the previous one, but it makes use of the "args" option to filter the vertices printed
to system out:
[source,groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
g.V().has('name',args[0]).each { println it }
----
When executed from the command line a parameter can be supplied:
[source,bash]
----
$ bin/gremlin.sh -e gremlin.groovy marko
v[1]
$ bin/gremlin.sh -e gremlin.groovy vadas
v[2]
----
It is also possible to pass multiple scripts by specifying multiple `-e` options. The scripts will execute in the order
in which they are specified. Note that only the arguments from the last script executed will be preserved in the console.
Finally, if the arguments conflict with the reserved flags to which `gremlin.sh` responds, double quotes can be used to
wrap all the arguments to the option:
[source,bash]
----
$ bin/gremlin.sh -e "gremlin.groovy -e -i --color"
----
[[interactive-mode]]
=== Interactive Mode
The Gremlin Console can be started in an "interactive" mode. Interactive mode is like <<execution-mode, execution mode>>
but the console will not exit at the completion of the script, even if the script completes unsuccessfully. In such a
case, it will simply stop processing on the line of the script that failed. In this way, the state of the console
is such that a user could examine the state of things up to the point of failure, which might make the script easier to
debug.
In addition to debugging, interactive mode is a helpful way for users to initialize their console environment to
avoid otherwise repetitive typing. For example, a user who spends a lot of time working with the TinkerPop "modern"
graph might create a script called `init.groovy` like:
[source,groovy]
----
graph = TinkerFactory.createModern()
g = graph.traversal()
----
and then start Gremlin Console as follows:
[source,text]
----
$ bin/gremlin.sh -i init.groovy
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> g.V()
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
----
Note that the user can now reference `g` (and `graph` for that matter) at startup without having to directly type that
variable initialization code into the console.
As in execution mode, it is also possible to pass multiple scripts by specifying multiple `-i` options. See the
<<execution-mode, Execution Mode Section>> for more information on the specifics of that capability.
[[gremlin-console-docker-image]]
=== Docker Image
The Gremlin Console can also be started as a link:https://hub.docker.com/r/tinkerpop/gremlin-console/[Docker image]:
[source,text]
----
$ docker run -it tinkerpop/gremlin-console:x.y.z
Feb 25, 2018 3:47:24 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
----
The Docker image offers the same options as the standalone Console. It can be used for example to execute scripts:
[source,bash]
----
$ docker run -it tinkerpop/gremlin-console:x.y.z -e gremlin.groovy
v[1]
v[2]
v[3]
v[4]
v[5]
v[6]
----
[[gremlin-server]]
== Gremlin Server
image:gremlin-server.png[width=400,float=right] Gremlin Server provides a way to remotely execute Gremlin against one
or more `Graph` instances hosted within it. The benefits of using Gremlin Server include:
* Allows any Gremlin Structure-enabled graph (i.e. implements the `Graph` API on the JVM) to exist as a standalone
server, which in turn enables the ability for multiple clients to communicate with the same graph database.
* Enables execution of ad hoc queries through remotely submitted Gremlin.
* Provides a method for non-JVM languages which may not have a Gremlin Traversal Machine (e.g. Python, Javascript, etc.)
to communicate with the TinkerPop stack on the JVM.
* Exposes numerous methods for extension and customization to include serialization options, remote commands, etc.
NOTE: Gremlin Server is the replacement for link:https://github.com/tinkerpop/rexster[Rexster].
NOTE: Please see the link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/[Provider Documentation] for information
on how to develop a driver for Gremlin Server.
By default, communication with Gremlin Server occurs over link:http://en.wikipedia.org/wiki/WebSocket[WebSocket] and
exposes a custom sub-protocol for interacting with the server.
WARNING: Gremlin Server allows for the execution of remotely submitted "scripts" (i.e. arbitrary code sent by a client
to the server). Developers should consider the security implications involved in running Gremlin Server without the
appropriate precautions. Please review the <<security,Security Section>> and more specifically, the
<<script-execution,Script Execution Section>> for more information.
[[starting-gremlin-server]]
=== Starting Gremlin Server
Gremlin Server comes packaged with a script called `bin/gremlin-server.sh` to get it started (use `gremlin-server.bat`
on Windows):
[source,text]
----
$ bin/gremlin-server.sh conf/gremlin-server-modern.yaml
[INFO] GremlinServer
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-modern.yaml
[INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
[INFO] DefaultGraphManager - Graph [graph] was successfully configured via [conf/tinkergraph-empty.properties].
[INFO] ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
[INFO] ServerGremlinExecutor - Initialized gremlin-groovy GremlinScriptEngine and registered metrics
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] OpLoader - Adding the standard OpProcessor.
[INFO] OpLoader - Adding the session OpProcessor.
[INFO] OpLoader - Adding the traversal OpProcessor.
[INFO] TraversalOpProcessor - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
[INFO] GremlinServer - Executing start up LifeCycleHook
[INFO] Logger$info - Loading 'modern' graph data.
[INFO] GremlinServer - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[INFO] GremlinServer - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[WARN] AbstractChannelizer - The org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0 serialization class is deprecated.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
[WARN] AbstractChannelizer - The org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0 serialization class is deprecated.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0 with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0-stringd with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
Gremlin Server is configured by the provided link:http://www.yaml.org/[YAML] file `conf/gremlin-server-modern.yaml`.
That file tells Gremlin Server many things such as:
* The host and port to serve on
* Thread pool sizes
* Where to report metrics gathered by the server
* The serializers to make available
* The Gremlin `ScriptEngine` instances to expose and external dependencies to inject into them
* `Graph` instances to expose
The log messages that printed above show a number of things, but most importantly, there is a `Graph` instance named
`graph` that is exposed in Gremlin Server. This graph is an in-memory TinkerGraph and was empty at the start of the
server. An initialization script at `scripts/generate-modern.groovy` was executed during startup. Its contents are
as follows:
[source,groovy]
----
include::{basedir}/gremlin-server/scripts/generate-modern.groovy[]
----
The script above initializes a `Map` and assigns two key/values to it. The first, assigned to "hook", defines a
`LifeCycleHook` for Gremlin Server. The "hook" provides a way to tie script code into the Gremlin Server startup and
shutdown sequences. The `LifeCycleHook` has two methods that can be implemented: `onStartUp` and `onShutDown`.
These events are called once at Gremlin Server start and once at Gremlin Server stop. This is an important point
because code outside of the "hook" is executed for each `ScriptEngine` creation (multiple may be created when
"sessions" are enabled) and therefore the `LifeCycleHook` provides a way to ensure that a script is only executed a
single time. In this case, the startup hook loads the "modern" graph into the empty TinkerGraph instance, preparing
it for use. The second key/value pair assigned to the `Map`, named "g", defines a `TraversalSource` from the `Graph`
bound to the "graph" variable in the YAML configuration file. This variable `g`, as well as any other variable
assigned to the `Map`, will be made available as variables for future remote script executions. In more general
terms, any key/value pairs assigned to a `Map` returned from the initialization script will become variables that
are global to all requests. In addition, any functions that are defined will be cached for future use.
WARNING: Transactions on graphs in initialization scripts are not closed automatically after the script finishes
executing. It is up to the script to properly commit or rollback transactions in the script itself.
[[connecting-via-drivers]]
=== Connecting via Drivers
image:rexster-connect.png[width=180,float=right] TinkerPop offers client-side drivers for the Gremlin Server websocket
sub-protocol in a variety of languages:
* <<gremlin-dotnet,C#>>
* <<gremlin-java,Java>>
* <<gremlin-javascript,Javascript>>
* <<gremlin-python,Python>>
These drivers provide methods to send Gremlin based requests and get back traversal results as a response. The requests
may be script-based or bytecode-based. As discussed earlier in the <<connecting-gremlin-server,introduction>> the
recommendation is to use bytecode-based requests. The difference between sending scripts and sending bytecode are
demonstrated below in some basic examples:
[source,java,tab]
----
// script
Cluster cluster = Cluster.open();
Client client = cluster.connect();
Map<String,Object> params = new HashMap<>();
params.put("name","marko");
List<Result> list = client.submit("g.V().has('person','name',name).out('knows')", params).all().get();
// bytecode
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"));
List<Vertex> list = g.V().has("person","name","marko").out("knows").toList();
----
[source,groovy]
----
// script
def cluster = Cluster.open()
def client = cluster.connect()
def list = client.submit("g.V().has('person','name',name).out('knows')", [name: "marko"]).all().get();
// bytecode
def g = traversal().withRemote(DriverRemoteConnection.using("localhost",8182,"g"))
def list = g.V().has('person','name','marko').out('knows').toList()
----
[source,csharp]
----
include::../../../gremlin-dotnet/test/Gremlin.Net.IntegrationTest/Docs/Reference/GremlinApplicationsTests.cs[tags=connectingViaDrivers]
----
[source,javascript]
----
// script
const client = new Client('ws://localhost:45940/gremlin', { traversalSource: "g" });
const conn = client.open();
const list = conn.submit("g.V().has('person','name',name).out('knows')",{name: 'marko'}).then(function (response) { ... });
// bytecode
const g = gtraversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin'));
const list = g.V().has("person","name","marko").out("knows").toList();
----
[source,python]
----
# script
client = Client('ws://localhost:8182/gremlin', 'g')
list = client.submit("g.V().has('person','name',name).out('knows')",{'name': 'marko'}).all()
# bytecode
g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
list = g.V().has("person","name","marko").out("knows").toList()
----
The advantage of bytecode over scripts should be apparent from the above examples. Scripts are just strings that are
embedded in code (in the above examples, the strings are Groovy-based) whereas bytecode based requests are themselves
code written in the native language of use. Obviously, the advantage of the Gremlin being actual code is that there
are checks (e.g. compile-time, auto-complete and other IDE support, language level checks, etc.) that help validate the
Gremlin during the development process.
TinkerPop makes an effort to ensure a high-level of consistency among the drivers and their features, but there are
differences in capabilities and features as they are each developed independently. The Java driver was the first and
is therefore the most advanced. Please see the related documentation for the driver of interest for more information
and details in the <<gremlin-drivers-variants,Gremlin Drivers and Variants>> Section of this documentation.
[[connecting-via-console]]
=== Connecting via Console
With Gremlin Server running it is now possible to issue some scripts to it for processing. Start Gremlin Console as
follows:
[source,text]
----
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
gremlin>
----
The console has the notion of a "remote", which represents a place a script will be sent from the console to be
evaluated elsewhere in some other context (e.g. Gremlin Server, Hadoop, etc.). To create a remote in the console,
do the following:
[gremlin-groovy]
----
:remote connect tinkerpop.server conf/remote.yaml
----
The `:remote` command shown above displays the current status of the remote connection. This command can also be
used to configure a new connection and change other related settings. To actually send a script to the server a
different command is required:
[gremlin-groovy]
----
:> g.V().values('name')
:> g.V().has('name','marko').out('created').values('name')
:> g.E().label().groupCount()
result
:remote close
----
The `:>` command, which is a shorthand for `:submit`, sends the script to the server to execute there. Results are
wrapped in an `Result` object which is a just a holder for each individual result. The `class` shows the data type
for the containing value. Note that the last script sent was supposed to return a `Map`, but its `class` is
`java.lang.String`. By default, the connection is configured to only return text results. In other words,
Gremlin Server is using `toString` to serialize all results back to the console. This enables virtually any
object on the server to be returned to the console, but it doesn't allow the opportunity to work with this data
in any way in the console itself. A different configuration of the `:remote` is required to get the results back
as "objects":
[gremlin-groovy]
----
:remote connect tinkerpop.server conf/remote-objects.yaml <1>
:remote list <2>
:> g.E().label().groupCount() <3>
m = result[0].object <4>
m.sort {it.value}
script = """
g.V().hasLabel('person').
out('knows').
out('created').
group().
by('name')
"""
:> @script <5>
:remote close
----
<1> This configuration file specifies that results should be deserialized back into an `Object` in the console with
the caveat being that the server and console both know how to serialize and deserialize the result to be returned.
<2> There are now two configured remote connections. The one marked by an asterisk is the one that was just created
and denotes the current one that `:submit` will react to.
<3> When the script is executed again, the `class` is no longer shown to be a `java.lang.String`. It is instead a `java.util.HashMap`.
<4> The last result of a remote script is always stored in the reserved variable `result`, which allows access to
the `Result` and by virtue of that, the `Map` itself.
<5> If the submission requires multiple-lines to express, then a multi-line string can be created. The `:>` command
realizes that the user is referencing a variable via `@` and submits the string script.
TIP: In Groovy, `""" text """` is a convenient way to create a multi-line string and works well in concert with
`:> @variable`. Note that this model of submitting a string variable works for all `:>` based plugins, not just Gremlin Server.
WARNING: Not all values that can be returned from a Gremlin script end up being serializable. For example,
submitting `:> graph` will return a `Graph` instance and in most cases those are not serializable by Gremlin Server
and will return a serialization error. It should be noted that `TinkerGraph`, as a convenience for shipping around
small sub-graphs, is serializable from Gremlin Server.
The alternative syntax to connecting allows for the `Cluster` to be user constructed directly in the console as
opposed to simply providing a static YAML file.
[gremlin-groovy]
----
cluster = Cluster.open()
:remote connect tinkerpop.server cluster
----
The Gremlin Server `:remote config` command for the driver has the following configuration options:
[width="100%",cols="3,10a",options="header"]
|=========================================================
|Command |Description
|alias |
[width="100%",cols="3,10",options="header"]
!=========================================================
!Option !Description
! _pairs_ !A set of key/value alias/binding pairs to apply to requests.
!`reset` !Clears any aliases that were supplied in previous configurations of the remote.
!`show` !Shows the current set of aliases which is returned as a `Map`
!=========================================================
|timeout |Specifies the length of time in milliseconds the Console will wait for a response from the server. Specify
"none" to have no timeout. By default, this setting uses "none".
|=========================================================
[[console-aliases]]
==== Aliases
The `alias` configuration command for the Gremlin Server `:remote` can be useful in situations where there are
multiple `Graph` or `TraversalSource` instances on the server, as it becomes possible to rename them from the client
for purposes of execution within the context of a script. Therefore, it becomes possible to submit commands this way:
[gremlin-groovy]
----
:remote connect tinkerpop.server conf/remote-objects.yaml
:remote config alias x g
:> x.E().label().groupCount()
:remote close
----
[[console-sessions]]
==== Sessions
A `:remote` created in the following fashion will be "sessionless", meaning each script issued to the server with
`:>` will be encased in a transaction and no state will be maintained from one request to the next.
[source,groovy]
----
gremlin> :remote connect tinkerpop.server conf/remote-objects.yaml
==>Configured localhost/127.0.0.1:8182
----
In other words, the transaction will be automatically committed (or rolledback on error) and any variables declared
in that script will be forgotten for the next request. See the section on <<sessions, "Considering Sessions">>
for more information on that topic.
To enable the remote to connect with a session the `connect` argument takes another argument as follows:
[gremlin-groovy]
----
:remote connect tinkerpop.server conf/remote.yaml session
:> x = 1
:> y = 2
:> x + y
:remote close
----
With the above command a session gets created with a random UUID for a session identifier. It is also possible to
assign a custom session identifier by adding it as the last argument to `:remote` command above. There is also the
option to replace "session" with "session-managed" to create a session that will auto-manage transactions (i.e. each
request will occur within the bounds of a transaction). In this way, the state of bound variables between requests are
maintained, but the need to manually managed the transactional scope of the graph is no longer required.
[[console-remote-console]]
==== Remote Console
Previous examples have shown usage of the `:>` command to send scripts to Gremlin Server. The Gremlin Console also
supports an additional method for doing this which can be more convenient when the intention is to exclusively
work with a remote connection to the server.
[gremlin-groovy]
----
:remote connect tinkerpop.server conf/remote.yaml session
:remote console
x = 1
y = 2
x + y
:remote console
:remote close
----
In the above example, the `:remote console` command is executed. It places the console in a state where the `:>` is
no longer required. Each script line is actually automatically submitted to Gremlin Server for evaluation. The
variables `x` and `y` that were defined actually don't exist locally - they only exist on the server! In this sense,
putting the console in this mode is basically like creating a window to a session on Gremlin Server.
TIP: When using `:remote console` there is not much point to using a configuration that uses a serializer that returns
actual data. In other words, using a configuration like the one inside of `conf/remote-objects.yaml` isn't typically
useful as in this mode the result will only ever be displayed but not used. Using a serializer configuration like
the one in `conf/remote.yaml` should perform better.
NOTE: Console commands, those that begin with a colon (e.g. `:x`, `:remote`) do not execute remotely when in this mode.
They are all still evaluated locally.
=== Connecting via HTTP
image:gremlin-rexster.png[width=225,float=left] While the default behavior for Gremlin Server is to provide a
WebSocket-based connection, it can also be configured to support plain HTTP web service.
The HTTP endpoint provides for a communication protocol familiar to most developers, with a wide support of
programming languages, tools and libraries for accessing it. As a result, HTTP provides a fast way to get started
with Gremlin Server. It also may represent an easier upgrade path from link:https://github.com/tinkerpop/rexster[Rexster]
as the API for the endpoint is very similar to Rexster's link:https://github.com/tinkerpop/rexster/wiki/Gremlin-Extension[Gremlin Extension].
IMPORTANT: TinkerPop provides and supports this HTTP endpoint as a convenience and for legacy reasons, but users should
prefer the recommended approach of bytcode based requests as described in <<connecting-gremlin,Connecting Gremlin>>
section.
Gremlin Server provides for a single HTTP endpoint - a Gremlin evaluator - which allows the submission of a Gremlin
script as a request. For each request, it returns a response containing the serialized results of that script.
To enable this endpoint, Gremlin Server needs to be configured with the `HttpChannelizer`, which replaces the default.
The `WsAndHttpChannelizer` may also be configured to enable both WebSockets and the REST endpoint in the configuration
file:
[source,yaml]
channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer
[source,yaml]
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
The `HttpChannelizer` is already configured in the `gremlin-server-rest-modern.yaml` file that is packaged with the Gremlin
Server distribution. To utilize it, start Gremlin Server as follows:
[source,text]
bin/gremlin-server.sh conf/gremlin-server-rest-modern.yaml
Once the server has started, issue a request. Here's an example with link:http://curl.haxx.se/[cURL]:
[source,text]
$ curl "http://localhost:8182?gremlin=100-1"
which returns:
[source,js]
{
"result":{"data":99,"meta":{}},
"requestId":"0581cdba-b152-45c4-80fa-3d36a6eecf1c",
"status":{"code":200,"attributes":{},"message":""}
}
The above example showed a `GET` operation, but the preferred method for this endpoint is `POST`:
[source,text]
curl -X POST -d "{\"gremlin\":\"100-1\"}" "http://localhost:8182"
which returns:
[source,js]
{
"result":{"data":99,"meta":{}},
"requestId":"ef2fe16c-441d-4e13-9ddb-3c7b5dfb10ba",
"status":{"code":200,"attributes":{},"message":""}
}
It is also preferred that Gremlin scripts be parameterized when possible via `bindings`:
[source,text]
curl -X POST -d "{\"gremlin\":\"100-x\", \"bindings\":{\"x\":1}}" "http://localhost:8182"
The `bindings` argument is a `Map` of variables where the keys become available as variables in the Gremlin script.
Note that parameterization of requests is critical to performance, as repeated script compilation can be avoided on
each request.
NOTE: It is possible to pass bindings via `GET` based requests. Query string arguments prefixed with "bindings." will
be treated as parameters, where that prefix will be removed and the value following the period will become the
parameter name. In other words, `bindings.x` will create a parameter named "x" that can be referenced in the submitted
Gremlin script. The caveat is that these arguments will always be treated as `String` values. To ensure that data
types are preserved or to pass complex objects such as lists or maps, use `POST` which will at least support the
allowed JSON data types.
Finally, as Gremlin Server can host multiple `ScriptEngine` instances (e.g. `gremlin-groovy`, `nashorn`), it is
possible to define the language to utilize to process the request:
[source,text]
curl -X POST -d "{\"gremlin\":\"100-x\", \"language\":\"gremlin-groovy\", \"bindings\":{\"x\":1}}" "http://localhost:8182"
By default this value is set to `gremlin-groovy`. If using a `GET` operation, this value can be set as a query
string argument with by setting the `language` key.
WARNING: Consider the size of the result of a submitted script being returned from the HTTP endpoint. A script
that iterates thousands of results will serialize each of those in memory into a single JSON result set. It is
quite possible that such a script will generate `OutOfMemoryError` exceptions on the server. Consider the default
WebSocket configuration, which supports streaming, if that type of use case is required.
=== Configuring
The `gremlin-server.sh` file serves multiple purposes. It can be used to "install" dependencies to the Gremlin
Server path. For example, to be able to configure and use other `Graph` implementations, the dependencies must be
made available to Gremlin Server. To do this, use the `install` switch and supply the Maven coordinates for the
dependency to "install". For example, to use Neo4j in Gremlin Server:
[source,text]
----
bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin x.y.z
----
This command will "grab" the appropriate dependencies and copy them to the `ext` directory of Gremlin Server, which
will then allow them to be "used" the next time the server is started. To uninstall dependencies, simply delete them
from the `ext` directory.
`bin/gremlin-server.sh` has several other options.
[width="100%",cols="3,10",options="header"]
|=========================================================
|Parameter|Description
|start|Start the server in the background.
|stop|Shutdown the server.
|restart|Shutdown a running server then start it again.
|status|Check if the server is running.
|console|Start the server in the foreground. Use ^C to kill it.
|install <group> <artifact> <version>| Install dependencies into the server. "-i" exists for backwards compatibility but is deprecated.
|<conf file>| Start the server in the foreground using the provided YAML config file.
|=========================================================
The `bin/gremlin-server.sh` script can be customized with environment variables in `bin/gremlin-server.conf`.
[width="100%",cols="3,10",options="header"]
|=========================================================
|Variable |Description
|DEBUG| Enable debugging of the startup script
|GREMLIN_HOME| The Gremlin Server install directory. Use this if the script has trouble finding itself.
|GREMLIN_YAML| The default server YAML file (conf/gremlin-server.yaml)
|LOG_DIR| Location of gremlin.log where stdout/stderr are captured (logs/)
|PID_DIR| Location of gremlin.pid
|RUNAS| User to run the server as
|JAVA_HOME| Java install location. Will use $JAVA_HOME/bin/java
|JAVA_OPTIONS| Options passed to the JVM
|=========================================================
As mentioned earlier, Gremlin Server is configured though a YAML file. By default, Gremlin Server will look for a
file called `conf/gremlin-server.yaml` to configure itself on startup. To override this default, set GREMLIN_YAML in
`bin/gemlin-server.conf` or supply the file to use to `bin/gremlin-server.sh` as in:
[source,text]
----
bin/gremlin-server.sh conf/gremlin-server-min.yaml
----
WARNING: On Windows, gremlin-server.bat will always start in the foreground. When no parameter is provided, it will
start with the default `conf/gremlin-server.yaml` file.
The following table describes the various YAML configuration options that Gremlin Server expects:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|authentication.authenticator |The fully qualified classname of an `Authenticator` implementation to use. If this setting is not present, then authentication is effectively disabled. |`AllowAllAuthenticator`
|authentication.authenticationHandler | The fully qualified classname of an `AbstractAuthenticationHandler` implementation to use. If this setting is not present, but the `authentication.authenticator` is, it will use that authenticator with the default `AbstractAuthenticationHandler` implementation for the specified `Channelizer` |_none_
|authentication.config |A `Map` of configuration settings to be passes to the `Authenticator` when it is constructed. The settings available are dependent on the implementation. |_none_
|authentication.enableAuditLog |The available authenticators can issue audit logging messages, binding the authenticated user to his remote socket address and binding requests with a gremlin query to the remote socket address. For privacy reasons, the default value of this setting is false. The audit logging messages are logged at the INFO level via the `audit.org.apache.tinkerpop.gremlin.server` logger, which can be configured using the log4j.properties file. |false
|channelizer |The fully qualified classname of the `Channelizer` implementation to use. A `Channelizer` is a "channel initializer" which Gremlin Server uses to define the type of processing pipeline to use. By allowing different `Channelizer` implementations, Gremlin Server can support different communication protocols (e.g. WebSocket). |`WebSocketChannelizer`
|graphManager |The fully qualified classname of the `GraphManager` implementation to use. A `GraphManager` is a class that adheres to the TinkerPop `GraphManager` interface, allowing custom implementations for storing and managing graph references, as well as defining custom methods to open and close graphs instantiations. It is important to note that the TinkerPop HTTP and WebSocketChannelizers auto-commit and auto-rollback based on the graphs stored in the graphManager upon script execution completion. |`DefaultGraphManager`
|graphs |A `Map` of `Graph` configuration files where the key of the `Map` becomes the name to which the `Graph` will be bound and the value is the file name of a `Graph` configuration file. |_none_
|gremlinPool |The number of "Gremlin" threads available to execute actual scripts in a `ScriptEngine`. This pool represents the workers available to handle blocking operations in Gremlin Server. When set to `0`, Gremlin Server will use the value provided by `Runtime.availableProcessors()`. |0
|host |The name of the host to bind the server to. |localhost
|idleConnectionTimeout |Time in milliseconds that the server will allow a channel to not receive requests from a client before it automatically closes. If enabled, the value provided should typically exceed the amount of time given to `keepAliveInterval`. Note that while this value is to be provided as milliseconds it will resolve to second precision. Set this value to `0` to disable this feature. |0
|keepAliveInterval |Time in milliseconds that the server will allow a channel to not send responses to a client before it sends a "ping" to see if it is still present. If it is present, the client should respond with a "pong" which will thus reset the `idleConnectionTimeout` and keep the channel open. If enabled, this number should be smaller than the value provided to the `idleConnectionTimeout`. Note that while this value is to be provided as milliseconds it will resolve to second precision. Set this value to `0` to disable this feature. |0
|maxAccumulationBufferComponents |Maximum number of request components that can be aggregated for a message. |1024
|maxChunkSize |The maximum length of the content or each chunk. If the content length exceeds this value, the transfer encoding of the decoded request will be converted to 'chunked' and the content will be split into multiple `HttpContent` objects. If the transfer encoding of the HTTP request is 'chunked' already, each chunk will be split into smaller chunks if the length of the chunk exceeds this value. |8192
|maxContentLength |The maximum length of the aggregated content for a message. Works in concert with `maxChunkSize` where chunked requests are accumulated back into a single message. A request exceeding this size will return a `413 - Request Entity Too Large` status code. A response exceeding this size will raise an internal exception. |65536
|maxHeaderSize |The maximum length of all headers. |8192
|maxInitialLineLength |The maximum length of the initial line (e.g. "GET / HTTP/1.0") processed in a request, which essentially controls the maximum length of the submitted URI. |4096
|metrics.consoleReporter.enabled |Turns on console reporting of metrics. |false
|metrics.consoleReporter.interval |Time in milliseconds between reports of metrics to console. |180000
|metrics.csvReporter.enabled |Turns on CSV reporting of metrics. |false
|metrics.csvReporter.fileName |The file to write metrics to. |_none_
|metrics.csvReporter.interval |Time in milliseconds between reports of metrics to file. |180000
|metrics.gangliaReporter.addressingMode |Set to `MULTICAST` or `UNICAST`. |_none_
|metrics.gangliaReporter.enabled |Turns on Ganglia reporting of metrics. Additional link:https://tinkerpop.apache.org/docs/x.y.z/reference/#metrics[setup] is required. |false
|metrics.gangliaReporter.host |Define the Ganglia host to report Metrics to. |localhost
|metrics.gangliaReporter.interval |Time in milliseconds between reports of metrics for Ganglia. |180000
|metrics.gangliaReporter.port |Define the Ganglia port to report Metrics to. |8649
|metrics.graphiteReporter.enabled |Turns on Graphite reporting of metrics. Additional link:https://tinkerpop.apache.org/docs/x.y.z/reference/#metrics[setup] is required. |false
|metrics.graphiteReporter.host |Define the Graphite host to report Metrics to. |localhost
|metrics.graphiteReporter.interval |Time in milliseconds between reports of metrics for Graphite. |180000
|metrics.graphiteReporter.port |Define the Graphite port to report Metrics to. |2003
|metrics.graphiteReporter.prefix |Define a "prefix" to append to metrics keys reported to Graphite. |_none_
|metrics.jmxReporter.enabled |Turns on JMX reporting of metrics. |false
|metrics.slf4jReporter.enabled |Turns on SLF4j reporting of metrics. |false
|metrics.slf4jReporter.interval |Time in milliseconds between reports of metrics to SLF4j. |180000
|port |The port to bind the server to. |8182
|processors |A `List` of `Map` settings, where each `Map` represents a `OpProcessor` implementation to use along with its configuration. |_none_
|processors[X].className |The full class name of the `OpProcessor` implementation. |_none_
|processors[X].config |A `Map` containing `OpProcessor` specific configurations. |_none_
|resultIterationBatchSize |Defines the size in which the result of a request is "batched" back to the client. In other words, if set to `1`, then a result that had ten items in it would get each result sent back individually. If set to `2` the same ten results would come back in five batches of two each. |64
|scriptEngines |A `Map` of `ScriptEngine` implementations to expose through Gremlin Server, where the key is the name given by the `ScriptEngine` implementation. The key must match the name exactly for the `ScriptEngine` to be constructed. The value paired with this key is itself a `Map` of configuration for that `ScriptEngine`. If this value is not set, it will default to "gremlin-groovy". |_gremlin-groovy_
|scriptEngines.<name>.imports |A comma separated list of classes/packages to make available to the `ScriptEngine`. |_none_
|scriptEngines.<name>.staticImports |A comma separated list of "static" imports to make available to the `ScriptEngine`. |_none_
|scriptEngines.<name>.scripts |A comma separated list of script files to execute on `ScriptEngine` initialization. `Graph` and `TraversalSource` instance references produced from scripts will be stored globally in Gremlin Server, therefore it is possible to use initialization scripts to add Traversal Strategies or create entirely new `Graph` instances all together. Instantiating a `LifeCycleHook` in a script provides a way to execute scripts when Gremlin Server starts and stops.|_none_
|scriptEngines.<name>.config |A `Map` of configuration settings for the `ScriptEngine`. These settings are dependent on the `ScriptEngine` implementation being used. |_none_
|evaluationTimeout |The amount of time in milliseconds before a script evaluation and iteration of result times out. This feature can be turned off by setting the value to `0`. |30000
|serializers |A `List` of `Map` settings, where each `Map` represents a `MessageSerializer` implementation to use along with its configuration. If this value is not set, then Gremlin Server will configure with GraphSON and Gryo but will not register any `ioRegistries` for configured graphs. |_empty_
|serializers[X].className |The full class name of the `MessageSerializer` implementation. |_none_
|serializers[X].config |A `Map` containing `MessageSerializer` specific configurations. |_none_
|ssl.enabled |Determines if SSL is turned on or not. |false
|ssl.keyStore |The private key in JKS or PKCS#12 format. |_none_
|ssl.keyStorePassword |The password of the `keyStore` if it is password-protected. |_none_
|ssl.keyStoreType |`JKS` (Java 8 default) or `PKCS12` (Java 9+ default) |_none_
|ssl.needClientAuth | Optional. One of NONE, REQUIRE. Enables client certificate authentication at the enforcement level specified. Can be used in combination with Authenticator. |_none_
|ssl.sslCipherSuites |The list of JSSE ciphers to support for SSL connections. If specified, only the ciphers that are listed and supported will be enabled. If not specified, the JVM default is used. |_none_
|ssl.sslEnabledProtocols |The list of SSL protocols to support for SSL connections. If specified, only the protocols that are listed and supported will be enabled. If not specified, the JVM default is used. |_none_
|ssl.trustStore |Required when needClientAuth is REQUIRE. Trusted certificates for verifying the remote endpoint's certificate. If this value is not provided and SSL is enabled, the default `TrustManager` will be used. |_none_
|ssl.trustStorePassword |The password of the `trustStore` if it is password-protected |_none_
|strictTransactionManagement |Set to `true` to require `aliases` to be submitted on every requests, where the `aliases` become the scope of transaction management. |false
|threadPoolBoss |The number of threads available to Gremlin Server for accepting connections. Should always be set to `1`. |1
|threadPoolWorker |The number of threads available to Gremlin Server for processing non-blocking reads and writes. |1
|useEpollEventLoop |try to use epoll event loops (works only on Linux os) instead of netty NIO. |false
|writeBufferHighWaterMark | If the number of bytes in the network send buffer exceeds this value then the channel is no longer writeable, accepting no additional writes until buffer is drained and the `writeBufferLowWaterMark` is met. |65536
|writeBufferLowWaterMark | Once the number of bytes queued in the network send buffer exceeds the `writeBufferHighWaterMark`, the channel will not become writeable again until the buffer is drained and it drops below this value. |65536
|=========================================================
See the <<metrics,Metrics>> section for more information on how to configure Ganglia and Graphite.
[[opprocessor-configurations]]
==== OpProcessor Configurations
An `OpProcessor` provides a way to plug-in handlers to Gremlin Server's processing flow. Gremlin Server uses this
plug-in system itself to expose the packaged functionality that it exposes. Configurations can be supplied to an
`OpProcessor` through the `processors` key in the Gremlin Server configuration file. Each `OpProcessor` can take a
`Map` of arguments which are specific to a particular implementation:
[source,yaml]
----
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
----
The following sub-sections describe those configurations for each `OpProcessor` implementations supplied with Gremlin
Server.
===== SessionOpProcessor
The `SessionOpProcessor` provides a way to interact with Gremlin Server over a <<sessions,session>>.
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Name |Description |Default
|globalFunctionCacheEnabled |Determines if the script engine cache for global functions is enabled and behaves as an override to the plugin specific setting of the same name. |true
|maxParameters |Maximum number of parameters that can be passed on the request. |16
|perGraphCloseTimeout |Time in milliseconds to wait for each configured graph to close any open transactions when the session is killed. |10000
|sessionTimeout |Time in milliseconds before a session will time out. |28800000
|=========================================================
===== StandardOpProcessor
The `StandardOpProcessor` provides a way to interact with Gremlin Server without use of sessions and is the default
method for processing script evaluation requests.
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Name |Description |Default
|maxParameters |Maximum number of parameters that can be passed on the request. |16
|=========================================================
[[traversalopprocessor]]
===== TraversalOpProcessor
The `TraversalOpProcessor` provides a way to accept traversals configured via <<connecting-via-drivers,withRemote()>>.
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Name |Description |Default
|cacheExpirationTime |Time in milliseconds before side-effects from a `Traversal` will be evicted. |60000
|cacheMaxSize |The maximum number of entries in the side-effect cache. |1000
|=========================================================
If there is no intention to gather side-effects from traversals, the `cacheMaxSize` can be set to zero to disable the
cache.
[[security]]
==== Security
image:gremlin-server-secure.png[width=175,float=right] Gremlin Server provides for several features that aid in the
security of the graphs that it exposes. In particular it supports SSL for transport layer security, protective
measures against malicious script execution, and authentication. Client SSL options are described in the
<<gremlin-drivers-variants, Gremlin Drivers and Variants">> sections with varying capability depending on the driver
chosen. Script execution options are covered <<script-execution, "at the end of this section">>. This section
starts with authentication.
Gremlin Server supports a pluggable authentication framework using
link:https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer[SASL] (Simple Authentication and
Security Layer). Depending on the client used to connect to Gremlin Server, different authentication
mechanisms are accessible, see the table below.
[width="70%",cols="3,5,3",options="header"]
|=========================================================
|Client |Authentication mechanism |Availability
|HTTP |BASIC |3.0.0-incubating
1.3+v|Gremlin-Java/
Gremlin-Console |PLAIN SASL (username/password) |3.0.0-incubating
|Pluggable SASL |3.0.0-incubating
|GSSAPI SASL (Kerberos) |3.3.0
|Gremlin.NET |PLAIN SASL |3.3.0
|Gremlin-Python |PLAIN SASL |3.2.2
|Gremlin.Net |PLAIN SASL |3.2.7
|Gremlin-Javascript |PLAIN SASL |3.3.0
|=========================================================
By default, Gremlin Server is configured to allow all requests to be processed (i.e. no authentication). To enable
authentication, Gremlin Server must be configured with an `Authenticator` implementation in its YAML file. Gremlin
Server comes packaged with two implementations called `SimpleAuthenticator` for plain text authentication using HTTP
BASIC or PLAIN SASL and `Krb5Authenticator` for Kerberos authentication using GSSAPI SASL.
===== Plain text authentication
The `SimpleAuthenticator` implements the "PLAIN" SASL mechanism (i.e. plain text) to authenticate a request. It also
supports handling basic authentication requests from http clients. It validates
username/password pairs against a graph database, which must be provided to it as part of the configuration.
[source,yaml]
authentication: {
authenticator: org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator,
config: {
credentialsDb: conf/tinkergraph-credentials.properties}}
A quick way to get started with the `SimpleAuthenticator` is to use TinkerGraph for the "credentials graph" and the
"sample" credential graph that is packaged with the server. To secure the transport for the credentials,
SSL should be enabled. For this Quick Start, a self-signed certificate will be created but this should not
be used in a production environment.
Generate the self-signed SSL certificate:
[source,text]
----
$ keytool -genkey -alias localhost -keyalg RSA -keystore server.jks
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]: localhost
What is the name of your organizational unit?
[Unknown]:
What is the name of your organization?
[Unknown]:
What is the name of your City or Locality?
[Unknown]:
What is the name of your State or Province?
[Unknown]:
What is the two-letter country code for this unit?
[Unknown]:
Is CN=localhost, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
[no]: yes
Enter key password for <localhost>
(RETURN if same as keystore password):
----
Next, uncomment the `keyStore` and `keyStorePassword` lines in `conf/gremlin-server-secure.yaml`.
[source,yaml]
----
ssl: {
enabled: true,
sslEnabledProtocols: [TLSv1.2],
keyStore: server.jks,
keyStorePassword: changeit
}
----
[source,text]
----
$ bin/gremlin-server.sh conf/gremlin-server-secure.yaml
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-secure.yaml
...
[INFO] AbstractChannelizer - SSL enabled
[INFO] SimpleAuthenticator - Initializing authentication with the org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator
[INFO] SimpleAuthenticator - CredentialGraph initialized at CredentialGraph{graph=tinkergraph[vertices:1 edges:0]}
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
When SSL is enabled on the server, it must also be enabled on the client when connecting. To connect to
Gremlin Server with the <<gremlin-java,`gremlin-driver`>>, set the `credentials`, `enableSsl`, and `trustStore`
when constructing the `Cluster`.
[source,java]
Cluster cluster = Cluster.build().credentials("stephen", "password")
.enableSsl(true).trustStore("server.jks").create();
If connecting with Gremlin Console, which utilizes `gremlin-driver` for remote script execution, use the provided
`conf/remote-secure.yaml` file when defining the remote. That file contains configuration for the username and
password as well as enablement of SSL from the client side. Be sure to configure the trustStore if using self-signed
certificates.
Similarly, Gremlin Server can be configured for REST and security. Follow the steps above for configuring the SSL
certificate.
[source,text]
----
$ bin/gremlin-server.sh conf/gremlin-server-rest-secure.yaml
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-secure.yaml
...
[INFO] AbstractChannelizer - SSL enabled
[INFO] SimpleAuthenticator - Initializing authentication with the org.apache.tinkerpop.gremlin.server.auth.SimpleAuthenticator
[INFO] SimpleAuthenticator - CredentialGraph initialized at CredentialGraph{graph=tinkergraph[vertices:1 edges:0]}
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
Once the server has started, issue a request passing the credentials with an `Authentication` header, as described in link:http://tools.ietf.org/html/rfc2617#section-2[RFC2617]. Here's a HTTP Basic authentication example with cURL:
[source,text]
curl -X POST --insecure -u stephen:password -d "{\"gremlin\":\"100-1\"}" "https://localhost:8182"
[[credentials-dsl]]
===== Credentials Graph DSL
The "credentials graph", which has been mentioned in previous sections, is used by Gremlin Server to hold the list of
users who can authenticate to the server. It is possible to use virtually any `Graph` instance for this task as long
as it complies to a defined schema. The credentials graph stores users as vertices with the `label` of "user". Each
"user" vertex has two properties: `username` and `password`. Naturally, these are both `String` values. The password
must not be stored in plain text and should be hashed.
IMPORTANT: Be sure to define an index on the `username` property, as this will be used for lookups. If supported by
the `Graph`, consider specifying a unique constraint as well.
To aid with the management of a credentials graph, Gremlin Server provides a Gremlin Console plugin which can be
used to add and remove users so as to ensure that the schema is adhered to, thus ensuring compatibility with Gremlin
Server. In addition, as it is a plugin, it works naturally in the Gremlin Console as an extension of its
capabilities (though one could use it programmatically, if desired). This plugin is distributed with the Gremlin
Console so it does not have to be "installed". It does however need to be activated:
[source,groovy]
gremlin> :plugin use tinkerpop.credentials
==>tinkerpop.credentials activated
Please see the example usage as follows:
[gremlin-groovy]
----
graph = TinkerGraph.open()
graph.createIndex("username",Vertex.class)
credentials = graph.traversal(CredentialTraversalSource.class)
credentials.user("stephen","password")
credentials.user("daniel","better-password")
credentials.user("marko","rainbow-dash")
credentials.users("marko").elementMap()
credentials.users().count()
credentials.users("daniel").drop()
credentials.users().count()
----
NOTE: The Credentials DSL is built using TinkerPop's DSL Annotation Processor described <<gremlin-java-dsl,here>>.
IMPORTANT: In the above example, an empty in-memory TinkerGraph was used for demonstrating the API of the DSL.
Obviously, this data will not be retained and usable with Gremlin Server. It would be important to configure
TinkerGraph to persist that data or to manually persist it (e.g. write the graph data to Gryo) once changes are
complete. Alternatively, use a persistent graph to hold the credentials and configure Gremlin Server accordingly.
[[krb5authenticator]]
===== Kerberos Authentication
The `Krb5Authenticator` implements the "GSSAPI" SASL mechanism (i.e. Kerberos) to authenticate a request from a Gremlin
client. It can be applied in an existing Kerberos environment and validates whether a
link:https://www.roguelynn.com/words/explain-like-im-5-kerberos/[valid authentication proof and service ticket are
offered].
[source,yaml]
authentication: {
className: org.apache.tinkerpop.gremlin.server.auth.Krb5Authenticator,
config: {
principal: gremlinserver/hostname.your.org@YOUR.REALM,
keytab: /etc/security/keytabs/gremlinserver.service.keytab}}
Krb5Authenticator needs a Kerberos service principal and a keytab that holds the secret key for that principal. The keytab
location and service name, e.g. gremlinserver, are free to be chosen, but Gremlin clients have to specify this service name
as the `protocol`. For Gremlin-Console the `protocol` is an entry in the remote.yaml file, for Gremlin-java the client builder
has a `protocol()` method.
In addition to the `protocol`, the Gremlin client needs to specify a `jaasEntry`, an entry in the
link:https://en.wikipedia.org/wiki/Java_Authentication_and_Authorization_Service[JAAS] configuration file. Gremlin-Console
comes with a sample gremlin-jaas.conf file with a `GremlinConsole` jaasEntry:
[source, jaas]
GremlinConsole {
com.sun.security.auth.module.Krb5LoginModule required
doNotPrompt=true
useTicketCache=true;
};
This configuration tells Gremlin-Console to pass authentication requests from gremlin-server to the Krb5LoginModule, which is
part of the java standard library. The Krb5LoginModule does not prompt the user for a username and password but uses the ticket cache that
is normally refreshed when a user logs in to a host within the Kerberos realm.
Finally, the Gremlin client needs the location of the JAAS configuration file to be passed as a system property to the JVM. For
Gremlin-Console the easiest way to do this is to pass it to the run script via the JAVA_OPTIONS environment property:
[source, bash]
export JAVA_OPTIONS="$JAVA_OPTIONS -Djava.security.auth.login.config=conf/gremlin-jaas.conf"
[[script-execution]]
===== Protecting Script Execution
It is important to remember that Gremlin Server exposes `GremlinScriptEngine` instances that allows for remote execution
of arbitrary code on the server. Obviously, this situation can represent a security risk or, more minimally, provide
ways for "bad" scripts to be inadvertently executed. A simple example of a "valid" Gremlin script that would cause
some problems would be, `while(true) {}`, which would consume a thread in the Gremlin pool indefinitely, thus
preventing it from serving other requests. Sending enough of these kinds of scripts would eventually consume all
available threads and Gremlin Server would stop responding.
Scripts have access to the full power of their language and the JVM on which they are running. This means that they
can access certain APIs that have nothing to do with Gremlin itself, such as `java.lang.System` or the `java.io`
and `java.net` packages. Scripts offer developers a lot of flexibility, but having that flexibility comes at the cost
of safety. A Gremlin Server instance that is not secured appropriately provides for a big security risk.
The previous sections discussed methods for securing Gremlin Server through authentication and encryption, which is a
good first step in protection. Another layer of protection comes in the form of specific configurations for the
`GremlinGroovyScriptEngine`. A user can configure the script engine with a `GroovyCompilerGremlinPlugin`
implementation. Consider the basic configuration from the Gremlin Server YAML file:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
----
This configuration can be expanded to include a the `GroovyCompilerGremlinPlugin`:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample-secure.groovy]},
org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {enableThreadInterrupt: true}}}}
----
This configuration sets up the script engine with to ensure that loops (like `while`) will respect interrupt requests.
With this configuration in place, a remote execution as follows, now times out rather than consuming the thread
continuously:
[source,groovy]
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> :> while(true) { }
==>Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [while(true) {}]
The `GroovyCompilerGremlinPlugin` has a number of configuration options:
[width="100%",cols="3,10a",options="header"]
|=========================================================
|Customizer |Description
|`compilation` |Allows for three configurations: `COMPILE_STATIC`, `TYPE_CHECKED` or `NONE` (default). When configured with `COMPILE_STATIC` or `TYPE_CHECKED` it applies `CompileStatic` or `TypeChecked` annotations (respectively) to incoming scripts thus removing dynamic dispatch. More information about static compilation can be found link:http://docs.groovy-lang.org/latest/html/documentation/#_static_compilation[here] and additional information on `TypeChecked` usage can be found link:http://docs.groovy-lang.org/latest/html/documentation/#_the_code_typechecked_code_annotation[here].
|`compilerConfigurationOptions` |Allows configuration of the Groovy `CompilerConfiguration` object by taking a `Map` of key/value pairs where the "key" is a property to set on the `CompilerConfiguration`.
|`enableThreadInterrupt` |Injects checks for thread interruption, thus allowing the script to potentially respect calls to `Thread.interrupt()`
|`expectedCompilationTime` |The amount of time in milliseconds a script is allowed to compile before a warning message is sent to the logs.
|`globalFunctionCacheEnabled` |Determines if the global function cache is enabled. By default, this value is `true` - described in more detail in the <<gremlin-server-cache,Cache Management>> Section.
|`classMapCacheSpecification` |The cache specification for the `GremlinGroovyScriptEngine` class map cache - described in more detail in the <<gremlin-server-cache,Cache Management>> Section.
|`extensions` | This setting is for use when `compilation` is configured with `COMPILE_STATIC` or `TYPE_CHECKED` and accepts a comma separated list of link:http://docs.groovy-lang.org/latest/html/documentation/#Typecheckingextensions-Workingwithextensions[type checking extensions] that can have the effect of securing calls to various methods.
|=========================================================
NOTE: Consult the latest link:http://docs.groovy-lang.org/latest/html/documentation/#_typing[Groovy Documentation]
for information on the differences. It is important to understand the impact that these configuration will have on
submitted scripts before enabling this feature.
To provide some basic out-of-the-box protections against troublesome scripts, the following configuration can be used:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {}
org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {enableThreadInterrupt: true, compilation: COMPILE_STATIC, extensions: org.apache.tinkerpop.gremlin.groovy.jsr223.customizer.SimpleSandboxExtension},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample-secure.groovy]}}}}
----
This configuration uses the `SimpleSandboxExtension`, which blacklists calls to methods on the `System` class,
thereby preventing someone from remotely killing the server:
[source,groovy]
----
gremlin> :> System.exit(0)
Script8.groovy: 1: [Static type checking] - Not authorized to call this method: java.lang.System#exit(int)
@ line 1, column 1.
System.exit(0)
^
1 error
----
The `SimpleSandboxExtension` is by no means a "complete" implementation protecting against all manner of nefarious
scripts, but it does provide an example for how such a capability might be implemented. A more complete implementation
is offered in the `FileSandboxExtension` which uses a configuration file to white list certain classes and methods.
The configuration file is YAML-based and an example is presented as follows:
[source,yaml]
----
autoTypeUnknown: true
methodWhiteList:
- java\.lang\.Boolean.*
- java\.lang\.Byte.*
- java\.lang\.Character.*
- java\.lang\.Double.*
- java\.lang\.Enum.*
- java\.lang\.Float.*
- java\.lang\.Integer.*
- java\.lang\.Long.*
- java\.lang\.Math.*
- java\.lang\.Number.*
- java\.lang\.Object.*
- java\.lang\.Short.*
- java\.lang\.String.*
- java\.lang\.StringBuffer.*
- java\.lang\.System#currentTimeMillis\(\)
- java\.lang\.System#nanoTime\(\)
- java\.lang\.Throwable.*
- java\.lang\.Void.*
- java\.util\..*
- org\.codehaus\.groovy\.runtime\.DefaultGroovyMethods.*
- org\.codehaus\.groovy\.runtime\.InvokerHelper#runScript\(java\.lang\.Class,java\.lang\.String\[\]\)
- org\.codehaus\.groovy\.runtime\.StringGroovyMethods.*
- groovy\.lang\.Script#<init>\(groovy.lang.Binding\)
- org\.apache\.tinkerpop\.gremlin\.structure\..*
- org\.apache\.tinkerpop\.gremlin\.process\..*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\..*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.bulkloading\..*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.clustering\.peerpressure\.*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.ranking\.pagerank\.*
- org\.apache\.tinkerpop\.gremlin\.process\.computer\.traversal\..*
- org\.apache\.tinkerpop\.gremlin\.process\.traversal\..*
- org\.apache\.tinkerpop\.gremlin\.process\.traversal\.dsl\.graph\..*
- org\.apache\.tinkerpop\.gremlin\.process\.traversal\.engine\..*
- org\.apache\.tinkerpop\.gremlin\.server\.util\.LifeCycleHook.*
staticVariableTypes:
graph: org.apache.tinkerpop.gremlin.structure.Graph
g: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource
----
There are three keys in this configuration file that control different aspects of the sandbox:
. `autoTypeUnknown` - When set to `true`, unresolved variables are typed as `Object`.
. `methodWhiteList` - A white list of classes and methods that follow a regex pattern which can then be matched against
method descriptors to determine if they can be executed. The method descriptor is the fully-qualified class name
of the method, its name and parameters. For example, `Math.ceil` would have a descriptor of
`java.lang.Math#ceil(double)`.
. `staticVariableTypes` - A list of variables that will be used in the `ScriptEngine` for which the types are
always known. In the above example, the variable "graph" will always be bound to a `Graph` instance.
At Gremlin Server startup, the `FileSandboxExtension` looks in the root of Gremlin Server installation directory for a
file called `sandbox.yaml` and configures itself. To use a file in a different location set the
`gremlinServerSandbox` system property to the location of the file (e.g. `-DgremlinServerSandbox=conf/my-sandbox.yaml`).
The `FileSandboxExtension` provides for a basic configurable security function in Gremlin Server. More complex
sandboxing implementations can be developed by using this white listing model and extending from the
`AbstractSandboxExtension`.
A final thought on the topic of `GroovyCompilerGremlinPlugin` implementation is that it is not just for
"security" (though it is demonstrated in that capacity here). It can be used for a variety of features that
can fine tune the Groovy compilation process. Read more about compilation customization in the
link:http://docs.groovy-lang.org/latest/html/documentation/#compilation-customizers[Groovy Documentation].
==== Serialization
Gremlin Server can accept requests and return results using different serialization formats. Serializers implement the
`MessageSerializer` interface. In doing so, they express the list of mime types they expect to support. When
configuring multiple serializers it is possible for two or more serializers to support the same mime type. Such a
situation may be common with a generic mime type such as `application/json`. Serializers are added in the order that
they are encountered in the configuration file and the first one added for a specific mime type will not be overridden
by other serializers that also support it.
The format of the serialization is configured by the `serializers` setting described in the table above. Note that
some serializers have additional configuration options as defined by the `serializers[X].config` setting. The
`config` setting is a `Map` where the keys and values get passed to the serializer at its initialization. The
available and/or expected keys are dependent on the serializer being used. Gremlin Server comes packaged with three
different serializers: GraphSON, Gryo, and GraphBinary.
===== GraphSON
The GraphSON serializer produces human readable output in JSON format and is a good configuration choice for those
trying to use TinkerPop from non-JVM languages. JSON obviously has wide support across virtually all major
programming languages and can be consumed by a wide variety of tools. The format itself is described in the
link:https://tinkerpop.apache.org/docs/current/dev/io/#graphson[IO Documentation].
[source,yaml]
----
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0 }
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0 }
----
The above configuration represents the default serialization under the `application/json` MIME type and produces JSON
consistent with standard JSON data types. It has the following configuration option:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|ioRegistries |A list of `IoRegistry` implementations to be applied to the serializer. |_none_
|=========================================================
[source,yaml]
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 }
When the standard JSON data types are not enough (e.g. need to identify the difference between `double` and `float`
data types), the above configuration will embed types into the JSON itself. The type embedding uses standard Java
type names, so interpretation from non-JVM languages will be required. It has the MIME type of
`application/vnd.gremlin-v1.0+json` and the following configuration options:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|ioRegistries |A list of `IoRegistry` implementations to be applied to the serializer. |_none_
|=========================================================
===== Gryo
The Gryo serializer utilizes Kryo-based serialization which produces a binary output. This format is best consumed
by JVM-based languages. The format itself is described in the
link:https://tinkerpop.apache.org/docs/current/dev/io/#gryo[IO Documentation].
[source,yaml]
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerGremlinV1d0 }
It has the MIME type of `application/vnd.gremlin-v1.0+gryo` and the following configuration options:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|bufferSize |The maximum size of the Kryo buffer for use on a single object being serialized. Increasing this value will correct `KryoException` errors that complain of "Buffer too small". |_4096_
|classResolverSupplier |The fully qualified classname of a custom `Supplier<ClassResolver>` which will be used when constructing `Kryo` instances. There is no direct default for this setting, but without a setting the `GryoClassResolver` is used. |_none_
|custom |A list of classes with custom kryo `Serializer` implementations related to them in the form of `<class>;<serializer-class>`. |_none_
|ioRegistries |A list of `IoRegistry` implementations to be applied to the serializer. |_none_
|serializeResultToString |When set to `true`, results are serialized by first calling `toString()` on each object in the result list resulting in an extended MIME Type of `application/vnd.gremlin-v1.0+gryo-stringd`. When set to `false` Kryo-based serialization is applied. |_false_
|=========================================================
As described above, there are multiple ways in which to register serializers for Kryo-based serialization. Note
that the `ioRegistries` setting is applied first, followed by the `custom` setting.
Those configuring or implementing a `Supplier<ClassResolver>` should consider this an "advanced" option and typically
important to use cases where server types need to be coerced to client types (i.e. a type is available on the server
but not on the client). Implementations should typically instantiate `ClassResolver` implementations that are
extensions of the `GryoClassResolver` as this class is important to most serialization tasks in TinkerPop.
===== GraphBinary
GraphBinary is a binary serialization format suitable for object trees, designed to reduce serialization overhead on
both the client and the server, as well as limiting the size of the payload that is transmitted over the wire. The
format itself is described in the link:https://tinkerpop.apache.org/docs/current/dev/io/#graphbinary[IO Documentation].
IMPORTANT: GraphBinary is currently only supported on the JVM.
[source,yaml]
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1 }
It has the MIME type of `application/vnd.graphbinary-v1.0` and the following configuration options:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Key |Description |Default
|custom |A list of classes with custom kryo `Serializer` implementations related to them in the form of `<class>;<serializer-class>`. |_none_
|ioRegistries |A list of `IoRegistry` implementations to be applied to the serializer. |_none_
|builder |Name of the `TypeSerializerRegistry.Builder` instance to be used to construct the `TypeSerializerRegistry`. |_none_
|=========================================================
As described above, there are multiple ways in which to register serializers for GraphBinary-based serialization. Note
that the `ioRegistries` setting is applied first, followed by the `custom` setting.
[[metrics]]
==== Metrics
Gremlin Server produces metrics about its operations that can yield some insight into how it is performing. These
metrics are exposed in a variety of ways:
* Directly to the console where Gremlin Server is running
* CSV file
* link:http://ganglia.info/[Ganglia]
* link:http://graphite.wikidot.com/[Graphite]
* link:http://www.slf4j.org/[SLF4j]
* link:https://en.wikipedia.org/wiki/Java_Management_Extensions[JMX]
The configuration of each of these outputs is described in the Gremlin Server <<_configuring_2, Configuring>> section.
Note that Graphite and Ganglia are not included as part of the Gremlin Server distribution and must be installed
to the server manually.
[source,text]
----
bin/gremlin-server.sh install com.codahale.metrics metrics-ganglia 3.0.2
bin/gremlin-server.sh install com.codahale.metrics metrics-graphite 3.0.2
----
WARNING: Gremlin Server is built to work with Metrics 3.0.2. Usage of other versions may lead to unexpected problems.
NOTE: Installing Ganglia will include `org.acplt:oncrpc`, which is an LGPL licensed dependency.
Regardless of the output, the metrics gathered are the same. Each metric is prefixed with
`org.apache.tinkerpop.gremlin.server.GremlinServer` and the following metrics are reported:
* `sessions` - the number of sessions open at the time the metric was last measured.
* `errors` - the number of total errors, mean rate, as well as the 1, 5, and 15-minute error rates.
* `op.eval` - the number of script evaluations, mean rate, 1, 5, and 15 minute rates, minimum, maximum, median, mean,
and standard deviation evaluation times, as well as the 75th, 95th, 98th, 99th and 99.9th percentile evaluation times
(note that these time apply to both sessionless and in-session requests).
* `op.traversal` - the number of `Traversal` executions, mean rate, 1, 5, and 15 minute rates, minimum, maximum, median,
mean, and standard deviation evaluation times, as well as the 75th, 95th, 98th, 99th and 99.9th percentile evaluation
times.
* `engine-name.session.session-id.*` - metrics related to different `GremlinScriptEngine` instances configured for
session-based requests where "engine-name" will be the actual name of the engine, such as "gremlin-groovy" and
"session-id" will be the identifier for the session itself.
* `engine-name.sessionless.*` - metrics related to different `GremlinScriptEngine` instances configured for sessionless
requests where "engine-name" will be the actual name of the engine, such as "gremlin-groovy".
==== As A Service
Gremlin server can be configured to run as a service.
===== Init.d (SysV)
Link `bin/gremlin-server.sh` to `init.d`
Be sure to set RUNAS to the service user in `bin/gremlin-server.conf`
[source,bash]
----
# Install
ln -s /path/to/apache-tinkerpop-gremlin-server-x.y.z/bin/gremlin-server.sh /etc/init.d/gremlin-server
# Systems with chkconfig/service. E.g. Fedora, Red Hat
chkconfig --add gremlin-server
# Start
service gremlin-server start
# Or call directly
/etc/init.d/gremlin-server restart
----
===== Systemd
To install, copy the service template below to /etc/systemd/system/gremlin.service
and update the paths `/path/to/apache-tinkerpop-gremlin-server` with the actual install path of Gremlin Server.
[source,bash]
----
[Unit]
Description=Apache TinkerPop Gremlin Server daemon
Documentation=https://tinkerpop.apache.org/
After=network.target
[Service]
Type=forking
ExecStart=/path/to/apache-tinkerpop-gremlin-server/bin/gremlin-server.sh start
ExecStop=/path/to/apache-tinkerpop-gremlin-server/bin/gremlin-server.sh stop
PIDFile=/path/to/apache-tinkerpop-gremlin-server/run/gremlin.pid
[Install]
WantedBy=multi-user.target
----
Enable the service with `systemctl enable gremlin-server`
Start the service with `systemctl start gremlin-server`
=== Best Practices
The following sections define best practices for working with Gremlin Server.
==== Tuning
image:gremlin-handdrawn.png[width=120,float=right] Tuning Gremlin Server for a particular environment may require some simple trial-and-error, but the following represent some basic guidelines that might be useful:
* Gremlin Server defaults to a very modest maximum heap size. Consider increasing this value for non-trivial uses.
Maximum heap size (`-Xmx`) is defined with the `JAVA_OPTIONS` setting in `gremlin-server.conf`.
* If Gremlin Server is processing scripts or lambdas in bytecode requests, consider fine tuning the JVM's handling of
the metaspace size. Consider modifying the `-XX:MetaspaceSize`,`-XX:MaxMetaspaceSize`, and related settings given the
expected workload. More discussion on this topic can be found in the <<parameterized-scripts,Parameterized Scripts>>
Section below.
* When configuring the size of `threadPoolWorker` start with the default of `1` and increment by one as needed to a
maximum of `2*number of cores`.
* The "right" size of the `gremlinPool` setting is somewhat dependent on the type of scripts that will be processed
by Gremlin Server. As requests arrive to Gremlin Server they are decoded and queued to be processed by threads in
this pool. When this pool is exhausted of threads, Gremlin Server will continue to accept incoming requests, but
the queue will continue to grow. If left to grow too large, the server will begin to slow. When tuning around
this setting, consider whether the bulk of the scripts being processed will be "fast" or "slow", where "fast"
generally means being measured in the low hundreds of milliseconds and "slow" means anything longer than that.
* Scripts that are "slow" can really hurt Gremlin Server if they are not properly accounted for. `ScriptEngine`
evaluations are blocking operations that aren't always easily interrupted, so once a "slow" script is being evaluated in
the context of a `ScriptEngine` it must finish its work. Lots of "slow" scripts will eventually consume the
`gremlinPool` preventing other scripts from getting processed from the queue.
** To limit the impact of this problem, consider properly setting the `evaluationTimeout` to something "sane".
In other words, test the traversals being sent to Gremlin Server and determine the maximum time they take to evaluate
and iterate over results, then set the timeout value accordingly.
** Note that `evaluationTimeout` can only attempt to interrupt the evaluation on timeout. It allows Gremlin
Server to "ignore" the result of that evaluation, which means the thread in the `gremlinPool` that did the evaluation
may still be consumed after the timeout if interruption does not succeed on the thread.
* Graph element serialization for `Vertex` and `Edge` can be expensive, as their data structures are complex given the
possible existence of multi-properties and meta-properties. When returning data from Gremlin Server only return the
data that is required. For example, if only two properties of a `Vertex` are needed then simply return the two rather
than returning the entire `Vertex` object itself. Even with an entire `Vertex`, it is typically much faster to issue
the query as `g.V(1).elementMap()` than `g.V(1)`, as the former returns a `Map` of the same data as a `Vertex`, but
without all the associated structure which can slow the response.
[[parameterized-scripts]]
==== Parameterized Scripts
image:gremlin-parameterized.png[width=150,float=left] Use script parameterization. Period. There are at least two good
reasons for doing so: script caching and protection from "Gremlin injection" (conceptually the same as the notion of
SQL injection).
With respect to caching, Gremlin Server caches all scripts that are passed to it. The cache is keyed based on the a
hash of the script. Therefore `g.V(1)` and `g.V(2)` will be recognized as two separate scripts in the cache. If that
script is parameterized to `g.V(x)` where `x` is passed as a parameter from the client, there will be no additional
compilation cost for future requests on that script. Compilation of a script should be considered "expensive" and
avoided when possible.
IMPORTANT: The parameterized script of `g.V(x)` is keyed in the cache differently than `g.V(y)` or even `g.V( x )`.
Scripts must be exact string matches for recompilation to be avoided.
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect();
Map<String,Object> params = new HashMap<>();
params.put("x",4);
client.submit("[1,2,3,x]", params);
----
The more parameters that are used in a script the more expensive the compilation step becomes. Gremlin Server has a
`OpProcessor` setting called `maxParameters`, which is mentioned in the <<opprocessor-configurations,OpProcessor Configuration>>
section. It controls the maximum number of parameters that can be passed to the server for script evaluation purposes.
Use of this setting can prevent accidental long run compilations, which individually are not terribly oppressive to
the server, but taken as a group under high concurrency would be considered detrimental.
On the topic of Gremlin injection, note that it is possible to take advantage of Gremlin scripts in the same fashion
as SQL scripts that are submitted as strings. When using string building patterns for queries without proper input
scrubbing, it would be quite simple to do:
[source,java]
----
String lbl = "person"
String nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas";
String query = "g.addV('" + lbl + "').property('identifier','" + nodeId + "')";
client.submit(query);
----
The above case would `drop()` all vertices in the graph. By using script parameterization, there is a different outcome
in that the `nodeId` string is not treated as something executable, but rather as a literal string that just becomes
part of the "identifier" for the vertex on insertion:
[source,java]
----
String lbl = "person"
String nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas";
String query = "g.addV(lbl).property('identifier',nodeId)";
Map<String,Object> params = new HashMap<>();
params.put("lbl",lbl);
params.put("nodeId",nodeId);
client.submit(query, params);
----
Gremlin injection should not be possible with `Bytecode` based traversals - only scripts - because `Bytecode`
traversals will treat all arguments as literal values. There is potential for concern if lambda based steps are
utilized as they execute arbitrary code, which is string based, but configuring `TraversalSource` instances with
`LambdaRestrictionStrategy`, which prevents lambdas all together, using a graph that does not allow lambdas at all, or
configuring appropriate <<script-execution,sandbox options>> in Gremlin Server (or such options available to the graph
database in use) should each help mitigate problems related to this issue.
Scripts create classes which get loaded to the JVM metaspace and to a `Class` cache. For those using script
parameterization, a typical application should not generate an overabundance of pressure on these two components of
Gremlin Server's memory footprint. On the other hand, it's not too hard to imagine a situation where problems might
emerge:
* An application use case makes parameterization impossible and therefore all scripts are unique.
* There is a bug in an applications parameterization code that is actually instead producing unique scripts.
* A long running Gremln Server takes lots of non-parameterized scripts from Gremlin Console or similar tools.
In these sorts of cases, Gremlin Server's performance can be affected adversely as without some additional configuration
the metaspace will grow indefinitely (possibly along with the general heap) triggering longer and more frequent rounds
of garbage collection (GC). Some tuning of JVM settings can help abate this issue.
As a first guard against this problem consider setting the `-XX:SoftRefLRUPolicyMSPerMB` to release soft references
earlier. The `ScriptEngine` cache for created `Class` objects uses soft references and if the workload expectation is
such that cache hits will be low there is little need to keep such references around.
Perhaps the more important guards are related to the JVM metaspace. Start by setting the initial size of this space
with `-XX:MetaspaceSize`. When this value is exceeded it will trigger a GC round - it is essentially a threshold for
GC. The grow of this value can be capped with `-XX:MaxMetaspaceSize` (this value is unlimited by default). In an ideal
situation (i.e. parameterization), the `-XX:MetaspaceSize` should have a large enough setting so as to avoid early GC
rounds for metaspace, but outside of an ideal world (i.e. non-parameterization) it may not be smart to make this number
too large. Making the setting too large (and thus the `-XX:MaxMetaspaceSize` even larger) may trigger longer GC rounds
when they inevitably arrive.
In addition to those two metaspace settings it may also be useful to consider the following additional options:
* `MinMetaspaceFreeRatio` - When the percentage for committed space available for class metadata is less than this
value, then the threshold of metaspace GC will be raised, but only if the incremental size of the threshold meets the
requirement set by `MinMetaspaceExpansion`. A larger number should make the metaspace grow more aggressively.
* `MaxMetaspaceFreeRatio` - When the percentage for committed space available for class metadata is more than this
value, then the threshold of metaspace GC will be lowered, but only if the incremental size of the threshold meets the
requirement set by `MaxMetaspaceExpansion`. A larger number should reduce the chance of the metaspace shrinking.
* `MinMetaspaceExpansion` - The minimum size by which the metaspace is expanded after a metaspace GC round.
* `MaxMetaspaceExpansion`` - If the incremental size exceeds `MinMetaspaceExpansion` but less than
`MaxMetaspaceExpansion`, then the incremental size is `MaxMetaspaceExpansion`. If the incremental size exceeds
`MaxMetaspaceExpansion`, then the incremental size is `MinMetaspaceExpansion` plus the original incremental size.
There really aren't any general guidelines for how to initially set these values. Using profiling tools to examine GC
trends is likely the best way to understand how a particular workload is affecting the metaspace and its relation to
GC. Getting these settings "right" however will help ensure much more predictable Gremlin Server operations.
IMPORTANT: A lambda used in a bytecode-based request will be treated as a script, so issues related to raw script-based
requests apply equally well to lambda-bytecode requests.
==== Properties of Elements
It was mentioned above at the start of this "Best Practices" section that serialization of graph elements (i.e.
`Vertex`, `Edge`, and `VertexProperty`) can be expensive and that it is best to only return the data that is required
by the requesting system. This point begs for further clarification as there are a number of ways to use and configure
Gremlin Server which might influence its interpretation.
To begin to discuss these nuances, first consider the method of making requests to Gremlin Server: script or bytecode.
For scripts, that will mean that users are sending string representation of Gremlin to the server directly through a
driver over websockets or through the HTTP. For bytecode, users will be utilize a <<gremlin-drivers-variants, Gremlin GLV>>
which will construct bytecode for them and submit the request to the server upon iteration of their traversal.
In either case, it is important to also consider the method of "detachment". Detachment refers to the manner in which
a graph element is disconnected from the graph for purpose of serialization. Depending on the case and configuration,
graph elements may be detached with or without properties. Cases where they include properties is generally referred
to as "detached elements" and cases where properties are not included are "reference elements".
With the type of request and detachment model in mind, it is now possible to discuss how best to consider element
properties in relation to them all in concert.
By default, Gremlin Server sample configurations utilize `ReferenceElementStrategy` when creating the out-of-the-box
`GraphTraversalSource`. As the name suggests, this means that elements will be detached by reference and will
therefore not have properties included. The relevant configuration from the Gremlin Server initialization script looks
like this:
[source,groovy]
----
globals << [g : graph.traversal().withStrategies(ReferenceElementStrategy.instance())]
----
This configuration is global to Gremlin Server and therefore all methods of connection will always return elements
without properties. If this strategy is not included, then there are other considerations to take into account such as
the connection type (i.e. script or bytecode) and the serializer.
For script-based requests, users should take care when returning graph elements. By default, elements will be returned
as "detached elements" and depending on the serializer being used those detached elements may or may not have their
properties carried with them. Gryo and GraphSON serializers will write all properties in the return payload in this
case but GraphBinary will not. Therefore, script-based requests that use Gryo or GraphSON should definitely follow the
best practice of only returning the data required by the application.
NOTE: Gryo does have the exception for the `GryoMessageSerializerGremlinV1d0` with the `serializeResultToString`
option enabled, which will simply convert all results using the Java `toString()` method prior to serialization and
is typically only use by the Gremlin Console for remote sessions where the actual object from the server is not of use.
For bytecode-based requests, graph elements have reference detachment and thus only return the `id` and `label` of
the elements. While this approach alleviates a potential performance problem that the script approach exposes, it is
still important to follow the practice of being specific about the data that is required by the requesting application
as it won't arrive on the client side without that declaration.
Ultimately, the detachment model should have little impact to Gremlin usage if the best practice of specifying only
the data required by the application is adhered to. In other words, while there may be a difference in the contents
of return values for these traversals:
[source,java]
----
// properties returned from g.V().hasLabel('person') because this is using the
// Script API with full detachment
Cluster cluster = Cluster.open();
Client client = cluster.connect();
ResultSet results = client.submit("g.V().hasLabel('person')");
// no properties returned from g.V().hasLabel("person") because this is using
// Bytecode API with reference detachment
GraphTraversalSource g = traversal().withRemote('conf/remote-graph.properties');
List<Vertex> results = g.V().hasLabel("person").toList();
----
There is no difference if re-written using the best practice of requesting only the data the application needs:
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect();
ResultSet results = client.submit("g.V().hasLabel('person').elementMap('name')");
GraphTraversalSource g = traversal().withRemote('conf/remote-graph.properties');
List<Vertex> results = g.V().hasLabel("person").elementMap('name').toList();
----
Both of the above requests return a list of `Map` instances that contain the `id`, `label` and the "name" property.
TIP: The example graph configurations pre-packaged with Gremlin Server utilize `ReferenceElementStrategy`
which convert all graph elements to references by initializing "g" using
`withStrategies(ReferenceElementStrategy.instance()`. Consider utilizing `ReferenceElementStrategy` whenever creating
a `GraphTraversalSource` in Java to ensure the most portable Gremlin.
NOTE: For those interested, please see link:https://lists.apache.org/thread.html/e959e85d4f8b3d46d281f2742a6e574c7d27c54bfc52f802f7c04af3%40%3Cdev.tinkerpop.apache.org%3E[this post]
to the TinkerPop dev list which outlines the full history of this issue and related concerns.
[[gremlin-server-cache]]
==== Cache Management
If Gremlin Server processes a large number of unique scripts, the global function cache will grow beyond the memory
available to Gremlin Server and an `OutOfMemoryError` will loom. Script parameterization goes a long way to solving
this problem and running out of memory should not be an issue for those cases. If it is a problem or if there is no
script parameterization due to a given use case (perhaps using with use of <<sessions,sessions>>), it is possible to
better control the nature of the global function cache from the client side, by issuing scripts with a parameter to
help define how the garbage collector should treat the references.
The parameter is called `#jsr223.groovy.engine.keep.globals` and has four options:
* `hard` - available in the cache for the life of the JVM (default when not specified).
* `soft` - retained until memory is "low" and should be reclaimed before an `OutOfMemoryError` is thrown.
* `weak` - garbage collected even when memory is abundant.
* `phantom` - removed immediately after being evaluated by the `ScriptEngine`.
By specifying an option other than `hard`, an `OutOfMemoryError` in Gremlin Server should be avoided. Of course,
this approach will come with the downside that functions could be garbage collected and thus removed from the
cache, forcing Gremlin Server to recompile later if that script is later encountered.
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect();
Map<String,Object> params = new HashMap<>();
params.put("#jsr223.groovy.engine.keep.globals", "soft");
client.submit("def addItUp(x,y){x+y}", params);
----
In cases where maintaining the expense of the global function cache is unecessary this cache can be disabled with the
`globalFunctionCacheEnabled` configuration on the `GroovyCompilerGremlinPlugin`.
Gremlin Server also has a "class map" cache which holds compiled scripts which helps avoid recompilation costs on
future requests. This cache can be tuned in the Gremlin Server configuration with the `GroovyCompilerGremlinPlugin`
in the following fashion:
[source,yaml]
----
scriptEngines: {
gremlin-groovy: {
plugins: { ...
org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyCompilerGremlinPlugin: {classMapCacheSpecification: "initialCapacity=1000,maximumSize=10000"},
...}
----
The specifics for this comma delimited format can be found
link:https://static.javadoc.io/com.github.ben-manes.caffeine/caffeine/2.6.2/com/github/benmanes/caffeine/cache/CaffeineSpec.html[here].
By default, the cache is set to `softValues` which means they are garbage collected in a globally least-recently-used
manner as memory gets low. For production systems, it is likely that a more predictable strategy be taken as shown
above with the use of the `maximumSize`.
[[sessions]]
==== Considering Sessions
The preferred approach for issuing script-based requests to Gremlin Server is to do so in a sessionless manner. The
concept of "sessionless" refers to a request that is completely encapsulated within a single transaction, such that
the script in the request starts with a new transaction and ends with a closed transaction. Sessionless requests have
automatic transaction management handled by Gremlin Server, thus automatically opening and closing transactions as
previously described. The downside to the sessionless approach is that the entire script to be executed must be known
at the time of submission so that it can all be executed at once. This requirement makes it difficult for some use
cases where more control over the transaction is desired.
For such use cases, Gremlin Server supports sessions. With sessions, the user is in complete control of the start
and end of the transaction. This feature comes with some additional expense to consider:
* Initialization scripts will be executed for each session created so any expense related to them will be established
each time a session is constructed.
* There will be one script cache per session, which obviously increases memory requirements. The cache is not shared,
so as to ensure that a session has isolation from other session environments. As a result, if the same script is
executed in each session the same compilation cost will be paid for each session it is executed in.
* Each session will require its own thread pool with a single thread in it - this ensures that transactional
boundaries are managed properly from one request to the next.
* If there are multiple Gremlin Server instances, communication from the client to the server must be bound to the
server that the session was initialized in. Gremlin Server does not share session state as the transactional context
of a `Graph` is bound to the thread it was initialized in.
To connect to a session with Java via the `gremlin-driver`, it is necessary to create a `SessionedClient` from the
`Cluster` object:
[source,java]
----
Cluster cluster = Cluster.open(); <1>
Client client = cluster.connect("sessionName"); <2>
----
<1> Opens a reference to `localhost` as <<gremlin-java,previously shown>>.
<2> Creates a `SessionedClient` given the configuration options of the Cluster. The `connect()` method is given a
`String` value that becomes the unique name of the session. It is often best to simply use a `UUID` to represent
the session.
It is also possible to have Gremlin Server manage the transactions as is done with sessionless requests. The user is
in control of enabling this feature when creating the `SessionedClient`:
[source,java]
----
Cluster cluster = Cluster.open();
Client client = cluster.connect("sessionName", true);
----
Specifying `true` to the `connect()` method signifies that the `client` should make each request as one encapsulated
in a transaction. With this configuration of `client` there is no need to close a transaction manually.
When using this mode of the `SessionedClient` it is important to recognize that global variable state for the session
is not rolled-back on failure depending on where the failure occurs. For example, sending the following script would
create a variable "x" in global session scope that would be accessible on the next request:
[source,groovy]
x = 1
However, sending this script which explicitly throws an exception:
[source,groovy]
y = 2
throw new RuntimeException()
will result in an obvious failure during script evaluation and "y" will not be available to the next request. The
complication arises where the script evaluates successfully, but fails during result iteration or serialization. For
example, this script:
[source,groovy]
a = 1
g.addV()
would successfully evaluate and return a `Traversal`. The variable "a" would be available on the next request. However,
if there was a failure in transaction management on the call to `commit()`, "a" would still be available to the next
request.
A session is a "heavier" approach to the simple "request/response" approach of sessionless requests, but is sometimes
necessary for a given use case.
[[considering-transactions]]
==== Considering Transactions
Gremlin Server performs automated transaction handling for "sessionless" requests (i.e. no state between requests) and
for "in-session" requests with that feature enabled. It will automatically commit or rollback transactions depending
on the success or failure of the request.
Another aspect of Transaction Management that should be considered is the usage of the `strictTransactionManagement`
setting. It is `false` by default, but when set to `true`, it forces the user to pass `aliases` for all requests.
The aliases are then used to determine which graphs will have their transactions closed for that request. Running
Gremlin Server in this configuration should be more efficient when there are multiple graphs being hosted as
Gremlin Server will only close transactions on the graphs specified by the `aliases`. Keeping this setting `false`,
will simply have Gremlin Server close transactions on all graphs for every request.
[[considering-state]]
==== Considering State
With HTTP and any sessionless requests, there is no variable state maintained between requests. Therefore,
when <<connecting-via-console,connecting with the console>>, for example, it is not possible to create a variable in
one command and then expect to access it in the next:
[source,groovy]
----
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> :> x = 2
==>2
gremlin> :> 2 + x
No such property: x for class: Script4
Display stack trace? [yN] n
----
The same behavior would be seen with HTTP or when using sessionless requests through one of the Gremlin Server drivers.
If having this behavior is desireable, then <<sessions,consider sessions>>.
There is an exception to this notion of state not existing between requests and that is globally defined functions.
All functions created via scripts are global to the server.
[source,groovy]
----
gremlin> :> def subtractIt(int x, int y) { x - y }
==>null
gremlin> :> subtractIt(8,7)
==>1
----
If this behavior is not desirable there are several options. A first option would be to consider using sessions. Each
session gets its own `ScriptEngine`, which maintains its own isolated cache of global functions, whereas sessionless
requests uses a single function cache. A second option would be to define functions as closures:
[source,groovy]
----
gremlin> :> multiplyIt = { int x, int y -> x * y }
==>Script7$_run_closure1@6b24f3ab
gremlin> :> multiplyIt(7, 8)
No signature of method: org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.multiplyIt() is applicable for argument types: (java.lang.Integer, java.lang.Integer) values: [7, 8]
Display stack trace? [yN]
----
When the function is declared this way, the function is viewed by the `ScriptEngine` as a variable rather than a global
function and since sessionless requests don't maintain state, the function is forgotten for the next request. A final
option would be to manage the `ScriptEngine` cache manually:
[source,bourne]
----
$ curl -X POST -d "{\"gremlin\":\"def divideIt(int x, int y){ x / y }\",\"bindings\":{\"#jsr223.groovy.engine.keep.globals\":\"phantom\"}}" "http://localhost:8182"
{"requestId":"97fe1467-a943-45ea-8fd6-9e889a6c9381","status":{"message":"","code":200,"attributes":{}},"result":{"data":[null],"meta":{}}}
$ curl -X POST -d "{\"gremlin\":\"divideIt(8, 2)\"}" "http://localhost:8182"
{"message":"Error encountered evaluating script: divideIt(8, 2)"}
----
In the above HTTP-based requests, the bindings contain a special parameter that tells the `ScriptEngine` cache to
immediately forget the script after execution. In this way, the function does not end up being globally available.
[[gremlin-server-docker-image]]
=== Docker Image
The Gremlin Server can also be started as a link:https://hub.docker.com/r/tinkerpop/gremlin-server/[Docker image]:
[source,text]
----
$ docker run tinkerpop/gremlin-server:x.y.z
[INFO] GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server.yaml
...
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
----
By default, Gremlin Server listens on port 8182. So that port needs to be exposed if it should be reachable on the host:
[source,bash]
----
$ docker run -p 8182:8182 tinkerpop/gremlin-server:x.y.z
----
Arguments provided with `docker run` are forwarded to the script that starts Gremlin Server. This allows for example
to use an alternative config file:
[source,bash]
----
$ docker run tinkerpop/gremlin-server:x.y.z conf/gremlin-server-secure.yaml
----
[[gremlin-plugins]]
== Gremlin Plugins
image:gremlin-plugin.png[width=125]
Plugins provide a way to expand the features of Gremlin Console and Gremlin Server. The following sections describe
the plugins that are available directly from TinkerPop. Please see the
link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#gremlin-plugins[Provider Documentation] for information on
how to develop custom plugins.
[[credentials-plugin]]
=== Credentials Plugin
image:gremlin-server.png[width=200,float=left] xref:gremlin-server[Gremlin Server] supports an authentication model
where user credentials are stored inside of a `Graph` instance. This database can be managed with the
xref:credentials-dsl[Credentials DSL], which can be installed in the console via the Credentials Plugin. This plugin
is packaged with the console, but is not enabled by default.
[source,groovy]
gremlin> :plugin use tinkerpop.credentials
==>tinkerpop.credentials activated
This plugin imports the appropriate classes for managing the credentials graph.
[[gephi-plugin]]
=== Gephi Plugin
image:gephi-logo.png[width=200, float=left] link:http://gephi.org/[Gephi] is an interactive visualization,
exploration, and analysis platform for graphs. The link:https://gephi.org/plugins/#/plugin/graphstreaming[Graph Streaming]
plugin for Gephi provides an API that can be leveraged to stream graph data to a running Gephi application. The Gephi
plugin for Gremlin Console utilizes this API to allow for graph and traversal visualization.
IMPORTANT: These instructions have been tested with Gephi 0.9.1 and Graph Streaming plugin 1.0.3.
The following instructions assume that Gephi has been download and installed. It further assumes that the Graph
Streaming plugin has been installed (`Tools > Plugins`). The following instructions explain how to visualize a
`Graph` and `Traversal`.
In Gephi, create a new project with `File > New Project`. In the lower left view, click the "Streaming" tab, open the
Master drop down, and right click `Master Server > Start` which starts the Graph Streaming server in Gephi and by
default accepts requests at `http://localhost:8080/workspace1`:
image::gephi-start-server.png[width=800]
IMPORTANT: The Gephi Streaming Plugin doesn't detect port conflicts and will appear to start the plugin successfully
even if there is something already active on that port it wants to connect to (which is 8080 by default). Be sure
that there is nothing running on the port before Gephi will be using before starting the plugin. Failing to do
this produce behavior where the console will appear to submit requests to Gephi successfully but nothing will
render.
WARNING: Do not skip the `File > New Project` step as it may prevent a newly started Gephi application from fully
enabling the streaming tab.
Start the xref:gremlin-console[Gremlin Console] and activate the Gephi plugin:
[gremlin-groovy]
----
:plugin use tinkerpop.gephi
graph = TinkerFactory.createModern()
:remote connect tinkerpop.gephi
:> graph
----
The above Gremlin session activates the Gephi plugin, creates the "modern" `TinkerGraph`, uses the `:remote` command
to setup a connection to the Graph Streaming server in Gephi (with default parameters that will be explained below),
and then uses `:submit` which sends the vertices and edges of the graph to the Gephi Streaming Server. The resulting
graph appears in Gephi as displayed in the left image below.
image::gephi-graph-submit.png[width=800]
NOTE: Issuing `:> graph` again will clear the Gephi workspace and then re-write the graph. To manually empty the
workspace do `:> clear`.
Now that the graph is visualized in Gephi, it is possible to link:https://gephi.github.io/users/tutorial-layouts/[apply a layout algorithm],
change the size and/or color of vertices and edges, and display labels/properties of interest. Further information
can be found in Gephi's tutorial on link:https://gephi.github.io/users/tutorial-visualization/[Visualization].
After applying the Fruchterman Reingold layout, increasing the node size, decreasing the edge scale, and displaying
the id, name, and weight attributes the graph looks as displayed in the right image above.
Visualization of a `Traversal` has a different approach as the visualization occurs as the `Traversal` is executing,
thus showing a real-time view of its execution. A `Traversal` must be "configured" to operate in this format and for
that it requires use of the `visualTraversal` option on the `config` function of the `:remote` command:
[gremlin-groovy,modern]
----
:remote config visualTraversal graph <1>
traversal = vg.V(2).in().out('knows').
has('age',gt(30)).outE('created').
has('weight',gt(0.5d)).inV();[] <2>
:> traversal <3>
----
<1> Configure a "visual traversal" from your "graph" - this must be a `Graph` instance. This command will create a
new `TraversalSource` called "vg" that must be used to visualize any spawned traversals in Gephi.
<2> Define the traversal to be visualized. Note that ending the line with `;[]` simply prevents iteration of
the traversal before it is submitted.
<3> Submit the `Traversal` to visualize to Gephi.
When the `:>` line is called, each step of the `Traversal` that produces or filters vertices generates events to
Gephi. The events update the color and size of the vertices at that step with `startRGBColor` and `startSize`
respectively. After the first step visualization, it sleeps for the configured `stepDelay` in milliseconds. On the
second step, it decays the configured `colorToFade` of all the previously visited vertices in prior steps, by
multiplying the current `colorToFade` value for each vertex with the `colorFadeRate`. Setting the `colorFadeRate`
value to `1.0` will prevent the color decay. The screenshots below show how the visualization evolves over the four
steps:
image::gephi-traversal.png[width=1200]
To get a sense of how the visualization configuration parameters affect the output, see the example below:
[gremlin-groovy,modern]
----
:remote config startRGBColor [0.0,0.3,1.0]
:remote config colorToFade b
:remote config colorFadeRate 0.5
:> traversal
----
image::gephi-traversal-config.png[width=400]
The visualization configuration above starts with a blue color now (most recently visited), fading the blue color
(so that dark green remains on oldest visited), and fading the blue color more quickly so that the gradient from dark
green to blue across steps has higher contrast. The following table provides a more detailed description of the
Gephi plugin configuration parameters as accepted via the `:remote config` command:
[width="100%",cols="3,10,^2",options="header"]
|=========================================================
|Parameter |Description |Default
|workspace |The name of the workspace that your Graph Streaming server is started for. |workspace1
|host |The host URL where the Graph Streaming server is configured for. |localhost
|port |The port number of the URL that the Graph Streaming server is listening on. |8080
|sizeDecrementRate |The rate at which the size of an element decreases on each step of the visualization. |0.33
|stepDelay |The amount of time in milliseconds to pause between step visualizations. |1000
|startRGBColor |A size 3 float array of RGB color values which define the starting color to update most recently visited nodes with. |[0.0,1.0,0.5]
|startSize |The size an element should be when it is most recently visited. |20
|colorToFade |A single char from the set `{r,g,b,R,G,B}` determining which color to fade for vertices visited in prior steps |g
|colorFadeRate |A float value in the range `(0.0,1.0]` which is multiplied against the current `colorToFade` value for prior vertices; a `1.0` value effectively turns off the color fading of prior step visited vertices |0.7
|visualTraversal |Creates a `TraversalSource` variable in the Console named `vg` which can be used for visualizing traversals. This configuration option takes two parameters. The first is required and is the name of the `Graph` instance variable that will generate the `TraversalSource`. The second parameter is the variable name that the `TraversalSource` should have when referenced in the Console. If left unspecified, this value defaults to `vg`. |vg
|=========================================================
NOTE: This plugin is typically only useful to the Gremlin Console and is enabled in the there by default.
The instructions above assume that the `Graph` instance being visualized is local to the Gremlin Console. It makes that
assumption because the Gephi plugin requires a locally held `Graph`. If the intent is to visualize a `Graph` instance
hosted in Gremlin Server or a TinkerPop-enabled graph that can only be connected to in a "remote" fashion, then it
is still possible to use the Gephi plugin, but the requirement for a locally held `Graph` remains the same. To use
the Gephi plugin in these situations simply use <<subgraph-step,subgraph()-step>> to extract the portion of the remote
graph that will be visualized. Use of that step will return a `TinkerGraph` instance to the Gremlin Console at which
point it can be used locally with the Gephi plugin. The following example demonstrates the general steps:
[source,text]
----
gremlin> :remote connect tinkerpop.server conf/remote-objects.yaml <1>
...
gremlin> :> g.E().hasLabel('knows').subgraph('subGraph').cap('subGraph') <2>
...
gremlin> graph = result[0].object <3>
...
----
<1> Be sure to connect with a serializer configured to return objects and not their `toString()` representation which
is discussed in more detail in the <<connecting-via-console, Connecting Via Console>> Section.
<2> Use the `:>` command to subgraph the remote graph as needed.
<3> The `TinkerGraph` of that previous traversal can be found in the `result` object and now that the `Graph` is local
to Gremlin Console it can be used with Gephi as shown in the prior instruction set.
[[graph-plugins]]
=== Graph Plugins
This section does not refer to a specific Gremlin Plugin, but a class of them. Graph Plugins are typically created by
graph providers to make it easy to integrate their graph systems into Gremlin Console and Gremlin Server. As TinkerPop
provides two reference `Graph` implementations in <<tinkergraph-gremlin,TinkerGraph>> and <<neo4j-gremlin,Neo4j>>,
there is also one Gremlin Plugin for each of them.
The TinkerGraph plugin is installed and activated in the Gremlin Console by default and the sample configurations that
are supplied with the Gremlin Server distribution include the `TinkerGraphGremlinPlugin` as part of the default setup.
If using Neo4j, however, the plugin must be installed manually. Instructions for doing so can be found in the
<<neo4j-gremlin,Neo4j>> section.
[[hadoop-plugin]]
=== Hadoop Plugin
image:hadoop-logo-notext.png[width=100,float=left] The Hadoop Plugin installs as part of `hadoop-gremlin` and provides
a number of imports and utility functions to the environment within which it is used. Those classes and functions
provide the basis for supporting <<graphcomputer,OLAP based traversals>> with Gremlin. This plugin is defined in
greater detail in the <<hadoop-gremlin,Hadoop-Gremlin>> section.
[[server-plugin]]
=== Server Plugin
image:gremlin-server.png[width=200,float=left] xref:gremlin-server[Gremlin Server] remotely executes Gremlin scripts
that are submitted to it. The Server Plugin provides a way to submit scripts to Gremlin Server for remote
processing. Read more about the plugin and how it works in the Gremlin Server section on
<<connecting-via-console,Connecting via Console>>.
NOTE: This plugin is typically only useful to the Gremlin Console and is enabled in the there by default.
The Server Plugin for remoting with the Gremlin Console should not be confused with a plugin of similar name that is
used by the server. `GremlinServerGremlinPlugin` is typically only configured in Gremlin Server and provides a number
of imports that are required for writing <<starting-gremlin-server,initialization scripts>>.
[[spark-plugin]]
=== Spark Plugin
image:spark-logo.png[width=175,float=left] The Spark Plugin installs as part of `spark-gremlin` and provides
a number of imports and utility functions to the environment within which it is used. Those classes and functions
provide the basis for supporting <<graphcomputer,OLAP based traversals>> using link:http://spark.apache.org[Spark].
This plugin is defined in greater detail in the <<sparkgraphcomputer,SparkGraphComputer>> section and is typically
installed in conjuction with the <<hadoop-plugin,Hadoop-Plugin>>.
[[sugar-plugin]]
=== Sugar Plugin
image:gremlin-sugar.png[width=120,float=left] In previous versions of Gremlin-Groovy, there were numerous
link:http://en.wikipedia.org/wiki/Syntactic_sugar[syntactic sugars] that users could rely on to make their traversals
more succinct. Unfortunately, many of these conventions made use of link:http://docs.oracle.com/javase/tutorial/reflect/[Java reflection]
and thus, were not performant. In TinkerPop, these conveniences have been removed in support of the standard
Gremlin-Groovy syntax being both inline with Gremlin-Java8 syntax as well as always being the most performant
representation. However, for those users that would like to use the previous syntactic sugars (as well as new ones),
there is `SugarGremlinPlugin` (a.k.a Gremlin-Groovy-Sugar).
IMPORTANT: It is important that the sugar plugin is loaded in a Gremlin Console session prior to any manipulations of
the respective TinkerPop objects as Groovy will cache unavailable methods and properties.
[source,groovy]
----
gremlin> :plugin use tinkerpop.sugar
==>tinkerpop.sugar activated
----
TIP: When using Sugar in a Groovy class file, add `static { SugarLoader.load() }` to the head of the file. Note that
`SugarLoader.load()` will automatically call `GremlinLoader.load()`.
==== Graph Traversal Methods
If a `GraphTraversal` property is unknown and there is a corresponding method with said name off of `GraphTraversal`
then the property is assumed to be a method call. This enables the user to omit `( )` from the method name. However,
if the property does not reference a `GraphTraversal` method, then it is assumed to be a call to `values(property)`.
[gremlin-groovy,modern]
----
g.V <1>
g.V.name <2>
g.V.outE.weight <3>
----
<1> There is no need for the parentheses in `g.V()`.
<2> The traversal is interpreted as `g.V().values('name')`.
<3> A chain of zero-argument step calls with a property value call.
==== Range Queries
The `[x]` and `[x..y]` range operators in Groovy translate to `RangeStep` calls.
[gremlin-groovy,modern]
----
g.V[0..2]
g.V[0..<2]
g.V[2]
----
==== Logical Operators
The `&` and `|` operator are overloaded in `SugarGremlinPlugin`. When used, they introduce the `AndStep` and `OrStep`
markers into the traversal. See <<and-step,`and()`>> and <<or-step,`or()`>> for more information.
[gremlin-groovy,modern]
----
g.V.where(outE('knows') & outE('created')).name <1>
t = g.V.where(outE('knows') | inE('created')).name; null <2>
t.toString()
t
t.toString()
----
<1> Introducing the `AndStep` with the `&` operator.
<2> Introducing the `OrStep` with the `|` operator.
==== Traverser Methods
It is rare that a user will ever interact with a `Traverser` directly. However, if they do, some method redirects exist
to make it easy.
[gremlin-groovy,modern]
----
g.V().map{it.get().value('name')} // conventional
g.V.map{it.name} // sugar
----
[[utilities-plugin]]
=== Utilities Plugin
The Utilities Plugin provides various functions, helper methods and imports of external classes that are useful in
the console.
NOTE: The Utilities Plugin is enabled in the Gremlin Console by default.
[[describe-graph]]
==== Describe Graph
A good implementation of the Gremlin APIs will validate their features against the
link:../dev/provider/#validating-with-gremlin-test[Gremlin test suite]. To learn more about a specific
implementation's compliance with the test suite, use the `describeGraph` function. The following shows the output
for `HadoopGraph`:
[gremlin-groovy,modern]
----
describeGraph(HadoopGraph)
----