docs/src/dev/future/proposal-declarative-match-step-9.asciidoc - tinkerpop - Git at Google

 ////
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 ////

 == *Specification for the declarative `match()` step*

 This document outlines the specification for a new, declarative
 `match()` step in Gremlin. This step is designed to replace the existing
 imperative `match()` step, which has proven difficult for providers to
 optimize and for users to utilize effectively. The new step leverages
 familiar declarative graph pattern matching syntax to provide a more
 powerful, intuitive, and optimizable query experience.

 === 1. Motivation

 The current `match()` step in Gremlin is imperative, which requires
 graph providers to translate complex traversal logic into an optimizable
 form, a notoriously difficult task. Consequently, its adoption is low,
 and its performance is often suboptimal.

 The proposed solution introduces a new `match()` step that accepts a
 query string from a standard declarative graph query language. This
 allows the underlying database to use its native query planner and
 optimizer to execute the pattern match efficiently, while Gremlin
 retains its role in composing the declarative query with broader
 imperative traversals.

 '''''

 === 2. Core Concept

 The `match()` step introduces a declarative pattern matching clause into
 a Gremlin traversal. The variables bound within the pattern are not
 returned directly but are added to the traversal’s path history, making
 them accessible to subsequent steps like `select()`.

 The `match()` step can be used as both a start step on a
 `GraphTraversalSource` and as a mid-traversal step.

 === 3. Declarative Language

 The `match()` step is language-agnostic by design, but it will
 standardize on a default language to ensure portability.

 * *Default Language:* A restricted, read-only subset of *GQL* will be
 the default language. This subset will primarily support `MATCH` and
 `WHERE` clauses. The `RETURN` clause will *not* be supported in the
 default implementation (see Section 6). A provider implementing the
 default GQL does not need to be specified via a modulator.
 ** Example:
 `g.match("MATCH (p:Person WHERE p.name = 'Stephen')-[:knows]->(friend)")`
 * *Provider-Specific Languages:* Providers may support other declarative
 languages (e.g., Cypher, GSQL, SQL++) via the `queryLanguage` modulator.
 ** Example: `g.match("...").with("queryLanguage", "GSQL")`

 To aid vendors, the TinkerPop project should consider providing a
 reference ANTLR4 grammar for the default GQL dialect.

 '''''

 === 4. Parameterization

 Parameterized queries are supported to prevent query injection and improve performance by enabling query
 plan caching. If such exist, parameters are
 supplied using the `Map` argument as part of the match query step.

 * *Example:*
 +
 [source,groovy]
 ----
 g.match("MATCH (p:Person WHERE p.name = $personName)", Map.of("personName", "Stephen"))
 ----

 If parameters are not explicitly provided via
 `Map`, an implicit lookup on remote server bindings may be performed.

 '''''

 === 5. Execution Semantics

 The `match()` step behaves similarly to the `V()` and `E()` steps.

 * *Start Step:* When used as a start step (`g.match(...)`), it executes
 the pattern match against the entire graph.
 * *Mid-Traversal Step:* When used mid-traversal
 (`g.V(1).out().match(...)`), it *does not* operate on the incoming
 traversers. Instead, like `V()`, it ``resets'' the traversal and
 executes its pattern match against the *entire graph for each input
 parameter*. The incoming traversers are disregarded.

 The step is restricted to executing *idempotent (read-only)* queries.

 '''''

 === 6. Return Value and Data Access

 A key aspect of this new design is its seamless integration with
 Gremlin’s existing projection and path-management mechanisms.

 * *`match()` Step Return Value:* The `match()` step itself does not emit
 any elements. It produces an `Optional.empty()`.
 * *Accessing Matched Data:* All variables bound in the `MATCH` clause
 (e.g., `(p)`, `(friend)` in the examples) are automatically bound to the
 traversal’s path history. This data is then accessed using the
 `select()` step. This design eliminates the need for a `RETURN` clause
 and allows the user to continue chaining Gremlin steps naturally.

 ==== *Example Flow*

 Consider a query to find a person and then, using Gremlin, calculate the
 sum of the weights of their outgoing `:knows` edges.

 *Query:*

 [source,groovy]
 ----
 g.match("MATCH (n:Person {name:'Cole'})-[e:knows]->(friend)")
  .select("e")
  .values("weight")
  .sum()
 ----

 *Execution Breakdown:*

 [arabic]
 . `match("...")`: Finds all patterns matching a person named `Cole' with
 an outgoing `knows` edge. For each match, it binds the vertex `n`, the
 edge `e`, and the vertex `friend` to the path history. It returns
 `Optional.empty()`.
 . `select("e")`: It projects the element associated with the label `e`.
 The traversal stream now contains the `knows` edges.
 . `values("weight").sum()`: Standard Gremlin steps that operate on the
 stream of edges to calculate the sum of their `weight` properties.

 This approach is elegant, highly composable, and allows providers to
 optimize the declarative query by analyzing subsequent `select()` steps
 to determine which variables are actually needed.

 === 6. Deprecation of Existing `match()` Step

 The existing, imperative `match()` step will be marked as *deprecated*
 upon the introduction of this new specification and will be removed in a
 future major release of TinkerPop.
	////
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	////

	== Specification for the declarative `match()` step

	This document outlines the specification for a new, declarative
	`match()` step in Gremlin. This step is designed to replace the existing
	imperative `match()` step, which has proven difficult for providers to
	optimize and for users to utilize effectively. The new step leverages
	familiar declarative graph pattern matching syntax to provide a more
	powerful, intuitive, and optimizable query experience.

	=== 1. Motivation

	The current `match()` step in Gremlin is imperative, which requires
	graph providers to translate complex traversal logic into an optimizable
	form, a notoriously difficult task. Consequently, its adoption is low,
	and its performance is often suboptimal.

	The proposed solution introduces a new `match()` step that accepts a
	query string from a standard declarative graph query language. This
	allows the underlying database to use its native query planner and
	optimizer to execute the pattern match efficiently, while Gremlin
	retains its role in composing the declarative query with broader
	imperative traversals.

	'''''

	=== 2. Core Concept

	The `match()` step introduces a declarative pattern matching clause into
	a Gremlin traversal. The variables bound within the pattern are not
	returned directly but are added to the traversal’s path history, making
	them accessible to subsequent steps like `select()`.

	The `match()` step can be used as both a start step on a
	`GraphTraversalSource` and as a mid-traversal step.

	=== 3. Declarative Language

	The `match()` step is language-agnostic by design, but it will
	standardize on a default language to ensure portability.

	* Default Language: A restricted, read-only subset of GQL will be
	the default language. This subset will primarily support `MATCH` and
	`WHERE` clauses. The `RETURN` clause will not be supported in the
	default implementation (see Section 6). A provider implementing the
	default GQL does not need to be specified via a modulator.
	** Example:
	`g.match("MATCH (p:Person WHERE p.name = 'Stephen')-[:knows]->(friend)")`
	* Provider-Specific Languages: Providers may support other declarative
	languages (e.g., Cypher, GSQL, SQL++) via the `queryLanguage` modulator.
	** Example: `g.match("...").with("queryLanguage", "GSQL")`

	To aid vendors, the TinkerPop project should consider providing a
	reference ANTLR4 grammar for the default GQL dialect.

	'''''

	=== 4. Parameterization

	Parameterized queries are supported to prevent query injection and improve performance by enabling query
	plan caching. If such exist, parameters are
	supplied using the `Map` argument as part of the match query step.

	* Example:
	+
	[source,groovy]
	----
	g.match("MATCH (p:Person WHERE p.name = $personName)", Map.of("personName", "Stephen"))
	----

	If parameters are not explicitly provided via
	`Map`, an implicit lookup on remote server bindings may be performed.

	'''''

	=== 5. Execution Semantics

	The `match()` step behaves similarly to the `V()` and `E()` steps.

	* Start Step: When used as a start step (`g.match(...)`), it executes
	the pattern match against the entire graph.
	* Mid-Traversal Step: When used mid-traversal
	(`g.V(1).out().match(...)`), it does not operate on the incoming
	traversers. Instead, like `V()`, it ``resets'' the traversal and
	executes its pattern match against the *entire graph for each input
	parameter*. The incoming traversers are disregarded.

	The step is restricted to executing idempotent (read-only) queries.

	'''''

	=== 6. Return Value and Data Access

	A key aspect of this new design is its seamless integration with
	Gremlin’s existing projection and path-management mechanisms.

	* `match()` Step Return Value: The `match()` step itself does not emit
	any elements. It produces an `Optional.empty()`.
	* Accessing Matched Data: All variables bound in the `MATCH` clause
	(e.g., `(p)`, `(friend)` in the examples) are automatically bound to the
	traversal’s path history. This data is then accessed using the
	`select()` step. This design eliminates the need for a `RETURN` clause
	and allows the user to continue chaining Gremlin steps naturally.

	==== Example Flow

	Consider a query to find a person and then, using Gremlin, calculate the
	sum of the weights of their outgoing `:knows` edges.

	Query:

	[source,groovy]
	----
	g.match("MATCH (n:Person {name:'Cole'})-[e:knows]->(friend)")
	.select("e")
	.values("weight")
	.sum()
	----

	Execution Breakdown:

	[arabic]
	. `match("...")`: Finds all patterns matching a person named `Cole' with
	an outgoing `knows` edge. For each match, it binds the vertex `n`, the
	edge `e`, and the vertex `friend` to the path history. It returns
	`Optional.empty()`.
	. `select("e")`: It projects the element associated with the label `e`.
	The traversal stream now contains the `knows` edges.
	. `values("weight").sum()`: Standard Gremlin steps that operate on the
	stream of edges to calculate the sum of their `weight` properties.

	This approach is elegant, highly composable, and allows providers to
	optimize the declarative query by analyzing subsequent `select()` steps
	to determine which variables are actually needed.

	=== 6. Deprecation of Existing `match()` Step

	The existing, imperative `match()` step will be marked as deprecated
	upon the introduction of this new specification and will be removed in a
	future major release of TinkerPop.