blob: ec4925a74f91fe33f73d21d9a7e09368e693300d [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"]
*x.y.z - Proposal 1*
= Equality, Equivalence, Comparability and Orderability Semantics
== Status
Existing Gremlin behavior documented here is being migrated to Provider Documentation. Tests for enforcement of this
existing behavior and suggestions for modifications are being scoped for 3.6.0. Open issues described in this document
are being set aside for further discussion.
== Motivation
How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound
like a trivial question at first, when looking under the surface many interesting questions arise, including aspects
around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly
different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around
comparability and ordering across different logical types, as well as around the identity of elements (such as vertex
and edge properties).
TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and
published for the different types of equality and comparison operations as of today. Rather, what equals what and how
values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used,
and hence may vary from context to context. We believe that a concise definition of these concepts is critical for
both TinkerPop users — who need to be able to reason about the outcome of their queries — as well as custom
implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop
should provide a complete and cohesive semantics for equality / comparison such that all Graph Providers can easily
ensure that their query processing approach aligns with the TinkerPop implementation. Helping users and
implementers alike, this will help increase the adoption of Gremlin as a query language.
This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle
equality / comparison in different contexts. Motivated by different examples of the status today, we formalize
different notions of equality and comparability and describe the contexts in which they apply. While the semantics
that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to
fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and
proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and
documentable.
=== Examples
Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined
behavior in TinkerPop today:
==== Underspecified or Undocumented Behavior
Consider an equality check such as
[source]
----
gremlin> g.V().has(id, 19)
----
Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an
ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value
19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting
rules apply, in other cases such as
[source]
----
gremlin> g.V().property(numericValue).dedup()
----
the two values above would always be treated as different entities.
==== The behavior that is inherently driven by Java:
Another example is equality over composite type.
[source]
----
gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
----
This query compares two `BulkSet` objects produced by cap-Step. But the comparison is Java dependent and we don’t have
a clear definition of how the comparison works for this kind of types. Same as `Map`, e.g.
[source]
----
gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
----
and even we have comparison over `Map.Entry` which is Java dependent type.
[source]
----
gremlin> g.V().group().unfold().order()
class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
----
==== Incompleteness Rendres Unexpected Results
A query which tries to determine the order across multiple types fails today.
[source]
----
// Properties have values of Integer and String.
gremlin> g.V().values("some property").order()
class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
// This query aims to order a heterogeneous result set
gremlin> g.V().union(out(), outE()).order()
class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
----
It would be more helpful for users to define the complete order across types and returns a result instead of throwing
an exception.
==== Inconsistencies Results
Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases
consistently at this moment.
[source]
----
gremlin> g.V("1").properties("key")
==>vp[key→0]
// NaN == 0 holds true for this equality check.
gremlin> g.V("1").has("key", Double.NaN)
==>v[1]
gremlin> g.V("1").properties()
==>vp[key→Infinity]
// 0.0 is interpreted as BigDecimal in Groovy and it tries to promote Infinity to BigDecimal as well,
// then the type casting fails. This is observed when using Java11.
gremlin> g.V("1").has("key", gt(0.0))
Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
----
In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered,
which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and
reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in
TinkerPop / Gremlin.
== Conceptualization of Equality and Comparison
In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the
formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we
now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs.
orderability, which constitute two flavors of these concepts tailored to their usage in different concepts. We
summarize and contrast these concepts in the following subsections; more technical details and discussion of edge
cases can be found in the technical appendix.
=== Proposed semantics
==== Equality vs. Equivalence
Equality defines when two values are considered equal in the context of database lookups and predicates, while
equivalence defines value collation semantics in the context of, for instance, deduplication. For instance,
equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be
defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two
"unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it
typically makes sense to group them together in, for instance, aggregations.
Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is
always either `true` or `false` (in particular, it never returns nulltype` or throws an exception). The details on
equality and equivalence are sketched in the following two subsections, respectively.
===== Equality
* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns `true` for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
Example:
[code]
----
// equality over 2 ids
gremlin> g.V().has(id, "some id")
// equality over vertices
gremlin> g.V().as("v").out().out().where(eq("v"))
----
* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different
numerical type if they cast into the exactly same same value of the lowest common super type.
* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
* Equality checks always return `true` or `false`. They never result in `nulltype` output, undefined behavior, nor do
they ever throw an error.
===== Equivalence
* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
Example:
[code]
----
// deduplication needs equivalence over 2 property values
gremlin> g.V().dedup().by("name")
// grouping by equivalence over 2 property values
gremlin> g.V().group().by("age")
----
* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are
always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account).
* For Number,
** Because type promotion is not effective, if the types are different then two numbers are never equivalent
** NaN is not equal to NaN, but equivalent to each other
* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
* Like equality, equivalence checks always return `true` or `false`. They never result in `nulltype` output, undefined behavior, nor do they ever throw an error.
==== Comparability vs. Orderability
Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range
comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is
stronger than orderability in the sense that everything that every order between two values that holds `true` w.r.t.
comparability also holds `true` w.r.t. orderability, but not vice versa. Comparability is what is being used in range
predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is
complete within a given type, but returns `nulltype` if the two types are considered incomparable (e.g., an integer
cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type
results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will
never return `nulltype` or throw an exception.
More details on comparability and orderability are sketched in the following two subsections, respectively.
===== Comparability
* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
Example:
[code]
----
// comparison over 2 property values
gremlin> g.E().has("weight", gt(1))
----
* For numbers,
** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
* Comparison should not result in undefined behavior, but can return `nulltype` if and only if we are comparing
incomparable data types. How this `nulltype` result is handled is Graph provider dependent.
* Otherwise Comparison does return `true` or `false`
===== Orderability
* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
* Orderability must not result in `nulltype` / undefined behavior.
* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to
determine the order of those two. This inevitably leads to the requirement to define the order across different data
types. For the detailed order across types, see appendix.
* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
* The concept of equivalence is used to determine if the 2 values are at the same position
* When the position is identical, which value comes first (in other words, whether it should perform stable sort)
depends on graph providers' implementation.
* For values of the same type, comparability can be used to determine which comes first except for `NaN` in Number.
For a different type, we have a dedicated order as described in the section below.
===== Mapping table for TinkerPop operators
Shown as below is a table for which notion proposed above each TinkerPop construct used.
[%header]
|================
|Construct|Concept
|P.eq |Equality
|P.neq |Equality
|P.within |Equality
|P.without|Equality
|P.lt |Comparability
|P.gt |Comparability
|P.lte |Equality, Comparability
|P.gte |Equality, Comparability
|P.inside |Comparability
|P.outside|Comparability
|P.between|Equality, Comparability
|================
== What would change ?
=== Semantics
In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced
in this proposal. Therefore these semantics should be published in the official TinkerPop documentation.
=== Behavioral changes
==== Equality
* NaN +
JDK11 seems to produce a different error from JDK8 when it comes to `BigDecimal` comparisons that hit NaN and such. For
JDK8 they seem to produce `NumberFormatException` but for JDK11 you get stuff like:
[code]
----
gremlin> g.V().has("key", Float.NaN)
Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
----
When `Double` / `Float` is stored, it always throws. With the proposed change, it wouldn't throw but because `NaN` is
not equal to any numbers this returns empty result.
* BigDecimal +
Equality around `BigDecimal` and special values which cannot be parsed as `Integer` such as `NaN`, `INF` should not
produce exceptions and should filter.
[code]
----
gremlin> g.addV().property('key',Float.NaN)
==>v[0]
gremlin> g.addV().property('key',1.0f)
==>v[2]
gremlin> g.V().has('key',Float.NaN)
==>v[0]
gremlin> g.V().has('key',1.0f)
==>v[2]
gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
==>1.0
gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
java.lang.NumberFormatException
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
java.lang.NumberFormatException
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.V().has('key',1.0) // proposed
==>v[2]
gremlin> g.V().values("key").is(eq(1.0)) // proposed
==>1.0
----
==== Comparability
* NaN +
Comparing on `NaN` should return no results.
[code]
----
gremlin> g.addV().property('key',-5)
==>v[0]
gremlin> g.addV().property('key',0)
==>v[2]
gremlin> g.addV().property('key',5)
==>v[4]
gremlin> g.addV().property('key',Double.NaN)
==>v[6]
gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
==>-5
==>0
==>NaN
gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
==>0
==>5
==>NaN
gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
==>-5
gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
==>5
gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
==>NaN
gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
==>NaN
gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
----
* Comparability throws exception today but based on the proposal, it returns `nulltype` when comparing incompatible types.
** When `Vertex` / `Edge` / `VertexProperty` is compared, today it throws but it should return `nulltype`.
** When `nulltype` is compared, today it throws an exception but it should return `nulltype`.
==== Equivalence
TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
==== Orderability
* Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in
ordering. The proposed change is to be able to order any types.
[code]
----
gremlin> g.V().order(). // 3.5.x
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
Type ':help' or ':h' for help.
Display stack trace? [yN]n
gremlin> g.V().order() // proposed
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
==>v[2]
==>marko
gremlin> g.addV().property("key", 100)
==>v[0]
gremlin> g.addV().property("key", "100000")
==>v[2]
gremlin> g.V().values('key').order() // 3.5.x
java.lang.Integer cannot be cast to java.lang.String
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.V().values('key').order() // proposed
==>100
==>100000
----
== Open Questions
* Should we take type-promotion into account in terms of equivalence ? +
[code]
----
// In this case below,
gremlin> g.V().property()
==>[key:1.0]
==>[key:1]
// which is more natural, whether we don't de-duplicate them
gremlin> g.V().property().dedup()
==>[key:1.0]
==>[key:1]
// or de-dup them
gremlin> g.V().property().dedup()
==>[key:1.0]
----
If de-duping, there is another question which value we should filter out. We need to define priority over types in `Number`.
Also note that TinkerPop is Java based and we have `Double.NaN` and `Float.NaN`, `±Double.INF` and `±Float.INF`. Not
adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
* `Map.Entry` is Java dependent type. Instead of defining semantics for `Map.Entry`, do we introduce a concept of like
key-value tuple for it to generalize?
* Today we have `Date` type but don’t we need timezone aware `DateTime` type as well?
* Some graph providers may not support `BigDecimal`. Do we leave how TinkerPop deals with `BigDecimal` to Graph Providers?
* Which should be more reasonable, `nulltype` eq `nulltype` is true or false?
** If it is true, it may be respected in JOIN operation
* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do
these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that
enforce the behaviors that we’ve outlined.
== Technical Appendix
=== Types
First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a
primitive type, we are using the following types:
* Boolean: true or false
* Integer
** int8
** int16 (short)
** int32
** int64 (long)
** uint8 (byte)
** uint16
** uint32
** uint64
** BigInt
* Float
** float32
** float64 (double)
*** In Double and Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
** BigFloat
* String / Char
* UUID
* Date
* `nulltype`
** It denotes the "undefined" value.
Graph providers may not support all of these types depending on the architecture and implementation. Therefore
TinkerPop must provide a way for Graph providers to override the behavior while it has its own default behavior. Also
when some types are not supported Graph providers needs to map unsupported types into supported types internally. This
mapping can be done in either information-preserving manner or non-preserving manner. Graph providers must tell which
mapping they support through `Graph.Features` as well as which types they support.
* Which atomic types are supported
** Boolean, Integer, Float, String, UUID and Date
** TinkerPop by default supports all of them
* Which integer types are supported
** int8, int16, int32, int64, uint8, uint16, uint32, uint64, BigInt
** TinkerPop by default supports uint8 (byte), int16 (short), int32 (int), int64 (long) and BigInt
* Which float types are supported
** float32, float64 and BigFloat
** TinkerPop by defualt supports all as float, double, and BigDecimal in Java
In addition to these, there are composite types as follows:
* Vertex
* Edge
* VertexProperty
* Property
** Edge property
** Vertex meta property
* PropertyKey
* PropertyValue
* Label
* ID
* Path
* List
* Map
* Set / BulkSet
* Map.Entry (obtained from unfolding a Map)
=== Type Casting
We do type casting a.k.a type promotion for Numbers. Numbers are Byte, Short, Integer, Long, Float, Double, BigInteger,
and BigDecimal. Here is the rule how types are promoted:
* If at least one is BigDecimal then compare as BigDecimal
* If at least one is BigInteger then compare as BigInteger
* If at least one is Double then compare as Double
* If one of them is a Float, then convert both to floating type of highest common bit denomination
** If another value is Long or Double, we need 64bit so convert both to Double
** Otherwise convert both to Float
* If at least one is Long then compare as Long
* If at least one is Integer then compare as Integer
* If at least one is Short then compare as Short
* If at least one is Byte then compare as Byte
BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type
casting for these 2 types can vary depending on a Graph provider.
=== Equality
==== Primitive types
===== Number
Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
* If either one of LHS or RHS is Number and another isn't, eq returns `false`.
* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
* Example for edge cases:
** -0.0 eq 0.0 = `true`
** +0.0 eq 0.0 = `true`
** -0.0 eq +0.0 = `true`
** NaN eq NaN = `false`
** +INF eq +INF = `true`
** -INF eq -INF = `true`
** -INF eq +INF = `false`
* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
===== Boolean
* If either one of LHS or RHS is Boolean and another isn't, return `false`
* True != False, True == True, False == False
===== String
* If either one of LHS or RHS is String and another isn't, return `false`
* We assume the common graphical order over unicode strings.
* LHS and RHS needs to be lexicographically equal for LHS eq RHS == `true` for String.
===== UUID
* UUID is evaluated based on its String representation.
* However, for example, UUID("b46d37e9-755c-477e-9ab6-44aabea51d50") and String "b46d37e9-755c-477e-9ab6-44aabea51d50" are not equal to each other.
===== Date
* If either one of LHS or RHS is Date and another isn't, return `false`
* LHS eq RHS == `true` when both LHS and RHS value are numerically identical in Unix Epoch time.
===== `nulltype`
* If either one of LHS or RHS is `nulltype` and another isn't, return `false`
* If both LHS and RHS are `nulltype`, return `true`
==== Composite types
For all of them, if LHS and RHS is not of the same data type, equality returns `false`. The following semantics applied when both LHS and RHS has the data type.
===== Vertex / Edge / VertexProperty
They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
===== Property
If key and value are same, 2 properties are equal.
===== PropertyKey
key is String type so Equality for String type applies.
===== PropertyValue
Any type, so Equality for a corresponding type applies.
===== ID
Any type, so Equality for a corresponding type applies.
===== Label
label is String type so Equality for String type applies.
===== Path
2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal.
===== List
* Two lists are equal if they contain the same (equal to each other) elements in the same order.
===== Map
* Two maps are equal when a Set of key-value pairs from those 2 maps are equal to each other. A key-value pair is equal to another pair if and only if both its key and value are equal to each other.
===== Set
* Two sets are equal if they contain the same (equal to each other) elements.
=== Equivalence
Equivalence is identical to Equality, except for the cases listed below.
==== Primitive types
===== Number
* Unlike Equality, we *don't do* type casting for Equivalence.
** If the type is different, they are not equivalent.
*** +INF^^double is not equivalent to +INF^^float
*** NaN^^double is not equivalent to NaN^^float
** 123 and 123.0 are equal but not equivalent to each other
* -0.0, 0.0, and +0.0 are not equivalent to each other
** -0.0 is equivalent to -0.0
** 0.0 is equivalent to 0.0
** +0.0 is equivalent to +0.0
* -INF and +INF are not equivalent to each other
** -INF is equivalent to -INF
** +INF is equivalent to +INF
** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
* NaN is not equivalent to any other numbers
** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
===== `nulltype`
* `nulltype` is not equivalent to any other values
* `nulltype` is equivalent to `nulltype`
=== Comparability
==== Primitive types
===== Number
* If either one of LHS or RHS is Numbers and another isn’t, throw an Exception. This comes first before the handling for each type.
* If both LHS and RHS are Numbers, try the type casting, and then compare 2 values.
* For -0.0, 0.0, +0.0, lt and gt returns `false` and lte, gte returns `true` because they are "equal" in this semantics.
* -INF < +INF
* Any comparison between NaN and any numbers (including NaN) should return `false` +
https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3
* IF `nulltype` and NaN is compared it should return `nulltype` as their "type" is different and they are not comparable.
===== Boolean
* If either one of LHS or RHS is Boolean and another isn’t, throws an Exception
* False < True
===== String
* If either one of LHS or RHS is String and another isn’t, returns `nulltype`.
* We assume the common lexicographical order over unicode strings
* LHS and RHS are compared lexicographically
* UUID is evaluated based on its String representation.
===== UUID
* UUID is evaluated based on its String representation.
* However, for example, UUID("b46d37e9-755c-477e-9ab6-44aabea51d50") and String "b46d37e9-755c-477e-9ab6-44aabea51d50" cannot be compared with each other, hence comparing them returns `nulltype`.
===== Date
* If either one of LHS or RHS is Date and another isn’t, throw an Exception
* Compare LHS and RHS based on chronological order, i.e. numerical order in timestamp.
===== `nulltype`
* `nulltype` is not comparable, if the LHS or RHS is `nulltype` then the comparison result is `nulltype`.
==== Composite types
For all of them, if LHS and RHS is not of the same data type, equality returns `false`. The following semantics applied when both LHS and RHS has the data type.
===== Vertex / Edge / VertexProperty
They are not comparable, return `nulltype`.
===== Property
It it not comparable, return `nulltype`.
===== PropertyKey
Comparability of String applies.
===== PropertyValue
Property values are of any primitive types defined, so Comparability for a corresponding type applies.
===== ID
IDs are of any primitive types defined, so Comparability for a corresponding type applies.
===== Label
Comparability of String applies.
===== Path
It it not comparable, throw an Exception.
===== List
It it not comparable, throw an Exception.
===== Map
It it not comparable, throw an Exception.
===== Map.Entry
It it not comparable, throw an Exception.
===== Set
It it not comparable, throw an Exception.
=== Orderability
To sort across any types of values, we define the order between each type as follows:
(In this order, ID, label, property key and property value are considered as a part of primitive types)
* `nulltype`
* Boolean
* Number
* Date
* String
* Vertex
* Edge
* VertexProperty
* Property
* Path
* List
* Map
==== Primitive types
===== Number
* Same applies as Comparability. Exceptions are as below:
** NaN is ordered at a larger index among all Numbers. i.e. after +INF.
* We do type promotion for orderability as we do for comparability.
===== Boolean
* False < True
===== String
* String value is ordered lexicographically
===== UUID
* UUID is ordered lexicographically based on its String representation
===== Date
* Date value is ordered chronologically
===== `nulltype`
* `nulltype` is before all value types
==== Composite types
===== Vertex / Edge / VertexProperty
They are ordered by their ID. The ID is chosen internally by the implementation, so ordering is implementation specific, but is guaranteed to be stable.
===== Property
They are ordered by property key. If the key is equal, then property value is used as the 2nd key.
===== PropertyKey
Comparability of String applies.
===== PropertyValue
Property values are of any primitive types defined, so orderability for a corresponding type applies.
===== ID
IDs are of any primitive types defined, so orderability for a corresponding type applies.
===== Label
Comparability of String applies.
===== Path
* Orderability of the 1st element in the Path applies. Empty Path should come first.
* If the 1st element is tie, then check the next element, and so on.
* If one Path exhausts the element fast then it comes earlier in the order.
===== List
* Orderability of the 1st element in the List applies.
* Empty List should come first.
* If the 1st element is tie, then check the next element, and so on.
* If one List exhausts the element fast then it comes earlier in the order.
===== Map
* For 2 maps, get the 1st entry (a key-value pair) from both, the orderability between them decides the order of the maps.
* If the 1st entry is tie, then we pick the 2nd one and repeat the process until we determine the order.
** So the orderability of Map depends on in which order they return an entry. It is implementation dependent and undefined in this semantics.
* If one Map exhausts an entry earlier than another, then it comes earlier in the order.
===== Map.Entry
* First check the orderability of their key.
* If the key ties, then check the orderability of their value.
===== Set
* For 2 sets, get the 1st item from both, the orderbaility between them decides the order of the sets.
* If the 1st item is tie, we pick the 2nd one and so on until we determine the order.
** So the orderability of Set depends on in which order they return an item. It is implementation dependent and undefined in this semantics.
* If one Set exhausts an item earlier than another, then it comes earlier in the order.