blob: 63393d46b52758e3f89264c99edbcba97bfb1231 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
[[pagination]]
== Pagination
image:gremlin-paging.png[float=left,width=330]In most database applications, it is oftentimes desirable to return
discrete blocks of data for a query rather than all of the data that the total results would contain. This approach to
returning data is referred to as "pagination" and typically involves a situation where the client executing the query
can specify the start position and end position (or the amount of data to return in lieu of the end position)
representing the block of data to return. In this way, one could return the first ten records of one hundred, then the
second ten records and so on, until potentially all one hundred were returned.
In Gremlin, a basic approach to paging would look something like the following:
[gremlin-groovy,modern]
----
g.V().hasLabel('person').fold() <1>
g.V().hasLabel('person').
fold().as('persons','count').
select('persons','count').
by(range(local, 0, 2)).
by(count(local)) <2>
g.V().hasLabel('person').
fold().as('persons','count').
select('persons','count').
by(range(local, 2, 4)).
by(count(local)) <3>
----
<1> Gets all the "person" vertices.
<2> Gets the first two "person" vertices and includes the total number of vertices so that the client knows how many
it has to page through.
<3> Gets the final two "person" vertices.
From a functional perspective, the above example shows a fairly standard paging model. Unfortunately, there is a
problem. To get the total number of vertices, the traversal must first `fold()` them, which iterates out
the traversal bringing them all into memory. If the number of "person" vertices is large, that step could lead to a
long running traversal and perhaps one that would simply run out of memory prior to completion. There is no shortcut
to getting a total count without doing a full iteration of the traversal. If the requirement for a total count is
removed then the traversals become more simple:
[gremlin-groovy,modern]
----
g.V().hasLabel('person').range(0,2)
g.V().hasLabel('person').range(2,4)
----
NOTE: The first traversal above could also be written as `g.V().hasLabel('person').limit(2)`.
In this case, there is no way to know the total count so the only way to know if the end of the results have been
reached is to count the results from each paged result to see if there's less than the number expected or simply zero
results. In that case, further requests for additional pages would be unnecessary. Of course, this approach is not
free of problems either. Most graph databases will not optimize the `range()`-step, meaning that the second traversal
will repeat the iteration of the first two vertices to get to the second set of two vertices. In other words, for the
second traversal, the graph will still read four vertices even though there was only a request for two.
The only way to completely avoid that problem is to re-use the same traversal instance:
[gremlin-groovy,modern]
----
t = g.V().hasLabel('person');[]
t.next(2)
t.next(2)
----
A further consideration relates to the order in which results are returned. TinkerPop does not guarantee that the
order of the items returned on the same traversal will be the same order each time the traversal is iterated.
TinkerPop only guarantees that it does not re-shuffle the order provided by the underlying graph database. This
guarantee has two implications:
1. Iteration order is dependent on the underlying graph database. Some graphs may guarantee ordering and some may not
and still some others may guarantee ordering but only under certain conditions. Consult the documentation of the
graph database for more information on this.
2. Use `order()`-step to make iteration order explicit if guarantees are required.