| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| [[pagination]] |
| == Pagination |
| |
| image:gremlin-paging.png[float=left,width=330]In most database applications, it is oftentimes desirable to return |
| discrete blocks of data for a query rather than all of the data that the total results would contain. This approach to |
| returning data is referred to as "pagination" and typically involves a situation where the client executing the query |
| can specify the start position and end position (or the amount of data to return in lieu of the end position) |
| representing the block of data to return. In this way, one could return the first ten records of one hundred, then the |
| second ten records and so on, until potentially all one hundred were returned. |
| |
| In Gremlin, a basic approach to paging would look something like the following: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().hasLabel('person').fold() <1> |
| g.V().hasLabel('person'). |
| fold().as('persons','count'). |
| select('persons','count'). |
| by(range(local, 0, 2)). |
| by(count(local)) <2> |
| g.V().hasLabel('person'). |
| fold().as('persons','count'). |
| select('persons','count'). |
| by(range(local, 2, 4)). |
| by(count(local)) <3> |
| ---- |
| |
| <1> Gets all the "person" vertices. |
| <2> Gets the first two "person" vertices and includes the total number of vertices so that the client knows how many |
| it has to page through. |
| <3> Gets the final two "person" vertices. |
| |
| From a functional perspective, the above example shows a fairly standard paging model. Unfortunately, there is a |
| problem. To get the total number of vertices, the traversal must first `fold()` them, which iterates out |
| the traversal bringing them all into memory. If the number of "person" vertices is large, that step could lead to a |
| long running traversal and perhaps one that would simply run out of memory prior to completion. There is no shortcut |
| to getting a total count without doing a full iteration of the traversal. If the requirement for a total count is |
| removed then the traversals become more simple: |
| |
| [gremlin-groovy,modern] |
| ---- |
| g.V().hasLabel('person').range(0,2) |
| g.V().hasLabel('person').range(2,4) |
| ---- |
| |
| NOTE: The first traversal above could also be written as `g.V().hasLabel('person').limit(2)`. |
| |
| In this case, there is no way to know the total count so the only way to know if the end of the results have been |
| reached is to count the results from each paged result to see if there's less than the number expected or simply zero |
| results. In that case, further requests for additional pages would be unnecessary. Of course, this approach is not |
| free of problems either. Most graph databases will not optimize the `range()`-step, meaning that the second traversal |
| will repeat the iteration of the first two vertices to get to the second set of two vertices. In other words, for the |
| second traversal, the graph will still read four vertices even though there was only a request for two. |
| |
| The only way to completely avoid that problem is to re-use the same traversal instance: |
| |
| [gremlin-groovy,modern] |
| ---- |
| t = g.V().hasLabel('person');[] |
| t.next(2) |
| t.next(2) |
| ---- |
| |
| A further consideration relates to the order in which results are returned. TinkerPop does not guarantee that the |
| order of the items returned on the same traversal will be the same order each time the traversal is iterated. |
| TinkerPop only guarantees that it does not re-shuffle the order provided by the underlying graph database. This |
| guarantee has two implications: |
| |
| 1. Iteration order is dependent on the underlying graph database. Some graphs may guarantee ordering and some may not |
| and still some others may guarantee ordering but only under certain conditions. Consult the documentation of the |
| graph database for more information on this. |
| 2. Use `order()`-step to make iteration order explicit if guarantees are required. |