| ## Paging |
| |
| When a query returns many rows, it would be inefficient to return them |
| as a single response message. Instead, the driver breaks the results |
| into *pages* which get returned as they are needed. |
| |
| ### Setting the fetch size |
| |
| The *fetch size* specifies how many rows will be returned at once by |
| Cassandra (in other words, it's the size of each page). |
| |
| You can set a default fetch size globally for a `Cluster` instance: |
| |
| ```java |
| // At initialization: |
| Cluster cluster = Cluster.builder() |
| .addContactPoint("127.0.0.1") |
| .withQueryOptions(new QueryOptions().setFetchSize(2000)) |
| .build(); |
| |
| // Or at runtime: |
| cluster.getConfiguration().getQueryOptions().setFetchSize(2000); |
| ``` |
| |
| The fetch size can also be set on a statement: |
| |
| ```java |
| Statement statement = new SimpleStatement("your query"); |
| statement.setFetchSize(2000); |
| ``` |
| |
| If the fetch size is set on a statement, it will take precedence; |
| otherwise, the cluster-wide value (which defaults to 5000) will be used. |
| |
| Note that setting a fetch size doesn't mean that Cassandra will always |
| return the exact number of rows, it is possible that it returns slightly |
| more or less results. |
| |
| ### Result set iteration |
| |
| The fetch size limits the number of results that are returned in one |
| page; if you iterate past that, the driver will run background queries |
| to fetch subsequent pages. Here's an example with a fetch size of 20: |
| |
| ```ditaa |
| client Session Cassandra |
| --+--------------+---------------------------------+----- |
| |execute(query)| | |
| |------------->| | |
| | | query rows 1 to 20 | |
| | |-------------------------------->| |
| | | | |
| | |create | |
| | |------>ResultSet | |
| | | | |
| +-----+--------+-----------------+-+ | |
| |For i in 1..20| | | | |
| +--------------+ | | | |
| | | get next row | | | |
| | |------------------------->| | | |
| | | row i | | | |
| | |<-------------------------| | | |
| | | | | | |
| +-----+--------------------------+-+ | |
| | | | |
| | | | |
| | get next row | | |
| |------------------------->| | |
| | | query rows 21 to 40 | |
| | |-------------------->| |
| | row 21 | | |
| |<------------------------ | | |
| ``` |
| |
| By default, the background fetch happens at the last moment, when there |
| are no more "local" rows available. If you need finer control, the |
| [ResultSet][result_set] interface provides the following methods: |
| |
| * `getAvailableWithoutFetching()` and `isFullyFetched()` to check the |
| current state; |
| * `fetchMoreResults()` to force a page fetch. |
| |
| Here's how you could use these methods to pre-fetch the next page in |
| advance, in order to avoid the performance hit at the end of each page: |
| |
| ```java |
| ResultSet rs = session.execute("your query"); |
| for (Row row : rs) { |
| if (rs.getAvailableWithoutFetching() == 100 && !rs.isFullyFetched()) |
| rs.fetchMoreResults(); // this is asynchronous |
| // Process the row ... |
| System.out.println(row); |
| } |
| ``` |
| |
| If you use paging with the async API, you'll also want to use those |
| methods to avoid triggering synchronous fetches unintentionally; see |
| [async paging](../async/#async-paging). |
| |
| |
| ### Saving and reusing the paging state |
| |
| Sometimes it is convenient to save the paging state in order to restore |
| it later. For example, consider a stateless web service that displays a |
| list of results with a link to the next page. When the user clicks that |
| link, we want to run the exact same query, except that the iteration |
| should start where we stopped on the previous page. |
| |
| To do so, the driver exposes a [PagingState][paging_state] object that represents |
| where we were in the result set when the last page was fetched: |
| |
| ```java |
| ResultSet resultSet = session.execute("your query"); |
| // iterate the result set... |
| PagingState pagingState = resultSet.getExecutionInfo().getPagingState(); |
| ``` |
| |
| This object can be serialized to a `String` or a byte array: |
| |
| ```java |
| String string = pagingState.toString(); |
| byte[] bytes = pagingState.toBytes(); |
| ``` |
| |
| This serialized form can be saved in some form of persistent storage to |
| be reused later. In our web service example, we would probably save the |
| string version as a query parameter in the URL to the next page |
| (`http://myservice.com/results?page=<...>`). When that value is |
| retrieved later, we can deserialize it and reinject it in a statement: |
| |
| ```java |
| PagingState pagingState = PagingState.fromString(string); |
| Statement st = new SimpleStatement("your query"); |
| st.setPagingState(pagingState); |
| ResultSet rs = session.execute(st); |
| ``` |
| |
| Note that the paging state can only be reused with the exact same |
| statement (same query string, same parameters). Also, it is an opaque |
| value that is only meant to be collected, stored an re-used. If you try |
| to modify its contents or reuse it with a different statement, the |
| driver will raise an error. |
| |
| Putting it all together, here's a more comprehensive example |
| implementation for our web service: |
| |
| ```java |
| final int RESULTS_PER_PAGE = 100; |
| |
| Statement st = new SimpleStatement("your query"); |
| st.setFetchSize(RESULTS_PER_PAGE); |
| |
| String requestedPage = extractPagingStateStringFromURL(); |
| // This will be absent for the first page |
| if (requestedPage != null) { |
| st.setPagingState( |
| PagingState.fromString(requestedPage)); |
| } |
| |
| ResultSet rs = session.execute(st); |
| PagingState nextPage = rs.getExecutionInfo().getPagingState(); |
| |
| // Note that we don't rely on RESULTS_PER_PAGE, since Cassandra might |
| // have not respected it, or we might be at the end of the result set |
| int remaining = rs.getAvailableWithoutFetching(); |
| for (Row row : rs) { |
| renderInResponse(row); |
| if (--remaining == 0) { |
| break; |
| } |
| } |
| |
| // This will be null if there are no more pages |
| if (nextPage != null) { |
| renderNextPageLink(nextPage.toString()); |
| } |
| ``` |
| |
| [result_set]:http://docs.datastax.com/en/drivers/java/3.2/com/datastax/driver/core/ResultSet.html |
| [paging_state]:http://docs.datastax.com/en/drivers/java/3.2/com/datastax/driver/core/PagingState.html |
| |
| |
| Due to internal implementation details, `PagingState` instances are not |
| portable across [native protocol](../native_protocol/) versions. This |
| could become a problem in the following scenario: |
| |
| * you're using the driver 2.0.x and Cassandra 2.0.x, and therefore |
| native protocol v2; |
| * a user bookmarks a link to your web service that contains a serialized |
| paging state; |
| * you upgrade your server stack to use the driver 2.1.x and Cassandra |
| 2.1.x, so you're now using protocol v3; |
| * the user tries to reload their bookmark, but the paging state was |
| serialized with protocol v2, so trying to reuse it will fail. |
| |
| If this is not acceptable for you, you might want to consider the unsafe |
| API described in the next section. |
| |
| #### Unsafe API |
| |
| As an alternative to the standard API, there are two methods that |
| manipulate a raw `byte[]` instead of a `PagingState` object: |
| |
| * [ExecutionInfo#getPagingStateUnsafe()][gpsu] |
| * [Statement#setPagingStateUnsafe(byte[])][spsu] |
| |
| These low-level methods perform no validation on their arguments; |
| therefore nothing protects you from reusing a paging state that was |
| generated from a different statement, or altered in any way. This could |
| result in sending a corrupt paging state to Cassandra, with |
| unpredictable consequences (ranging from wrong results to a query |
| failure). |
| |
| There are two situations where you might want to use the unsafe API: |
| |
| * you never expose the paging state to end users and you are confident |
| that it won't get altered; |
| * you want portability across protocol versions and/or you prefer |
| implementing your own validation logic (for example, signing the raw |
| state with a private key). |
| |
| [gpsu]: http://www.datastax.com/drivers/java/3.2/com/datastax/driver/core/ExecutionInfo.html#getPagingStateUnsafe-- |
| [spsu]: http://www.datastax.com/drivers/java/3.2/com/datastax/driver/core/Statement.html#setPagingStateUnsafe-byte:A- |
| |
| ### Offset queries |
| |
| Saving the paging state works well when you only let the user move from |
| one page to the next. But it doesn't allow random jumps (like "go |
| directly to page 10"), because you can't fetch a page unless you have |
| the paging state of the previous one. Such a feature would require |
| *offset queries*, but they are not natively supported by Cassandra (see |
| [CASSANDRA-6511](https://issues.apache.org/jira/browse/CASSANDRA-6511)). |
| The rationale is that offset queries are inherently inefficient (the |
| performance will always be linear in the number of rows skipped), so the |
| Cassandra team doesn't want to encourage their use. |
| |
| If you really want offset queries, you can emulate them client-side. |
| You'll still get linear performance, but maybe that's acceptable for |
| your use case. For example, if each page holds 10 rows and you show at |
| most 20 pages, this means you'll fetch at most 190 extra rows, which |
| doesn't sound like a big deal. |
| |
| For example, if the page size is 10, the fetch size is 50, and the user |
| asks for page 12 (rows 110 to 119): |
| |
| * execute the statement a first time (the result set contains rows 0 to |
| 49, but you're not going to use them, only the paging state); |
| * execute the statement a second time with the paging state from the |
| first query; |
| * execute the statement a third time with the paging state from the |
| second query. The result set now contains rows 100 to 149; |
| * skip the first 10 rows of the iterator. Read the next 10 rows and |
| discard the remaining ones. |
| |
| You'll want to experiment with the fetch size to find the best balance: |
| too small means many background queries; too big means bigger messages |
| and too many unneeded rows returned (we picked 50 above for the sake of |
| example, but it's probably too small -- the default is 5000). |
| |
| Again, offset queries are inefficient by nature. Emulating them |
| client-side is a compromise when you think you can get away with the |
| performance hit. We recommend that you: |
| |
| * test your code at scale with the expected query patterns, to make sure |
| that your assumptions are correct; |
| * set a hard limit on the highest possible page number, to prevent |
| malicious users from triggering queries that would skip a huge amount |
| of rows. |