blob: 0647dbc3bb397962a71c3b11d7653cf05752964a [file] [log] [blame] [view]
# Auto Pagination in Drill
Remote APIs frequently implement some sort of pagination as a way of limiting results. However, if you are performing bulk data analysis, it is necessary to reassemble the
data into one larger dataset. Drill's auto-pagination features allow this to happen in the background, so that the user will get clean data back.
To use a paginator, you simply have to configure the paginator in the connection for the particular API.
## Words of Caution
While extremely powerful, the auto-pagination feature has the potential to run afoul of APIs rate limits and even potentially DDoS an API. Please use with extreme care.
## Rate Limits
When using automatic pagination, you may encounter APIs that have burst limits or other limits
as to the maximum number of requests in a minute or other amount of time. Drill allows you to
set a `retryDelay` parameter which is the number of milliseconds that Drill should wait before
resending the request. This defaults to 1 second. This option is set in the configuration for
the HTTP plugin.
## Offset Pagination
Offset Pagination uses commands similar to SQL which has a `LIMIT` and an `OFFSET`. With an offset paginator, let's say you want 200 records and the page size is 50 records, the offset paginator will break up your query into 4 requests as shown below:
* myapi.com?limit=50&offset=0
* myapi.com?limit=50?offset=50
* myapi.com?limit=50&offset=100
* myapi.com?limit=50&offset=150
### Configuring Offset Pagination
To configure an offset paginator, simply add the following to the configuration for your connection.
```json
"paginator": {
"limitParam": "<limit>",
"offsetParam": "<offset>",
"pageSize": 100,
"method": "OFFSET"
}
```
## Page Pagination
Page pagination is very similar to offset pagination except instead of using an `OFFSET` it uses a page number.
```json
"paginator": {
"pageParam": "page",
"pageSizeParam": "per_page",
"pageSize": 100,
"method": "PAGE"
}
```
In either case, the `pageSize` parameter should be set to the maximum page size allowable by the API. This will minimize the number of requests Drill is making.
## Index / KeySet Pagination
Index or KeySet pagination is when the API itself returns values to generate the next page.
Consider an API that returned data like this:
```json
{
"companies": [
...
],
"has-more": true,
"index": 3849945478
}
```
In this case, the `has-more` parameter is a boolean value which indicates whether or not there are more pages. The `index` parameter gets appended to the URL in question to generate the next page.
`https://api.myapi.com/paged?properties=name&properties=website&offset=3856722038`
There is a slight variant of this where the API will return the actual URL of the next page.
There are three possible parameters:
* `hasMoreParam`: This is the name of the boolean parameter which indicates whether the API has more pages.
* `indexParam`: The parameter name of the key or offset that will be used to generate the next page.
* `nextPageParam`: The parameter name which returns a complete URL of the next page.
** Note: Index / Keyset Pagination is only implemented for APIs that return JSON **
## Header Index Pagination
Header index pagination is used when the API in question returns a link to the next page in the response header. Shopify is one such example of an API that does this.
The only configuration option is the `nextPageParam` which is the parameter that Drill should look for in the response header.
```json
"paginator": {
"nextPageParam": "page",
"method": "HEADER_INDEX"
}
```