blob: a4da8234f01d5116afb99b9a43b28e2e9af3a5d5 [file] [log] [blame]
= Exporting Result Sets
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
It's possible to export fully sorted result sets using a special <<query-re-ranking.adoc#,rank query parser>> and <<response-writers.adoc#,response writer>> specifically designed to work together to handle scenarios that involve sorting and exporting millions of records.
This feature uses a stream sorting technique that begins to send records within milliseconds and continues to stream results until the entire result set has been sorted and exported.
The cases where this functionality may be useful include: session analysis, distributed merge joins, time series roll-ups, aggregations on high cardinality fields, fully distributed field collapsing, and sort based stats.
== Field Requirements
All the fields being sorted and exported must have docValues set to true. For more information, see the section on <<docvalues.adoc#,DocValues>>.
== The /export RequestHandler
The `/export` request handler with the appropriate configuration is one of Solr's out-of-the-box request handlers - see <<implicit-requesthandlers.adoc#,Implicit RequestHandlers>> for more information.
Note that this request handler's properties are defined as "invariants", which means they cannot be overridden by other properties passed at another time (such as at query time).
== Requesting Results Export
You can use `/export` to make requests to export the result set of a query.
All queries must include `sort` and `fl` parameters, or the query will return an error. Filter queries are also supported.
An optional parameter `batchSize` determines the size of the internal buffers for partial results. The default value is `30000` but users may want to specify smaller values to limit the memory use (at the cost of degraded performance) or higher values to improve export performance (the relationship is not linear and larger values don't bring proportionally larger performance increases).
The supported response writers are `json` and `javabin`. For backward compatibility reasons `wt=xsort` is also supported as input, but `wt=xsort` behaves same as `wt=json`. The default output format is `json`.
Here is an example of an export request of some indexed log data:
[source,text]
----
http://localhost:8983/solr/core_name/export?q=my-query&sort=severity+desc,timestamp+desc&fl=severity,timestamp,msg
----
=== Specifying the Sort Criteria
The `sort` property defines how documents will be sorted in the exported result set. Results can be sorted by any field that has a field type of int,long, float, double, string. The sort fields must be single valued fields.
The export performance will get slower as you add more sort fields. If there is enough physical memory available outside of the JVM to load up the sort fields then the performance will be linearly slower with addition of sort fields.
It can get worse otherwise.
=== Specifying the Field List
The `fl` property defines the fields that will be exported with the result set. Any of the field types that can be sorted (i.e., int, long, float, double, string, date, boolean) can be used in the field list. The fields can be single or multi-valued. However, returning scores and wildcards are not supported at this time.
=== Specifying the Local Streaming Expression
The optional `expr` property defines a <<streaming-expressions.adoc#,stream expression>> that allows documents to be processed locally before they are exported in the result set.
Expressions have to use a special `input()` stream that represents original results from the `/export` handler. Output from the stream expression then becomes the output from the `/export` handler. The `&streamLocalOnly=true` flag is always set for this streaming expression.
Only stream <<stream-decorator-reference.adoc#,decorators>> and <<stream-evaluator-reference.adoc#,evaluators>> are supported in these expressions - using any of the <<stream-source-reference.adoc#,source>> expressions except for the pre-defined `input()` will result in an error.
Using stream expressions with the `/export` handler may result in dramatic performance improvements due to the local in-memory reduction of the number of documents to be returned.
Here's an example of using `top` decorator for returning only top N results:
[source,text]
----
http://localhost:8983/solr/core_name/export?q=my-query&sort=timestamp+desc,&fl=timestamp,reporter,severity&expr=top(n=2,input(),sort="timestamp+desc")
----
(Note that the sort spec in the `top` decorator must match the sort spec in the
handler parameter).
Here's an example of using `unique` decorator:
[source,text]
----
http://localhost:8983/solr/core_name/export?q=my-query&sort=reporter+desc,&fl=reporter&expr=unique(input(),over="reporter")
----
(Note that the `over` parameter must use one of the fields requested in the `fl` parameter).
== Distributed Support
See the section <<streaming-expressions.adoc#,Streaming Expressions>> for distributed support.