solr/solr-ref-guide/src/collapse-and-expand-results.adoc - lucene-solr - Git at Google

 = Collapse and Expand Results
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 The Collapsing query parser and the Expand component combine to form an approach to grouping documents for field collapsing in search results.

 The Collapsing query parser groups documents (collapsing the result set) according to your parameters, while the Expand component provides access to documents in the collapsed group for use in results display or other processing by a client application. Collapse & Expand can together do what the older <<result-grouping.adoc#,Result Grouping>> (`group=true`) does for _most_ use-cases but not all. Collapse and Expand are not supported when Result Grouping is enabled. Generally, you should prefer Collapse & Expand.

 [IMPORTANT]
 ====
 In order to use these features with SolrCloud, the documents must be located on the same shard. To ensure document co-location, you can define the `router.name` parameter as `compositeId` when creating the collection. For more information on this option, see the section <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,Document Routing>>.
 ====

 == Collapsing Query Parser

 The `CollapsingQParser` is really a _post filter_ that provides more performant field collapsing than Solr's standard approach when the number of distinct groups in the result set is high. This parser collapses the result set to a single document per group before it forwards the result set to the rest of the search components. So all downstream components (faceting, highlighting, etc.) will work with the collapsed result set.

 The CollapsingQParserPlugin fully supports the QueryElevationComponent.

 === Collapsing Query Parser Options

 The CollapsingQParser accepts the following local parameters:

 `field`::
 The field that is being collapsed on. The field must be a single valued String, Int or Float-type of field.

 `min` or `max`::
 Selects the group head document for each group based on which document has the min or max value of the specified numeric field or <<function-queries.adoc#,function query>>.
 +
 At most only one of the `min`, `max`, or `sort` (see below) parameters may be specified.
 +
 If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none.

 `sort`::
 Selects the group head document for each group based on which document comes first according to the specified <<common-query-parameters.adoc#sort-parameter,sort string>>.
 +
 At most only one of the `min`, `max`, (see above) or `sort` parameters may be specified.
 +
 If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none.

 `nullPolicy`::
 There are three available null policies:
 +
 * `ignore`: removes documents with a null value in the collapse field. This is the default.
 * `expand`: treats each document with a null value in the collapse field as a separate group.
 * `collapse`: collapses all documents with a null value into a single group using either highest score, or minimum/maximum.
 +
 The default is `ignore`.

 `hint`::
 +
 There are two hint options available:
 +
 `top_fc`::: This stands for top level FieldCache.
 +
 The `hint=top_fc` hint is only available when collapsing on String fields. `top_fc` usually provides the best query time speed but takes the longest to warm on startup or following a commit. `top_fc` will also result in having the collapsed field cached in memory twice if it's used for faceting or sorting. For very high cardinality (high distinct count) fields, `top_fc` may not fare so well.
 +
 `hint=block`::: This indicates that the field being collapsed on is suitable for the optimzed <<#block-collapsing,Block Collapse>> logic described below.
 +
 The default is none.


 `size`::
 Sets the initial size of the collapse data structures when collapsing on a *numeric field only*.
 +
 The data structures used for collapsing grow dynamically when collapsing on numeric fields. Setting the size above the number of results expected in the result set will eliminate the resizing cost.
 +
 The default is 100,000.

 `collectElevatedDocsWhenCollapsing`::
 In combination with the <<collapse-and-expand-results.adoc#collapsing-query-parser,Collapse Query Parser>> all elevated docs are visible at the beginning of the result set.
 If this parameter is `false`, only the representative is visible if the elevated docs has the same collapse key (default is `true`).


 === Sample Usage Syntax

 Collapse on `group_field` selecting the document in each group with the highest scoring document:

 [source,text]
 ----
 fq={!collapse field=group_field}
 ----

 Collapse on `group_field` selecting the document in each group with the minimum value of `numeric_field`:

 [source,text]
 ----
 fq={!collapse field=group_field min=numeric_field}
 ----

 Collapse on `group_field` selecting the document in each group with the maximum value of `numeric_field`:

 [source,text]
 ----
 fq={!collapse field=group_field max=numeric_field}
 ----

 Collapse on `group_field` selecting the document in each group with the maximum value of a function. Note that the *cscore()* function can be used with the min/max options to use the score of the current document being collapsed.

 [source,text]
 ----
 fq={!collapse field=group_field max=sum(cscore(),numeric_field)}
 ----

 Collapse on `group_field` with a null policy so that all docs that do not have a value in the `group_field` will be treated as a single group. For each group, the selected document will be based first on a `numeric_field`, but ties will be broken by score:

 [source,text]
 ----
 fq={!collapse field=group_field nullPolicy=collapse sort='numeric_field asc, score desc'}
 ----

 Collapse on `group_field` with a hint to use the top level field cache:

 [source,text]
 ----
 fq={!collapse field=group_field hint=top_fc}
 ----

 Collapse with custom `cost` which defaults to `100`
 [source,text]
 ----
 fq={!collapse cost=1000 field=group_field}
 ----

 === Block Collapsing

 When collapsing on the `\_root_` field, using `nullPolicy=expand` or `nullPolicy=ignore`, the Collapsing Query Parser can take advantage of the fact that all docs with identical field values are adjacent to each other in the index in a single <<indexing-nested-documents.adoc#,"block" of nested documents>>. This allows the collapsing logic to be much faster and more memory efficient.

 The default collapsing logic must keep track of all group head documents -- for all groups encountered so far -- until it has evaluated all documents, because each document it considers may become the new group head of any group.

 When collapsing on the `\_root_` field however, the logic knows that as it scans over the index, it will never encounter any new documents in a group that it has previously processed.

 This more efficient logic can also be used with other `collapseField` values, via the `hint=block` local param.  This can be useful when you have deeply nested documents and you'd like to collapse on a field that does not contain identical values for all documents with a common `\_root_` but is a unique and identical value for sets of contiguous documents under a common `\_root_`.  For example: searching for "grand child" documents and collapsing on a field that is unique per "child document"

 [CAUTION]
 ====
 Specifing `hint=block` when collapsing on a field that is not unique per contiguous block of documents is not supported and may fail in unexpected ways; including the possibility of silently returning incorrect results.

 The implementation does not offer any safeguards against misuse on an unsupported field, since doing so would require the same group level tracking as the non-Block collapsing implementation -- defeating the purpose of this optimization.
 ====

 == Expand Component

 The ExpandComponent can be used to expand the groups that were collapsed by the CollapsingQParserPlugin.

 Example usage with the CollapsingQParserPlugin:

 [source,text]
 ----
 q=foo&fq={!collapse field=ISBN}
 ----

 In the query above, the CollapsingQParserPlugin will collapse the search results on the _ISBN_ field. The main search results will contain the highest ranking document from each book.

 The ExpandComponent can now be used to expand the results so you can see the documents grouped by ISBN. For example:

 [source,text]
 ----
 q=foo&fq={!collapse field=ISBN}&expand=true
 ----

 [IMPORTANT]
 ====
 When used with CollapsingQParserPlugin and there are multiple collapse groups, the field is chosen from the group with least cost.
 If there are multiple collapse groups with same cost then the first specified one is chosen.
 ====

 When enabled, the ExpandComponent adds a new section to the search output labeled `expanded`.

 Inside the `expanded` section there is a _map_ with each group head pointing to the expanded documents that are within the group. As applications iterate the main collapsed result set, they can access the _expanded_ map to retrieve the expanded groups.

 The ExpandComponent has the following parameters:

 `expand`::
 When `true`, the ExpandComponent is enabled.

 `expand.field`::
 Field on which expand documents need to be populated. When `expand=true`, either this parameter needs to be specified or should be used with CollapsingQParserPlugin.
 When both are specified, this parameter is given higher priority.

 `expand.sort`::
 Orders the documents within the expanded groups. The default is `score desc`.

 `expand.rows`::
 The number of rows to display in each group. The default is 5 rows.
 +
 [IMPORTANT]
 ====
 When `expand.rows=0`, only the number of documents found for each expanded value is returned. Hence, scores won't be computed even if requested and `maxScore` is set to 0.
 ====

 `expand.q`::
 Overrides the main query (`q`), determines which documents to include in the main group. The default is to use the main query.

 `expand.fq`::
 Overrides main filter queries (`fq`), determines which documents to include in the main group. The default is to use the main filter queries.

 `expand.nullGroup`::
 Indicates if an expanded group can be returned containing documents with no value in the expanded field.
 This option only _enables_ support for returning a "null" expanded group.
 As with all expanded groups, it will only exist if the main group includes corresponding documents for it to expand (via `collapse` using either `nullPolicy=collapse` or `nullPolicy=expand`; or via `expand.q`) _and_ documents are found that belong in this expanded group.
 The default value is `false`.
	= Collapse and Expand Results
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	The Collapsing query parser and the Expand component combine to form an approach to grouping documents for field collapsing in search results.

	The Collapsing query parser groups documents (collapsing the result set) according to your parameters, while the Expand component provides access to documents in the collapsed group for use in results display or other processing by a client application. Collapse & Expand can together do what the older <<result-grouping.adoc#,Result Grouping>> (`group=true`) does for _most_ use-cases but not all. Collapse and Expand are not supported when Result Grouping is enabled. Generally, you should prefer Collapse & Expand.

	[IMPORTANT]
	====
	In order to use these features with SolrCloud, the documents must be located on the same shard. To ensure document co-location, you can define the `router.name` parameter as `compositeId` when creating the collection. For more information on this option, see the section <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,Document Routing>>.
	====

	== Collapsing Query Parser

	The `CollapsingQParser` is really a _post filter_ that provides more performant field collapsing than Solr's standard approach when the number of distinct groups in the result set is high. This parser collapses the result set to a single document per group before it forwards the result set to the rest of the search components. So all downstream components (faceting, highlighting, etc.) will work with the collapsed result set.

	The CollapsingQParserPlugin fully supports the QueryElevationComponent.

	=== Collapsing Query Parser Options

	The CollapsingQParser accepts the following local parameters:

	`field`::
	The field that is being collapsed on. The field must be a single valued String, Int or Float-type of field.

	`min` or `max`::
	Selects the group head document for each group based on which document has the min or max value of the specified numeric field or <<function-queries.adoc#,function query>>.
	+
	At most only one of the `min`, `max`, or `sort` (see below) parameters may be specified.
	+
	If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none.

	`sort`::
	Selects the group head document for each group based on which document comes first according to the specified <<common-query-parameters.adoc#sort-parameter,sort string>>.
	+
	At most only one of the `min`, `max`, (see above) or `sort` parameters may be specified.
	+
	If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none.

	`nullPolicy`::
	There are three available null policies:
	+
	* `ignore`: removes documents with a null value in the collapse field. This is the default.
	* `expand`: treats each document with a null value in the collapse field as a separate group.
	* `collapse`: collapses all documents with a null value into a single group using either highest score, or minimum/maximum.
	+
	The default is `ignore`.

	`hint`::
	+
	There are two hint options available:
	+
	`top_fc`::: This stands for top level FieldCache.
	+
	The `hint=top_fc` hint is only available when collapsing on String fields. `top_fc` usually provides the best query time speed but takes the longest to warm on startup or following a commit. `top_fc` will also result in having the collapsed field cached in memory twice if it's used for faceting or sorting. For very high cardinality (high distinct count) fields, `top_fc` may not fare so well.
	+
	`hint=block`::: This indicates that the field being collapsed on is suitable for the optimzed <<#block-collapsing,Block Collapse>> logic described below.
	+
	The default is none.


	`size`::
	Sets the initial size of the collapse data structures when collapsing on a numeric field only.
	+
	The data structures used for collapsing grow dynamically when collapsing on numeric fields. Setting the size above the number of results expected in the result set will eliminate the resizing cost.
	+
	The default is 100,000.

	`collectElevatedDocsWhenCollapsing`::
	In combination with the <<collapse-and-expand-results.adoc#collapsing-query-parser,Collapse Query Parser>> all elevated docs are visible at the beginning of the result set.
	If this parameter is `false`, only the representative is visible if the elevated docs has the same collapse key (default is `true`).


	=== Sample Usage Syntax

	Collapse on `group_field` selecting the document in each group with the highest scoring document:

	[source,text]
	----
	fq={!collapse field=group_field}
	----

	Collapse on `group_field` selecting the document in each group with the minimum value of `numeric_field`:

	[source,text]
	----
	fq={!collapse field=group_field min=numeric_field}
	----

	Collapse on `group_field` selecting the document in each group with the maximum value of `numeric_field`:

	[source,text]
	----
	fq={!collapse field=group_field max=numeric_field}
	----

	Collapse on `group_field` selecting the document in each group with the maximum value of a function. Note that the cscore() function can be used with the min/max options to use the score of the current document being collapsed.

	[source,text]
	----
	fq={!collapse field=group_field max=sum(cscore(),numeric_field)}
	----

	Collapse on `group_field` with a null policy so that all docs that do not have a value in the `group_field` will be treated as a single group. For each group, the selected document will be based first on a `numeric_field`, but ties will be broken by score:

	[source,text]
	----
	fq={!collapse field=group_field nullPolicy=collapse sort='numeric_field asc, score desc'}
	----

	Collapse on `group_field` with a hint to use the top level field cache:

	[source,text]
	----
	fq={!collapse field=group_field hint=top_fc}
	----

	Collapse with custom `cost` which defaults to `100`
	[source,text]
	----
	fq={!collapse cost=1000 field=group_field}
	----

	=== Block Collapsing

	When collapsing on the `\_root_` field, using `nullPolicy=expand` or `nullPolicy=ignore`, the Collapsing Query Parser can take advantage of the fact that all docs with identical field values are adjacent to each other in the index in a single <<indexing-nested-documents.adoc#,"block" of nested documents>>. This allows the collapsing logic to be much faster and more memory efficient.

	The default collapsing logic must keep track of all group head documents -- for all groups encountered so far -- until it has evaluated all documents, because each document it considers may become the new group head of any group.

	When collapsing on the `\_root_` field however, the logic knows that as it scans over the index, it will never encounter any new documents in a group that it has previously processed.

	This more efficient logic can also be used with other `collapseField` values, via the `hint=block` local param. This can be useful when you have deeply nested documents and you'd like to collapse on a field that does not contain identical values for all documents with a common `\_root_` but is a unique and identical value for sets of contiguous documents under a common `\_root_`. For example: searching for "grand child" documents and collapsing on a field that is unique per "child document"

	[CAUTION]
	====
	Specifing `hint=block` when collapsing on a field that is not unique per contiguous block of documents is not supported and may fail in unexpected ways; including the possibility of silently returning incorrect results.

	The implementation does not offer any safeguards against misuse on an unsupported field, since doing so would require the same group level tracking as the non-Block collapsing implementation -- defeating the purpose of this optimization.
	====

	== Expand Component

	The ExpandComponent can be used to expand the groups that were collapsed by the CollapsingQParserPlugin.

	Example usage with the CollapsingQParserPlugin:

	[source,text]
	----
	q=foo&fq={!collapse field=ISBN}
	----

	In the query above, the CollapsingQParserPlugin will collapse the search results on the _ISBN_ field. The main search results will contain the highest ranking document from each book.

	The ExpandComponent can now be used to expand the results so you can see the documents grouped by ISBN. For example:

	[source,text]
	----
	q=foo&fq={!collapse field=ISBN}&expand=true
	----

	[IMPORTANT]
	====
	When used with CollapsingQParserPlugin and there are multiple collapse groups, the field is chosen from the group with least cost.
	If there are multiple collapse groups with same cost then the first specified one is chosen.
	====

	When enabled, the ExpandComponent adds a new section to the search output labeled `expanded`.

	Inside the `expanded` section there is a _map_ with each group head pointing to the expanded documents that are within the group. As applications iterate the main collapsed result set, they can access the _expanded_ map to retrieve the expanded groups.

	The ExpandComponent has the following parameters:

	`expand`::
	When `true`, the ExpandComponent is enabled.

	`expand.field`::
	Field on which expand documents need to be populated. When `expand=true`, either this parameter needs to be specified or should be used with CollapsingQParserPlugin.
	When both are specified, this parameter is given higher priority.

	`expand.sort`::
	Orders the documents within the expanded groups. The default is `score desc`.

	`expand.rows`::
	The number of rows to display in each group. The default is 5 rows.
	+
	[IMPORTANT]
	====
	When `expand.rows=0`, only the number of documents found for each expanded value is returned. Hence, scores won't be computed even if requested and `maxScore` is set to 0.
	====

	`expand.q`::
	Overrides the main query (`q`), determines which documents to include in the main group. The default is to use the main query.

	`expand.fq`::
	Overrides main filter queries (`fq`), determines which documents to include in the main group. The default is to use the main filter queries.

	`expand.nullGroup`::
	Indicates if an expanded group can be returned containing documents with no value in the expanded field.
	This option only _enables_ support for returning a "null" expanded group.
	As with all expanded groups, it will only exist if the main group includes corresponding documents for it to expand (via `collapse` using either `nullPolicy=collapse` or `nullPolicy=expand`; or via `expand.q`) _and_ documents are found that belong in this expanded group.
	The default value is `false`.