blob: cc3e3faf95d108802fcbaaf50c99f642c82f9070 [file] [log] [blame]
= Transforming Result Documents
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Document Transformers modify the information returned about documents in the results of a query.
== Using Document Transformers
When executing a request, a document transformer can be used by including it in the `fl` parameter using square brackets, for example:
[source,plain]
----
fl=id,name,score,[shard]
----
Some transformers allow, or require, local parameters which can be specified as key value pairs inside the brackets:
[source,plain]
----
fl=id,name,score,[explain style=nl]
----
As with regular fields, you can change the key used when a Transformer adds a field to a document via a prefix:
[source,plain]
----
fl=id,name,score,my_val_a:[value v=42 t=int],my_val_b:[value v=7 t=float]
----
The sections below discuss exactly what these various transformers do.
== Available Transformers
=== [value] - ValueAugmenterFactory
Modifies every document to include the exact same value, as if it were a stored field in every document:
[source,plain]
----
q=*:*&fl=id,greeting:[value v='hello']&wt=xml
----
The above query would produce results like the following:
[source,xml]
----
<result name="response" numFound="32" start="0">
<doc>
<str name="id">1</str>
<str name="greeting">hello</str></doc>
</doc>
...
----
By default, values are returned as a String, but a `t` parameter can be specified using a value of `int`, `float`, `double`, or `date` to force a specific return type:
[source,plain]
----
q=*:*&fl=id,my_number:[value v=42 t=int],my_string:[value v=42]
----
In addition to using these request parameters, you can configure additional named instances of ValueAugmenterFactory, or override the default behavior of the existing `[value]` transformer in your `solrconfig.xml` file:
[source,xml]
----
<transformer name="mytrans2" class="org.apache.solr.response.transform.ValueAugmenterFactory" >
<int name="value">5</int>
</transformer>
<transformer name="value" class="org.apache.solr.response.transform.ValueAugmenterFactory" >
<double name="defaultValue">5</double>
</transformer>
----
The `value` option forces an explicit value to always be used, while the `defaultValue` option provides a default that can still be overridden using the `v` and `t` local parameters.
=== [explain] - ExplainAugmenterFactory
Augments each document with an inline explanation of its score exactly like the information available about each document in the debug section:
[source,plain]
----
q=features:cache&fl=id,[explain style=nl]
----
Supported values for `style` are `text`, `html`, and `nl` which returns the information as structured data. Here is the output of the above request using `style=nl`:
[source,json]
----
{ "response":{"numFound":2,"start":0,"docs":[
{
"id":"6H500F0",
"[explain]":{
"match":true,
"value":1.052226,
"description":"weight(features:cache in 2) [DefaultSimilarity], result of:",
"details":[{
}]}}]}}
----
A default style can be configured by specifying an `args` parameter in your `solrconfig.xml` configuration:
[source,xml]
----
<transformer name="explain" class="org.apache.solr.response.transform.ExplainAugmenterFactory" >
<str name="args">nl</str>
</transformer>
----
=== [child] - ChildDocTransformerFactory
This transformer returns all <<indexing-nested-documents.adoc#,descendant documents>> of each parent document matching your query. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relevant parent documents for any type of search query.
Note that this transformer can be used even when the query used to match the result documents is not a <<other-parsers.adoc#block-join-query-parsers,Block Join query>>.
[source,plain]
----
q=book_title:Solr&fl=id,[child childFilter=doc_type:chapter limit=100]
----
If the documents involved include a `\_nest_path_` field, then it is used to re-create the hierarchical structure of the descendent documents using the original pseudo-field names the documents were indexed with, otherwise the descendent documents are returned as a flat list of <<indexing-nested-documents#indexing-anonymous-children,anonymous children>>.
`childFilter`::
A query to filter which child documents should be included. This can be particularly useful when you have multiple levels of hierarchical documents. The default is all children.
`limit`::
The maximum number of child documents to be returned per parent document. The default is `10`.
`fl`::
The field list which the transformer is to return. The default is the top level `fl`).
+
There is a further limitation in which the fields here should be a subset of those specified by the top level `fl` parameter.
`parentFilter`::
Serves the same purpose as the `of`/`which` params in `{!child}`/`{!parent}` query parsers: to
identify the set of "all parents" for the purpose of identifying the beginning & end of each
nested document block. This recently became fully optional and appears to be obsolete.
It is likely to be removed in a future Solr release, so _if you find it has some use, let the
project know!_
[TIP]
====
.Experimental `childFilter` Syntax
When a `\_nest_path_` field is defined, the `childFilter` option supports an experimental syntax to combine a "path syntax" restriction with a more traditional filtering query.
*This syntax is triggered by including a `/` seperated path structure prior to a query that includes a `:` character.*
When the "path" begins with a `/` character, it restricts matches to documents that have that exist "path" of nested pseudo-field documents, starting at the Root document of the block (even if the document being transformed is not a Root level document)
Some Examples:
* `childFilter="/skus/\*:*"`
** Matches any documents that are descendents of the current document and have a "nested path" of `/skus` -- but not any children of those `skus`
* childFilter="/skus/color_s:RED"
** Matches any documents that are descendents of the current document; match `color_s:RED`; and have a "nested path" of `/skus` -- but not any children of those `skus`
* `childFilter="/skus/manuals/\*:*"`
** Matches any documents that are descendents of the current document and have a "nested path" of `/skus/manuals` -- but not any children of those `manuals`
When paths do not start with a `/` they are treated as "path suffixes":
* `childFilter="manuals/\*:*"`
** Matches any documents that are descendents of the current document and have a "nested path" that ends with "manuals", regardless of how deeply nested they are -- but not any children of those `manuals`
====
=== [shard] - ShardAugmenterFactory
This transformer adds information about what shard each individual document came from in a distributed request.
ShardAugmenterFactory does not support any request parameters, or configuration options.
=== [docid] - DocIdAugmenterFactory
This transformer adds the internal Lucene document id to each document this is primarily only useful for debugging purposes.
DocIdAugmenterFactory does not support any request parameters, or configuration options.
=== [elevated] and [excluded]
These transformers are available only when using the <<the-query-elevation-component.adoc#,Query Elevation Component>>.
* `[elevated]` annotates each document to indicate if it was elevated or not.
* `[excluded]` annotates each document to indicate if it would have been excluded - this is only supported if you also use the `markExcludes` parameter.
[source,plain]
----
fl=id,[elevated],[excluded]&excludeIds=GB18030TEST&elevateIds=6H500F0&markExcludes=true
----
[source,json]
----
{ "response":{"numFound":32,"start":0,"docs":[
{
"id":"6H500F0",
"[elevated]":true,
"[excluded]":false},
{
"id":"GB18030TEST",
"[elevated]":false,
"[excluded]":true},
{
"id":"SP2514N",
"[elevated]":false,
"[excluded]":false},
]}}
----
=== [json] / [xml]
These transformers replace a field value containing a string representation of a valid XML or JSON structure with the actual raw XML or JSON structure instead of just the string value. Each applies only to the specific writer, such that `[json]` only applies to `wt=json` and `[xml]` only applies to `wt=xml`.
[source,plain]
----
fl=id,source_s:[json]&wt=json
----
=== [subquery]
This transformer executes a separate query per transforming document passing document fields as an input for subquery parameters. It's usually used with `{!join}` and `{!parent}` query parsers, and is intended to be an improvement for `[child]`.
* It must be given an unique name: `fl=*,children:[subquery]`
* There might be a few of them, e.g., `fl=*,sons:[subquery],daughters:[subquery]`.
* Every `[subquery]` occurrence adds a field into a result document with the given name, the value of this field is a document list, which is a result of executing subquery using document fields as an input.
* Subquery will use the `/select` search handler by default, and will return an error if `/select` is not configured. This can be changed by supplying `foo.qt` parameter.
Here is how it looks like using various formats:
.XML
[source,xml]
----
<result name="response" numFound="2" start="0">
<doc>
<int name="id">1</int>
<arr name="title">
<str>vdczoypirs</str>
</arr>
<result name="children" numFound="1" start="0">
<doc>
<int name="id">2</int>
<arr name="title">
<str>vdczoypirs</str>
</arr>
</doc>
</result>
</doc>
...
----
.JSON
[source,json]
----
{ "response":{
"numFound":2, "start":0,
"docs":[
{
"id":1,
"subject":["parentDocument"],
"title":["xrxvomgu"],
"children":{
"numFound":1, "start":0,
"docs":[
{ "id":2,
"cat":["childDocument"]
}
]
}}]}}
----
.SolrJ
[source,java]
----
SolrDocumentList subResults = (SolrDocumentList)doc.getFieldValue("children");
----
==== Subquery Result Fields
To appear in subquery document list, a field should be specified in both `fl` parameters: in the main `fl` (despite the main result documents have no this field), and in subquery's `fl` (e.g., `foo.fl`).
Wildcards can be used in one or both of these parameters. For example, if field `title` should appear in categories subquery, it can be done via one of these ways:
[source,plain]
----
fl=...title,categories:[subquery]&categories.fl=title&categories.q=...
fl=...title,categories:[subquery]&categories.fl=*&categories.q=...
fl=...*,categories:[subquery]&categories.fl=title&categories.q=...
fl=...*,categories:[subquery]&categories.fl=*&categories.q=...
----
==== Subquery Parameters Shift
If a subquery is declared as `fl=*,foo:[subquery]`, subquery parameters are prefixed with the given name and period. For example:
[source,plain]
q=*:*&fl=*,**foo**:[subquery]&**foo.**q=to be continued&**foo.**rows=10&**foo.**sort=id desc
==== Document Field as an Input for Subquery Parameters
It's necessary to pass some document field values as a parameter for subquery. It's supported via an implicit *`row.__fieldname__`* parameter, and can be (but might not only) referred via local parameters syntax:
[source,plain,subs="quotes"]
q=name:john&fl=name,id,depts:[subquery]&depts.q={!terms f=id **v=$row.dept_id**}&depts.rows=10
Here departments are retrieved per every employee in search result. We can say that it's like SQL `join ON emp.dept_id=dept.id`.
Note, when a document field has multiple values they are concatenated with a comma by default. This can be changed with the local parameter `foo:[subquery separator=' ']`, this mimics *`{!terms}`* to work smoothly with it.
To log substituted subquery request parameters, add the corresponding parameter names, as in: `depts.logParamsList=q,fl,rows,**row.dept_id**`
==== Cores and Collections in SolrCloud
Use `foo:[subquery fromIndex=departments]` to invoke subquery on another core on the same node. This is what `{!join}` does for non-SolrCloud mode. But with SolrCloud, just (and only) explicitly specify its native parameters like `collection, shards` for subquery, for example:
[source,plain,subs="quotes"]
q=\*:*&fl=\*,foo:[subquery]&foo.q=cloud&**foo.collection**=departments
[IMPORTANT]
====
If subquery collection has a different unique key field name (such as `foo_id` instead of `id` in the primary collection), add the following parameters to accommodate this difference:
[source,plain]
foo.fl=id:foo_id&foo.distrib.singlePass=true
Otherwise you'll get `NullPointerException` from `QueryComponent.mergeIds`.
====
=== [geo] - Geospatial formatter
Formats spatial data from a spatial field using a designated format type name. Two inner parameters are required: `f` for the field name, and `w` for the format name. Example: `geojson:[geo f=mySpatialField w=GeoJSON]`.
Normally you'll simply be consistent in choosing the format type you want by setting the `format` attribute on the spatial field type to `WKT` or `GeoJSON` – see the section <<spatial-search.adoc#,Spatial Search>> for more information. If you are consistent, it'll come out the way you stored it. This transformer offers a convenience to transform the spatial format to something different on retrieval.
In addition, this feature is very useful with the `RptWithGeometrySpatialField` to avoid double-storage of the potentially large vector geometry. This transformer will detect that field type and fetch the geometry from an internal compact binary representation on disk (in docValues), and then format it as desired. As such, you needn't mark the field as stored, which would be redundant. In a sense this double-storage between docValues and stored-value storage isn't unique to spatial but with polygonal geometry it can be a lot of data, and furthermore you'd like to avoid storing it in a verbose format (like GeoJSON or WKT).
=== [features] - LTRFeatureLoggerTransformerFactory
The "LTR" prefix stands for <<learning-to-rank.adoc#,Learning To Rank>>. This transformer returns the values of features and it can be used for feature extraction and feature logging.
[source,plain]
----
fl=id,[features store=yourFeatureStore]
----
This will return the values of the features in the `yourFeatureStore` store.
[source,plain]
----
fl=id,[features]&rq={!ltr model=yourModel}
----
If you use `[features]` together with an Learning-To-Rank reranking query then the values of the features in the reranking model (`yourModel`) will be returned.