blob: eb5d2b2e1f6eb0badeabd8f7e480987bc73cb720 [file] [log] [blame]
= Learning To Rank
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
With the *Learning To Rank* (or *LTR* for short) contrib module you can configure and run machine learned ranking models in Solr.
The module also supports feature extraction inside Solr. The only thing you need to do outside Solr is train your own ranking model.
== Learning to Rank Concepts
=== Re-Ranking
Re-Ranking allows you to run a simple query for matching documents and then re-rank the top N documents using the scores from a different, more complex query. This page describes the use of *LTR* complex queries, information on other rank queries included in the Solr distribution can be found on the <<query-re-ranking.adoc#,Query Re-Ranking>> page.
=== Learning To Rank Models
In information retrieval systems, https://en.wikipedia.org/wiki/Learning_to_rank[Learning to Rank] is used to re-rank the top N retrieved documents using trained machine learning models. The hope is that such sophisticated models can make more nuanced ranking decisions than standard ranking functions like https://en.wikipedia.org/wiki/Tf%E2%80%93idf[TF-IDF] or https://en.wikipedia.org/wiki/Okapi_BM25[BM25].
==== Ranking Model
A ranking model computes the scores used to rerank documents. Irrespective of any particular algorithm or implementation, a ranking model's computation can use three types of inputs:
* parameters that represent the scoring algorithm
* features that represent the document being scored
* features that represent the query for which the document is being scored
==== Interleaving
Interleaving is an approach to Online Search Quality evaluation that allows to compare two models interleaving their results in the final ranked list returned to the user.
* currently only the Team Draft Interleaving algorithm is supported (and its implementation assumes all results are from the same shard)
==== Feature
A feature is a value, a number, that represents some quantity or quality of the document being scored or of the query for which documents are being scored. For example documents often have a 'recency' quality and 'number of past purchases' might be a quantity that is passed to Solr as part of the search query.
==== Normalizer
Some ranking models expect features on a particular scale. A normalizer can be used to translate arbitrary feature values into normalized values e.g., on a 0..1 or 0..100 scale.
=== Training Models
==== Feature Engineering
The LTR contrib module includes several feature classes as well as support for custom features. Each feature class's javadocs contain an example to illustrate use of that class. The process of https://en.wikipedia.org/wiki/Feature_engineering[feature engineering] itself is then entirely up to your domain expertise and creativity.
[cols=",,,",options="header",]
|===
|Feature |Class |Example parameters |<<External Feature Information>>
|field length |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/FieldLengthFeature.html[FieldLengthFeature] |`{"field":"title"}` |not (yet) supported
|field value |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/FieldValueFeature.html[FieldValueFeature] |`{"field":"hits"}` |not (yet) supported
|original score |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/OriginalScoreFeature.html[OriginalScoreFeature] |`{}` |not applicable
|solr query |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] |`{"q":"{!func}` `recip(ms(NOW,last_modified)` `,3.16e-11,1,1)"}` |supported
|solr filter query |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] |`{"fq":["{!terms f=category}book"]}` |supported
|solr query + filter query |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] |`{"q":"{!func}` `recip(ms(NOW,last_modified),` `3.16e-11,1,1)",` `"fq":["{!terms f=category}book"]}` |supported
|value |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/ValueFeature.html[ValueFeature] |`{"value":"$\{userFromMobile}","required":true}` |supported
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/Feature.html[Feature]) | |
|===
[cols=",,",options="header",]
|===
|Normalizer |Class |Example parameters
|Identity |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/norm/IdentityNormalizer.html[IdentityNormalizer] |`{}`
|MinMax |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/norm/MinMaxNormalizer.html[MinMaxNormalizer] |`{"min":"0", "max":"50" }`
|Standard |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/norm/StandardNormalizer.html[StandardNormalizer] |`{"avg":"42","std":"6"}`
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/norm/Normalizer.html[Normalizer]) |
|===
==== Feature Extraction
The ltr contrib module includes a <<transforming-result-documents.adoc#,[features>> transformer] to support the calculation and return of feature values for https://en.wikipedia.org/wiki/Feature_extraction[feature extraction] purposes including and especially when you do not yet have an actual reranking model.
==== Feature Selection and Model Training
Feature selection and model training take place offline and outside Solr. The ltr contrib module supports two generalized forms of models as well as custom models. Each model class's javadocs contain an example to illustrate configuration of that class. In the form of JSON files your trained model or models (e.g., different models for different customer geographies) can then be directly uploaded into Solr using provided REST APIs.
[cols=",,",options="header",]
|===
|General form |Class |Specific examples
|Linear |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/LinearModel.html[LinearModel] |RankSVM, Pranking
|Multiple Additive Trees |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html[MultipleAdditiveTreesModel] |LambdaMART, Gradient Boosted Regression Trees (GBRT)
|Neural Network |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/NeuralNetworkModel.html[NeuralNetworkModel] |RankNet
|(wrapper) |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/DefaultWrapperModel.html[DefaultWrapperModel] |(not applicable)
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/AdapterModel.html[AdapterModel]) |(not applicable)
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/LTRScoringModel.html[LTRScoringModel]) |(not applicable)
|===
== Quick Start with LTR
The `"techproducts"` example included with Solr is pre-configured with the plugins required for learning-to-rank, but they are disabled by default.
To enable the plugins, please specify the `solr.ltr.enabled` JVM System Property when running the example:
[source,bash]
----
bin/solr start -e techproducts -Dsolr.ltr.enabled=true
----
=== Uploading Features
To upload features in a `/path/myFeatures.json` file, please run:
[source,bash]
----
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' --data-binary "@/path/myFeatures.json" -H 'Content-type:application/json'
----
To view the features you just uploaded please open the following URL in a browser:
[source,text]
http://localhost:8983/solr/techproducts/schema/feature-store/_DEFAULT_
.Example: /path/myFeatures.json
[source,json]
----
[
{
"name" : "documentRecency",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
}
},
{
"name" : "isBook",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"fq": ["{!terms f=cat}book"]
}
},
{
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
}
]
----
=== Extracting Features
To extract features as part of a query, add `[features]` to the `fl` parameter, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&fl=id,score,[features]
The output will include feature values as a comma-separated list, resembling the output shown here:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"test",
"fl":"id,score,[features]"}},
"response":{"numFound":2,"start":0,"maxScore":1.959392,"docs":[
{
"id":"GB18030TEST",
"score":1.959392,
"[features]":"documentRecency=0.020893794,isBook=0.0,originalScore=1.959392"},
{
"id":"UTF8TEST",
"score":1.5513437,
"[features]":"documentRecency=0.020893794,isBook=0.0,originalScore=1.5513437"}]
}}
----
=== Uploading a Model
To upload the model in a `/path/myModel.json` file, please run:
[source,bash]
----
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' --data-binary "@/path/myModel.json" -H 'Content-type:application/json'
----
To view the model you just uploaded please open the following URL in a browser:
[source,text]
http://localhost:8983/solr/techproducts/schema/model-store
.Example: /path/myModel.json
[source,json]
----
{
"class" : "org.apache.solr.ltr.model.LinearModel",
"name" : "myModel",
"features" : [
{ "name" : "documentRecency" },
{ "name" : "isBook" },
{ "name" : "originalScore" }
],
"params" : {
"weights" : {
"documentRecency" : 1.0,
"isBook" : 0.1,
"originalScore" : 0.5
}
}
}
----
=== Running a Rerank Query
To rerank the results of a query, add the `rq` parameter to your search, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel reRankDocs=100}&fl=id,score
The addition of the `rq` parameter will not change the output of the search.
To obtain the feature values computed during reranking, add `[features]` to the `fl` parameter, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel reRankDocs=100}&fl=id,score,[features]
The output will include feature values as a comma-separated list, resembling the output shown here:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"test",
"fl":"id,score,[features]",
"rq":"{!ltr model=myModel reRankDocs=100}"}},
"response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
{
"id":"GB18030TEST",
"score":1.0005897,
"[features]":"documentRecency=0.020893792,isBook=0.0,originalScore=1.959392"},
{
"id":"UTF8TEST",
"score":0.79656565,
"[features]":"documentRecency=0.020893792,isBook=0.0,originalScore=1.5513437"}]
}}
----
=== Running a Rerank Query Interleaving Two Models
To rerank the results of a query, interleaving two models (myModelA, myModelB) add the `rq` parameter to your search, passing two models in input, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score
To obtain the model that interleaving picked for a search result, computed during reranking, add `[interleaving]` to the `fl` parameter, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score,[interleaving]
The output will include the model picked for each search result, resembling the output shown here:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"test",
"fl":"id,score,[interleaving]",
"rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}},
"response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
{
"id":"GB18030TEST",
"score":1.0005897,
"[interleaving]":"myModelB"},
{
"id":"UTF8TEST",
"score":0.79656565,
"[interleaving]":"myModelA"}]
}}
----
=== Running a Rerank Query Interleaving a Model with the Original Ranking
When approaching Search Quality Evaluation with interleaving it may be useful to compare a model with the original ranking.
To rerank the results of a query, interleaving a model with the original ranking, add the `rq` parameter to your search, passing the special inbuilt `_OriginalRanking_` model identifier as one model and your comparison model as the other model, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=_OriginalRanking_ model=myModel reRankDocs=100}&fl=id,score
The addition of the `rq` parameter will not change the output of the search.
To obtain the model that interleaving picked for a search result, computed during reranking, add `[interleaving]` to the `fl` parameter, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=_OriginalRanking_ model=myModel reRankDocs=100}&fl=id,score,[interleaving]
The output will include the model picked for each search result, resembling the output shown here:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"test",
"fl":"id,score,[features]",
"rq":"{!ltr model=_OriginalRanking_ model=myModel reRankDocs=100}"}},
"response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
{
"id":"GB18030TEST",
"score":1.0005897,
"[interleaving]":"_OriginalRanking_"},
{
"id":"UTF8TEST",
"score":0.79656565,
"[interleaving]":"myModel"}]
}}
----
=== Running a Rerank Query with Interleaving Passing a Specific Algorithm
To rerank the results of a query, interleaving two models using a specific algorithm, add the `interleavingAlgorithm` local parameter to the ltr query parser, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100 interleavingAlgorithm=TeamDraft}&fl=id,score
Currently the only (and default) algorithm supported is 'TeamDraft'.
=== External Feature Information
The {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/ValueFeature.html[ValueFeature] and {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] classes support the use of external feature information, `efi` for short.
==== Uploading Features
To upload features in a `/path/myEfiFeatures.json` file, please run:
[source,bash]
----
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' --data-binary "@/path/myEfiFeatures.json" -H 'Content-type:application/json'
----
To view the features you just uploaded please open the following URL in a browser:
[source,text]
http://localhost:8983/solr/techproducts/schema/feature-store/myEfiFeatureStore
.Example: /path/myEfiFeatures.json
[source,json]
----
[
{
"store" : "myEfiFeatureStore",
"name" : "isPreferredManufacturer",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "fq" : [ "{!field f=manu}${preferredManufacturer}" ] }
},
{
"store" : "myEfiFeatureStore",
"name" : "userAnswerValue",
"class" : "org.apache.solr.ltr.feature.ValueFeature",
"params" : { "value" : "${answer:42}" }
},
{
"store" : "myEfiFeatureStore",
"name" : "userFromMobileValue",
"class" : "org.apache.solr.ltr.feature.ValueFeature",
"params" : { "value" : "${fromMobile}", "required" : true }
},
{
"store" : "myEfiFeatureStore",
"name" : "userTextCat",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!field f=cat}${text}" }
}
]
----
As an aside, you may have noticed that the `myEfiFeatures.json` example uses `"store":"myEfiFeatureStore"` attributes: read more about feature `store` in the <<LTR Lifecycle>> section of this page.
==== Extracting Features
To extract `myEfiFeatureStore` features as part of a query, add `efi.*` parameters to the `[features]` part of the `fl` parameter, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1]
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13]
==== Uploading a Model
To upload the model in a `/path/myEfiModel.json` file, please run:
[source,bash]
----
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' --data-binary "@/path/myEfiModel.json" -H 'Content-type:application/json'
----
To view the model you just uploaded please open the following URL in a browser:
[source,text]
http://localhost:8983/solr/techproducts/schema/model-store
.Example: /path/myEfiModel.json
[source,json]
----
{
"store" : "myEfiFeatureStore",
"name" : "myEfiModel",
"class" : "org.apache.solr.ltr.model.LinearModel",
"features" : [
{ "name" : "isPreferredManufacturer" },
{ "name" : "userAnswerValue" },
{ "name" : "userFromMobileValue" },
{ "name" : "userTextCat" }
],
"params" : {
"weights" : {
"isPreferredManufacturer" : 0.2,
"userAnswerValue" : 1.0,
"userFromMobileValue" : 1.0,
"userTextCat" : 0.1
}
}
}
----
==== Running a Rerank Query
To obtain the feature values computed during reranking, add `[features]` to the `fl` parameter and `efi.*` parameters to the `rq` parameter, for example:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1}&fl=id,cat,manu,score,[features]
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13}&fl=id,cat,manu,score,[features]
Notice the absence of `efi.*` parameters in the `[features]` part of the `fl` parameter.
==== Extracting Features While Reranking
To extract features for `myEfiFeatureStore` features while still reranking with `myModel`:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel}&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1]
Notice the absence of `efi.\*` parameters in the `rq` parameter (because `myModel` does not use `efi` feature) and the presence of `efi.*` parameters in the `[features]` part of the `fl` parameter (because `myEfiFeatureStore` contains `efi` features).
Read more about model evolution in the <<LTR Lifecycle>> section of this page.
=== Training Example
Example training data and a demo `train_and_upload_demo_model.py` script can be found in the `solr/contrib/ltr/example` folder in the https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=tree;f=solr/contrib/ltr/example[Apache lucene-solr Git repository] (mirrored on https://github.com/apache/lucene-solr/tree/releases/lucene-solr/{solr-docs-version}.0/solr/contrib/ltr/example[github.com]). This example folder is not shipped in the Solr binary release.
== Installation of LTR
The ltr contrib module requires the `dist/solr-ltr-*.jar` JARs.
== LTR Configuration
Learning-To-Rank is a contrib module and therefore its plugins must be configured in `solrconfig.xml`.
=== Minimum Requirements
* Include the required contrib JARs. Note that by default paths are relative to the Solr core so they may need adjustments to your configuration, or an explicit specification of the `$solr.install.dir`.
+
[source,xml]
----
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-ltr-\d.*\.jar" />
----
* Declaration of the `ltr` query parser.
+
[source,xml]
----
<queryParser name="ltr" class="org.apache.solr.ltr.search.LTRQParserPlugin"/>
----
* Configuration of the feature values cache.
+
[source,xml]
----
<cache name="QUERY_DOC_FV"
class="solr.search.LRUCache"
size="4096"
initialSize="2048"
autowarmCount="4096"
regenerator="solr.search.NoOpRegenerator" />
----
* Declaration of the `[features]` transformer.
+
[source,xml]
----
<transformer name="features" class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory">
<str name="fvCacheName">QUERY_DOC_FV</str>
</transformer>
----
* Declaration of the `[interleaving]` transformer.
+
[source,xml]
----
<transformer name="interleaving" class="org.apache.solr.ltr.response.transform.LTRInterleavingTransformerFactory"/>
----
=== Advanced Options
==== LTRThreadModule
A thread module can be configured for the query parser and/or the transformer to parallelize the creation of feature weights. For details, please refer to the {solr-javadocs}/solr-ltr/org/apache/solr/ltr/LTRThreadModule.html[LTRThreadModule] javadocs.
==== Feature Vector Customization
The features transformer returns dense CSV values such as `featureA=0.1,featureB=0.2,featureC=0.3,featureD=0.0`.
For sparse CSV output such as `featureA:0.1 featureB:0.2 featureC:0.3` you can customize the {solr-javadocs}/solr-ltr/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.html[feature logger transformer] declaration in `solrconfig.xml` as follows:
[source,xml]
----
<transformer name="features" class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory">
<str name="fvCacheName">QUERY_DOC_FV</str>
<str name="defaultFormat">sparse</str>
<str name="csvKeyValueDelimiter">:</str>
<str name="csvFeatureSeparator"> </str>
</transformer>
----
==== Implementation and Contributions
How does Solr Learning-To-Rank work under the hood?::
Please refer to the `ltr` {solr-javadocs}/solr-ltr/org/apache/solr/ltr/package-summary.html[javadocs] for an implementation overview.
How could I write additional models and/or features?::
Contributions for further models, features, normalizers and interleaving algorithms are welcome. Related links:
+
* {solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/LTRScoringModel.html[LTRScoringModel javadocs]
* {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/Feature.html[Feature javadocs]
* {solr-javadocs}/solr-ltr/org/apache/solr/ltr/norm/Normalizer.html[Normalizer javadocs]
* {solr-javadocs}/solr-ltr/org/apache/solr/ltr/interleaving/Interleaving.html[Interleaving javadocs]
* https://cwiki.apache.org/confluence/display/solr/HowToContribute
* https://cwiki.apache.org/confluence/display/LUCENE/HowToContribute
== LTR Lifecycle
=== Feature Stores
It is recommended that you organise all your features into stores which are akin to namespaces:
* Features within a store must be named uniquely.
* Across stores identical or similar features can share the same name.
* If no store name is specified then the default `\_DEFAULT_` feature store will be used.
To discover the names of all your feature stores:
[source,text]
http://localhost:8983/solr/techproducts/schema/feature-store
To inspect the content of the `commonFeatureStore` feature store:
[source,text]
http://localhost:8983/solr/techproducts/schema/feature-store/commonFeatureStore
=== Models
* A model uses features from exactly one feature store.
* If no store is specified then the default `\_DEFAULT_` feature store will be used.
* A model need not use all the features defined in a feature store.
* Multiple models can use the same feature store.
To extract features for `currentFeatureStore` 's features:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&fl=id,score,[features store=currentFeatureStore]
To extract features for `nextFeatureStore` features whilst reranking with `currentModel` based on `currentFeatureStore`:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=currentModel reRankDocs=100}&fl=id,score,[features store=nextFeatureStore]
To view all models:
[source,text]
http://localhost:8983/solr/techproducts/schema/model-store
To delete the `currentModel` model:
[source,bash]
----
curl -XDELETE 'http://localhost:8983/solr/techproducts/schema/model-store/currentModel'
----
IMPORTANT: A feature store may be deleted only when there are no models using it.
To delete the `currentFeatureStore` feature store:
[source,bash]
----
curl -XDELETE 'http://localhost:8983/solr/techproducts/schema/feature-store/currentFeatureStore'
----
==== Using Large Models
With SolrCloud, large models may fail to upload due to the limitation of ZooKeeper's buffer. In this case, `DefaultWrapperModel` may help you to separate the model definition from uploaded file.
Assuming that you consider to use a large model placed at `/path/to/models/myModel.json` through `DefaultWrapperModel`.
[source,json]
----
{
"store" : "largeModelsFeatureStore",
"name" : "myModel",
"class" : ...,
"features" : [
...
],
"params" : {
...
}
}
----
First, add the directory to Solr's resource paths with a <<libs.adoc#lib-directives-in-solrconfig,`<lib/>` directive>>:
[source,xml]
----
<lib dir="/path/to" regex="models" />
----
Then, configure `DefaultWrapperModel` to wrap `myModel.json`:
[source,json]
----
{
"store" : "largeModelsFeatureStore",
"name" : "myWrapperModel",
"class" : "org.apache.solr.ltr.model.DefaultWrapperModel",
"params" : {
"resource" : "myModel.json"
}
}
----
`myModel.json` will be loaded during the initialization and be able to use by specifying `model=myWrapperModel`.
NOTE: No `"features"` are configured in `myWrapperModel` because the features of the wrapped model (`myModel`) will be used; also note that the `"store"` configured for the wrapper model must match that of the wrapped model i.e., in this example the feature store called `largeModelsFeatureStore` is used.
CAUTION: `<lib dir="/path/to/models" regex=".*\.json" />` doesn't work as expected in this case, because `SolrResourceLoader` considers given resources as JAR if `<lib />` indicates files.
As an alternative to the above-described `DefaultWrapperModel`, it is possible to <<setting-up-an-external-zookeeper-ensemble#increasing-the-file-size-limit,increase ZooKeeper's file size limit>>.
=== Applying Changes
The feature store and the model store are both <<managed-resources.adoc#,Managed Resources>>. Changes made to managed resources are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.
=== LTR Examples
==== One Feature Store, Multiple Ranking Models
* `leftModel` and `rightModel` both use features from `commonFeatureStore` and the only different between the two models is the weights attached to each feature.
* Conventions used:
** `commonFeatureStore.json` file contains features for the `commonFeatureStore` feature store
** `leftModel.json` file contains model named `leftModel`
** `rightModel.json` file contains model named `rightModel`
** The model's features and weights are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two models are.
** The stores features are sorted alphabetically by name, this makes it easy to lookup features used in the models
.Example: /path/commonFeatureStore.json
[source,json]
----
[
{
"store" : "commonFeatureStore",
"name" : "documentRecency",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
}
},
{
"store" : "commonFeatureStore",
"name" : "isBook",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"fq": [ "{!terms f=category}book" ]
}
},
{
"store" : "commonFeatureStore",
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
}
]
----
.Example: /path/leftModel.json
[source,json]
----
{
"store" : "commonFeatureStore",
"name" : "leftModel",
"class" : "org.apache.solr.ltr.model.LinearModel",
"features" : [
{ "name" : "documentRecency" },
{ "name" : "isBook" },
{ "name" : "originalScore" }
],
"params" : {
"weights" : {
"documentRecency" : 0.1,
"isBook" : 1.0,
"originalScore" : 0.5
}
}
}
----
.Example: /path/rightModel.json
[source,json]
----
{
"store" : "commonFeatureStore",
"name" : "rightModel",
"class" : "org.apache.solr.ltr.model.LinearModel",
"features" : [
{ "name" : "documentRecency" },
{ "name" : "isBook" },
{ "name" : "originalScore" }
],
"params" : {
"weights" : {
"documentRecency" : 1.0,
"isBook" : 0.1,
"originalScore" : 0.5
}
}
}
----
==== Model Evolution
* `linearModel201701` uses features from `featureStore201701`
* `treesModel201702` uses features from `featureStore201702`
* `linearModel201701` and `treesModel201702` and their feature stores can co-exist whilst both are needed.
* When `linearModel201701` has been deleted then `featureStore201701` can also be deleted.
* Conventions used:
** `<store>.json` file contains features for the `<store>` feature store
** `<model>.json` file contains model name `<model>`
** a 'generation' id (e.g., `YYYYMM` year-month) is part of the feature store and model names
** The model's features and weights are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two models are.
** The stores features are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two feature stores are.
.Example: /path/featureStore201701.json
[source,json]
----
[
{
"store" : "featureStore201701",
"name" : "documentRecency",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"q" : "{!func}recip( ms(NOW,last_modified), 3.16e-11, 1, 1)"
}
},
{
"store" : "featureStore201701",
"name" : "isBook",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"fq": [ "{!terms f=category}book" ]
}
},
{
"store" : "featureStore201701",
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
}
]
----
.Example: /path/linearModel201701.json
[source,json]
----
{
"store" : "featureStore201701",
"name" : "linearModel201701",
"class" : "org.apache.solr.ltr.model.LinearModel",
"features" : [
{ "name" : "documentRecency" },
{ "name" : "isBook" },
{ "name" : "originalScore" }
],
"params" : {
"weights" : {
"documentRecency" : 0.1,
"isBook" : 1.0,
"originalScore" : 0.5
}
}
}
----
.Example: /path/featureStore201702.json
[source,json]
----
[
{
"store" : "featureStore201702",
"name" : "isBook",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : {
"fq": [ "{!terms f=category}book" ]
}
},
{
"store" : "featureStore201702",
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
}
]
----
.Example: /path/treesModel201702.json
[source,json]
----
{
"store" : "featureStore201702",
"name" : "treesModel201702",
"class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"features" : [
{ "name" : "isBook" },
{ "name" : "originalScore" }
],
"params" : {
"trees" : [
{
"weight" : "1",
"root" : {
"feature" : "isBook",
"threshold" : "0.5",
"left" : { "value" : "-100" },
"right" : {
"feature" : "originalScore",
"threshold" : "10.0",
"left" : { "value" : "50" },
"right" : { "value" : "75" }
}
}
},
{
"weight" : "2",
"root" : {
"value" : "-10"
}
}
]
}
}
----
== Additional LTR Resources
* "Learning to Rank in Solr" presentation at Lucene/Solr Revolution 2015 in Austin:
** Slides: http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
** Video: https://www.youtube.com/watch?v=M7BKwJoh96s
* The importance of Online Testing in Learning To Rank:
** Blog: https://sease.io/2020/04/the-importance-of-online-testing-in-learning-to-rank-part-1.html
** Blog: https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html