This README file is only about this example directory's content.
Please refer to the Solr Reference Guide's section on Learning To Rank section for broader information on Learning to Rank (LTR) with Apache Solr.
./bin/solr -e techproducts -Dsolr.ltr.enabled=true
Download and install liblinear
Change contrib/ltr/example/config.json
“trainingLibraryLocation” to point to the train directory where you installed liblinear.
Alternatively, leave the config.json
file unchanged and create a soft-link to your liblinear
directory e.g.
ln -s /Users/YourNameHere/Downloads/liblinear-2.1 ./contrib/ltr/example/liblinear
cd contrib/ltr/example
python train_and_upload_demo_model.py -c config.json
This script deploys your features from config.json
“solrFeaturesFile” to Solr. Then it takes the relevance judged query document pairs of “userQueriesFile” and merges it with the features extracted from Solr into a training file. That file is used to train a linear model, which is then deployed to Solr for you to rerank results.
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr%20model=exampleModel%20reRankDocs=25%20efi.user_query=%27test%27}&fl=price,score,name
In order to train a learning to rank model you need training data. Training data is what teaches the model what the appropriate weight for each feature is. In general training data is a collection of queries with associated documents and what their ranking/score should be. As an example:
hard drive|SP2514N |0.6|CLICK_LOGS hard drive|6H500F0 |0.3|CLICK_LOGS hard drive|F8V7067-APL-KIT|0.0|CLICK_LOGS hard drive|IW-02 |0.0|CLICK_LOGS ipod |MA147LL/A |1.0|HUMAN_JUDGEMENT ipod |F8V7067-APL-KIT|0.5|HUMAN_JUDGEMENT ipod |IW-02 |0.5|HUMAN_JUDGEMENT ipod |6H500F0 |0.0|HUMAN_JUDGEMENT
The columns in the example represent:
the user query;
a unique id for a document in the response;
the a score representing the relevance of that document (not necessarily between zero and one);
the source, i.e., if the training record was produced by using interaction data (CLICK_LOGS
) or by human judgements (HUMAN_JUDGEMENT
).
You might collect data for use with your machine learning algorithm relying on:
There are many ways of preparing interaction data for training a model, and it is outside the scope of this readme to provide a complete review of all the techniques. In the following we illustrate a simple way for obtaining training data from simple interaction data.
Simple interaction data will be a log file generated by your application after it has talked to Solr. The log will contain two different types of record:
user-id, query, responses
, where responses
is a list of unique document ids returned for a query.Example:
diego, hard drive, [SP2514N,6H500F0,F8V7067-APL-KIT,IW-02]
user-id, query, document-id, click
Example:
christine, hard drive, SP2154N diego , hard drive, SP2154N michael , hard drive, SP2154N joshua , hard drive, IW-02
Given a log composed by records like these, a simple way to produce a training dataset is to group on the query field and then assign to each query a relevance score equal to the number of clicks:
hard drive|SP2514N |3|CLICK_LOGS hard drive|IW-02 |1|CLICK_LOGS hard drive|6H500F0 |0|CLICK_LOGS hard drive|F8V7067-APL-KIT|0|CLICK_LOGS
This is a really trival way to generate a training dataset, and in many settings it might not produce great results. Indeed, it is a well known fact that clicks are biased: users tend to click on the first result proposed for a query, also if it is not relevant. A click on a document in position five could be considered more important than a click on a document in position one, because the user took the effort to browse the results list until position five.
Some approaches take into account the time spent on the clicked document (if the user spent only two seconds on the document and then clicked on other documents in the list, probably she did not intend to click that document).
There are many papers proposing techniques for removing the bias, or for taking into account the click positions, a good survey is Click Models for Web Search, by Chuklin, Markov and Rijke.
Another way to get training data is asking human judges to label them. Producing human judgements is in general more expensive, but the quality of the dataset produced can be better than the one produced from interaction data. It is worth to note that human judgements can be produced also relying on a crowdsourcing platform, that allows a user to show human workers documents associated with a query and to get back relevance labels. Usually a human worker visualizes a query together with a list of results and the task consists in assigning a relevance label to each document (e.g., Perfect, Excellent, Good, Fair, Not relevant). Training data can then be obtained by translating the labels into numeric scores (e.g., Perfect = 4, Excellent = 3, Good = 2, Fair = 1, Not relevant = 0).