| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <head> |
| <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <meta content="Apache Forrest" name="Generator"> |
| <meta name="Forrest-version" content="0.8"> |
| <meta name="Forrest-skin-name" content="lucene"> |
| <title> |
| Apache Lucene - Scoring |
| </title> |
| <link type="text/css" href="skin/basic.css" rel="stylesheet"> |
| <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet"> |
| <link media="print" type="text/css" href="skin/print.css" rel="stylesheet"> |
| <link type="text/css" href="skin/profile.css" rel="stylesheet"> |
| <script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script> |
| <link rel="shortcut icon" href="images/favicon.ico"> |
| </head> |
| <body onload="init()"> |
| <script type="text/javascript">ndeSetTextSize();</script> |
| <div id="top"> |
| <!--+ |
| |breadtrail |
| +--> |
| <div class="breadtrail"> |
| <a href="http://www.apache.org/">Apache</a> > <a href="http://lucene.apache.org/">Lucene</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script> |
| </div> |
| <!--+ |
| |header |
| +--> |
| <div class="header"> |
| <!--+ |
| |start group logo |
| +--> |
| <div class="grouplogo"> |
| <a href="http://lucene.apache.org/"><img class="logoImage" alt="Lucene" src="http://www.apache.org/images/asf_logo_simple.png" title="Apache Lucene"></a> |
| </div> |
| <!--+ |
| |end group logo |
| +--> |
| <!--+ |
| |start Project Logo |
| +--> |
| <div class="projectlogo"> |
| <a href="http://lucene.apache.org/java/"><img class="logoImage" alt="Lucene" src="http://lucene.apache.org/images/lucene_green_300.gif" title="Apache Lucene is a high-performance, full-featured text search engine library written entirely in |
| Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform."></a> |
| </div> |
| <!--+ |
| |end Project Logo |
| +--> |
| <!--+ |
| |start Search |
| +--> |
| <div class="searchbox"> |
| <form action="http://search.lucidimagination.com/p:lucene" method="get" class="roundtopsmall"> |
| <input onFocus="getBlank (this, 'Search the site with Lucene');" size="25" name="q" id="query" type="text" value="Search the site with Lucene"> |
| <input name="Search" value="Search" type="submit"> |
| </form> |
| <div style="position: relative; top: -5px; left: -10px">Powered by <a href="http://www.lucidimagination.com" style="color: #033268">Lucid Imagination</a> |
| </div> |
| </div> |
| <!--+ |
| |end search |
| +--> |
| <!--+ |
| |start Tabs |
| +--> |
| <ul id="tabs"> |
| <li class="current"> |
| <a class="selected" href="http://lucene.apache.org/java/docs/">Main</a> |
| </li> |
| <li> |
| <a class="unselected" href="http://wiki.apache.org/lucene-java">Wiki</a> |
| </li> |
| <li class="current"> |
| <a class="selected" href="index.html">Lucene 2.9.2 Documentation</a> |
| </li> |
| </ul> |
| <!--+ |
| |end Tabs |
| +--> |
| </div> |
| </div> |
| <div id="main"> |
| <div id="publishedStrip"> |
| <!--+ |
| |start Subtabs |
| +--> |
| <div id="level2tabs"></div> |
| <!--+ |
| |end Endtabs |
| +--> |
| <script type="text/javascript"><!-- |
| document.write("Last Published: " + document.lastModified); |
| // --></script> |
| </div> |
| <!--+ |
| |breadtrail |
| +--> |
| <div class="breadtrail"> |
| |
| |
| </div> |
| <!--+ |
| |start Menu, mainarea |
| +--> |
| <!--+ |
| |start Menu |
| +--> |
| <div id="menu"> |
| <div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div> |
| <div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;"> |
| <div class="menuitem"> |
| <a href="index.html">Overview</a> |
| </div> |
| <div onclick="SwitchMenu('menu_1.1.2', 'skin/')" id="menu_1.1.2Title" class="menutitle">Changes</div> |
| <div id="menu_1.1.2" class="menuitemgroup"> |
| <div class="menuitem"> |
| <a href="changes/Changes.html">Core</a> |
| </div> |
| <div class="menuitem"> |
| <a href="changes/Contrib-Changes.html">Contrib</a> |
| </div> |
| </div> |
| <div onclick="SwitchMenu('menu_1.1.3', 'skin/')" id="menu_1.1.3Title" class="menutitle">Javadocs</div> |
| <div id="menu_1.1.3" class="menuitemgroup"> |
| <div class="menuitem"> |
| <a href="api/all/index.html">All</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/core/index.html">Core</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/demo/index.html">Demo</a> |
| </div> |
| <div onclick="SwitchMenu('menu_1.1.3.4', 'skin/')" id="menu_1.1.3.4Title" class="menutitle">Contrib</div> |
| <div id="menu_1.1.3.4" class="menuitemgroup"> |
| <div class="menuitem"> |
| <a href="api/contrib-analyzers/index.html">Analyzers</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-smartcn/index.html">Smart Chinese Analyzer</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-ant/index.html">Ant</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-bdb/index.html">Bdb</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-bdb-je/index.html">Bdb-je</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-benchmark/index.html">Benchmark</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-collation/index.html">Collation</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-fast-vector-highlighter/index.html">Fast Vector Highlighter</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-highlighter/index.html">Highlighter</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-instantiated/index.html">Instantiated</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-lucli/index.html">Lucli</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-memory/index.html">Memory</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-misc/index.html">Miscellaneous</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-queries/index.html">Queries</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-queryparser/index.html">Query Parser Framework</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-regex/index.html">Regex</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-remote/index.html">Remote</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-snowball/index.html">Snowball</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-spatial/index.html">Spatial</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-spellchecker/index.html">Spellchecker</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-surround/index.html">Surround</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-swing/index.html">Swing</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-wikipedia/index.html">Wikipedia</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-wordnet/index.html">Wordnet</a> |
| </div> |
| <div class="menuitem"> |
| <a href="api/contrib-xml-query-parser/index.html">XML Query Parser</a> |
| </div> |
| </div> |
| </div> |
| <div class="menuitem"> |
| <a href="contributions.html">Contributions</a> |
| </div> |
| <div class="menuitem"> |
| <a href="http://wiki.apache.org/lucene-java/LuceneFAQ">FAQ</a> |
| </div> |
| <div class="menuitem"> |
| <a href="fileformats.html">File Formats</a> |
| </div> |
| <div class="menuitem"> |
| <a href="gettingstarted.html">Getting Started</a> |
| </div> |
| <div class="menuitem"> |
| <a href="lucene-contrib/index.html">Lucene Contrib</a> |
| </div> |
| <div class="menuitem"> |
| <a href="queryparsersyntax.html">Query Syntax</a> |
| </div> |
| <div class="menupage"> |
| <div class="menupagetitle">Scoring</div> |
| </div> |
| <div class="menuitem"> |
| <a href="http://wiki.apache.org/lucene-java">Wiki</a> |
| </div> |
| </div> |
| <div id="credit"></div> |
| <div id="roundbottom"> |
| <img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div> |
| <!--+ |
| |alternative credits |
| +--> |
| <div id="credit2"></div> |
| </div> |
| <!--+ |
| |end Menu |
| +--> |
| <!--+ |
| |start content |
| +--> |
| <div id="content"> |
| <div title="Portable Document Format" class="pdflink"> |
| <a class="dida" href="scoring.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br> |
| PDF</a> |
| </div> |
| <h1> |
| Apache Lucene - Scoring |
| </h1> |
| <div id="minitoc-area"> |
| <ul class="minitoc"> |
| <li> |
| <a href="#Introduction">Introduction</a> |
| </li> |
| <li> |
| <a href="#Scoring">Scoring</a> |
| <ul class="minitoc"> |
| <li> |
| <a href="#Fields and Documents">Fields and Documents</a> |
| </li> |
| <li> |
| <a href="#Score Boosting">Score Boosting</a> |
| </li> |
| <li> |
| <a href="#Understanding the Scoring Formula">Understanding the Scoring Formula</a> |
| </li> |
| <li> |
| <a href="#The Big Picture">The Big Picture</a> |
| </li> |
| <li> |
| <a href="#Query Classes">Query Classes</a> |
| </li> |
| <li> |
| <a href="#Changing Similarity">Changing Similarity</a> |
| </li> |
| </ul> |
| </li> |
| <li> |
| <a href="#Changing your Scoring -- Expert Level">Changing your Scoring -- Expert Level</a> |
| </li> |
| <li> |
| <a href="#Appendix">Appendix</a> |
| <ul class="minitoc"> |
| <li> |
| <a href="#Algorithm">Algorithm</a> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| |
| |
| <a name="N10013"></a><a name="Introduction"></a> |
| <h2 class="boxed">Introduction</h2> |
| <div class="section"> |
| <p>Lucene scoring is the heart of why we all love Lucene. It is blazingly fast and it hides almost all of the complexity from the user. |
| In a nutshell, it works. At least, that is, until it doesn't work, or doesn't work as one would expect it to |
| work. Then we are left digging into Lucene internals or asking for help on java-user@lucene.apache.org to figure out why a document with five of our query terms |
| scores lower than a different document with only one of the query terms. </p> |
| <p>While this document won't answer your specific scoring issues, it will, hopefully, point you to the places that can |
| help you figure out the what and why of Lucene scoring.</p> |
| <p>Lucene scoring uses a combination of the |
| <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model (VSM) of Information |
| Retrieval</a> and the <a href="http://en.wikipedia.org/wiki/Standard_Boolean_model">Boolean model</a> |
| to determine |
| how relevant a given Document is to a User's query. In general, the idea behind the VSM is the more |
| times a query term appears in a document relative to |
| the number of times the term appears in all the documents in the collection, the more relevant that |
| document is to the query. It uses the Boolean model to first narrow down the documents that need to |
| be scored based on the use of boolean logic in the Query specification. Lucene also adds some |
| capabilities and refinements onto this model to support boolean and fuzzy searching, but it |
| essentially remains a VSM based system at the heart. |
| For some valuable references on VSM and IR in general refer to the |
| <a href="http://wiki.apache.org/lucene-java/InformationRetrieval">Lucene Wiki IR references</a>. |
| </p> |
| <p>The rest of this document will cover <a href="#Scoring">Scoring</a> basics and how to change your |
| <a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>. Next it will cover ways you can |
| customize the Lucene internals in <a href="#Changing your Scoring -- Expert Level">Changing your Scoring |
| -- Expert Level</a> which gives details on implementing your own |
| <a href="api/core/org/apache/lucene/search/Query.html">Query</a> class and related functionality. Finally, we |
| will finish up with some reference material in the <a href="#Appendix">Appendix</a>. |
| </p> |
| </div> |
| |
| <a name="N10045"></a><a name="Scoring"></a> |
| <h2 class="boxed">Scoring</h2> |
| <div class="section"> |
| <p>Scoring is very much dependent on the way documents are indexed, |
| so it is important to understand indexing (see |
| <a href="gettingstarted.html">Apache Lucene - Getting Started Guide</a> |
| and the Lucene |
| <a href="fileformats.html">file formats</a> |
| before continuing on with this section.) It is also assumed that readers know how to use the |
| <a href="api/core/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality, |
| which can go a long way in informing why a score is returned. |
| </p> |
| <a name="N10059"></a><a name="Fields and Documents"></a> |
| <h3 class="boxed">Fields and Documents</h3> |
| <p>In Lucene, the objects we are scoring are |
| <a href="api/core/org/apache/lucene/document/Document.html">Documents</a>. A Document is a collection |
| of |
| <a href="api/core/org/apache/lucene/document/Field.html">Fields</a>. Each Field has semantics about how |
| it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.) It is important to |
| note that Lucene scoring works on Fields and then combines the results to return Documents. This is |
| important because two Documents with the exact same content, but one having the content in two Fields |
| and the other in one Field will return different scores for the same query due to length normalization |
| (assumming the |
| <a href="api/core/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a> |
| on the Fields). |
| </p> |
| <a name="N1006E"></a><a name="Score Boosting"></a> |
| <h3 class="boxed">Score Boosting</h3> |
| <p>Lucene allows influencing search results by "boosting" in more than one level: |
| <ul> |
| |
| <li> |
| <b>Document level boosting</b> |
| - while indexing - by calling |
| <a href="api/core/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a> |
| before a document is added to the index. |
| </li> |
| |
| <li> |
| <b>Document's Field level boosting</b> |
| - while indexing - by calling |
| <a href="api/core/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a> |
| before adding a field to the document (and before adding the document to the index). |
| </li> |
| |
| <li> |
| <b>Query level boosting</b> |
| - during search, by setting a boost on a query clause, calling |
| <a href="api/core/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>. |
| </li> |
| |
| </ul> |
| |
| </p> |
| <p>Indexing time boosts are preprocessed for storage efficiency and written to |
| the directory (when writing the document) in a single byte (!) as follows: |
| For each field of a document, all boosts of that field |
| (i.e. all boosts under the same field name in that doc) are multiplied. |
| The result is multiplied by the boost of the document, |
| and also multiplied by a "field length norm" value |
| that represents the length of that field in that doc |
| (so shorter fields are automatically boosted up). |
| The result is decoded as a single byte |
| (with some precision loss of course) and stored in the directory. |
| The similarity object in effect at indexing computes the length-norm of the field. |
| </p> |
| <p>This composition of 1-byte representation of norms |
| (that is, indexing time multiplication of field boosts & doc boost & field-length-norm) |
| is nicely described in |
| <a href="api/core/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>. |
| </p> |
| <p>Encoding and decoding of the resulted float norm in a single byte are done by the |
| static methods of the class Similarity: |
| <a href="api/core/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and |
| <a href="api/core/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>. |
| Due to loss of precision, it is not guaranteed that decode(encode(x)) = x, |
| e.g. decode(encode(0.89)) = 0.75. |
| At scoring (search) time, this norm is brought into the score of document |
| as <b>norm(t, d)</b>, as shown by the formula in |
| <a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>. |
| </p> |
| <a name="N100B1"></a><a name="Understanding the Scoring Formula"></a> |
| <h3 class="boxed">Understanding the Scoring Formula</h3> |
| <p> |
| This scoring formula is described in the |
| <a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a> class. Please take the time to study this formula, as it contains much of the information about how the |
| basics of Lucene scoring work, especially the |
| <a href="api/core/org/apache/lucene/search/TermQuery.html">TermQuery</a>. |
| </p> |
| <a name="N100C2"></a><a name="The Big Picture"></a> |
| <h3 class="boxed">The Big Picture</h3> |
| <p>OK, so the tf-idf formula and the |
| <a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a> |
| is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are |
| the use and interactions between the |
| <a href="api/core/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in |
| response to a user's information need. |
| </p> |
| <p>In this regard, Lucene offers a wide variety of <a href="api/core/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the |
| <a href="api/core/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package. |
| These implementations can be combined in a wide variety of ways to provide complex querying |
| capabilities along with |
| information about where matches took place in the document collection. The <a href="#Query Classes">Query</a> |
| section below |
| highlights some of the more important Query classes. For information on the other ones, see the |
| <a href="api/core/org/apache/lucene/search/package-summary.html">package summary</a>. For details on implementing |
| your own Query class, see <a href="#Changing your Scoring -- Expert Level">Changing your Scoring -- |
| Expert Level</a> below. |
| </p> |
| <p>Once a Query has been created and submitted to the |
| <a href="api/core/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process |
| begins. (See the <a href="#Appendix">Appendix</a> Algorithm section for more notes on the process.) After some infrastructure setup, |
| control finally passes to the <a href="api/core/org/apache/lucene/search/Weight.html">Weight</a> implementation and its |
| <a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> instance. In the case of any type of |
| <a href="api/core/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the |
| <a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight2</a> (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class), |
| unless |
| <a href="api/core/org/apache/lucene/search/Weight.html#scoresDocsOutOfOrder()"> |
| Weight#scoresDocsOutOfOrder()</a> method is set to true, |
| in which case the |
| <a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight</a> |
| (link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class) from the 1.4 version of Lucene is used by default. |
| See <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/CHANGES.txt">CHANGES.txt</a> under release 1.9 RC1 for more information on choosing which Scorer to use. |
| </p> |
| <p>ry#setUseScorer14(boolean) |
| Assuming the use of the BooleanWeight2, a |
| BooleanScorer2 is created by bringing together |
| all of the |
| <a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery. |
| When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type |
| of clauses in the Query. This internal Scorer essentially loops over the sub scorers and sums the scores |
| provided by each scorer while factoring in the coord() score. |
| <!-- Do we want to fill in the details of the counting sum scorer, disjunction scorer, etc.? --> |
| </p> |
| <a name="N1011A"></a><a name="Query Classes"></a> |
| <h3 class="boxed">Query Classes</h3> |
| <p>For information on the Query Classes, refer to the |
| <a href="api/core/org/apache/lucene/search/package-summary.html#query">search package javadocs</a> |
| |
| </p> |
| <a name="N10127"></a><a name="Changing Similarity"></a> |
| <h3 class="boxed">Changing Similarity</h3> |
| <p>One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors. For information on |
| how to do this, see the |
| <a href="api/core/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a> |
| </p> |
| </div> |
| |
| <a name="N10134"></a><a name="Changing your Scoring -- Expert Level"></a> |
| <h2 class="boxed">Changing your Scoring -- Expert Level</h2> |
| <div class="section"> |
| <p>At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more |
| about how to do this, refer to the |
| <a href="api/core/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a> |
| |
| </p> |
| </div> |
| |
| |
| <a name="N10141"></a><a name="Appendix"></a> |
| <h2 class="boxed">Appendix</h2> |
| <div class="section"> |
| <a name="N10146"></a><a name="Algorithm"></a> |
| <h3 class="boxed">Algorithm</h3> |
| <p>This section is mostly notes on stepping through the Scoring process and serves as |
| fertilizer for the earlier sections.</p> |
| <p>In the typical search application, a |
| <a href="api/core/org/apache/lucene/search/Query.html">Query</a> |
| is passed to the |
| <a href="api/core/org/apache/lucene/search/Searcher.html">Searcher</a> |
| , beginning the scoring process. |
| </p> |
| <p>Once inside the Searcher, a |
| <a href="api/core/org/apache/lucene/search/Collector.html">Collector</a> |
| is used for the scoring and sorting of the search results. |
| These important objects are involved in a search: |
| <ol> |
| |
| <li>The |
| <a href="api/core/org/apache/lucene/search/Weight.html">Weight</a> |
| object of the Query. The Weight object is an internal representation of the Query that |
| allows the Query to be reused by the Searcher. |
| </li> |
| |
| <li>The Searcher that initiated the call.</li> |
| |
| <li>A |
| <a href="api/core/org/apache/lucene/search/Filter.html">Filter</a> |
| for limiting the result set. Note, the Filter may be null. |
| </li> |
| |
| <li>A |
| <a href="api/core/org/apache/lucene/search/Sort.html">Sort</a> |
| object for specifying how to sort the results if the standard score based sort method is not |
| desired. |
| </li> |
| |
| </ol> |
| |
| </p> |
| <p> Assuming we are not sorting (since sorting doesn't |
| effect the raw Lucene score), |
| we call one of the search methods of the Searcher, passing in the |
| <a href="api/core/org/apache/lucene/search/Weight.html">Weight</a> |
| object created by Searcher.createWeight(Query), |
| <a href="api/core/org/apache/lucene/search/Filter.html">Filter</a> |
| and the number of results we want. This method |
| returns a |
| <a href="api/core/org/apache/lucene/search/TopDocs.html">TopDocs</a> |
| object, which is an internal collection of search results. |
| The Searcher creates a |
| <a href="api/core/org/apache/lucene/search/TopScoreDocCollector.html">TopScoreDocCollector</a> |
| and passes it along with the Weight, Filter to another expert search method (for more on the |
| <a href="api/core/org/apache/lucene/search/Collector.html">Collector</a> |
| mechanism, see |
| <a href="api/core/org/apache/lucene/search/Searcher.html">Searcher</a> |
| .) The TopDocCollector uses a |
| <a href="api/core/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a> |
| to collect the top results for the search. |
| </p> |
| <p>If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise, |
| we ask the Weight for |
| a |
| <a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> |
| for the |
| <a href="api/core/org/apache/lucene/index/IndexReader.html">IndexReader</a> |
| of the current searcher and we proceed by |
| calling the score method on the |
| <a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> |
| . |
| </p> |
| <p>At last, we are actually going to score some documents. The score method takes in the Collector |
| (most likely the TopScoreDocCollector or TopFieldCollector) and does its business. |
| Of course, here is where things get involved. The |
| <a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> |
| that is returned by the |
| <a href="api/core/org/apache/lucene/search/Weight.html">Weight</a> |
| object depends on what type of Query was submitted. In most real world applications with multiple |
| query terms, |
| the |
| <a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> |
| is going to be a |
| <a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java?view=log">BooleanScorer2</a> |
| (see the section on customizing your scoring for info on changing this.) |
| |
| </p> |
| <p>Assuming a BooleanScorer2 scorer, we first initialize the Coordinator, which is used to apply the |
| coord() factor. We then |
| get a internal Scorer based on the required, optional and prohibited parts of the query. |
| Using this internal Scorer, the BooleanScorer2 then proceeds |
| into a while loop based on the Scorer#next() method. The next() method advances to the next document |
| matching the query. This is an |
| abstract method in the Scorer class and is thus overriden by all derived |
| implementations. <!-- DOUBLE CHECK THIS -->If you have a simple OR query |
| your internal Scorer is most likely a DisjunctionSumScorer, which essentially combines the scorers |
| from the sub scorers of the OR'd terms.</p> |
| </div> |
| |
| </div> |
| <!--+ |
| |end content |
| +--> |
| <div class="clearboth"> </div> |
| </div> |
| <div id="footer"> |
| <!--+ |
| |start bottomstrip |
| +--> |
| <div class="lastmodified"> |
| <script type="text/javascript"><!-- |
| document.write("Last Published: " + document.lastModified); |
| // --></script> |
| </div> |
| <div class="copyright"> |
| Copyright © |
| 2006 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a> |
| </div> |
| <!--+ |
| |end bottomstrip |
| +--> |
| </div> |
| </body> |
| </html> |