docs/4.8.0-beta00009/api/core/Lucene.Net.Search.Similarities.html - lucenenet-site - Git at Google

 <!DOCTYPE html>
 <!--[if IE]><![endif]-->
 <html>

   <head>
     <meta charset="utf-8">
     <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
     <title>Namespace Lucene.Net.Search.Similarities
    | Apache Lucene.NET 4.8.0-beta00009 Documentation </title>
     <meta name="viewport" content="width=device-width">
     <meta name="title" content="Namespace Lucene.Net.Search.Similarities
    | Apache Lucene.NET 4.8.0-beta00009 Documentation ">
     <meta name="generator" content="docfx 2.56.0.0">

     <link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
     <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
     <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
     <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
     <meta property="docfx:navrel" content="toc.html">
     <meta property="docfx:tocrel" content="core/toc.html">

     <meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">

   </head>
   <body data-spy="scroll" data-target="#affix" data-offset="120">
     <div id="wrapper">
       <header>

         <nav id="autocollapse" class="navbar ng-scope" role="navigation">
           <div class="container">
             <div class="navbar-header">
               <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
                 <span class="sr-only">Toggle navigation</span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
               </button>

               <a class="navbar-brand" href="/">
                 <img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
               </a>
             </div>
             <div class="collapse navbar-collapse" id="navbar">
               <form class="navbar-form navbar-right" role="search" id="search">
                 <div class="form-group">
                   <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
                 </div>
               </form>
             </div>
           </div>
         </nav>

         <div class="subnav navbar navbar-default">
           <div class="container hide-when-search">
             <ul class="level0 breadcrumb">
                 <li>
                     <a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
                      <span id="breadcrumb">
                         <ul class="breadcrumb">
                           <li></li>
                         </ul>
                     </span>
                 </li>
             </ul>
           </div>
         </div>
       </header>
       <div class="container body-content">

         <div id="search-results">
           <div class="search-list"></div>
           <div class="sr-items">
             <p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
           </div>
           <ul id="pagination"></ul>
         </div>
       </div>
       <div role="main" class="container body-content hide-when-search">

         <div class="sidenav hide-when-search">
           <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
           <div class="sidetoggle collapse" id="sidetoggle">
             <div id="sidetoc"></div>
           </div>
         </div>
         <div class="article row grid-right">
           <div class="col-md-10">
             <article class="content wrap" id="_content" data-uid="Lucene.Net.Search.Similarities">

   <h1 id="Lucene_Net_Search_Similarities" data-uid="Lucene.Net.Search.Similarities" class="text-break">Namespace Lucene.Net.Search.Similarities
   </h1>
   <div class="markdown level0 summary"><!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <p>This package contains the various ranking models that can be used in Lucene. The
 abstract class <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> serves
 as the base for ranking functions. For searching, users can employ the models
 already implemented or create their own by extending one of the classes in this
 package.</p>
 <h2 id="table-of-contents">Table Of Contents</h2>
 <ol>
 <li><a href="#sims">Summary of the Ranking Methods</a> 2. <a href="#changingsimilarity">Changing the Similarity</a> </li>
 </ol>
 <h2 id="summary-of-the-ranking-methods">Summary of the Ranking Methods</h2>
 <p><a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html">DefaultSimilarity</a> is the original Lucene scoring function. It is based on a highly optimized <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model</a>. For more information, see <a class="xref" href="Lucene.Net.Search.Similarities.TFIDFSimilarity.html">TFIDFSimilarity</a>.</p>
 <p><a class="xref" href="Lucene.Net.Search.Similarities.BM25Similarity.html">BM25Similarity</a> is an optimized implementation of the successful Okapi BM25 model.</p>
 <p><a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a> provides a basic implementation of the Similarity contract and exposes a highly simplified interface, which makes it an ideal starting point for new ranking functions. Lucene ships the following methods built on <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a>: * Amati and Rijsbergen&#39;s {@linkplain org.apache.lucene.search.similarities.DFRSimilarity DFR} framework; * Clinchant and Gaussier&#39;s {@linkplain org.apache.lucene.search.similarities.IBSimilarity Information-based models} for IR; * The implementation of two {@linkplain org.apache.lucene.search.similarities.LMSimilarity language models} from Zhai and Lafferty&#39;s paper. Since <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a> is not optimized to the same extent as <a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html">DefaultSimilarity</a> and <a class="xref" href="Lucene.Net.Search.Similarities.BM25Similarity.html">BM25Similarity</a>, a difference in performance is to be expected when using the methods listed above. However, optimizations can always be implemented in subclasses; see <a href="#changingsimilarity">below</a>.</p>
 <h2 id="changing-similarity">Changing Similarity</h2>
 <p>Chances are the available Similarities are sufficient for all your searching needs. However, in some applications it may be necessary to customize your <a href="Similarity.html">Similarity</a> implementation. For instance, some applications do not need to distinguish between shorter and longer documents (see <a href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">a &quot;fair&quot; similarity</a>).</p>
 <p>To change <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a>, one must do so for both indexing and searching, and the changes must happen before either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it just isn&#39;t well-defined what is going to happen. </p>
 <p>To make this change, implement your own <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> (likely you&#39;ll want to simply subclass an existing method, be it <a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html">DefaultSimilarity</a> or a descendant of <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a>), and then register the new class by calling <a class="xref" href="Lucene.Net.Index.IndexWriterConfig.html">#setSimilarity(Similarity)</a> before indexing and <a class="xref" href="Lucene.Net.Search.IndexSearcher.html">#setSimilarity(Similarity)</a> before searching. </p>
 <h3 id="extending-linkplain-orgapachelucenesearchsimilaritiessimilaritybase">Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3>
 <p> The easiest way to quickly implement a new ranking method is to extend <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a>, which provides basic implementations for the low level . Subclasses are only required to implement the <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html#methods">Float)</a> and <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">#toString()</a> methods.</p>
 <p>Another option is to extend one of the <a href="#framework">frameworks</a> based on <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a>. These Similarities are implemented modularly, e.g. <a class="xref" href="Lucene.Net.Search.Similarities.DFRSimilarity.html">DFRSimilarity</a> delegates computation of the three parts of its formula to the classes <a class="xref" href="Lucene.Net.Search.Similarities.BasicModel.html">BasicModel</a>, <a class="xref" href="Lucene.Net.Search.Similarities.AfterEffect.html">AfterEffect</a> and <a class="xref" href="Lucene.Net.Search.Similarities.Normalization.html">Normalization</a>. Instead of subclassing the Similarity, one can simply introduce a new basic model and tell <a class="xref" href="Lucene.Net.Search.Similarities.DFRSimilarity.html">DFRSimilarity</a> to use it.</p>
 <h3 id="changing-linkplain-orgapachelucenesearchsimilaritiesdefaultsimilarity">Changing {@linkplain org.apache.lucene.search.similarities.DefaultSimilarity}</h3>
 <p> If you are interested in use cases for changing your similarity, see the Lucene users&#39;s mailing list at <a href="http://www.gossamer-threads.com/lists/lucene/java-user/39125">Overriding Similarity</a>. In summary, here are a few use cases: 1. <p>The <code>SweetSpotSimilarity</code> in <code>org.apache.lucene.misc</code> gives small increases as the frequency increases a small amount and then greater increases when you hit the &quot;sweet spot&quot;, i.e. where you think the frequency of terms is more significant.</p> 2. <p>Overriding tf — In some applications, it doesn&#39;t matter what the score of a document is as long as a matching term occurs. In these cases people have overridden Similarity to return 1 from the tf() method.</p> 3. <p>Changing Length Normalization — By overriding <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html#methods">State)</a>, it is possible to discount how the length of a field contributes to a score. In <a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html">DefaultSimilarity</a>, lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be 1 / (numTerms in field), all fields will be treated <a href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">&quot;fairly&quot;</a>.</p> In general, Chris Hostetter sums it up best in saying (from <a href="http://www.gossamer-threads.com/lists/lucene/java-user/39125#39125">the Lucene users&#39;s mailing list</a>): </p>
 <blockquote><p>[One would override the Similarity in] ... any situation where you know more about your data then just that it&#39;s &quot;text&quot; is a situation where it <em>might</em> make sense to to override your Similarity method.</p>
 </blockquote>
 </div>
   <div class="markdown level0 conceptual"></div>
   <div class="markdown level0 remarks"></div>
     <h3 id="classes">Classes
   </h3>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffect.html">AfterEffect</a></h4>
       <section><p>This class acts as the base class for the implementations of the <em>first
 normalization of the informative content</em> in the DFR framework. This
 component is also called the <em>after effect</em> and is defined by the
 formula <em>Inf<sub>2</sub> = 1 - Prob<sub>2</sub></em>, where
 <em>Prob<sub>2</sub></em> measures the <em>information gain</em>.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffect.NoAfterEffect.html">AfterEffect.NoAfterEffect</a></h4>
       <section><p>Implementation used when there is no aftereffect. </p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffectB.html">AfterEffectB</a></h4>
       <section><p>Model of the information gain based on the ratio of two Bernoulli processes.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffectL.html">AfterEffectL</a></h4>
       <section><p>Model of the information gain based on Laplace&apos;s law of succession.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModel.html">BasicModel</a></h4>
       <section><p>This class acts as the base class for the specific <em>basic model</em>
 implementations in the DFR framework. Basic models compute the
 <em>informative content Inf<sub>1</sub> = -log<sub>2</sub>Prob<sub>1</sub>
 </em>.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelBE.html">BasicModelBE</a></h4>
       <section><p>Limiting form of the Bose-Einstein model. The formula used in Lucene differs
 slightly from the one in the original paper: <code>F</code> is increased by <code>tfn+1</code>
 and <code>N</code> is increased by <code>F</code>
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p><p>
 NOTE: in some corner cases this model may give poor performance with Normalizations that
 return large values for <code>tfn</code> such as <a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH3.html">NormalizationH3</a>. Consider using the
 geometric approximation (<a class="xref" href="Lucene.Net.Search.Similarities.BasicModelG.html">BasicModelG</a>) instead, which provides the same relevance
 but with less practical problems.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelD.html">BasicModelD</a></h4>
       <section><p>Implements the approximation of the binomial model with the divergence
 for DFR. The formula used in Lucene differs slightly from the one in the
 original paper: to avoid underflow for small values of <code>N</code> and
 <code>F</code>, <code>N</code> is increased by <code>1</code> and
 <code>F</code> is always increased by <code>tfn+1</code>.
 <p>
 WARNING: for terms that do not meet the expected random distribution
 (e.g. stopwords), this model may give poor performance, such as
 abnormally high scores for low tf values.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelG.html">BasicModelG</a></h4>
       <section><p>Geometric as limiting form of the Bose-Einstein model.  The formula used in Lucene differs
 slightly from the one in the original paper: <code>F</code> is increased by <code>1</code>
 and <code>N</code> is increased by <code>F</code>.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelIF.html">BasicModelIF</a></h4>
       <section><p>An approximation of the <em>I(n<sub>e</sub>)</em> model.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelIn.html">BasicModelIn</a></h4>
       <section><p>The basic tf-idf model of randomness.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelIne.html">BasicModelIne</a></h4>
       <section><p>Tf-idf model of randomness, based on a mixture of Poisson and inverse
 document frequency.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelP.html">BasicModelP</a></h4>
       <section><p>Implements the Poisson approximation for the binomial model for DFR.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p><p>
 WARNING: for terms that do not meet the expected random distribution
 (e.g. stopwords), this model may give poor performance, such as
 abnormally high scores for low tf values.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BasicStats.html">BasicStats</a></h4>
       <section><p>Stores all statistics commonly used ranking methods.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.BM25Similarity.html">BM25Similarity</a></h4>
       <section><p>BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker,
 Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3.
 In Proceedings of the Third <strong>T</strong>ext <strong>RE</strong>trieval <strong>C</strong>onference (TREC 1994).
 Gaithersburg, USA, November 1994.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html">DefaultSimilarity</a></h4>
       <section><p>Expert: Default scoring implementation which encodes (<a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html#Lucene_Net_Search_Similarities_DefaultSimilarity_EncodeNormValue_System_Single_">EncodeNormValue(Single)</a>)
 norm values as a single byte before being stored. At search time,
 the norm byte value is read from the index
 <a class="xref" href="Lucene.Net.Store.Directory.html">Directory</a> and
 decoded (<a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html#Lucene_Net_Search_Similarities_DefaultSimilarity_DecodeNormValue_System_Int64_">DecodeNormValue(Int64)</a>) back to a float <em>norm</em> value.
 this encoding/decoding, while reducing index size, comes with the price of
 precision loss - it is not guaranteed that <em>Decode(Encode(x)) = x</em>. For
 instance, <em>Decode(Encode(0.89)) = 0.75</em>.
 <p>
 Compression of norm values to a single byte saves memory at search time,
 because once a field is referenced at search time, its norms - for all
 documents - are maintained in memory.
 <p>
 The rationale supporting such lossy compression of norm values is that given
 the difficulty (and inaccuracy) of users to express their true information
 need by a query, only big differences matter.
 <p>
 Last, note that search time is too late to modify this <em>norm</em> part of
 scoring, e.g. by using a different <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> for search.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.DFRSimilarity.html">DFRSimilarity</a></h4>
       <section><p>Implements the <em>divergence from randomness (DFR)</em> framework
 introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002.
 Probabilistic models of information retrieval based on measuring the
 divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002),
 357-389.
 <p>The DFR scoring formula is composed of three separate components: the
 <em>basic model</em>, the <em>aftereffect</em> and an additional
 <em>normalization</em> component, represented by the classes
 <a class="xref" href="Lucene.Net.Search.Similarities.BasicModel.html">BasicModel</a>, <a class="xref" href="Lucene.Net.Search.Similarities.AfterEffect.html">AfterEffect</a> and <a class="xref" href="Lucene.Net.Search.Similarities.Normalization.html">Normalization</a>,
 respectively. The names of these classes were chosen to match the names of
 their counterparts in the Terrier IR engine.</p>
 <p>To construct a <a class="xref" href="Lucene.Net.Search.Similarities.DFRSimilarity.html">DFRSimilarity</a>, you must specify the implementations for
 all three components of DFR:
 <table><thead><tr><th>ComponentImplementations</th><th></th></tr></thead><tbody><tr><td><a class="xref" href="Lucene.Net.Search.Similarities.BasicModel.html">BasicModel</a>: Basic model of information content:
 <ul><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelBE.html">BasicModelBE</a>: Limiting form of Bose-Einstein</li><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelG.html">BasicModelG</a>: Geometric approximation of Bose-Einstein</li><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelP.html">BasicModelP</a>: Poisson approximation of the Binomial</li><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelD.html">BasicModelD</a>: Divergence approximation of the Binomial</li><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelIn.html">BasicModelIn</a>: Inverse document frequency</li><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelIne.html">BasicModelIne</a>: Inverse expected document frequency [mixture of Poisson and IDF]</li><li><a class="xref" href="Lucene.Net.Search.Similarities.BasicModelIF.html">BasicModelIF</a>: Inverse term frequency [approximation of I(ne)]</li></ul>
 </td><td></td></tr><tr><td><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffect.html">AfterEffect</a>: First normalization of information gain:
 <ul><li><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffectL.html">AfterEffectL</a>: Laplace&apos;s law of succession</li><li><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffectB.html">AfterEffectB</a>: Ratio of two Bernoulli processes</li><li><a class="xref" href="Lucene.Net.Search.Similarities.AfterEffect.NoAfterEffect.html">AfterEffect.NoAfterEffect</a>: no first normalization</li></ul>
 </td><td></td></tr><tr><td><a class="xref" href="Lucene.Net.Search.Similarities.Normalization.html">Normalization</a>: Second (length) normalization:
 <ul><li><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH1.html">NormalizationH1</a>: Uniform distribution of term frequency</li><li><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH2.html">NormalizationH2</a>: term frequency density inversely related to length</li><li><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH3.html">NormalizationH3</a>: term frequency normalization provided by Dirichlet prior</li><li><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationZ.html">NormalizationZ</a>: term frequency normalization provided by a Zipfian relation</li><li><a class="xref" href="Lucene.Net.Search.Similarities.Normalization.NoNormalization.html">Normalization.NoNormalization</a>: no second normalization</li></ul>
 </td><td></td></tr></tbody></table></p>
 <p>
 <p>Note that <em>qtf</em>, the multiplicity of term-occurrence in the query,
 is not handled by this implementation.
 </p> </p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Distribution.html">Distribution</a></h4>
       <section><p>The probabilistic distribution used to model term occurrence
 in information-based models.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.DistributionLL.html">DistributionLL</a></h4>
       <section><p>Log-logistic distribution.
 <p>Unlike for DFR, the natural logarithm is used, as
 it is faster to compute and the original paper does not express any
 preference to a specific base.</p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.DistributionSPL.html">DistributionSPL</a></h4>
       <section><p>The smoothed power-law (SPL) distribution for the information-based framework
 that is described in the original paper.
 <p>Unlike for DFR, the natural logarithm is used, as
 it is faster to compute and the original paper does not express any
 preference to a specific base.</p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.IBSimilarity.html">IBSimilarity</a></h4>
       <section><p>Provides a framework for the family of information-based models, as described
 in StÉphane Clinchant and Eric Gaussier. 2010. Information-based
 models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR
 conference on Research and development in information retrieval (SIGIR &apos;10).
 ACM, New York, NY, USA, 234-241.
 <p>The retrieval function is of the form <em>RSV(q, d) = ∑
 -x<sup>q</sup><sub>w</sub> log Prob(X<sub>w</sub> &gt;=
 t<sup>d</sup><sub>w</sub> | λ<sub>w</sub>)</em>, where
 <ul><li><em>x<sup>q</sup><sub>w</sub></em> is the query boost;</li><li><em>X<sub>w</sub></em> is a random variable that counts the occurrences
         of word <em>w</em>;</li><li><em>t<sup>d</sup><sub>w</sub></em> is the normalized term frequency;</li><li><em>λ<sub>w</sub></em> is a parameter.</li></ul>
 </p>
 <p>The framework described in the paper has many similarities to the DFR
 framework (see <a class="xref" href="Lucene.Net.Search.Similarities.DFRSimilarity.html">DFRSimilarity</a>). It is possible that the two
 Similarities will be merged at one point.</p>
 <p>To construct an <a class="xref" href="Lucene.Net.Search.Similarities.IBSimilarity.html">IBSimilarity</a>, you must specify the implementations for
 all three components of the Information-Based model.
 <table><thead><tr><th>ComponentImplementations</th><th></th></tr></thead><tbody><tr><td><a class="xref" href="Lucene.Net.Search.Similarities.IBSimilarity.html#Lucene_Net_Search_Similarities_IBSimilarity_Distribution">Distribution</a>: Probabilistic distribution used to
             model term occurrence
 <ul><li><a class="xref" href="Lucene.Net.Search.Similarities.DistributionLL.html">DistributionLL</a>: Log-logistic</li><li><a class="xref" href="Lucene.Net.Search.Similarities.DistributionLL.html">DistributionLL</a>: Smoothed power-law</li></ul>
 </td><td></td></tr><tr><td><a class="xref" href="Lucene.Net.Search.Similarities.IBSimilarity.html#Lucene_Net_Search_Similarities_IBSimilarity_Lambda">Lambda</a>: λ<sub>w</sub> parameter of the
     probability distribution
 <ul><li><a class="xref" href="Lucene.Net.Search.Similarities.LambdaDF.html">LambdaDF</a>: <code>N<sub>w</sub>/N</code> or average
         number of documents where w occurs</li><li><a class="xref" href="Lucene.Net.Search.Similarities.LambdaTTF.html">LambdaTTF</a>: <code>F<sub>w</sub>/N</code> or
         average number of occurrences of w in the collection</li></ul>
 </td><td></td></tr><tr><td><a class="xref" href="Lucene.Net.Search.Similarities.IBSimilarity.html#Lucene_Net_Search_Similarities_IBSimilarity_Normalization">Normalization</a>: Term frequency normalizationAny supported DFR normalization (listed in
 <a class="xref" href="Lucene.Net.Search.Similarities.DFRSimilarity.html">DFRSimilarity</a>)
 </td><td></td></tr></tbody></table>
 </p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Lambda.html">Lambda</a></h4>
       <section><p>The <em>lambda (λ<sub>w</sub>)</em> parameter in information-based
 models.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LambdaDF.html">LambdaDF</a></h4>
       <section><p>Computes lambda as <code>docFreq+1 / numberOfDocuments+1</code>.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LambdaTTF.html">LambdaTTF</a></h4>
       <section><p>Computes lambda as <code>totalTermFreq+1 / numberOfDocuments+1</code>.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LMDirichletSimilarity.html">LMDirichletSimilarity</a></h4>
       <section><p>Bayesian smoothing using Dirichlet priors. From Chengxiang Zhai and John
 Lafferty. 2001. A study of smoothing methods for language models applied to
 Ad Hoc information retrieval. In Proceedings of the 24th annual international
 ACM SIGIR conference on Research and development in information retrieval
 (SIGIR &apos;01). ACM, New York, NY, USA, 334-342.
 <p>
 The formula as defined the paper assigns a negative score to documents that
 contain the term, but with fewer occurrences than predicted by the collection
 language model. The Lucene implementation returns <code>0</code> for such
 documents.
 </p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LMJelinekMercerSimilarity.html">LMJelinekMercerSimilarity</a></h4>
       <section><p>Language model based on the Jelinek-Mercer smoothing method. From Chengxiang
 Zhai and John Lafferty. 2001. A study of smoothing methods for language
 models applied to Ad Hoc information retrieval. In Proceedings of the 24th
 annual international ACM SIGIR conference on Research and development in
 information retrieval (SIGIR &apos;01). ACM, New York, NY, USA, 334-342.
 <p>The model has a single parameter, λ. According to said paper, the
 optimal value depends on both the collection and the query. The optimal value
 is around <code>0.1</code> for title queries and <code>0.7</code> for long queries.</p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.html">LMSimilarity</a></h4>
       <section><p>Abstract superclass for language modeling Similarities. The following inner
 types are introduced:
 <ul><li><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.LMStats.html">LMSimilarity.LMStats</a>, which defines a new statistic, the probability that
         the collection language model generates the current term;</li><li><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.ICollectionModel.html">LMSimilarity.ICollectionModel</a>, which is a strategy interface for object that
         compute the collection language model <code>p(w|C)</code>;</li><li><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.DefaultCollectionModel.html">LMSimilarity.DefaultCollectionModel</a>, an implementation of the former, that
         computes the term probability as the number of occurrences of the term in the
         collection, divided by the total number of tokens.</li></ul>
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.DefaultCollectionModel.html">LMSimilarity.DefaultCollectionModel</a></h4>
       <section><p>Models <code>p(w|C)</code> as the number of occurrences of the term in the
 collection, divided by the total number of tokens <code>+ 1</code>.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.LMStats.html">LMSimilarity.LMStats</a></h4>
       <section><p>Stores the collection distribution of the current term. </p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.MultiSimilarity.html">MultiSimilarity</a></h4>
       <section><p>Implements the CombSUM method for combining evidence from multiple
 similarity values described in: Joseph A. Shaw, Edward A. Fox.
 In Text REtrieval Conference (1993), pp. 243-252
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Normalization.html">Normalization</a></h4>
       <section><p>This class acts as the base class for the implementations of the term
 frequency normalization methods in the DFR framework.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Normalization.NoNormalization.html">Normalization.NoNormalization</a></h4>
       <section><p>Implementation used when there is no normalization. </p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH1.html">NormalizationH1</a></h4>
       <section><p>Normalization model that assumes a uniform distribution of the term frequency.
 <p>While this model is parameterless in the
 <a href="http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742">
 original article</a>, <a href="http://dl.acm.org/citation.cfm?id=1835490">
 information-based models</a> (see <a class="xref" href="Lucene.Net.Search.Similarities.IBSimilarity.html">IBSimilarity</a>) introduced a
 multiplying factor.
 The default value for the <code>c</code> parameter is <code>1</code>.</p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH2.html">NormalizationH2</a></h4>
       <section><p>Normalization model in which the term frequency is inversely related to the
 length.
 <p>While this model is parameterless in the
 <a href="http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742">
 original article</a>, the <a href="http://theses.gla.ac.uk/1570/">thesis</a>
 introduces the parameterized variant.
 The default value for the <code>c</code> parameter is <code>1</code>.</p></p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationH3.html">NormalizationH3</a></h4>
       <section><p>Dirichlet Priors normalization
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.NormalizationZ.html">NormalizationZ</a></h4>
       <section><p>Pareto-Zipf Normalization
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.PerFieldSimilarityWrapper.html">PerFieldSimilarityWrapper</a></h4>
       <section><p>Provides the ability to use a different <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> for different fields.
 <p>
 Subclasses should implement <a class="xref" href="Lucene.Net.Search.Similarities.PerFieldSimilarityWrapper.html#Lucene_Net_Search_Similarities_PerFieldSimilarityWrapper_Get_System_String_">Get(String)</a> to return an appropriate
 <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> (for example, using field-specific parameter values) for the field.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a></h4>
       <section><p>Similarity defines the components of Lucene scoring.
 <p>
 Expert: Scoring API.
 <p>
 This is a low-level API, you should only extend this API if you want to implement
 an information retrieval <em>model</em>.  If you are instead looking for a convenient way
 to alter Lucene&apos;s scoring, consider extending a higher-level implementation
 such as <a class="xref" href="Lucene.Net.Search.Similarities.TFIDFSimilarity.html">TFIDFSimilarity</a>, which implements the vector space model with this API, or
 just tweaking the default implementation: <a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html">DefaultSimilarity</a>.
 <p>
 Similarity determines how Lucene weights terms, and Lucene interacts with
 this class at both <a href="#indextime">index-time</a> and
 <a href="#querytime">query-time</a>.
 <p>
 <a name="indextime"></a>
 At indexing time, the indexer calls <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html#Lucene_Net_Search_Similarities_Similarity_ComputeNorm_Lucene_Net_Index_FieldInvertState_">ComputeNorm(FieldInvertState)</a>, allowing
 the <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> implementation to set a per-document value for the field that will
 be later accessible via <a class="xref" href="Lucene.Net.Index.AtomicReader.html#Lucene_Net_Index_AtomicReader_GetNormValues_System_String_">GetNormValues(String)</a>.  Lucene makes no assumption
 about what is in this norm, but it is most useful for encoding length normalization
 information.
 <p>
 Implementations should carefully consider how the normalization is encoded: while
 Lucene&apos;s classical <a class="xref" href="Lucene.Net.Search.Similarities.TFIDFSimilarity.html">TFIDFSimilarity</a> encodes a combination of index-time boost
 and length normalization information with <a class="xref" href="Lucene.Net.Util.SmallSingle.html">SmallSingle</a> into a single byte, this
 might not be suitable for all purposes.
 <p>
 Many formulas require the use of average document length, which can be computed via a
 combination of <a class="xref" href="Lucene.Net.Search.CollectionStatistics.html#Lucene_Net_Search_CollectionStatistics_SumTotalTermFreq">SumTotalTermFreq</a> and
 <a class="xref" href="Lucene.Net.Search.CollectionStatistics.html#Lucene_Net_Search_CollectionStatistics_MaxDoc">MaxDoc</a> or <a class="xref" href="Lucene.Net.Search.CollectionStatistics.html#Lucene_Net_Search_CollectionStatistics_DocCount">DocCount</a>,
 depending upon whether the average should reflect field sparsity.
 <p>
 Additional scoring factors can be stored in named
 <a class="xref" href="Lucene.Net.Documents.NumericDocValuesField.html">NumericDocValuesField</a>s and accessed
 at query-time with <a class="xref" href="Lucene.Net.Index.AtomicReader.html#Lucene_Net_Index_AtomicReader_GetNumericDocValues_System_String_">GetNumericDocValues(String)</a>.
 <p>
 Finally, using index-time boosts (either via folding into the normalization byte or
 via <a class="xref" href="Lucene.Net.Index.DocValues.html">DocValues</a>), is an inefficient way to boost the scores of different fields if the
 boost will be the same for every document, instead the Similarity can simply take a constant
 boost parameter <em>C</em>, and <a class="xref" href="Lucene.Net.Search.Similarities.PerFieldSimilarityWrapper.html">PerFieldSimilarityWrapper</a> can return different
 instances with different boosts depending upon field name.
 <p>
 <a name="querytime"></a>
 At query-time, Queries interact with the Similarity via these steps:
 <ol><li>The <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html#Lucene_Net_Search_Similarities_Similarity_ComputeWeight_System_Single_Lucene_Net_Search_CollectionStatistics_Lucene_Net_Search_TermStatistics___">ComputeWeight(Single, CollectionStatistics, TermStatistics[])</a> method is called a single time,
       allowing the implementation to compute any statistics (such as IDF, average document length, etc)
       across <em>the entire collection</em>. The <a class="xref" href="Lucene.Net.Search.TermStatistics.html">TermStatistics</a> and <a class="xref" href="Lucene.Net.Search.CollectionStatistics.html">CollectionStatistics</a> passed in
       already contain all of the raw statistics involved, so a <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> can freely use any combination
       of statistics without causing any additional I/O. Lucene makes no assumption about what is
       stored in the returned <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimWeight.html">Similarity.SimWeight</a> object.</li><li>The query normalization process occurs a single time: <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimWeight.html#Lucene_Net_Search_Similarities_Similarity_SimWeight_GetValueForNormalization">GetValueForNormalization()</a>
       is called for each query leaf node, <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html#Lucene_Net_Search_Similarities_Similarity_QueryNorm_System_Single_">QueryNorm(Single)</a> is called for the top-level
       query, and finally <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimWeight.html#Lucene_Net_Search_Similarities_Similarity_SimWeight_Normalize_System_Single_System_Single_">Normalize(Single, Single)</a> passes down the normalization value
       and any top-level boosts (e.g. from enclosing <a class="xref" href="Lucene.Net.Search.BooleanQuery.html">BooleanQuery</a>s).</li><li>For each segment in the index, the <a class="xref" href="Lucene.Net.Search.Query.html">Query</a> creates a <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html#Lucene_Net_Search_Similarities_Similarity_GetSimScorer_Lucene_Net_Search_Similarities_Similarity_SimWeight_Lucene_Net_Index_AtomicReaderContext_">GetSimScorer(Similarity.SimWeight, AtomicReaderContext)</a>
       The GetScore() method is called for each matching document.</li></ol>
 <p>
 <a name="explaintime"></a>
 When <a class="xref" href="Lucene.Net.Search.IndexSearcher.html#Lucene_Net_Search_IndexSearcher_Explain_Lucene_Net_Search_Query_System_Int32_">Explain(Query, Int32)</a> is called, queries consult the Similarity&apos;s DocScorer for an
 explanation of how it computed its score. The query passes in a the document id and an explanation of how the frequency
 was computed.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimScorer.html">Similarity.SimScorer</a></h4>
       <section><p>API for scoring &quot;sloppy&quot; queries such as <a class="xref" href="Lucene.Net.Search.TermQuery.html">TermQuery</a>,
 <a class="xref" href="Lucene.Net.Search.Spans.SpanQuery.html">SpanQuery</a>, and <a class="xref" href="Lucene.Net.Search.PhraseQuery.html">PhraseQuery</a>.
 <p>
 Frequencies are floating-point values: an approximate
 within-document frequency adjusted for &quot;sloppiness&quot; by
 <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimScorer.html#Lucene_Net_Search_Similarities_Similarity_SimScorer_ComputeSlopFactor_System_Int32_">ComputeSlopFactor(Int32)</a>.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimWeight.html">Similarity.SimWeight</a></h4>
       <section><p>Stores the weight for a query across the indexed collection. this abstract
 implementation is empty; descendants of <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> should
 subclass <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.SimWeight.html">Similarity.SimWeight</a> and define the statistics they require in the
 subclass. Examples include idf, average field length, etc.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a></h4>
       <section><p>A subclass of <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> that provides a simplified API for its
 descendants. Subclasses are only required to implement the <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html#Lucene_Net_Search_Similarities_SimilarityBase_Score_Lucene_Net_Search_Similarities_BasicStats_System_Single_System_Single_">Score(BasicStats, Single, Single)</a>
 and <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html#Lucene_Net_Search_Similarities_SimilarityBase_ToString">ToString()</a> methods. Implementing
 <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html#Lucene_Net_Search_Similarities_SimilarityBase_Explain_Lucene_Net_Search_Explanation_Lucene_Net_Search_Similarities_BasicStats_System_Int32_System_Single_System_Single_">Explain(Explanation, BasicStats, Int32, Single, Single)</a> is optional,
 inasmuch as <a class="xref" href="Lucene.Net.Search.Similarities.SimilarityBase.html">SimilarityBase</a> already provides a basic explanation of the score
 and the term frequency. However, implementers of a subclass are encouraged to
 include as much detail about the scoring method as possible.
 <p>
 Note: multi-word queries such as phrase queries are scored in a different way
 than Lucene&apos;s default ranking algorithm: whereas it &quot;fakes&quot; an IDF value for
 the phrase as a whole (since it does not know it), this class instead scores
 phrases as a summation of the individual term scores.
 <p>
 <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.TFIDFSimilarity.html">TFIDFSimilarity</a></h4>
       <section><p>Implementation of <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> with the Vector Space Model.
 <p>
 Expert: Scoring API.
 <p>TFIDFSimilarity defines the components of Lucene scoring.
 Overriding computation of these components is a convenient
 way to alter Lucene scoring.</p>
 <p>Suggested reading:
 <a href="http://nlp.stanford.edu/IR-book/html/htmledition/queries-as-vectors-1.html">
 Introduction To Information Retrieval, Chapter 6</a>.

 <p>The following describes how Lucene scoring evolves from
 underlying information retrieval models to (efficient) implementation.
 We first brief on <em>VSM Score</em>,
 then derive from it <em>Lucene&apos;s Conceptual Scoring Formula</em>,
 from which, finally, evolves <em>Lucene&apos;s Practical Scoring Function</em>
 (the latter is connected directly with Lucene classes and methods).

 <p>Lucene combines
 <a href="http://en.wikipedia.org/wiki/Standard_Boolean_model">
 Boolean model (BM) of Information Retrieval</a>
 with
 <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">
 Vector Space Model (VSM) of Information Retrieval</a> -
 documents &quot;approved&quot; by BM are scored by VSM.

 <p>In VSM, documents and queries are represented as
 weighted vectors in a multi-dimensional space,
 where each distinct index term is a dimension,
 and weights are
 <a href="http://en.wikipedia.org/wiki/Tfidf">Tf-idf</a> values.

 <p>VSM does not require weights to be <em>Tf-idf</em> values,
 but <em>Tf-idf</em> values are believed to produce search results of high quality,
 and so Lucene is using <em>Tf-idf</em>.
 <em>Tf</em> and <em>Idf</em> are described in more detail below,
 but for now, for completion, let&apos;s just say that
 for given term <em>t</em> and document (or query) <em>x</em>,
 <em>Tf(t,x)</em> varies with the number of occurrences of term <em>t</em> in <em>x</em>
 (when one increases so does the other) and
 <em>idf(t)</em> similarly varies with the inverse of the
 number of index documents containing term <em>t</em>.

 <p><em>VSM score</em> of document <em>d</em> for query <em>q</em> is the
 <a href="http://en.wikipedia.org/wiki/Cosine_similarity">
 Cosine Similarity</a>
 of the weighted query vectors <em>V(q)</em> and <em>V(d)</em>:
 <p>
 <table><tbody><tr><td>
 <table><tbody><tr><td>cosine-similarity(q,d)   =<br><table>
 <item><small>V(q) · V(d)</small></item>
 <item>–––––––––</item>
 <item><small>|V(q)| |V(d)|</small></item>
 </table>
 </td><td></td></tr></tbody></table>
 </td><td></td></tr><tr><td>VSM Score</td><td></td></tr></tbody></table>
 <p>


 <p>Where <em>V(q)</em> · <em>V(d)</em> is the
 <a href="http://en.wikipedia.org/wiki/Dot_product">dot product</a>
 of the weighted vectors,
 and <em>|V(q)|</em> and <em>|V(d)|</em> are their
 <a href="http://en.wikipedia.org/wiki/Euclidean_norm#Euclidean_norm">Euclidean norms</a>.</p>
 <p>Note: the above equation can be viewed as the dot product of
 the normalized weighted vectors, in the sense that dividing
 <em>V(q)</em> by its euclidean norm is normalizing it to a unit vector.

 <p>Lucene refines <em>VSM score</em> for both search quality and usability:
 <ul><li>Normalizing <em>V(d)</em> to the unit vector is known to be problematic in that
  it removes all document length information.
  For some documents removing this info is probably ok,
  e.g. a document made by duplicating a certain paragraph <em>10</em> times,
  especially if that paragraph is made of distinct terms.
  But for a document which contains no duplicated paragraphs,
  this might be wrong.
  To avoid this problem, a different document length normalization
  factor is used, which normalizes to a vector equal to or larger
  than the unit vector: <em>doc-len-norm(d)</em>.
 </li><li>At indexing, users can specify that certain documents are more
 important than others, by assigning a document boost.
 For this, the score of each document is also multiplied by its boost value
 <em>doc-boost(d)</em>.
 </li><li>Lucene is field based, hence each query term applies to a single
 field, document length normalization is by the length of the certain field,
 and in addition to document boost there are also document fields boosts.
 </li><li>The same field can be added to a document during indexing several times,
 and so the boost of that field is the multiplication of the boosts of
 the separate additions (or parts) of that field within the document.
 </li><li>At search time users can specify boosts to each query, sub-query, and
 each query term, hence the contribution of a query term to the score of
 a document is multiplied by the boost of that query term <em>query-boost(q)</em>.
 </li><li>A document may match a multi term query without containing all
 the terms of that query (this is correct for some of the queries),
 and users can further reward documents matching more query terms
 through a coordination factor, which is usually larger when
 more terms are matched: <em>coord-factor(q,d)</em>.
 </li></ul>

 <p>Under the simplifying assumption of a single field in the index,
 we get <em>Lucene&apos;s Conceptual scoring formula</em>:

 <p>
 <table><tbody><tr><td>
 <table><tbody><tr><td>
             score(q,d)   =<br><font color="#FF9933">coord-factor(q,d)</font> ·<br><font color="#CCCC00">query-boost(q)</font> ·<br>
 <table><tbody><tr><td><small><font color="#993399">V(q) · V(d)</font></small></td><td></td></tr><tr><td>–––––––––</td><td></td></tr><tr><td><small><font color="#FF33CC">|V(q)|</font></small></td><td></td></tr></tbody></table>

   ·   <font color="#3399FF">doc-len-norm(d)</font>
   ·   <font color="#3399FF">doc-boost(d)</font>
 </td><td></td></tr></tbody></table>
 </td><td></td></tr><tr><td>Lucene Conceptual Scoring Formula</td><td></td></tr></tbody></table>
 <p>


 <p>The conceptual formula is a simplification in the sense that (1) terms and documents
 are fielded and (2) boosts are usually per query term rather than per query.

 <p>We now describe how Lucene implements this conceptual scoring formula, and
 derive from it <em>Lucene&apos;s Practical Scoring Function</em>.

 <p>For efficient score computation some scoring components
 are computed and aggregated in advance:

 <ul><li><em>Query-boost</em> for the query (actually for each query term)
  is known when search starts.
 </li><li>Query Euclidean norm <em>|V(q)|</em> can be computed when search starts,
 as it is independent of the document being scored.
 From search optimization perspective, it is a valid question
 why bother to normalize the query at all, because all
 scored documents will be multiplied by the same <em>|V(q)|</em>,
 and hence documents ranks (their order by score) will not
 be affected by this normalization.
 There are two good reasons to keep this normalization:
 <ul><li>Recall that
 <a href="http://en.wikipedia.org/wiki/Cosine_similarity">
 Cosine Similarity</a> can be used find how similar
 two documents are. One can use Lucene for e.g.
 clustering, and use a document as a query to compute
 its similarity to other documents.
 In this use case it is important that the score of document <em>d3</em>
 for query <em>d1</em> is comparable to the score of document <em>d3</em>
 for query <em>d2</em>. In other words, scores of a document for two
 distinct queries should be comparable.
 There are other applications that may require this.
 And this is exactly what normalizing the query vector <em>V(q)</em>
 provides: comparability (to a certain extent) of two or more queries.
 </li><li>Applying query normalization on the scores helps to keep the
 scores around the unit vector, hence preventing loss of score data
 because of floating point precision limitations.
 </li></ul>
 </li><li>Document length norm <em>doc-len-norm(d)</em> and document
 boost <em>doc-boost(d)</em> are known at indexing time.
 They are computed in advance and their multiplication
 is saved as a single value in the index: <em>norm(d)</em>.
 (In the equations below, <em>norm(t in d)</em> means <em>norm(field(t) in doc d)</em>
 where <em>field(t)</em> is the field associated with term <em>t</em>.)
 </li></ul>

 <p><em>Lucene&apos;s Practical Scoring Function</em> is derived from the above.
 The color codes demonstrate how it relates
 to those of the <em>conceptual</em> formula:

 <p>
 <table><tbody><tr><td>
 <table><tbody><tr><td>
             score(q,d)   =<br><a href="#formula_coord"><font color="#FF9933">coord(q,d)</font></a>   ·<br><a href="#formula_queryNorm"><font color="#FF33CC">queryNorm(q)</font></a>   ·<br><big><big><big>∑</big></big></big>
 <big><big>(</big></big>
 <a href="#formula_tf"><font color="#993399">tf(t in d)</font></a>   ·<br><a href="#formula_idf"><font color="#993399">idf(t)</font></a><sup>2</sup>   ·<br><a href="#formula_termBoost"><font color="#CCCC00">t.Boost</font></a>   ·<br><a href="#formula_norm"><font color="#3399FF">norm(t,d)</font></a>
 <big><big>)</big></big>
 </td><td></td></tr><tr><td><small>t in q</small></td><td></td></tr></tbody></table>
 </td><td></td></tr><tr><td>Lucene Practical Scoring Function</td><td></td></tr></tbody></table>

 <p> where
 <ol><li>
 <a name="formula_tf"></a>
 <strong><em>tf(t in d)</em></strong>
 correlates to the term&apos;s <em>frequency</em>,
 defined as the number of times term <em>t</em> appears in the currently scored document <em>d</em>.
 Documents that have more occurrences of a given term receive a higher score.
 Note that <em>tf(t in q)</em> is assumed to be <em>1</em> and therefore it does not appear in this equation,
 However if a query contains twice the same term, there will be
 two term-queries with that same term and hence the computation would still be correct (although
 not very efficient).
 The default computation for <em>tf(t in d)</em> in
 DefaultSimilarity (<a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html#Lucene_Net_Search_Similarities_DefaultSimilarity_Tf_System_Single_">Tf(Single)</a>) is:

 <p>
 <table><tbody><tr><td>
             tf(t in d)   =<br>
             frequency<sup><big>½</big></sup>
 </td><td></td></tr></tbody></table>
 <p>

 <p></li><li>
 <a name="formula_idf"></a>
 <strong><em>idf(t)</em></strong> stands for Inverse Document Frequency. this value
 correlates to the inverse of <em>DocFreq</em>
 (the number of documents in which the term <em>t</em> appears).
 this means rarer terms give higher contribution to the total score.
 <em>idf(t)</em> appears for <em>t</em> in both the query and the document,
 hence it is squared in the equation.
 The default computation for <em>idf(t)</em> in
 DefaultSimilarity (<a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html#Lucene_Net_Search_Similarities_DefaultSimilarity_Idf_System_Int64_System_Int64_">Idf(Int64, Int64)</a>) is:<p>
 <p>
 <table><tbody><tr><td>idf(t)   =  1 + log <big>(</big>
 <table><tbody><tr><td><small>NumDocs</small></td><td></td></tr><tr><td>–––––––––</td><td></td></tr><tr><td><small>DocFreq+1</small></td><td></td></tr></tbody></table>
 <big>)</big></td><td></td></tr></tbody></table>
 <p>

 <p></li><li>
 <a name="formula_coord"></a>
 <strong><em>coord(q,d)</em></strong>
 is a score factor based on how many of the query terms are found in the specified document.
 Typically, a document that contains more of the query&apos;s terms will receive a higher score
 than another document with fewer query terms.
 this is a search time factor computed in
 coord(q,d) (<a class="xref" href="Lucene.Net.Search.Similarities.TFIDFSimilarity.html#Lucene_Net_Search_Similarities_TFIDFSimilarity_Coord_System_Int32_System_Int32_">Coord(Int32, Int32)</a>)
 by the Similarity in effect at search time.
 <p>
 </li><li><strong>
 <a name="formula_queryNorm"></a>
 <em>queryNorm(q)</em>
 </strong>
 is a normalizing factor used to make scores between queries comparable.
 this factor does not affect document ranking (since all ranked documents are multiplied by the same factor),
 but rather just attempts to make scores from different queries (or even different indexes) comparable.
 this is a search time factor computed by the Similarity in effect at search time.<p>
 <p>The default computation in
 DefaultSimilarity (<a class="xref" href="Lucene.Net.Search.Similarities.DefaultSimilarity.html#Lucene_Net_Search_Similarities_DefaultSimilarity_QueryNorm_System_Single_">QueryNorm(Single)</a>)
 produces a <a href="http://en.wikipedia.org/wiki/Euclidean_norm#Euclidean_norm">Euclidean norm</a>:</p>
 <p>
 <table><tbody><tr><td>
            queryNorm(q)    =<br>           queryNorm(sumOfSquaredWeights)
              =<br>
 <table><tbody><tr><td><big>1</big></td><td></td></tr><tr><td><big>––––––––––––––</big></td><td></td></tr><tr><td>sumOfSquaredWeights<sup><big>½</big></sup></td><td></td></tr></tbody></table>
 </td><td></td></tr></tbody></table>
 <p>

 <p>The sum of squared weights (of the query terms) is
 computed by the query <a class="xref" href="Lucene.Net.Search.Weight.html">Weight</a> object.
 For example, a <a class="xref" href="Lucene.Net.Search.BooleanQuery.html">BooleanQuery</a>
 computes this value as:</p>
 <p><p>
 <table><tbody><tr><td>
            sumOfSquaredWeights   =<br>           q.Boost <sup><big>2</big></sup>
             ·
 <big><big><big>∑</big></big></big>
 <big><big>(</big></big>
 <a href="#formula_idf">idf(t)</a>  ·
 <a href="#formula_termBoost">t.Boost</a>
 <big><big>) <sup>2</sup> </big></big>
 </td><td></td></tr><tr><td><small>t in q</small></td><td></td></tr></tbody></table>
 where sumOfSquaredWeights is <a class="xref" href="Lucene.Net.Search.Weight.html#Lucene_Net_Search_Weight_GetValueForNormalization">GetValueForNormalization()</a> and
 q.Boost is <a class="xref" href="Lucene.Net.Search.Query.html#Lucene_Net_Search_Query_Boost">Boost</a>
 <p>
 </li><li>
 <a name="formula_termBoost"></a>
 <strong><em>t.Boost</em></strong>
 is a search time boost of term <em>t</em> in the query <em>q</em> as
 specified in the query text
 (see <a href="{@docRoot}/../queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Boosting_a_Term">query syntax</a>),
 or as set by application calls to
 <a class="xref" href="Lucene.Net.Search.Query.html#Lucene_Net_Search_Query_Boost">Boost</a>.
 Notice that there is really no direct API for accessing a boost of one term in a multi term query,
 but rather multi terms are represented in a query as multi
 <a class="xref" href="Lucene.Net.Search.TermQuery.html">TermQuery</a> objects,
 and so the boost of a term in the query is accessible by calling the sub-query
 <a class="xref" href="Lucene.Net.Search.Query.html#Lucene_Net_Search_Query_Boost">Boost</a>.
 <p>
 </li><li>
 <a name="formula_norm"></a>
 <strong><em>norm(t,d)</em></strong> encapsulates a few (indexing time) boost and length factors:<p>
 <p><ul><li><strong>Field boost</strong> - set
 <a class="xref" href="Lucene.Net.Documents.Field.html#Lucene_Net_Documents_Field_Boost">Boost</a>
 before adding the field to a document.
 </li><li><strong>lengthNorm</strong> - computed
 when the document is added to the index in accordance with the number of tokens
 of this field in the document, so that shorter fields contribute more to the score.
 LengthNorm is computed by the <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> class in effect at indexing.
 </li></ul>
 The <a class="xref" href="Lucene.Net.Search.Similarities.TFIDFSimilarity.html#Lucene_Net_Search_Similarities_TFIDFSimilarity_ComputeNorm_Lucene_Net_Index_FieldInvertState_">ComputeNorm(FieldInvertState)</a> method is responsible for
 combining all of these factors into a single <span class="xref">System.Single</span>.</p>
 <p><p>
 When a document is added to the index, all the above factors are multiplied.
 If the document has multiple fields with the same name, all their boosts are multiplied together:</p>
 <p><p>
 <table><tbody><tr><td>
            norm(t,d)   =<br>           lengthNorm
             ·
 <big><big><big>∏</big></big></big><a class="xref" href="Lucene.Net.Index.IIndexableField.html#Lucene_Net_Index_IIndexableField_Boost">Boost</a></td><td></td></tr><tr><td><small>field <em><strong>f</strong></em> in <em>d</em> named as <em><strong>t</strong></em></small></td><td></td></tr></tbody></table>
 Note that search time is too late to modify this <em>norm</em> part of scoring,
 e.g. by using a different <a class="xref" href="Lucene.Net.Search.Similarities.Similarity.html">Similarity</a> for search.
 </li></ol></p>
 </section>
     <h3 id="interfaces">Interfaces
   </h3>
       <h4><a class="xref" href="Lucene.Net.Search.Similarities.LMSimilarity.ICollectionModel.html">LMSimilarity.ICollectionModel</a></h4>
       <section><p>A strategy for computing the collection language model. </p>
 </section>
 </article>
           </div>

           <div class="hidden-sm col-md-2" role="complementary">
             <div class="sideaffix">
               <div class="contribution">
                 <ul class="nav">
                   <li>
                     <a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00009/src/Lucene.Net/Search/Similarities/package.md/#L2" class="contribution-link">Improve this Doc</a>
                   </li>
                 </ul>
               </div>
               <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
               <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
               </nav>
             </div>
           </div>
         </div>
       </div>

       <footer>
         <div class="grad-bottom"></div>
         <div class="footer">
           <div class="container">
             <span class="pull-right">
               <a href="#top">Back to top</a>
             </span>
             Copyright © 2020 Licensed to the Apache Software Foundation (ASF)

           </div>
         </div>
       </footer>
     </div>

     <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
     <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
     <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
   </body>
 </html>