lucene/core/src/java/org/apache/lucene/search/similarities/package-info.java - lucene-solr - Git at Google

 /*
  * Licensed to the Apache Software Foundation (ASF) under one or more
  * contributor license agreements.  See the NOTICE file distributed with
  * this work for additional information regarding copyright ownership.
  * The ASF licenses this file to You under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */

 /**
  * This package contains the various ranking models that can be used in Lucene. The
  * abstract class {@link org.apache.lucene.search.similarities.Similarity} serves
  * as the base for ranking functions. For searching, users can employ the models
  * already implemented or create their own by extending one of the classes in this
  * package.
  *
  * <h2>Table Of Contents</h2>
  *     <ol>
  *         <li><a href="#sims">Summary of the Ranking Methods</a></li>
  *         <li><a href="#changingSimilarity">Changing the Similarity</a></li>
  *     </ol>
  *
  *
  * <a name="sims"></a>
  * <h2>Summary of the Ranking Methods</h2>
  *
  * <p>{@link org.apache.lucene.search.similarities.BM25Similarity} is an optimized
  * implementation of the successful Okapi BM25 model.
  *
  * <p>{@link org.apache.lucene.search.similarities.ClassicSimilarity} is the original Lucene
  * scoring function. It is based on the
  * <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model</a>. For more
  * information, see {@link org.apache.lucene.search.similarities.TFIDFSimilarity}.
  *
  * <p>{@link org.apache.lucene.search.similarities.SimilarityBase} provides a basic
  * implementation of the Similarity contract and exposes a highly simplified
  * interface, which makes it an ideal starting point for new ranking functions.
  * Lucene ships the following methods built on
  * {@link org.apache.lucene.search.similarities.SimilarityBase}:
  *
  * <a name="framework"></a>
  * <ul>
  *   <li>Amati and Rijsbergen's {@linkplain org.apache.lucene.search.similarities.DFRSimilarity DFR} framework;</li>
  *   <li>Clinchant and Gaussier's {@linkplain org.apache.lucene.search.similarities.IBSimilarity Information-based models}
  *     for IR;</li>
  *   <li>The implementation of two {@linkplain org.apache.lucene.search.similarities.LMSimilarity language models} from
  *   Zhai and Lafferty's paper.</li>
  *   <li>{@linkplain org.apache.lucene.search.similarities.DFISimilarity Divergence from independence} models as described
  *   in "IRRA at TREC 2012" (Dinçer).
  *   <li>
  * </ul>
  *
  * Since {@link org.apache.lucene.search.similarities.SimilarityBase} is not
  * optimized to the same extent as
  * {@link org.apache.lucene.search.similarities.ClassicSimilarity} and
  * {@link org.apache.lucene.search.similarities.BM25Similarity}, a difference in
  * performance is to be expected when using the methods listed above. However,
  * optimizations can always be implemented in subclasses; see
  * <a href="#changingSimilarity">below</a>.
  *
  * <a name="changingSimilarity"></a>
  * <h2>Changing Similarity</h2>
  *
  * <p>Chances are the available Similarities are sufficient for all
  *     your searching needs.
  *     However, in some applications it may be necessary to customize your <a
  *         href="Similarity.html">Similarity</a> implementation. For instance, some
  *     applications do not need to distinguish between shorter and longer documents
  *     and could set BM25's {@link org.apache.lucene.search.similarities.BM25Similarity#BM25Similarity(float,float) b}
  *     parameter to {@code 0}.
  *
  * <p>To change {@link org.apache.lucene.search.similarities.Similarity}, one must do so for both indexing and
  *     searching, and the changes must happen before
  *     either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it
  *     just isn't well-defined what is going to happen.
  *
  * <p>To make this change, implement your own {@link org.apache.lucene.search.similarities.Similarity} (likely
  *     you'll want to simply subclass {@link org.apache.lucene.search.similarities.SimilarityBase}), and
  *     then register the new class by calling
  *     {@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity)}
  *     before indexing and
  *     {@link org.apache.lucene.search.IndexSearcher#setSimilarity(Similarity)}
  *     before searching.
  *
  * <h3>Tuning {@linkplain org.apache.lucene.search.similarities.BM25Similarity}</h3>
  * <p>{@link org.apache.lucene.search.similarities.BM25Similarity} has
  * two parameters that may be tuned:
  * <ul>
  *   <li><tt>k1</tt>, which calibrates term frequency saturation and must be
  *   positive or null. A value of {@code 0} makes term frequency completely
  *   ignored, making documents scored only based on the value of the <tt>IDF</tt>
  *   of the matched terms. Higher values of <tt>k1</tt> increase the impact of
  *   term frequency on the final score. Default value is {@code 1.2}.</li>
  *   <li><tt>b</tt>, which controls how much document length should normalize
  *   term frequency values and must be in {@code [0, 1]}. A value of {@code 0}
  *   disables length normalization completely. Default value is {@code 0.75}.</li>
  * </ul>
  *
  * <h3>Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3>
  * <p>
  * The easiest way to quickly implement a new ranking method is to extend
  * {@link org.apache.lucene.search.similarities.SimilarityBase}, which provides
  * basic implementations for the low level . Subclasses are only required to
  * implement the {@link org.apache.lucene.search.similarities.SimilarityBase#score(BasicStats, double, double)}
  * and {@link org.apache.lucene.search.similarities.SimilarityBase#toString()}
  * methods.
  *
  * <p>Another option is to extend one of the <a href="#framework">frameworks</a>
  * based on {@link org.apache.lucene.search.similarities.SimilarityBase}. These
  * Similarities are implemented modularly, e.g.
  * {@link org.apache.lucene.search.similarities.DFRSimilarity} delegates
  * computation of the three parts of its formula to the classes
  * {@link org.apache.lucene.search.similarities.BasicModel},
  * {@link org.apache.lucene.search.similarities.AfterEffect} and
  * {@link org.apache.lucene.search.similarities.Normalization}. Instead of
  * subclassing the Similarity, one can simply introduce a new basic model and tell
  * {@link org.apache.lucene.search.similarities.DFRSimilarity} to use it.
  *
  */
 package org.apache.lucene.search.similarities;
	/*
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	*/

	/**
	* This package contains the various ranking models that can be used in Lucene. The
	* abstract class {@link org.apache.lucene.search.similarities.Similarity} serves
	* as the base for ranking functions. For searching, users can employ the models
	* already implemented or create their own by extending one of the classes in this
	* package.
	*
	* <h2>Table Of Contents</h2>
	* <ol>
	* <li><a href="#sims">Summary of the Ranking Methods</a></li>
	* <li><a href="#changingSimilarity">Changing the Similarity</a></li>
	* </ol>
	*
	*
	* <a name="sims"></a>
	* <h2>Summary of the Ranking Methods</h2>
	*
	* <p>{@link org.apache.lucene.search.similarities.BM25Similarity} is an optimized
	* implementation of the successful Okapi BM25 model.
	*
	* <p>{@link org.apache.lucene.search.similarities.ClassicSimilarity} is the original Lucene
	* scoring function. It is based on the
	* <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model</a>. For more
	* information, see {@link org.apache.lucene.search.similarities.TFIDFSimilarity}.
	*
	* <p>{@link org.apache.lucene.search.similarities.SimilarityBase} provides a basic
	* implementation of the Similarity contract and exposes a highly simplified
	* interface, which makes it an ideal starting point for new ranking functions.
	* Lucene ships the following methods built on
	* {@link org.apache.lucene.search.similarities.SimilarityBase}:
	*
	* <a name="framework"></a>
	* <ul>
	* <li>Amati and Rijsbergen's {@linkplain org.apache.lucene.search.similarities.DFRSimilarity DFR} framework;</li>
	* <li>Clinchant and Gaussier's {@linkplain org.apache.lucene.search.similarities.IBSimilarity Information-based models}
	* for IR;</li>
	* <li>The implementation of two {@linkplain org.apache.lucene.search.similarities.LMSimilarity language models} from
	* Zhai and Lafferty's paper.</li>
	* <li>{@linkplain org.apache.lucene.search.similarities.DFISimilarity Divergence from independence} models as described
	* in "IRRA at TREC 2012" (Dinçer).
	* <li>
	* </ul>
	*
	* Since {@link org.apache.lucene.search.similarities.SimilarityBase} is not
	* optimized to the same extent as
	* {@link org.apache.lucene.search.similarities.ClassicSimilarity} and
	* {@link org.apache.lucene.search.similarities.BM25Similarity}, a difference in
	* performance is to be expected when using the methods listed above. However,
	* optimizations can always be implemented in subclasses; see
	* <a href="#changingSimilarity">below</a>.
	*
	* <a name="changingSimilarity"></a>
	* <h2>Changing Similarity</h2>
	*
	* <p>Chances are the available Similarities are sufficient for all
	* your searching needs.
	* However, in some applications it may be necessary to customize your <a
	* href="Similarity.html">Similarity</a> implementation. For instance, some
	* applications do not need to distinguish between shorter and longer documents
	* and could set BM25's {@link org.apache.lucene.search.similarities.BM25Similarity#BM25Similarity(float,float) b}
	* parameter to {@code 0}.
	*
	* <p>To change {@link org.apache.lucene.search.similarities.Similarity}, one must do so for both indexing and
	* searching, and the changes must happen before
	* either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it
	* just isn't well-defined what is going to happen.
	*
	* <p>To make this change, implement your own {@link org.apache.lucene.search.similarities.Similarity} (likely
	* you'll want to simply subclass {@link org.apache.lucene.search.similarities.SimilarityBase}), and
	* then register the new class by calling
	* {@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity)}
	* before indexing and
	* {@link org.apache.lucene.search.IndexSearcher#setSimilarity(Similarity)}
	* before searching.
	*
	* <h3>Tuning {@linkplain org.apache.lucene.search.similarities.BM25Similarity}</h3>
	* <p>{@link org.apache.lucene.search.similarities.BM25Similarity} has
	* two parameters that may be tuned:
	* <ul>
	* <li><tt>k1</tt>, which calibrates term frequency saturation and must be
	* positive or null. A value of {@code 0} makes term frequency completely
	* ignored, making documents scored only based on the value of the <tt>IDF</tt>
	* of the matched terms. Higher values of <tt>k1</tt> increase the impact of
	* term frequency on the final score. Default value is {@code 1.2}.</li>
	* <li><tt>b</tt>, which controls how much document length should normalize
	* term frequency values and must be in {@code [0, 1]}. A value of {@code 0}
	* disables length normalization completely. Default value is {@code 0.75}.</li>
	* </ul>
	*
	* <h3>Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3>
	* <p>
	* The easiest way to quickly implement a new ranking method is to extend
	* {@link org.apache.lucene.search.similarities.SimilarityBase}, which provides
	* basic implementations for the low level . Subclasses are only required to
	* implement the {@link org.apache.lucene.search.similarities.SimilarityBase#score(BasicStats, double, double)}
	* and {@link org.apache.lucene.search.similarities.SimilarityBase#toString()}
	* methods.
	*
	* <p>Another option is to extend one of the <a href="#framework">frameworks</a>
	* based on {@link org.apache.lucene.search.similarities.SimilarityBase}. These
	* Similarities are implemented modularly, e.g.
	* {@link org.apache.lucene.search.similarities.DFRSimilarity} delegates
	* computation of the three parts of its formula to the classes
	* {@link org.apache.lucene.search.similarities.BasicModel},
	* {@link org.apache.lucene.search.similarities.AfterEffect} and
	* {@link org.apache.lucene.search.similarities.Normalization}. Instead of
	* subclassing the Similarity, one can simply introduce a new basic model and tell
	* {@link org.apache.lucene.search.similarities.DFRSimilarity} to use it.
	*
	*/
	package org.apache.lucene.search.similarities;