lucene/core/src/java/org/apache/lucene/search/similarities/package-info.java - lucene-solr - Git at Google

 /*
  * Licensed to the Apache Software Foundation (ASF) under one or more
  * contributor license agreements.  See the NOTICE file distributed with
  * this work for additional information regarding copyright ownership.
  * The ASF licenses this file to You under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */

 /**
  * This package contains the various ranking models that can be used in Lucene. The abstract class
  * {@link org.apache.lucene.search.similarities.Similarity} serves as the base for ranking
  * functions. For searching, users can employ the models already implemented or create their own by
  * extending one of the classes in this package.
  *
  * <h2>Table Of Contents</h2>
  *
  * <ol>
  *   <li><a href="#sims">Summary of the Ranking Methods</a>
  *   <li><a href="#changingSimilarity">Changing the Similarity</a>
  * </ol>
  *
  * <a id="sims"></a>
  *
  * <h2>Summary of the Ranking Methods</h2>
  *
  * <p>{@link org.apache.lucene.search.similarities.BM25Similarity} is an optimized implementation of
  * the successful Okapi BM25 model.
  *
  * <p>{@link org.apache.lucene.search.similarities.ClassicSimilarity} is the original Lucene scoring
  * function. It is based on the <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector
  * Space Model</a>. For more information, see {@link
  * org.apache.lucene.search.similarities.TFIDFSimilarity}.
  *
  * <p>{@link org.apache.lucene.search.similarities.SimilarityBase} provides a basic implementation
  * of the Similarity contract and exposes a highly simplified interface, which makes it an ideal
  * starting point for new ranking functions. Lucene ships the following methods built on {@link
  * org.apache.lucene.search.similarities.SimilarityBase}:
  *
  * <p><a id="framework"></a>
  *
  * <ul>
  *   <li>Amati and Rijsbergen's {@linkplain org.apache.lucene.search.similarities.DFRSimilarity DFR}
  *       framework;
  *   <li>Clinchant and Gaussier's {@linkplain org.apache.lucene.search.similarities.IBSimilarity
  *       Information-based models} for IR;
  *   <li>The implementation of two {@linkplain org.apache.lucene.search.similarities.LMSimilarity
  *       language models} from Zhai and Lafferty's paper.
  *   <li>{@linkplain org.apache.lucene.search.similarities.DFISimilarity Divergence from
  *       independence} models as described in "IRRA at TREC 2012" (Dinçer).
  *   <li>
  * </ul>
  *
  * Since {@link org.apache.lucene.search.similarities.SimilarityBase} is not optimized to the same
  * extent as {@link org.apache.lucene.search.similarities.ClassicSimilarity} and {@link
  * org.apache.lucene.search.similarities.BM25Similarity}, a difference in performance is to be
  * expected when using the methods listed above. However, optimizations can always be implemented in
  * subclasses; see <a href="#changingSimilarity">below</a>.
  *
  * <p><a id="changingSimilarity"></a>
  *
  * <h2>Changing Similarity</h2>
  *
  * <p>Chances are the available Similarities are sufficient for all your searching needs. However,
  * in some applications it may be necessary to customize your <a
  * href="Similarity.html">Similarity</a> implementation. For instance, some applications do not need
  * to distinguish between shorter and longer documents and could set BM25's {@link
  * org.apache.lucene.search.similarities.BM25Similarity#BM25Similarity(float,float) b} parameter to
  * {@code 0}.
  *
  * <p>To change {@link org.apache.lucene.search.similarities.Similarity}, one must do so for both
  * indexing and searching, and the changes must happen before either of these actions take place.
  * Although in theory there is nothing stopping you from changing mid-stream, it just isn't
  * well-defined what is going to happen.
  *
  * <p>To make this change, implement your own {@link
  * org.apache.lucene.search.similarities.Similarity} (likely you'll want to simply subclass {@link
  * org.apache.lucene.search.similarities.SimilarityBase}), and then register the new class by
  * calling {@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity)} before
  * indexing and {@link org.apache.lucene.search.IndexSearcher#setSimilarity(Similarity)} before
  * searching.
  *
  * <h3>Tuning {@linkplain org.apache.lucene.search.similarities.BM25Similarity}</h3>
  *
  * <p>{@link org.apache.lucene.search.similarities.BM25Similarity} has two parameters that may be
  * tuned:
  *
  * <ul>
  *   <li><code>k1</code>, which calibrates term frequency saturation and must be positive or null. A
  *       value of {@code 0} makes term frequency completely ignored, making documents scored only
  *       based on the value of the <code>IDF</code> of the matched terms. Higher values of <code>k1
  *       </code> increase the impact of term frequency on the final score. Default value is {@code
  *       1.2}.
  *   <li><code>b</code>, which controls how much document length should normalize term frequency
  *       values and must be in {@code [0, 1]}. A value of {@code 0} disables length normalization
  *       completely. Default value is {@code 0.75}.
  * </ul>
  *
  * <h3>Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3>
  *
  * <p>The easiest way to quickly implement a new ranking method is to extend {@link
  * org.apache.lucene.search.similarities.SimilarityBase}, which provides basic implementations for
  * the low level . Subclasses are only required to implement the {@link
  * org.apache.lucene.search.similarities.SimilarityBase#score(BasicStats, double, double)} and
  * {@link org.apache.lucene.search.similarities.SimilarityBase#toString()} methods.
  *
  * <p>Another option is to extend one of the <a href="#framework">frameworks</a> based on {@link
  * org.apache.lucene.search.similarities.SimilarityBase}. These Similarities are implemented
  * modularly, e.g. {@link org.apache.lucene.search.similarities.DFRSimilarity} delegates computation
  * of the three parts of its formula to the classes {@link
  * org.apache.lucene.search.similarities.BasicModel}, {@link
  * org.apache.lucene.search.similarities.AfterEffect} and {@link
  * org.apache.lucene.search.similarities.Normalization}. Instead of subclassing the Similarity, one
  * can simply introduce a new basic model and tell {@link
  * org.apache.lucene.search.similarities.DFRSimilarity} to use it.
  */
 package org.apache.lucene.search.similarities;
	/*
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	*/

	/**
	* This package contains the various ranking models that can be used in Lucene. The abstract class
	* {@link org.apache.lucene.search.similarities.Similarity} serves as the base for ranking
	* functions. For searching, users can employ the models already implemented or create their own by
	* extending one of the classes in this package.
	*
	* <h2>Table Of Contents</h2>
	*
	* <ol>
	* <li><a href="#sims">Summary of the Ranking Methods</a>
	* <li><a href="#changingSimilarity">Changing the Similarity</a>
	* </ol>
	*
	* <a id="sims"></a>
	*
	* <h2>Summary of the Ranking Methods</h2>
	*
	* <p>{@link org.apache.lucene.search.similarities.BM25Similarity} is an optimized implementation of
	* the successful Okapi BM25 model.
	*
	* <p>{@link org.apache.lucene.search.similarities.ClassicSimilarity} is the original Lucene scoring
	* function. It is based on the <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector
	* Space Model</a>. For more information, see {@link
	* org.apache.lucene.search.similarities.TFIDFSimilarity}.
	*
	* <p>{@link org.apache.lucene.search.similarities.SimilarityBase} provides a basic implementation
	* of the Similarity contract and exposes a highly simplified interface, which makes it an ideal
	* starting point for new ranking functions. Lucene ships the following methods built on {@link
	* org.apache.lucene.search.similarities.SimilarityBase}:
	*
	* <p><a id="framework"></a>
	*
	* <ul>
	* <li>Amati and Rijsbergen's {@linkplain org.apache.lucene.search.similarities.DFRSimilarity DFR}
	* framework;
	* <li>Clinchant and Gaussier's {@linkplain org.apache.lucene.search.similarities.IBSimilarity
	* Information-based models} for IR;
	* <li>The implementation of two {@linkplain org.apache.lucene.search.similarities.LMSimilarity
	* language models} from Zhai and Lafferty's paper.
	* <li>{@linkplain org.apache.lucene.search.similarities.DFISimilarity Divergence from
	* independence} models as described in "IRRA at TREC 2012" (Dinçer).
	* <li>
	* </ul>
	*
	* Since {@link org.apache.lucene.search.similarities.SimilarityBase} is not optimized to the same
	* extent as {@link org.apache.lucene.search.similarities.ClassicSimilarity} and {@link
	* org.apache.lucene.search.similarities.BM25Similarity}, a difference in performance is to be
	* expected when using the methods listed above. However, optimizations can always be implemented in
	* subclasses; see <a href="#changingSimilarity">below</a>.
	*
	* <p><a id="changingSimilarity"></a>
	*
	* <h2>Changing Similarity</h2>
	*
	* <p>Chances are the available Similarities are sufficient for all your searching needs. However,
	* in some applications it may be necessary to customize your <a
	* href="Similarity.html">Similarity</a> implementation. For instance, some applications do not need
	* to distinguish between shorter and longer documents and could set BM25's {@link
	* org.apache.lucene.search.similarities.BM25Similarity#BM25Similarity(float,float) b} parameter to
	* {@code 0}.
	*
	* <p>To change {@link org.apache.lucene.search.similarities.Similarity}, one must do so for both
	* indexing and searching, and the changes must happen before either of these actions take place.
	* Although in theory there is nothing stopping you from changing mid-stream, it just isn't
	* well-defined what is going to happen.
	*
	* <p>To make this change, implement your own {@link
	* org.apache.lucene.search.similarities.Similarity} (likely you'll want to simply subclass {@link
	* org.apache.lucene.search.similarities.SimilarityBase}), and then register the new class by
	* calling {@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity)} before
	* indexing and {@link org.apache.lucene.search.IndexSearcher#setSimilarity(Similarity)} before
	* searching.
	*
	* <h3>Tuning {@linkplain org.apache.lucene.search.similarities.BM25Similarity}</h3>
	*
	* <p>{@link org.apache.lucene.search.similarities.BM25Similarity} has two parameters that may be
	* tuned:
	*
	* <ul>
	* <li><code>k1</code>, which calibrates term frequency saturation and must be positive or null. A
	* value of {@code 0} makes term frequency completely ignored, making documents scored only
	* based on the value of the <code>IDF</code> of the matched terms. Higher values of <code>k1
	* </code> increase the impact of term frequency on the final score. Default value is {@code
	* 1.2}.
	* <li><code>b</code>, which controls how much document length should normalize term frequency
	* values and must be in {@code [0, 1]}. A value of {@code 0} disables length normalization
	* completely. Default value is {@code 0.75}.
	* </ul>
	*
	* <h3>Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3>
	*
	* <p>The easiest way to quickly implement a new ranking method is to extend {@link
	* org.apache.lucene.search.similarities.SimilarityBase}, which provides basic implementations for
	* the low level . Subclasses are only required to implement the {@link
	* org.apache.lucene.search.similarities.SimilarityBase#score(BasicStats, double, double)} and
	* {@link org.apache.lucene.search.similarities.SimilarityBase#toString()} methods.
	*
	* <p>Another option is to extend one of the <a href="#framework">frameworks</a> based on {@link
	* org.apache.lucene.search.similarities.SimilarityBase}. These Similarities are implemented
	* modularly, e.g. {@link org.apache.lucene.search.similarities.DFRSimilarity} delegates computation
	* of the three parts of its formula to the classes {@link
	* org.apache.lucene.search.similarities.BasicModel}, {@link
	* org.apache.lucene.search.similarities.AfterEffect} and {@link
	* org.apache.lucene.search.similarities.Normalization}. Instead of subclassing the Similarity, one
	* can simply introduce a new basic model and tell {@link
	* org.apache.lucene.search.similarities.DFRSimilarity} to use it.
	*/
	package org.apache.lucene.search.similarities;