src/Lucene.Net/Search/Package.html - lucenenet - Git at Google

 <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
 <html>
 <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <meta name="Author" content="Doug Cutting">
    <meta content="Grant Ingersoll"  name="Author">
 </head>
 <body>
 Code to search indices.

 <h2>Table Of Contents</h2>
 <p>
     <ol>
         <li><a href = "#search">Search Basics</a></li>
         <li><a href = "#query">The Query Classes</a></li>
         <li><a href = "#scoring">Changing the Scoring</a></li>
     </ol>
 </p>
 <a name = "search"></a>
 <h2>Search</h2>
 <p>
 Search over indices.

 Applications usually call {@link
 org.apache.lucene.search.Searcher#search(Query)} or {@link
 org.apache.lucene.search.Searcher#search(Query,Filter)}.

     <!-- FILL IN MORE HERE -->
 </p>
 <a name = "query"></a>
 <h2>Query Classes</h2>
 <h4>
     <a href = "TermQuery.html">TermQuery</a>
 </h4>

 <p>Of the various implementations of
     <a href = "Query.html">Query</a>, the
     <a href = "TermQuery.html">TermQuery</a>
     is the easiest to understand and the most often used in applications. A <a
         href="TermQuery.html">TermQuery</a> matches all the documents that contain the
     specified
     <a href = "index/Term.html">Term</a>,
     which is a word that occurs in a certain
     <a href = "document/Field.html">Field</a>.
     Thus, a <a href = "TermQuery.html">TermQuery</a> identifies and scores all
     <a href = "document/Document.html">Document</a>s that have a <a
         href="../document/Field.html">Field</a> with the specified string in it.
     Constructing a <a
         href="TermQuery.html">TermQuery</a>
     is as simple as:
     <pre>
         TermQuery tq = new TermQuery(new Term("fieldName", "term"));
     </pre>In this example, the <a href = "Query.html">Query</a> identifies all <a
         href="../document/Document.html">Document</a>s that have the <a
         href="../document/Field.html">Field</a> named <tt>"fieldName"</tt>
     containing the word <tt>"term"</tt>.
 </p>
 <h4>
     <a href = "BooleanQuery.html">BooleanQuery</a>
 </h4>

 <p>Things start to get interesting when one combines multiple
     <a href = "TermQuery.html">TermQuery</a> instances into a <a
         href="BooleanQuery.html">BooleanQuery</a>.
     A <a href = "BooleanQuery.html">BooleanQuery</a> contains multiple
     <a href = "BooleanClause.html">BooleanClause</a>s,
     where each clause contains a sub-query (<a href = "Query.html">Query</a>
     instance) and an operator (from <a
         href="BooleanClause.Occur.html">BooleanClause.Occur</a>)
     describing how that sub-query is combined with the other clauses:
     <ol>

         <li><p>SHOULD &mdash; Use this operator when a clause can occur in the result set, but is not required.
             If a query is made up of all SHOULD clauses, then every document in the result
             set matches at least one of these clauses.</p></li>

         <li><p>MUST &mdash; Use this operator when a clause is required to occur in the result set. Every
             document in the result set will match
             all such clauses.</p></li>

         <li><p>MUST NOT &mdash; Use this operator when a
             clause must not occur in the result set. No
             document in the result set will match
             any such clauses.</p></li>
     </ol>
     Boolean queries are constructed by adding two or more
     <a href = "BooleanClause.html">BooleanClause</a>
     instances. If too many clauses are added, a <a href = "BooleanQuery.TooManyClauses.html">TooManyClauses</a>
     exception will be thrown during searching. This most often occurs
     when a <a href = "Query.html">Query</a>
     is rewritten into a <a href = "BooleanQuery.html">BooleanQuery</a> with many
     <a href = "TermQuery.html">TermQuery</a> clauses,
     for example by <a href = "WildcardQuery.html">WildcardQuery</a>.
     The default setting for the maximum number
     of clauses 1024, but this can be changed via the
     static method <a href = "BooleanQuery.html#setMaxClauseCount(int)">setMaxClauseCount</a>
     in <a href = "BooleanQuery.html">BooleanQuery</a>.
 </p>

 <h4>Phrases</h4>

 <p>Another common search is to find documents containing certain phrases. This
     is handled two different ways:
     <ol>
         <li>
             <p><a href = "PhraseQuery.html">PhraseQuery</a>
                 &mdash; Matches a sequence of
                 <a href = "index/Term.html">Terms</a>.
                 <a href = "PhraseQuery.html">PhraseQuery</a> uses a slop factor to determine
                 how many positions may occur between any two terms in the phrase and still be considered a match.</p>
         </li>
         <li>
             <p><a href = "spans/SpanNearQuery.html">SpanNearQuery</a>
                 &mdash; Matches a sequence of other
                 <a href = "spans/SpanQuery.html">SpanQuery</a>
                 instances. <a href = "spans/SpanNearQuery.html">SpanNearQuery</a> allows for
                 much more
                 complicated phrase queries since it is constructed from other <a
                     href="spans/SpanQuery.html">SpanQuery</a>
                 instances, instead of only <a href = "TermQuery.html">TermQuery</a>
                 instances.</p>
         </li>
     </ol>
 </p>
 <h4>
     <a href = "RangeQuery.html">RangeQuery</a>
 </h4>

 <p>The
     <a href = "RangeQuery.html">RangeQuery</a>
     matches all documents that occur in the
     exclusive range of a lower
     <a href = "index/Term.html">Term</a>
     and an upper
     <a href = "index/Term.html">Term</a>.
     For example, one could find all documents
     that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>. This type of <a
         href="Query.html">Query</a> is frequently used to
     find
     documents that occur in a specific date range.
 </p>
 <h4>
     <a href = "PrefixQuery.html">PrefixQuery</a>,
     <a href = "WildcardQuery.html">WildcardQuery</a>
 </h4>

 <p>While the
     <a href = "PrefixQuery.html">PrefixQuery</a>
     has a different implementation, it is essentially a special case of the
     <a href = "WildcardQuery.html">WildcardQuery</a>.
     The <a href = "PrefixQuery.html">PrefixQuery</a> allows an application
     to identify all documents with terms that begin with a certain string. The <a
         href="WildcardQuery.html">WildcardQuery</a> generalizes this by allowing
     for the use of <tt>*</tt> (matches 0 or more characters) and <tt>?</tt> (matches exactly one character) wildcards.
     Note that the <a href = "WildcardQuery.html">WildcardQuery</a> can be quite slow. Also
     note that
     <a href = "WildcardQuery.html">WildcardQuery</a> should
     not start with <tt>*</tt> and <tt>?</tt>, as these are extremely slow.
 	To remove this protection and allow a wildcard at the beginning of a term, see method
 	<a href = "queryParser/QueryParser.html#setAllowLeadingWildcard(boolean)">setAllowLeadingWildcard</a> in
 	<a href = "queryParser/QueryParser.html">QueryParser</a>.
 </p>
 <h4>
     <a href = "FuzzyQuery.html">FuzzyQuery</a>
 </h4>

 <p>A
     <a href = "FuzzyQuery.html">FuzzyQuery</a>
     matches documents that contain terms similar to the specified term. Similarity is
     determined using
     <a href = "http://en.wikipedia.org//wiki/Levenshtein">Levenshtein (edit) distance</a>.
     This type of query can be useful when accounting for spelling variations in the collection.
 </p>
 <a name = "changingSimilarity"></a>
 <h2>Changing Similarity</h2>

 <p>Chances are <a href = "DefaultSimilarity.html">DefaultSimilarity</a> is sufficient for all
     your searching needs.
     However, in some applications it may be necessary to customize your <a
         href="Similarity.html">Similarity</a> implementation. For instance, some
     applications do not need to
     distinguish between shorter and longer documents (see <a
         href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">a "fair" similarity</a>).</p>

 <p>To change <a href = "Similarity.html">Similarity</a>, one must do so for both indexing and
     searching, and the changes must happen before
     either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it
     just isn't well-defined what is going to happen.
 </p>

 <p>To make this change, implement your own <a href = "Similarity.html">Similarity</a> (likely
     you'll want to simply subclass
     <a href = "DefaultSimilarity.html">DefaultSimilarity</a>) and then use the new
     class by calling
     <a href = "index/IndexWriter.html#setSimilarity(org.apache.lucene.search.Similarity)">IndexWriter.setSimilarity</a>
     before indexing and
     <a href = "Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)">Searcher.setSimilarity</a>
     before searching.
 </p>

 <p>
     If you are interested in use cases for changing your similarity, see the Lucene users's mailing list at <a
         href="http://www.nabble.com/Overriding-Similarity-tf2128934.html">Overriding Similarity</a>.
     In summary, here are a few use cases:
     <ol>
         <li><p><a href = "api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> &mdash; <a
                 href="api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> gives small increases
             as the frequency increases a small amount
             and then greater increases when you hit the "sweet spot", i.e. where you think the frequency of terms is
             more significant.</p></li>
         <li><p>Overriding tf &mdash; In some applications, it doesn't matter what the score of a document is as long as a
             matching term occurs. In these
             cases people have overridden Similarity to return 1 from the tf() method.</p></li>
         <li><p>Changing Length Normalization &mdash; By overriding <a
                 href="Similarity.html#lengthNorm(java.lang.String,%20int)">lengthNorm</a>,
             it is possible to discount how the length of a field contributes
             to a score. In <a href = "DefaultSimilarity.html">DefaultSimilarity</a>,
             lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
             1 / (numTerms in field), all fields will be treated
             <a href = "http://www.gossamer-threads.com//lists/lucene/java-user/38967#38967">"fairly"</a>.</p></li>
     </ol>
     In general, Chris Hostetter sums it up best in saying (from <a
         href="http://www.gossamer-threads.com/lists/lucene/java-user/39125#39125">the Lucene users's mailing list</a>):
     <blockquote>[One would override the Similarity in] ... any situation where you know more about your data then just
         that
         it's "text" is a situation where it *might* make sense to to override your
         Similarity method.</blockquote>
 </p>
 <a name = "scoring"></a>
 <h2>Changing Scoring &mdash; Expert Level</h2>

 <p>Changing scoring is an expert level task, so tread carefully and be prepared to share your code if
     you want help.
 </p>

 <p>With the warning out of the way, it is possible to change a lot more than just the Similarity
     when it comes to scoring in Lucene. Lucene's scoring is a complex mechanism that is grounded by
     <span >three main classes</span>:
     <ol>
         <li>
             <a href = "Query.html">Query</a> &mdash; The abstract object representation of the
             user's information need.</li>
         <li>
             <a href = "Weight.html">Weight</a> &mdash; The internal interface representation of
             the user's Query, so that Query objects may be reused.</li>
         <li>
             <a href = "Scorer.html">Scorer</a> &mdash; An abstract class containing common
             functionality for scoring. Provides both scoring and explanation capabilities.</li>
     </ol>
     Details on each of these classes, and their children, can be found in the subsections below.
 </p>
 <h4>The Query Class</h4>
     <p>In some sense, the
         <a href = "Query.html">Query</a>
         class is where it all begins. Without a Query, there would be
         nothing to score. Furthermore, the Query class is the catalyst for the other scoring classes as it
         is often responsible
         for creating them or coordinating the functionality between them. The
         <a href = "Query.html">Query</a> class has several methods that are important for
         derived classes:
         <ol>
             <li>createWeight(Searcher searcher) &mdash; A
                 <a href = "Weight.html">Weight</a> is the internal representation of the
                 Query, so each Query implementation must
                 provide an implementation of Weight. See the subsection on <a
                     href="#The Weight Interface">The Weight Interface</a> below for details on implementing the Weight
                 interface.</li>
             <li>rewrite(IndexReader reader) &mdash; Rewrites queries into primitive queries. Primitive queries are:
                 <a href = "TermQuery.html">TermQuery</a>,
                 <a href = "BooleanQuery.html">BooleanQuery</a>, <span
                     >OTHERS????</span></li>
         </ol>
     </p>
 <h4>The Weight Interface</h4>
     <p>The
         <a href = "Weight.html">Weight</a>
         interface provides an internal representation of the Query so that it can be reused. Any
         <a href = "Searcher.html">Searcher</a>
         dependent state should be stored in the Weight implementation,
         not in the Query class. The interface defines six methods that must be implemented:
         <ol>
             <li>
                 <a href = "Weight.html#getQuery()">Weight#getQuery()</a> &mdash; Pointer to the
                 Query that this Weight represents.</li>
             <li>
                 <a href = "Weight.html#getValue()">Weight#getValue()</a> &mdash; The weight for
                 this Query. For example, the TermQuery.TermWeight value is
                 equal to the idf^2 * boost * queryNorm <!-- DOUBLE CHECK THIS --></li>
             <li>
                 <a href = "Weight.html#sumOfSquaredWeights()">
                     Weight#sumOfSquaredWeights()</a> &mdash; The sum of squared weights. For TermQuery, this is (idf *
                 boost)^2</li>
             <li>
                 <a href = "Weight.html#normalize(float)">
                     Weight#normalize(float)</a> &mdash; Determine the query normalization factor. The query normalization may
                 allow for comparing scores between queries.</li>
             <li>
                 <a href = "Weight.html#scorer(IndexReader)">
                     Weight#scorer(IndexReader)</a> &mdash; Construct a new
                 <a href = "Scorer.html">Scorer</a>
                 for this Weight. See
                 <a href = "#The Scorer Class">The Scorer Class</a>
                 below for help defining a Scorer. As the name implies, the
                 Scorer is responsible for doing the actual scoring of documents given the Query.
             </li>
             <li>
                 <a href = "Weight.html#explain(IndexReader, int)">
                     Weight#explain(IndexReader, int)</a> &mdash; Provide a means for explaining why a given document was
                 scored
                 the way it was.</li>
         </ol>
     </p>
 <h4>The Scorer Class</h4>
     <p>The
         <a href = "Scorer.html">Scorer</a>
         abstract class provides common scoring functionality for all Scorer implementations and
         is the heart of the Lucene scoring process. The Scorer defines the following abstract methods which
         must be implemented:
         <ol>
             <li>
                 <a href = "Scorer.html#next()">Scorer#next()</a> &mdash; Advances to the next
                 document that matches this Query, returning true if and only
                 if there is another document that matches.</li>
             <li>
                 <a href = "Scorer.html#doc()">Scorer#doc()</a> &mdash; Returns the id of the
                 <a href = "document/Document.html">Document</a>
                 that contains the match. It is not valid until next() has been called at least once.
             </li>
             <li>
                 <a href = "Scorer.html#score()">Scorer#score()</a> &mdash; Return the score of the
                 current document. This value can be determined in any
                 appropriate way for an application. For instance, the
                 <a href = "http://svn.apache.org//viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/TermScorer.java?view=log">TermScorer</a>
                 returns the tf * Weight.getValue() * fieldNorm.
             </li>
             <li>
                 <a href = "Scorer.html#skipTo(int)">Scorer#skipTo(int)</a> &mdash; Skip ahead in
                 the document matches to the document whose id is greater than
                 or equal to the passed in value. In many instances, skipTo can be
                 implemented more efficiently than simply looping through all the matching documents until
                 the target document is identified.</li>
             <li>
                 <a href = "Scorer.html#explain(int)">Scorer#explain(int)</a> &mdash; Provides
                 details on why the score came about.</li>
         </ol>
     </p>
 <h4>Why would I want to add my own Query?</h4>

     <p>In a nutshell, you want to add your own custom Query implementation when you think that Lucene's
         aren't appropriate for the
         task that you want to do. You might be doing some cutting edge research or you need more information
         back
         out of Lucene (similar to Doug adding SpanQuery functionality).</p>
 <h4>Examples</h4>
     <p >FILL IN HERE</p>

 </body>
 </html>
	<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
	<meta name="Author" content="Doug Cutting">
	<meta content="Grant Ingersoll" name="Author">
	</head>
	<body>
	Code to search indices.

	<h2>Table Of Contents</h2>
	<p>
	<ol>
	<li><a href = "#search">Search Basics</a></li>
	<li><a href = "#query">The Query Classes</a></li>
	<li><a href = "#scoring">Changing the Scoring</a></li>
	</ol>
	</p>
	<a name = "search"></a>
	<h2>Search</h2>
	<p>
	Search over indices.

	Applications usually call {@link
	org.apache.lucene.search.Searcher#search(Query)} or {@link
	org.apache.lucene.search.Searcher#search(Query,Filter)}.

	<!-- FILL IN MORE HERE -->
	</p>
	<a name = "query"></a>
	<h2>Query Classes</h2>
	<h4>
	<a href = "TermQuery.html">TermQuery</a>
	</h4>

	<p>Of the various implementations of
	<a href = "Query.html">Query</a>, the
	<a href = "TermQuery.html">TermQuery</a>
	is the easiest to understand and the most often used in applications. A <a
	href="TermQuery.html">TermQuery</a> matches all the documents that contain the
	specified
	<a href = "index/Term.html">Term</a>,
	which is a word that occurs in a certain
	<a href = "document/Field.html">Field</a>.
	Thus, a <a href = "TermQuery.html">TermQuery</a> identifies and scores all
	<a href = "document/Document.html">Document</a>s that have a <a
	href="../document/Field.html">Field</a> with the specified string in it.
	Constructing a <a
	href="TermQuery.html">TermQuery</a>
	is as simple as:
	<pre>
	TermQuery tq = new TermQuery(new Term("fieldName", "term"));
	</pre>In this example, the <a href = "Query.html">Query</a> identifies all <a
	href="../document/Document.html">Document</a>s that have the <a
	href="../document/Field.html">Field</a> named <tt>"fieldName"</tt>
	containing the word <tt>"term"</tt>.
	</p>
	<h4>
	<a href = "BooleanQuery.html">BooleanQuery</a>
	</h4>

	<p>Things start to get interesting when one combines multiple
	<a href = "TermQuery.html">TermQuery</a> instances into a <a
	href="BooleanQuery.html">BooleanQuery</a>.
	A <a href = "BooleanQuery.html">BooleanQuery</a> contains multiple
	<a href = "BooleanClause.html">BooleanClause</a>s,
	where each clause contains a sub-query (<a href = "Query.html">Query</a>
	instance) and an operator (from <a
	href="BooleanClause.Occur.html">BooleanClause.Occur</a>)
	describing how that sub-query is combined with the other clauses:
	<ol>

	<li><p>SHOULD — Use this operator when a clause can occur in the result set, but is not required.
	If a query is made up of all SHOULD clauses, then every document in the result
	set matches at least one of these clauses.</p></li>

	<li><p>MUST — Use this operator when a clause is required to occur in the result set. Every
	document in the result set will match
	all such clauses.</p></li>

	<li><p>MUST NOT — Use this operator when a
	clause must not occur in the result set. No
	document in the result set will match
	any such clauses.</p></li>
	</ol>
	Boolean queries are constructed by adding two or more
	<a href = "BooleanClause.html">BooleanClause</a>
	instances. If too many clauses are added, a <a href = "BooleanQuery.TooManyClauses.html">TooManyClauses</a>
	exception will be thrown during searching. This most often occurs
	when a <a href = "Query.html">Query</a>
	is rewritten into a <a href = "BooleanQuery.html">BooleanQuery</a> with many
	<a href = "TermQuery.html">TermQuery</a> clauses,
	for example by <a href = "WildcardQuery.html">WildcardQuery</a>.
	The default setting for the maximum number
	of clauses 1024, but this can be changed via the
	static method <a href = "BooleanQuery.html#setMaxClauseCount(int)">setMaxClauseCount</a>
	in <a href = "BooleanQuery.html">BooleanQuery</a>.
	</p>

	<h4>Phrases</h4>

	<p>Another common search is to find documents containing certain phrases. This
	is handled two different ways:
	<ol>
	<li>
	<p><a href = "PhraseQuery.html">PhraseQuery</a>
	— Matches a sequence of
	<a href = "index/Term.html">Terms</a>.
	<a href = "PhraseQuery.html">PhraseQuery</a> uses a slop factor to determine
	how many positions may occur between any two terms in the phrase and still be considered a match.</p>
	</li>
	<li>
	<p><a href = "spans/SpanNearQuery.html">SpanNearQuery</a>
	— Matches a sequence of other
	<a href = "spans/SpanQuery.html">SpanQuery</a>
	instances. <a href = "spans/SpanNearQuery.html">SpanNearQuery</a> allows for
	much more
	complicated phrase queries since it is constructed from other <a
	href="spans/SpanQuery.html">SpanQuery</a>
	instances, instead of only <a href = "TermQuery.html">TermQuery</a>
	instances.</p>
	</li>
	</ol>
	</p>
	<h4>
	<a href = "RangeQuery.html">RangeQuery</a>
	</h4>

	<p>The
	<a href = "RangeQuery.html">RangeQuery</a>
	matches all documents that occur in the
	exclusive range of a lower
	<a href = "index/Term.html">Term</a>
	and an upper
	<a href = "index/Term.html">Term</a>.
	For example, one could find all documents
	that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>. This type of <a
	href="Query.html">Query</a> is frequently used to
	find
	documents that occur in a specific date range.
	</p>
	<h4>
	<a href = "PrefixQuery.html">PrefixQuery</a>,
	<a href = "WildcardQuery.html">WildcardQuery</a>
	</h4>

	<p>While the
	<a href = "PrefixQuery.html">PrefixQuery</a>
	has a different implementation, it is essentially a special case of the
	<a href = "WildcardQuery.html">WildcardQuery</a>.
	The <a href = "PrefixQuery.html">PrefixQuery</a> allows an application
	to identify all documents with terms that begin with a certain string. The <a
	href="WildcardQuery.html">WildcardQuery</a> generalizes this by allowing
	for the use of <tt>*</tt> (matches 0 or more characters) and <tt>?</tt> (matches exactly one character) wildcards.
	Note that the <a href = "WildcardQuery.html">WildcardQuery</a> can be quite slow. Also
	note that
	<a href = "WildcardQuery.html">WildcardQuery</a> should
	not start with <tt>*</tt> and <tt>?</tt>, as these are extremely slow.
	To remove this protection and allow a wildcard at the beginning of a term, see method
	<a href = "queryParser/QueryParser.html#setAllowLeadingWildcard(boolean)">setAllowLeadingWildcard</a> in
	<a href = "queryParser/QueryParser.html">QueryParser</a>.
	</p>
	<h4>
	<a href = "FuzzyQuery.html">FuzzyQuery</a>
	</h4>

	<p>A
	<a href = "FuzzyQuery.html">FuzzyQuery</a>
	matches documents that contain terms similar to the specified term. Similarity is
	determined using
	<a href = "http://en.wikipedia.org//wiki/Levenshtein">Levenshtein (edit) distance</a>.
	This type of query can be useful when accounting for spelling variations in the collection.
	</p>
	<a name = "changingSimilarity"></a>
	<h2>Changing Similarity</h2>

	<p>Chances are <a href = "DefaultSimilarity.html">DefaultSimilarity</a> is sufficient for all
	your searching needs.
	However, in some applications it may be necessary to customize your <a
	href="Similarity.html">Similarity</a> implementation. For instance, some
	applications do not need to
	distinguish between shorter and longer documents (see <a
	href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">a "fair" similarity</a>).</p>

	<p>To change <a href = "Similarity.html">Similarity</a>, one must do so for both indexing and
	searching, and the changes must happen before
	either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it
	just isn't well-defined what is going to happen.
	</p>

	<p>To make this change, implement your own <a href = "Similarity.html">Similarity</a> (likely
	you'll want to simply subclass
	<a href = "DefaultSimilarity.html">DefaultSimilarity</a>) and then use the new
	class by calling
	<a href = "index/IndexWriter.html#setSimilarity(org.apache.lucene.search.Similarity)">IndexWriter.setSimilarity</a>
	before indexing and
	<a href = "Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)">Searcher.setSimilarity</a>
	before searching.
	</p>

	<p>
	If you are interested in use cases for changing your similarity, see the Lucene users's mailing list at <a
	href="http://www.nabble.com/Overriding-Similarity-tf2128934.html">Overriding Similarity</a>.
	In summary, here are a few use cases:
	<ol>
	<li><p><a href = "api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> — <a
	href="api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> gives small increases
	as the frequency increases a small amount
	and then greater increases when you hit the "sweet spot", i.e. where you think the frequency of terms is
	more significant.</p></li>
	<li><p>Overriding tf — In some applications, it doesn't matter what the score of a document is as long as a
	matching term occurs. In these
	cases people have overridden Similarity to return 1 from the tf() method.</p></li>
	<li><p>Changing Length Normalization — By overriding <a
	href="Similarity.html#lengthNorm(java.lang.String,%20int)">lengthNorm</a>,
	it is possible to discount how the length of a field contributes
	to a score. In <a href = "DefaultSimilarity.html">DefaultSimilarity</a>,
	lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
	1 / (numTerms in field), all fields will be treated
	<a href = "http://www.gossamer-threads.com//lists/lucene/java-user/38967#38967">"fairly"</a>.</p></li>
	</ol>
	In general, Chris Hostetter sums it up best in saying (from <a
	href="http://www.gossamer-threads.com/lists/lucene/java-user/39125#39125">the Lucene users's mailing list</a>):
	<blockquote>[One would override the Similarity in] ... any situation where you know more about your data then just
	that
	it's "text" is a situation where it might make sense to to override your
	Similarity method.</blockquote>
	</p>
	<a name = "scoring"></a>
	<h2>Changing Scoring — Expert Level</h2>

	<p>Changing scoring is an expert level task, so tread carefully and be prepared to share your code if
	you want help.
	</p>

	<p>With the warning out of the way, it is possible to change a lot more than just the Similarity
	when it comes to scoring in Lucene. Lucene's scoring is a complex mechanism that is grounded by
	<span >three main classes</span>:
	<ol>
	<li>
	<a href = "Query.html">Query</a> — The abstract object representation of the
	user's information need.</li>
	<li>
	<a href = "Weight.html">Weight</a> — The internal interface representation of
	the user's Query, so that Query objects may be reused.</li>
	<li>
	<a href = "Scorer.html">Scorer</a> — An abstract class containing common
	functionality for scoring. Provides both scoring and explanation capabilities.</li>
	</ol>
	Details on each of these classes, and their children, can be found in the subsections below.
	</p>
	<h4>The Query Class</h4>
	<p>In some sense, the
	<a href = "Query.html">Query</a>
	class is where it all begins. Without a Query, there would be
	nothing to score. Furthermore, the Query class is the catalyst for the other scoring classes as it
	is often responsible
	for creating them or coordinating the functionality between them. The
	<a href = "Query.html">Query</a> class has several methods that are important for
	derived classes:
	<ol>
	<li>createWeight(Searcher searcher) — A
	<a href = "Weight.html">Weight</a> is the internal representation of the
	Query, so each Query implementation must
	provide an implementation of Weight. See the subsection on <a
	href="#The Weight Interface">The Weight Interface</a> below for details on implementing the Weight
	interface.</li>
	<li>rewrite(IndexReader reader) — Rewrites queries into primitive queries. Primitive queries are:
	<a href = "TermQuery.html">TermQuery</a>,
	<a href = "BooleanQuery.html">BooleanQuery</a>, <span
	>OTHERS????</span></li>
	</ol>
	</p>
	<h4>The Weight Interface</h4>
	<p>The
	<a href = "Weight.html">Weight</a>
	interface provides an internal representation of the Query so that it can be reused. Any
	<a href = "Searcher.html">Searcher</a>
	dependent state should be stored in the Weight implementation,
	not in the Query class. The interface defines six methods that must be implemented:
	<ol>
	<li>
	<a href = "Weight.html#getQuery()">Weight#getQuery()</a> — Pointer to the
	Query that this Weight represents.</li>
	<li>
	<a href = "Weight.html#getValue()">Weight#getValue()</a> — The weight for
	this Query. For example, the TermQuery.TermWeight value is
	equal to the idf^2 * boost * queryNorm <!-- DOUBLE CHECK THIS --></li>
	<li>
	<a href = "Weight.html#sumOfSquaredWeights()">
	Weight#sumOfSquaredWeights()</a> — The sum of squared weights. For TermQuery, this is (idf *
	boost)^2</li>
	<li>
	<a href = "Weight.html#normalize(float)">
	Weight#normalize(float)</a> — Determine the query normalization factor. The query normalization may
	allow for comparing scores between queries.</li>
	<li>
	<a href = "Weight.html#scorer(IndexReader)">
	Weight#scorer(IndexReader)</a> — Construct a new
	<a href = "Scorer.html">Scorer</a>
	for this Weight. See
	<a href = "#The Scorer Class">The Scorer Class</a>
	below for help defining a Scorer. As the name implies, the
	Scorer is responsible for doing the actual scoring of documents given the Query.
	</li>
	<li>
	<a href = "Weight.html#explain(IndexReader, int)">
	Weight#explain(IndexReader, int)</a> — Provide a means for explaining why a given document was
	scored
	the way it was.</li>
	</ol>
	</p>
	<h4>The Scorer Class</h4>
	<p>The
	<a href = "Scorer.html">Scorer</a>
	abstract class provides common scoring functionality for all Scorer implementations and
	is the heart of the Lucene scoring process. The Scorer defines the following abstract methods which
	must be implemented:
	<ol>
	<li>
	<a href = "Scorer.html#next()">Scorer#next()</a> — Advances to the next
	document that matches this Query, returning true if and only
	if there is another document that matches.</li>
	<li>
	<a href = "Scorer.html#doc()">Scorer#doc()</a> — Returns the id of the
	<a href = "document/Document.html">Document</a>
	that contains the match. It is not valid until next() has been called at least once.
	</li>
	<li>
	<a href = "Scorer.html#score()">Scorer#score()</a> — Return the score of the
	current document. This value can be determined in any
	appropriate way for an application. For instance, the
	<a href = "http://svn.apache.org//viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/TermScorer.java?view=log">TermScorer</a>
	returns the tf * Weight.getValue() * fieldNorm.
	</li>
	<li>
	<a href = "Scorer.html#skipTo(int)">Scorer#skipTo(int)</a> — Skip ahead in
	the document matches to the document whose id is greater than
	or equal to the passed in value. In many instances, skipTo can be
	implemented more efficiently than simply looping through all the matching documents until
	the target document is identified.</li>
	<li>
	<a href = "Scorer.html#explain(int)">Scorer#explain(int)</a> — Provides
	details on why the score came about.</li>
	</ol>
	</p>
	<h4>Why would I want to add my own Query?</h4>

	<p>In a nutshell, you want to add your own custom Query implementation when you think that Lucene's
	aren't appropriate for the
	task that you want to do. You might be doing some cutting edge research or you need more information
	back
	out of Lucene (similar to Doug adding SpanQuery functionality).</p>
	<h4>Examples</h4>
	<p >FILL IN HERE</p>

	</body>
	</html>