src/contrib/Highlighter/Package.html - lucenenet - Git at Google

 <!--

  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.

 -->

 <html>
     <body>
         The highlight package contains classes to provide "keyword in context" features
         typically used to highlight search terms in the text of results pages. The
         Highlighter class is the central component and can be used to extract the most
         interesting sections of a piece of text and highlight them, with the help of
         Fragmenter, FragmentScorer, Formatter classes.
         <h2>Example Usage</h2>
         <pre>
 	IndexSearcher searcher = new IndexSearcher(ramDir);
 	Query query = QueryParser.Parse("Kenne*", FIELD_NAME, analyzer);
 	query = query.Rewrite(reader); //required to expand search terms
 	Hits hits = searcher.Search(query);

 	Highlighter highlighter = new Highlighter(this, new QueryScorer(query));
 	for (int i = 0; i &lt; hits.Length(); i++)
 	{
 		String text = hits.Doc(i).Get(FIELD_NAME);
 		TokenStream tokenStream = analyzer.TokenStream(FIELD_NAME, new StringReader(text));
 		// Get 3 best fragments and seperate with a "..."
 		String result = highlighter.GetBestFragments(tokenStream, text, 3, "...");
 		System.Out.Console.WriteLine(result);
 	}
 </pre>
         <h2>New features 06/02/2005</h2>
         This release adds options for encoding (thanks to Nicko Cadell). An "IEncoder"
         implementation such as the new SimpleHTMLIEncoder class can be passed to the
         highlighter to encode all those non-xhtml standard characters such as &amp;
         into legal values. This simple class may not suffice for some languages -
         Commons Lang has an implementation that could be used: escapeHtml(String) in
         http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&view=markup
         <h2>New features 22/12/2004</h2>
         This release adds some new capabilities:
         <ol>
             <li>
                 Faster highlighting using Term vector support</li>
             <li>
                 New formatting options to use color intensity to show informational value</li>
             <li>
                 Options for better summarization by using term IDF scores to influence fragment
                 selection</li>
         </ol>
         <p>
             The highlighter takes a TokenStream as input. Until now these streams have
             typically been produced using an Analyzer but the new class TokenSources
             provides helper methods for obtaining TokenStreams from the new TermVector
             position support (see latest CVS version).</p>
         <p>The new class GradientFormatter can use a scale of colors to highlight terms
             according to their score. A subtle use of color can help emphasise the reasons
             for matching (useful when doing "MoreLikeThis" queries and you want to see what
             the basis of the similarities are).</p>
         <p>The QueryScorer class has a new constructor which can use an IndexReader to
             derive the IDF (inverse document frequency) for each term in order to
             influcence the score. This is useful for helping to extracting the most
             significant sections of a document and in supplying scores used by the new
             GradientFormatter to color significant words more strongly. The
             QueryScorer.getMaxWeight method is useful when passed to the GradientFormatter
             constructor to define the top score which is associated with the top color.</p>
     </body>
 </html>
	<!--

	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	-->

	<html>
	<body>
	The highlight package contains classes to provide "keyword in context" features
	typically used to highlight search terms in the text of results pages. The
	Highlighter class is the central component and can be used to extract the most
	interesting sections of a piece of text and highlight them, with the help of
	Fragmenter, FragmentScorer, Formatter classes.
	<h2>Example Usage</h2>
	<pre>
	IndexSearcher searcher = new IndexSearcher(ramDir);
	Query query = QueryParser.Parse("Kenne*", FIELD_NAME, analyzer);
	query = query.Rewrite(reader); //required to expand search terms
	Hits hits = searcher.Search(query);

	Highlighter highlighter = new Highlighter(this, new QueryScorer(query));
	for (int i = 0; i < hits.Length(); i++)
	{
	String text = hits.Doc(i).Get(FIELD_NAME);
	TokenStream tokenStream = analyzer.TokenStream(FIELD_NAME, new StringReader(text));
	// Get 3 best fragments and seperate with a "..."
	String result = highlighter.GetBestFragments(tokenStream, text, 3, "...");
	System.Out.Console.WriteLine(result);
	}
	</pre>
	<h2>New features 06/02/2005</h2>
	This release adds options for encoding (thanks to Nicko Cadell). An "IEncoder"
	implementation such as the new SimpleHTMLIEncoder class can be passed to the
	highlighter to encode all those non-xhtml standard characters such as &
	into legal values. This simple class may not suffice for some languages -
	Commons Lang has an implementation that could be used: escapeHtml(String) in
	http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&view=markup
	<h2>New features 22/12/2004</h2>
	This release adds some new capabilities:
	<ol>
	<li>
	Faster highlighting using Term vector support</li>
	<li>
	New formatting options to use color intensity to show informational value</li>
	<li>
	Options for better summarization by using term IDF scores to influence fragment
	selection</li>
	</ol>
	<p>
	The highlighter takes a TokenStream as input. Until now these streams have
	typically been produced using an Analyzer but the new class TokenSources
	provides helper methods for obtaining TokenStreams from the new TermVector
	position support (see latest CVS version).</p>
	<p>The new class GradientFormatter can use a scale of colors to highlight terms
	according to their score. A subtle use of color can help emphasise the reasons
	for matching (useful when doing "MoreLikeThis" queries and you want to see what
	the basis of the similarities are).</p>
	<p>The QueryScorer class has a new constructor which can use an IndexReader to
	derive the IDF (inverse document frequency) for each term in order to
	influcence the score. This is useful for helping to extracting the most
	significant sections of a document and in supplying scores used by the new
	GradientFormatter to color significant words more strongly. The
	QueryScorer.getMaxWeight method is useful when passed to the GradientFormatter
	constructor to define the top score which is associated with the top color.</p>
	</body>
	</html>