blob: 56733458dd466c3d67ae8adcf4d1792c58cfe566 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Namespace Lucene.Net.Search.VectorHighlight
| Apache Lucene.NET 4.8.0-beta00013 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Namespace Lucene.Net.Search.VectorHighlight
| Apache Lucene.NET 4.8.0-beta00013 Documentation ">
<meta name="generator" content="docfx 2.56.2.0">
<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
<meta property="docfx:navrel" content="toc.html">
<meta property="docfx:tocrel" content="highlighter/toc.html">
<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span>
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search">
<ul class="level0 breadcrumb">
<li>
<a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
<span id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</span>
</li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Search.VectorHighlight">
<h1 id="Lucene_Net_Search_VectorHighlight" data-uid="Lucene.Net.Search.VectorHighlight" class="text-break">Namespace Lucene.Net.Search.VectorHighlight
</h1>
<div class="markdown level0 summary"><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>This is an another highlighter implementation.</p>
<h2 id="features">Features</h2>
<ul>
<li><p>fast for large docs</p>
</li>
<li><p>support N-gram fields</p>
</li>
<li><p>support phrase-unit highlighting with slops</p>
</li>
<li><p>support multi-term (includes wildcard, range, regexp, etc) queries</p>
</li>
<li><p>need Java 1.5</p>
</li>
<li><p>highlight fields need to be stored with Positions and Offsets</p>
</li>
<li><p>take into account query boost and/or IDF-weight to score fragments</p>
</li>
<li><p>support colored highlight tags</p>
</li>
<li><p>pluggable FragListBuilder / FieldFragList</p>
</li>
<li><p>pluggable FragmentsBuilder</p>
</li>
</ul>
<h2 id="algorithm">Algorithm</h2>
<p>To explain the algorithm, let&#39;s use the following sample text (to be highlighted) and user query:</p>
<table border="1">
<tr>
<td><strong>Sample Text</strong></td>
<td>Lucene is a search engine library.</td>
</tr>
<tr>
<td><strong>User Query</strong></td>
<td>Lucene^2 OR &quot;search library&quot;~1</td>
</tr>
</table>
<p>The user query is a BooleanQuery that consists of TermQuery(&quot;Lucene&quot;) with boost of 2 and PhraseQuery(&quot;search library&quot;) with slop of 1.</p>
<p>For your convenience, here is the offsets and positions info of the sample text.</p>
<pre><code>+--------+-----------------------------------+
| | 1111111111222222222233333|
| offset|01234567890123456789012345678901234|
+--------+-----------------------------------+
|document|Lucene is a search engine library. |
+--------*-----------------------------------+
|position|0 1 2 3 4 5 |
+--------*-----------------------------------+
</code></pre><h3 id="step-1">Step 1.</h3>
<p>In Step 1, Fast Vector Highlighter generates <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldQuery.QueryPhraseMap.html">FieldQuery.QueryPhraseMap</a> from the user query. <code>QueryPhraseMap</code> consists of the following members:</p>
<pre><code>public class QueryPhraseMap {
boolean terminal;
int slop; // valid if terminal == true and phraseHighlight == true
float boost; // valid if terminal == true
Map&lt;String, QueryPhraseMap&gt; subMap;
}
</code></pre><p><code>QueryPhraseMap</code> has subMap. The key of the subMap is a term text in the user query and the value is a subsequent <code>QueryPhraseMap</code>. If the query is a term (not phrase), then the subsequent <code>QueryPhraseMap</code> is marked as terminal. If the query is a phrase, then the subsequent <code>QueryPhraseMap</code> is not a terminal and it has the next term text in the phrase.</p>
<p>From the sample user query, the following <code>QueryPhraseMap</code> will be generated:</p>
<pre><code> QueryPhraseMap
+--------+-+ +-------+-+
|&quot;Lucene&quot;|o+-&gt;|boost=2|*| * : terminal
+--------+-+ +-------+-+
</code></pre><p>+--------+-+ +---------+-+ +-------+------+-+
|&quot;search&quot;|o+-&gt;|&quot;library&quot;|o+-&gt;|boost=1|slop=1|*|
+--------+-+ +---------+-+ +-------+------+-+</p>
<h3 id="step-2">Step 2.</h3>
<p>In Step 2, Fast Vector Highlighter generates <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldTermStack.html">FieldTermStack</a>. Fast Vector Highlighter uses term vector data (must be stored <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Documents.FieldType.html">#setStoreTermVectorOffsets(boolean)</a> and <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Documents.FieldType.html">#setStoreTermVectorPositions(boolean)</a>) to generate it. <code>FieldTermStack</code> keeps the terms in the user query. Therefore, in this sample case, Fast Vector Highlighter generates the following <code>FieldTermStack</code>:</p>
<pre><code> FieldTermStack
+------------------+
|&quot;Lucene&quot;(0,6,0) |
+------------------+
|&quot;search&quot;(12,18,3) |
+------------------+
|&quot;library&quot;(26,33,5)|
+------------------+
where : &quot;termText&quot;(startOffset,endOffset,position)
</code></pre><h3 id="step-3">Step 3.</h3>
<p>In Step 3, Fast Vector Highlighter generates <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldPhraseList.html">FieldPhraseList</a> by reference to <code>QueryPhraseMap</code> and <code>FieldTermStack</code>.</p>
<pre><code> FieldPhraseList
+----------------+-----------------+---+
|&quot;Lucene&quot; |[(0,6)] |w=2|
+----------------+-----------------+---+
|&quot;search library&quot;|[(12,18),(26,33)]|w=1|
+----------------+-----------------+---+
</code></pre><p>The type of each entry is <code>WeightedPhraseInfo</code> that consists of an array of terms offsets and weight. </p>
<h3 id="step-4">Step 4.</h3>
<p>In Step 4, Fast Vector Highlighter creates <code>FieldFragList</code> by reference to <code>FieldPhraseList</code>. In this sample case, the following <code>FieldFragList</code> will be generated:</p>
<pre><code> FieldFragList
+---------------------------------+
|&quot;Lucene&quot;[(0,6)] |
|&quot;search library&quot;[(12,18),(26,33)]|
|totalBoost=3 |
+---------------------------------+
</code></pre><p>The calculation for each <code>FieldFragList.WeightedFragInfo.totalBoost</code> (weight)<br>depends on the implementation of <code>FieldFragList.add( ... )</code>:</p>
<pre><code> public void add( int startOffset, int endOffset, List&lt;WeightedPhraseInfo&gt; phraseInfoList ) {
float totalBoost = 0;
List&lt;SubInfo&gt; subInfos = new ArrayList&lt;SubInfo&gt;();
for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
subInfos.add( new SubInfo( phraseInfo.getText(), phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
totalBoost += phraseInfo.getBoost();
}
getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, subInfos, totalBoost ) );
}
</code></pre><p>The used implementation of <code>FieldFragList</code> is noted in <code>BaseFragListBuilder.createFieldFragList( ... )</code>:</p>
<pre><code> public FieldFragList createFieldFragList( FieldPhraseList fieldPhraseList, int fragCharSize ){
return createFieldFragList( fieldPhraseList, new SimpleFieldFragList( fragCharSize ), fragCharSize );
}
</code></pre><p> Currently there are basically to approaches available: </p>
<ul>
<li><p><code>SimpleFragListBuilder using SimpleFieldFragList</code>: <em>sum-of-boosts</em>-approach. The totalBoost is calculated by summarizing the query-boosts per term. Per default a term is boosted by 1.0</p>
</li>
<li><p><code>WeightedFragListBuilder using WeightedFieldFragList</code>: <em>sum-of-distinct-weights</em>-approach. The totalBoost is calculated by summarizing the IDF-weights of distinct terms.</p>
</li>
</ul>
<p>Comparison of the two approaches:</p>
<table border="1">
<caption>
query = das alte testament (The Old Testament)
</caption>
<tr><th>Terms in fragment</th><th>sum-of-distinct-weights</th><th>sum-of-boosts</th></tr>
<tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das testament alte</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das testament</td><td>2.9455688</td><td>2.0</td></tr>
<tr><td>das alte</td><td>2.4759595</td><td>2.0</td></tr>
<tr><td>das das das das</td><td>1.5015357</td><td>4.0</td></tr>
<tr><td>das das das</td><td>1.3003681</td><td>3.0</td></tr>
<tr><td>das das</td><td>1.061746</td><td>2.0</td></tr>
<tr><td>alte</td><td>1.0</td><td>1.0</td></tr>
<tr><td>alte</td><td>1.0</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
</table>
<h3 id="step-5">Step 5.</h3>
<p>In Step 5, by using <code>FieldFragList</code> and the field stored data, Fast Vector Highlighter creates highlighted snippets!</p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="markdown level0 remarks"></div>
<h3 id="classes">Classes
</h3>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.BaseFragListBuilder.html">BaseFragListBuilder</a></h4>
<section><p>A abstract implementation of <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.html">BaseFragmentsBuilder</a></h4>
<section><p>Base <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragmentsBuilder.html">IFragmentsBuilder</a> implementation that supports colored pre/post
tags and multivalued fields.
<p>
Uses <a class="xref" href="Lucene.Net.Search.VectorHighlight.IBoundaryScanner.html">IBoundaryScanner</a> to determine fragments.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FastVectorHighlighter.html">FastVectorHighlighter</a></h4>
<section><p>Another highlighter implementation.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.html">FieldFragList</a></h4>
<section><p>FieldFragList has a list of &quot;frag info&quot; that is used by <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragmentsBuilder.html">IFragmentsBuilder</a> class
to create fragments (snippets).</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.WeightedFragInfo.html">FieldFragList.WeightedFragInfo</a></h4>
<section><p>List of term offsets + weight for a frag info</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.WeightedFragInfo.SubInfo.html">FieldFragList.WeightedFragInfo.SubInfo</a></h4>
<section><p>Represents the list of term offsets for some text</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldPhraseList.html">FieldPhraseList</a></h4>
<section><p>FieldPhraseList has a list of WeightedPhraseInfo that is used by FragListBuilder
to create a FieldFragList object.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldPhraseList.WeightedPhraseInfo.html">FieldPhraseList.WeightedPhraseInfo</a></h4>
<section><p>Represents the list of term offsets and boost for some text</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldPhraseList.WeightedPhraseInfo.Toffs.html">FieldPhraseList.WeightedPhraseInfo.Toffs</a></h4>
<section><p>Term offsets (start + end)</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldQuery.html">FieldQuery</a></h4>
<section><p><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldQuery.html">FieldQuery</a> breaks down query object into terms/phrases and keeps
them in a <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldQuery.QueryPhraseMap.html">FieldQuery.QueryPhraseMap</a> structure.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldQuery.QueryPhraseMap.html">FieldQuery.QueryPhraseMap</a></h4>
<section><p>Internal structure of a query for highlighting: represents
a nested query structure</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldTermStack.html">FieldTermStack</a></h4>
<section><p><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldTermStack.html">FieldTermStack</a> is a stack that keeps query terms in the specified field
of the document to be highlighted.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldTermStack.TermInfo.html">FieldTermStack.TermInfo</a></h4>
<section><p>Single term with its position/offsets in the document and IDF weight.
It is <span class="xref">System.IComparable&lt;T&gt;</span> but considers only position.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.ScoreOrderFragmentsBuilder.html">ScoreOrderFragmentsBuilder</a></h4>
<section><p>An implementation of FragmentsBuilder that outputs score-order fragments.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.ScoreOrderFragmentsBuilder.ScoreComparer.html">ScoreOrderFragmentsBuilder.ScoreComparer</a></h4>
<section><p><span class="xref">System.Collections.Generic.IComparer&lt;T&gt;</span> for <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.WeightedFragInfo.html">FieldFragList.WeightedFragInfo</a> by boost, breaking ties
by offset.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.SimpleBoundaryScanner.html">SimpleBoundaryScanner</a></h4>
<section><p>Simple boundary scanner implementation that divides fragments
based on a set of separator characters.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.SimpleFieldFragList.html">SimpleFieldFragList</a></h4>
<section><p>A simple implementation of <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.html">FieldFragList</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.SimpleFragListBuilder.html">SimpleFragListBuilder</a></h4>
<section><p>A simple implementation of <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.SimpleFragmentsBuilder.html">SimpleFragmentsBuilder</a></h4>
<section><p>A simple implementation of FragmentsBuilder.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.SingleFragListBuilder.html">SingleFragListBuilder</a></h4>
<section><p>An implementation class of <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a> that generates one <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.WeightedFragInfo.html">FieldFragList.WeightedFragInfo</a> object.
Typical use case of this class is that you can get an entire field contents
by using both of this class and <a class="xref" href="Lucene.Net.Search.VectorHighlight.SimpleFragmentsBuilder.html">SimpleFragmentsBuilder</a>.
<p>
<pre><code>FastVectorHighlighter h = new FastVectorHighlighter( true, true,
new SingleFragListBuilder(), new SimpleFragmentsBuilder() );</code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.WeightedFieldFragList.html">WeightedFieldFragList</a></h4>
<section><p>A weighted implementation of <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.html">FieldFragList</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.WeightedFragListBuilder.html">WeightedFragListBuilder</a></h4>
<section><p>A weighted implementation of <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a>.</p>
</section>
<h3 id="interfaces">Interfaces
</h3>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.IBoundaryScanner.html">IBoundaryScanner</a></h4>
<section><p>Finds fragment boundaries: pluggable into <a class="xref" href="Lucene.Net.Search.VectorHighlight.BaseFragmentsBuilder.html">BaseFragmentsBuilder</a></p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a></h4>
<section><p><a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a> is an interface for <a class="xref" href="Lucene.Net.Search.VectorHighlight.FieldFragList.html">FieldFragList</a> builder classes.
A <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragListBuilder.html">IFragListBuilder</a> class can be plugged in to <a class="xref" href="Lucene.Net.Search.Highlight.Highlighter.html">Highlighter</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragmentsBuilder.html">IFragmentsBuilder</a></h4>
<section><p><a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragmentsBuilder.html">IFragmentsBuilder</a> is an interface for fragments (snippets) builder classes.
A <a class="xref" href="Lucene.Net.Search.VectorHighlight.IFragmentsBuilder.html">IFragmentsBuilder</a> class can be plugged in to
<a class="xref" href="Lucene.Net.Search.VectorHighlight.FastVectorHighlighter.html">FastVectorHighlighter</a>.</p>
</section>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00013/src/Lucene.Net.Highlighter/VectorHighlight/package.md/#L2" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small>
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
</body>
</html>