blob: 27ff3d14f06aa347bb1b986b398d4d3e8383b841 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Class HyphenatedWordsFilter
| Apache Lucene.NET 4.8.0-beta00013 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Class HyphenatedWordsFilter
| Apache Lucene.NET 4.8.0-beta00013 Documentation ">
<meta name="generator" content="docfx 2.56.2.0">
<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
<meta property="docfx:navrel" content="toc.html">
<meta property="docfx:tocrel" content="analysis-common/toc.html">
<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span>
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search">
<ul class="level0 breadcrumb">
<li>
<a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
<span id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</span>
</li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter">
<h1 id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter" class="text-break">Class HyphenatedWordsFilter
</h1>
<div class="markdown level0 summary"><p>When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines. This is often the case with documents where narrow text columns are used, such as newsletters.
In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together.
This filter should be used on indexing time only.
Example field definition in schema.xml:</p>
<pre><code>&lt;fieldtype name=&quot;text&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer type=&quot;index&quot;>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;index_synonyms.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;false&quot;/>
&lt;filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot;/>
&lt;filter class=&quot;solr.HyphenatedWordsFilterFactory&quot;/>
&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; catenateWords=&quot;1&quot; catenateNumbers=&quot;1&quot; catenateAll=&quot;0&quot;/>
&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/>
&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/>
&lt;/analyzer>
&lt;analyzer type=&quot;query&quot;>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;synonyms.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;true&quot;/>
&lt;filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot;/>
&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; catenateWords=&quot;0&quot; catenateNumbers=&quot;0&quot; catenateAll=&quot;0&quot;/>
&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/>
&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldtype></code></pre>
</div>
<div class="markdown level0 conceptual"></div>
<div class="inheritance">
<h5>Inheritance</h5>
<div class="level0"><span class="xref">System.Object</span></div>
<div class="level1"><span class="xref">Lucene.Net.Util.AttributeSource</span></div>
<div class="level2"><span class="xref">Lucene.Net.Analysis.TokenStream</span></div>
<div class="level3"><span class="xref">Lucene.Net.Analysis.TokenFilter</span></div>
<div class="level4"><span class="xref">HyphenatedWordsFilter</span></div>
</div>
<div classs="implements">
<h5>Implements</h5>
<div><span class="xref">System.IDisposable</span></div>
</div>
<div class="inheritedMembers">
<h5>Inherited Members</h5>
<div>
<span class="xref">Lucene.Net.Analysis.TokenFilter.m_input</span>
</div>
<div>
<span class="xref">Lucene.Net.Analysis.TokenFilter.End()</span>
</div>
<div>
<a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Analysis.TokenFilter.html#Lucene_Net_Analysis_TokenFilter_Dispose_System_Boolean_">TokenFilter.Dispose(Boolean)</a>
</div>
<div>
<span class="xref">Lucene.Net.Analysis.TokenStream.Dispose()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.GetAttributeFactory()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute)</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.AddAttribute&lt;T&gt;()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.HasAttributes</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.HasAttribute&lt;T&gt;()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.GetAttribute&lt;T&gt;()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.ClearAttributes()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.CaptureState()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.GetHashCode()</span>
</div>
<div>
<a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Util.AttributeSource.html#Lucene_Net_Util_AttributeSource_Equals_System_Object_">AttributeSource.Equals(Object)</a>
</div>
<div>
<a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Util.AttributeSource.html#Lucene_Net_Util_AttributeSource_ReflectAsString_System_Boolean_">AttributeSource.ReflectAsString(Boolean)</a>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector)</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.CloneAttributes()</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource)</span>
</div>
<div>
<span class="xref">Lucene.Net.Util.AttributeSource.ToString()</span>
</div>
<div>
<span class="xref">System.Object.Equals(System.Object, System.Object)</span>
</div>
<div>
<span class="xref">System.Object.GetType()</span>
</div>
<div>
<span class="xref">System.Object.MemberwiseClone()</span>
</div>
<div>
<span class="xref">System.Object.ReferenceEquals(System.Object, System.Object)</span>
</div>
</div>
<h6><strong>Namespace</strong>: <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.html">Lucene.Net.Analysis.Miscellaneous</a></h6>
<h6><strong>Assembly</strong>: Lucene.Net.Analysis.Common.dll</h6>
<h5 id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_syntax">Syntax</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public sealed class HyphenatedWordsFilter : TokenFilter, IDisposable</code></pre>
</div>
<h3 id="constructors">Constructors
</h3>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter__ctor_Lucene_Net_Analysis_TokenStream_.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.%23ctor(Lucene.Net.Analysis.TokenStream)%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/HyphenatedWordsFilter.cs/#L66">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter__ctor_" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.#ctor*"></a>
<h4 id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter__ctor_Lucene_Net_Analysis_TokenStream_" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.#ctor(Lucene.Net.Analysis.TokenStream)">HyphenatedWordsFilter(TokenStream)</h4>
<div class="markdown level1 summary"><p>Creates a new <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.html">HyphenatedWordsFilter</a></p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public HyphenatedWordsFilter(TokenStream in)</code></pre>
</div>
<h5 class="parameters">Parameters</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="xref">Lucene.Net.Analysis.TokenStream</span></td>
<td><span class="parametername">in</span></td>
<td><p><span class="xref">Lucene.Net.Analysis.TokenStream</span> that will be filtered </p>
</td>
</tr>
</tbody>
</table>
<h3 id="methods">Methods
</h3>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_IncrementToken.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.IncrementToken%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/HyphenatedWordsFilter.cs/#L96">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_IncrementToken_" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.IncrementToken*"></a>
<h4 id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_IncrementToken" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.IncrementToken">IncrementToken()</h4>
<div class="markdown level1 summary"><p>Consumers (i.e., <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Index.IndexWriter.html">IndexWriter</a>) use this method to advance the stream to
the next token. Implementing classes must implement this method and update
the appropriate <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Util.Attribute.html">Attribute</a>s with the attributes of the next
token.
<p>
The producer must make no assumptions about the attributes after the method
has been returned: the caller may arbitrarily change it. If the producer
needs to preserve the state for subsequent calls, it can use
<span class="xref">Lucene.Net.Util.AttributeSource.CaptureState()</span> to create a copy of the current attribute state.
<p>
this method is called for every token of a document, so an efficient
implementation is crucial for good performance. To avoid calls to
<span class="xref">Lucene.Net.Util.AttributeSource.AddAttribute&lt;T&gt;()</span> and <span class="xref">Lucene.Net.Util.AttributeSource.GetAttribute&lt;T&gt;()</span>,
references to all <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Util.Attribute.html">Attribute</a>s that this stream uses should be
retrieved during instantiation.
<p>
To ensure that filters and consumers know which attributes are available,
the attributes must be added during instantiation. Filters and consumers
are not required to check for availability of attributes in
<a class="xref" href="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.html#Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_IncrementToken">IncrementToken()</a>.</p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public override bool IncrementToken()</code></pre>
</div>
<h5 class="returns">Returns</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="xref">System.Boolean</span></td>
<td><p>false for end of stream; true otherwise </p>
</td>
</tr>
</tbody>
</table>
<h5 class="overrides">Overrides</h5>
<div><span class="xref">Lucene.Net.Analysis.TokenStream.IncrementToken()</span></div>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_Reset.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.Reset%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/HyphenatedWordsFilter.cs/#L158">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_Reset_" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.Reset*"></a>
<h4 id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_Reset" data-uid="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.Reset">Reset()</h4>
<div class="markdown level1 summary"><p>This method is called by a consumer before it begins consumption using
<a class="xref" href="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.html#Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_IncrementToken">IncrementToken()</a>.
<p>
Resets this stream to a clean state. Stateful implementations must implement
this method so that they can be reused, just as if they had been created fresh.
<p>
If you override this method, always call <code>base.Reset()</code>, otherwise
some internal state will not be correctly reset (e.g., <span class="xref">Lucene.Net.Analysis.Tokenizer</span> will
throw <span class="xref">System.InvalidOperationException</span> on further usage).</p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public override void Reset()</code></pre>
</div>
<h5 class="overrides">Overrides</h5>
<div><span class="xref">Lucene.Net.Analysis.TokenFilter.Reset()</span></div>
<h5 id="Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter_Reset_remarks">Remarks</h5>
<div class="markdown level1 remarks"><p><strong>NOTE:</strong>
The default implementation chains the call to the input <span class="xref">Lucene.Net.Analysis.TokenStream</span>, so
be sure to call <code>base.Reset()</code> when overriding this method.</p>
</div>
<h3 id="implements">Implements</h3>
<div>
<span class="xref">System.IDisposable</span>
</div>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Miscellaneous_HyphenatedWordsFilter.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A" class="contribution-link">Improve this Doc</a>
</li>
<li>
<a href="https://github.com/apache/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/HyphenatedWordsFilter.cs/#L52" class="contribution-link">View Source</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small>
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
</body>
</html>