blob: a7ead3c685e1136b9ac7fb58f2e5b76ce2041844 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Namespace Lucene.Net.Analysis.Miscellaneous
| Apache Lucene.NET 4.8.0-beta00010 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Namespace Lucene.Net.Analysis.Miscellaneous
| Apache Lucene.NET 4.8.0-beta00010 Documentation ">
<meta name="generator" content="docfx 2.56.0.0">
<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
<meta property="docfx:navrel" content="toc.html">
<meta property="docfx:tocrel" content="analysis-common/toc.html">
<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search">
<ul class="level0 breadcrumb">
<li>
<a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
<span id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</span>
</li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.Miscellaneous">
<h1 id="Lucene_Net_Analysis_Miscellaneous" data-uid="Lucene.Net.Analysis.Miscellaneous" class="text-break">Namespace Lucene.Net.Analysis.Miscellaneous
</h1>
<div class="markdown level0 summary"><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Miscellaneous TokenStreams</p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="markdown level0 remarks"></div>
<h3 id="classes">Classes
</h3>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ASCIIFoldingFilter.html">ASCIIFoldingFilter</a></h4>
<section><p>This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the &quot;Basic Latin&quot; Unicode
block) into their ASCII equivalents, if one exists.
<p>
Characters from the following Unicode blocks are converted; however, only
those characters with reasonable ASCII alternatives are converted:</p>
<p><ul>
<item>C1 Controls and Latin-1 Supplement: <a href="http://www.unicode.org/charts/PDF/U0080.pdf">http://www.unicode.org/charts/PDF/U0080.pdf</a></item>
<item>Latin Extended-A: <a href="http://www.unicode.org/charts/PDF/U0100.pdf">http://www.unicode.org/charts/PDF/U0100.pdf</a></item>
<item>Latin Extended-B: <a href="http://www.unicode.org/charts/PDF/U0180.pdf">http://www.unicode.org/charts/PDF/U0180.pdf</a></item>
<item>Latin Extended Additional: <a href="http://www.unicode.org/charts/PDF/U1E00.pdf">http://www.unicode.org/charts/PDF/U1E00.pdf</a></item>
<item>Latin Extended-C: <a href="http://www.unicode.org/charts/PDF/U2C60.pdf">http://www.unicode.org/charts/PDF/U2C60.pdf</a></item>
<item>Latin Extended-D: <a href="http://www.unicode.org/charts/PDF/UA720.pdf">http://www.unicode.org/charts/PDF/UA720.pdf</a></item>
<item>IPA Extensions: <a href="http://www.unicode.org/charts/PDF/U0250.pdf">http://www.unicode.org/charts/PDF/U0250.pdf</a></item>
<item>Phonetic Extensions: <a href="http://www.unicode.org/charts/PDF/U1D00.pdf">http://www.unicode.org/charts/PDF/U1D00.pdf</a></item>
<item>Phonetic Extensions Supplement: <a href="http://www.unicode.org/charts/PDF/U1D80.pdf">http://www.unicode.org/charts/PDF/U1D80.pdf</a></item>
<item>General Punctuation: <a href="http://www.unicode.org/charts/PDF/U2000.pdf">http://www.unicode.org/charts/PDF/U2000.pdf</a></item>
<item>Superscripts and Subscripts: <a href="http://www.unicode.org/charts/PDF/U2070.pdf">http://www.unicode.org/charts/PDF/U2070.pdf</a></item>
<item>Enclosed Alphanumerics: <a href="http://www.unicode.org/charts/PDF/U2460.pdf">http://www.unicode.org/charts/PDF/U2460.pdf</a></item>
<item>Dingbats: <a href="http://www.unicode.org/charts/PDF/U2700.pdf">http://www.unicode.org/charts/PDF/U2700.pdf</a></item>
<item>Supplemental Punctuation: <a href="http://www.unicode.org/charts/PDF/U2E00.pdf">http://www.unicode.org/charts/PDF/U2E00.pdf</a></item>
<item>Alphabetic Presentation Forms: <a href="http://www.unicode.org/charts/PDF/UFB00.pdf">http://www.unicode.org/charts/PDF/UFB00.pdf</a></item>
<item>Halfwidth and Fullwidth Forms: <a href="http://www.unicode.org/charts/PDF/UFF00.pdf">http://www.unicode.org/charts/PDF/UFF00.pdf</a></item>
</ul>
<p>
See: <a href="http://en.wikipedia.org/wiki/Latin_characters_in_Unicode">http://en.wikipedia.org/wiki/Latin_characters_in_Unicode</a>
<p>
For example, &apos;&amp;agrave;&apos; will be replaced by &apos;a&apos;.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ASCIIFoldingFilterFactory.html">ASCIIFoldingFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ASCIIFoldingFilter.html">ASCIIFoldingFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_ascii&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.ASCIIFoldingFilterFactory&quot; preserveOriginal=&quot;false&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.CapitalizationFilter.html">CapitalizationFilter</a></h4>
<section><p>A filter to apply normal capitalization rules to Tokens. It will make the first letter
capital and the rest lower case.
<p>
This filter is particularly useful to build nice looking facet parameters. This filter
is not appropriate if you intend to use a prefix query.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.CapitalizationFilterFactory.html">CapitalizationFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.CapitalizationFilter.html">CapitalizationFilter</a>.
<p>
The factory takes parameters:<p>
&quot;onlyFirstWord&quot; - should each word be capitalized or all of the words?<p>
&quot;keep&quot; - a keep word list. Each word that should be kept separated by whitespace.<p>
&quot;keepIgnoreCase - true or false. If true, the keep list will be considered case-insensitive.<p>
&quot;forceFirstLetter&quot; - Force the first letter to be capitalized even if it is in the keep list<p>
&quot;okPrefix&quot; - do not change word capitalization if a word begins with something in this list.
for example if &quot;McK&quot; is on the okPrefix list, the word &quot;McKinley&quot; should not be changed to
&quot;Mckinley&quot;<p>
&quot;minWordLength&quot; - how long the word needs to be to get capitalization applied. If the
minWordLength is 3, &quot;and&quot; &gt; &quot;And&quot; but &quot;or&quot; stays &quot;or&quot;<p>
&quot;maxWordCount&quot; - if the token contains more then maxWordCount words, the capitalization is
assumed to be correct.<p>
<pre><code>&lt;fieldType name=&quot;text_cptlztn&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.CapitalizationFilterFactory&quot; onlyFirstWord=&quot;true&quot;
keep=&quot;java solr lucene&quot; keepIgnoreCase=&quot;false&quot;
okPrefix=&quot;McK McD McA&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
<p>@since solr 1.3</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.CodepointCountFilter.html">CodepointCountFilter</a></h4>
<section><p>Removes words that are too long or too short from the stream.
<p>
Note: Length is calculated as the number of Unicode codepoints.
</p></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.CodepointCountFilterFactory.html">CodepointCountFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.CodepointCountFilter.html">CodepointCountFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_lngth&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.CodepointCountFilterFactory&quot; min=&quot;0&quot; max=&quot;1&quot; />
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.EmptyTokenStream.html">EmptyTokenStream</a></h4>
<section><p>An always exhausted token stream.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.html">HyphenatedWordsFilter</a></h4>
<section><p>When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines. This is often the case with documents where narrow text columns are used, such as newsletters.
In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together.
This filter should be used on indexing time only.
Example field definition in schema.xml:</p>
<pre><code>&lt;fieldtype name=&quot;text&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer type=&quot;index&quot;>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;index_synonyms.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;false&quot;/>
&lt;filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot;/>
&lt;filter class=&quot;solr.HyphenatedWordsFilterFactory&quot;/>
&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; catenateWords=&quot;1&quot; catenateNumbers=&quot;1&quot; catenateAll=&quot;0&quot;/>
&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/>
&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/>
&lt;/analyzer>
&lt;analyzer type=&quot;query&quot;>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;synonyms.txt&quot; ignoreCase=&quot;true&quot; expand=&quot;true&quot;/>
&lt;filter class=&quot;solr.StopFilterFactory&quot; ignoreCase=&quot;true&quot;/>
&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; catenateWords=&quot;0&quot; catenateNumbers=&quot;0&quot; catenateAll=&quot;0&quot;/>
&lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/>
&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldtype></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilterFactory.html">HyphenatedWordsFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.HyphenatedWordsFilter.html">HyphenatedWordsFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_hyphn&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.HyphenatedWordsFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeepWordFilter.html">KeepWordFilter</a></h4>
<section><p>A <span class="xref">Lucene.Net.Analysis.TokenFilter</span> that only keeps tokens with text contained in the
required words. This filter behaves like the inverse of <a class="xref" href="Lucene.Net.Analysis.Core.StopFilter.html">StopFilter</a>.</p>
<p>@since solr 1.3</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeepWordFilterFactory.html">KeepWordFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeepWordFilter.html">KeepWordFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_keepword&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.KeepWordFilterFactory&quot; words=&quot;keepwords.txt&quot; ignoreCase=&quot;false&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordMarkerFilter.html">KeywordMarkerFilter</a></h4>
<section><p>Marks terms as keywords via the <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html">KeywordAttribute</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordMarkerFilterFactory.html">KeywordMarkerFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordMarkerFilter.html">KeywordMarkerFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_keyword&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.KeywordMarkerFilterFactory&quot; protected=&quot;protectedkeyword.txt&quot; pattern=&quot;^.+er$&quot; ignoreCase=&quot;false&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordRepeatFilter.html">KeywordRepeatFilter</a></h4>
<section><p>This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with
<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html#Lucene_Net_Analysis_TokenAttributes_KeywordAttribute_IsKeyword">IsKeyword</a> set to <code>true</code> and once set to <code>false</code>.
This is useful if used with a stem filter that respects the <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html">KeywordAttribute</a> to index the stemmed and the
un-stemmed version of a term into the same field.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordRepeatFilterFactory.html">KeywordRepeatFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordRepeatFilter.html">KeywordRepeatFilter</a>.</p>
<p>Since <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.KeywordRepeatFilter.html">KeywordRepeatFilter</a> emits two tokens for every input token, and any tokens that aren&apos;t transformed
later in the analysis chain will be in the document twice. Therefore, consider adding
<a class="xref" href="Lucene.Net.Analysis.Miscellaneous.RemoveDuplicatesTokenFilterFactory.html">RemoveDuplicatesTokenFilterFactory</a> later in the analysis chain.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LengthFilter.html">LengthFilter</a></h4>
<section><p>Removes words that are too long or too short from the stream.
<p>
Note: Length is calculated as the number of UTF-16 code units.
</p></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LengthFilterFactory.html">LengthFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LengthFilter.html">LengthFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_lngth&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.LengthFilterFactory&quot; min=&quot;0&quot; max=&quot;1&quot; />
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountAnalyzer.html">LimitTokenCountAnalyzer</a></h4>
<section><p>This <span class="xref">Lucene.Net.Analysis.Analyzer</span> limits the number of tokens while indexing. It is
a replacement for the maximum field length setting inside <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Index.IndexWriter.html">IndexWriter</a>. </p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilter.html">LimitTokenCountFilter</a></h4>
<section><p>This <span class="xref">Lucene.Net.Analysis.TokenFilter</span> limits the number of tokens while indexing. It is
a replacement for the maximum field length setting inside <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Index.IndexWriter.html">IndexWriter</a>.
<p>
By default, this filter ignores any tokens in the wrapped <span class="xref">Lucene.Net.Analysis.TokenStream</span>
once the limit has been reached, which can result in <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilter.html#Lucene_Net_Analysis_Miscellaneous_LimitTokenCountFilter_Reset">Reset()</a> being
called prior to <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilter.html#Lucene_Net_Analysis_Miscellaneous_LimitTokenCountFilter_IncrementToken">IncrementToken()</a> returning <code>false</code>. For most
<span class="xref">Lucene.Net.Analysis.TokenStream</span> implementations this should be acceptable, and faster
then consuming the full stream. If you are wrapping a <span class="xref">Lucene.Net.Analysis.TokenStream</span>
which requires that the full stream of tokens be exhausted in order to
function properly, use the
<a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilter.html#Lucene_Net_Analysis_Miscellaneous_LimitTokenCountFilter__ctor_Lucene_Net_Analysis_TokenStream_System_Int32_System_Boolean_">LimitTokenCountFilter(TokenStream, Int32, Boolean)</a> consumeAllTokens
option.
</p></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilterFactory.html">LimitTokenCountFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilter.html">LimitTokenCountFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_lngthcnt&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.LimitTokenCountFilterFactory&quot; maxTokenCount=&quot;10&quot; consumeAllTokens=&quot;false&quot; />
&lt;/analyzer>
&lt;/fieldType></code></pre>
<p>
The <span class="xref">Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilterFactory.consumeAllTokens</span> property is optional and defaults to <code>false</code>.<br>See <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenCountFilter.html">LimitTokenCountFilter</a> for an explanation of it&apos;s use.
</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilter.html">LimitTokenPositionFilter</a></h4>
<section><p>This <span class="xref">Lucene.Net.Analysis.TokenFilter</span> limits its emitted tokens to those with positions that
are not greater than the configured limit.
<p>
By default, this filter ignores any tokens in the wrapped <span class="xref">Lucene.Net.Analysis.TokenStream</span>
once the limit has been exceeded, which can result in <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilter.html#Lucene_Net_Analysis_Miscellaneous_LimitTokenPositionFilter_Reset">Reset()</a> being
called prior to <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilter.html#Lucene_Net_Analysis_Miscellaneous_LimitTokenPositionFilter_IncrementToken">IncrementToken()</a> returning <code>false</code>. For most
<span class="xref">Lucene.Net.Analysis.TokenStream</span> implementations this should be acceptable, and faster
then consuming the full stream. If you are wrapping a <span class="xref">Lucene.Net.Analysis.TokenStream</span>
which requires that the full stream of tokens be exhausted in order to
function properly, use the
<a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilter.html#Lucene_Net_Analysis_Miscellaneous_LimitTokenPositionFilter__ctor_Lucene_Net_Analysis_TokenStream_System_Int32_System_Boolean_">LimitTokenPositionFilter(TokenStream, Int32, Boolean)</a> consumeAllTokens
option.
</p></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilterFactory.html">LimitTokenPositionFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilter.html">LimitTokenPositionFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_limit_pos&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.LimitTokenPositionFilterFactory&quot; maxTokenPosition=&quot;3&quot; consumeAllTokens=&quot;false&quot; />
&lt;/analyzer>
&lt;/fieldType></code></pre>
<p>
The <span class="xref">Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilterFactory.consumeAllTokens</span> property is optional and defaults to <code>false</code>.<br>See <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.LimitTokenPositionFilter.html">LimitTokenPositionFilter</a> for an explanation of its use.
</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.Lucene47WordDelimiterFilter.html">Lucene47WordDelimiterFilter</a></h4>
<section><p>Old Broken version of <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter.html">WordDelimiterFilter</a></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PatternAnalyzer.html">PatternAnalyzer</a></h4>
<section><p>Efficient Lucene analyzer/tokenizer that preferably operates on a <span class="xref">System.String</span> rather than a
<span class="xref">System.IO.TextReader</span>, that can flexibly separate text into terms via a regular expression <span class="xref">System.Text.RegularExpressions.Regex</span>
(with behaviour similar to <span class="xref">System.Text.RegularExpressions.Regex.Split(System.String)</span>),
and that combines the functionality of
<a class="xref" href="Lucene.Net.Analysis.Core.LetterTokenizer.html">LetterTokenizer</a>,
<a class="xref" href="Lucene.Net.Analysis.Core.LowerCaseTokenizer.html">LowerCaseTokenizer</a>,
<a class="xref" href="Lucene.Net.Analysis.Core.WhitespaceTokenizer.html">WhitespaceTokenizer</a>,
<a class="xref" href="Lucene.Net.Analysis.Core.StopFilter.html">StopFilter</a> into a single efficient
multi-purpose class.
<p>
If you are unsure how exactly a regular expression should look like, consider
prototyping by simply trying various expressions on some test texts via
<span class="xref">System.Text.RegularExpressions.Regex.Split(System.String)</span>. Once you are satisfied, give that regex to
<a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PatternAnalyzer.html">PatternAnalyzer</a>. Also see <a target="_blank" href="http://www.regular-expressions.info/">Regular Expression Tutorial</a>.
</p>
<p>
This class can be considerably faster than the &quot;normal&quot; Lucene tokenizers.
It can also serve as a building block in a compound Lucene
<span class="xref">Lucene.Net.Analysis.TokenFilter</span> chain. For example as in this
stemming example:</p>
<pre><code>PatternAnalyzer pat = ...
TokenStream tokenStream = new SnowballFilter(
pat.GetTokenStream(&quot;content&quot;, &quot;James is running round in the woods&quot;),
&quot;English&quot;));</code></pre>
<p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PatternKeywordMarkerFilter.html">PatternKeywordMarkerFilter</a></h4>
<section><p>Marks terms as keywords via the <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html">KeywordAttribute</a>. Each token
that matches the provided pattern is marked as a keyword by setting
<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html#Lucene_Net_Analysis_TokenAttributes_KeywordAttribute_IsKeyword">IsKeyword</a> to <code>true</code>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PerFieldAnalyzerWrapper.html">PerFieldAnalyzerWrapper</a></h4>
<section><p>This analyzer is used to facilitate scenarios where different
fields Require different analysis techniques. Use the Map
argument in <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PerFieldAnalyzerWrapper.html#Lucene_Net_Analysis_Miscellaneous_PerFieldAnalyzerWrapper__ctor_Lucene_Net_Analysis_Analyzer_System_Collections_Generic_IDictionary_System_String_Lucene_Net_Analysis_Analyzer__">PerFieldAnalyzerWrapper(Analyzer, IDictionary&lt;String, Analyzer&gt;)</a>
to add non-default analyzers for fields.</p>
<p>Example usage:
<pre><code>IDictionary&lt;string, Analyzer> analyzerPerField = new Dictionary&lt;string, Analyzer>();
analyzerPerField[&quot;firstname&quot;] = new KeywordAnalyzer();
analyzerPerField[&quot;lastname&quot;] = new KeywordAnalyzer();
PerFieldAnalyzerWrapper aWrapper =
new PerFieldAnalyzerWrapper(new StandardAnalyzer(version), analyzerPerField);</code></pre>
</p>
<p>
In this example, <a class="xref" href="Lucene.Net.Analysis.Standard.StandardAnalyzer.html">StandardAnalyzer</a> will be used for all fields except &quot;firstname&quot;
and &quot;lastname&quot;, for which <a class="xref" href="Lucene.Net.Analysis.Core.KeywordAnalyzer.html">KeywordAnalyzer</a> will be used.
</p>
<p>A <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PerFieldAnalyzerWrapper.html">PerFieldAnalyzerWrapper</a> can be used like any other analyzer, for both indexing
and query parsing.
</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PrefixAndSuffixAwareTokenFilter.html">PrefixAndSuffixAwareTokenFilter</a></h4>
<section><p>Links two <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PrefixAwareTokenFilter.html">PrefixAwareTokenFilter</a>.
<p>
<strong>NOTE:</strong> This filter might not behave correctly if used with custom
<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Util.IAttribute.html">IAttribute</a>s, i.e. <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Util.IAttribute.html">IAttribute</a>s other than
the ones located in Lucene.Net.Analysis.TokenAttributes. </p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.PrefixAwareTokenFilter.html">PrefixAwareTokenFilter</a></h4>
<section><p>Joins two token streams and leaves the last token of the first stream available
to be used when updating the token values in the second stream based on that token.</p>
<p>The default implementation adds last prefix token end offset to the suffix token start and end offsets.
<p>
<strong>NOTE:</strong> This filter might not behave correctly if used with custom
<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Util.IAttribute.html">IAttribute</a>s, i.e. <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Util.IAttribute.html">IAttribute</a>s other than
the ones located in Lucene.Net.Analysis.TokenAttributes.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.RemoveDuplicatesTokenFilter.html">RemoveDuplicatesTokenFilter</a></h4>
<section><p>A <span class="xref">Lucene.Net.Analysis.TokenFilter</span> which filters out <span class="xref">Lucene.Net.Analysis.Token</span>s at the same position and Term text as the previous token in the stream.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.RemoveDuplicatesTokenFilterFactory.html">RemoveDuplicatesTokenFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.RemoveDuplicatesTokenFilter.html">RemoveDuplicatesTokenFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_rmdup&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianFoldingFilter.html">ScandinavianFoldingFilter</a></h4>
<section><p>This filter folds Scandinavian characters åÅäæÄÆ-&gt;a and öÖøØ-&gt;o.
It also discriminate against use of double vowels aa, ae, ao, oe and oo, leaving just the first one.
<p>
It&apos;s is a semantically more destructive solution than <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianNormalizationFilter.html">ScandinavianNormalizationFilter</a> but
can in addition help with matching raksmorgas as räksmörgås.
<p>
blåbærsyltetøj == blåbärsyltetöj == blaabaarsyltetoej == blabarsyltetoj
räksmörgås == ræksmørgås == ræksmörgaos == raeksmoergaas == raksmorgas
<p>
Background:
Swedish åäö are in fact the same letters as Norwegian and Danish åæø and thus interchangeable
when used between these languages. They are however folded differently when people type
them on a keyboard lacking these characters.
<p>
In that situation almost all Swedish people use a, a, o instead of å, ä, ö.
<p>
Norwegians and Danes on the other hand usually type aa, ae and oe instead of å, æ and ø.
Some do however use a, a, o, oo, ao and sometimes permutations of everything above.
<p>
This filter solves that mismatch problem, but might also cause new.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianFoldingFilterFactory.html">ScandinavianFoldingFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianFoldingFilter.html">ScandinavianFoldingFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_scandfold&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.ScandinavianFoldingFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianNormalizationFilter.html">ScandinavianNormalizationFilter</a></h4>
<section><p>This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ
and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
<p>
It&apos;s a semantically less destructive solution than <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianFoldingFilter.html">ScandinavianFoldingFilter</a>,
most useful when a person with a Norwegian or Danish keyboard queries a Swedish index
and vice versa. This filter does <strong>not</strong> the common Swedish folds of å and ä to a nor ö to o.
<p>
blåbærsyltetøj == blåbärsyltetöj == blaabaarsyltetoej but not blabarsyltetoj
räksmörgås == ræksmørgås == ræksmörgaos == raeksmoergaas but not raksmorgas</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianNormalizationFilterFactory.html">ScandinavianNormalizationFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.ScandinavianNormalizationFilter.html">ScandinavianNormalizationFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_scandnorm&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.ScandinavianNormalizationFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.SetKeywordMarkerFilter.html">SetKeywordMarkerFilter</a></h4>
<section><p>Marks terms as keywords via the <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html">KeywordAttribute</a>. Each token
contained in the provided set is marked as a keyword by setting
<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html#Lucene_Net_Analysis_TokenAttributes_KeywordAttribute_IsKeyword">IsKeyword</a> to <code>true</code>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.SingleTokenTokenStream.html">SingleTokenTokenStream</a></h4>
<section><p>A <span class="xref">Lucene.Net.Analysis.TokenStream</span> containing a single token.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter.html">StemmerOverrideFilter</a></h4>
<section><p>Provides the ability to override any <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Analysis.TokenAttributes.KeywordAttribute.html">KeywordAttribute</a> aware stemmer
with custom dictionary-based stemming.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter.Builder.html">StemmerOverrideFilter.Builder</a></h4>
<section><p>This builder builds an <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Util.Fst.FST.html">FST</a> for the <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter.html">StemmerOverrideFilter</a></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter.StemmerOverrideMap.html">StemmerOverrideFilter.StemmerOverrideMap</a></h4>
<section><p>A read-only 4-byte FST backed map that allows fast case-insensitive key
value lookups for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter.html">StemmerOverrideFilter</a></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilterFactory.html">StemmerOverrideFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.StemmerOverrideFilter.html">StemmerOverrideFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_dicstem&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.StemmerOverrideFilterFactory&quot; dictionary=&quot;dictionary.txt&quot; ignoreCase=&quot;false&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.TrimFilter.html">TrimFilter</a></h4>
<section><p>Trims leading and trailing whitespace from Tokens in the stream.
<p>As of Lucene 4.4, this filter does not support updateOffsets=true anymore
as it can lead to broken token streams.
</p></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.TrimFilterFactory.html">TrimFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.TrimFilter.html">TrimFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_trm&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.NGramTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.TrimFilterFactory&quot; />
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.TruncateTokenFilter.html">TruncateTokenFilter</a></h4>
<section><p>A token filter for truncating the terms into a specific length.
Fixed prefix truncation, as a stemming method, produces good results on Turkish language.
It is reported that F5, using first 5 characters, produced best results in
<a href="http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf">
Information Retrieval on Turkish Texts</a></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.TruncateTokenFilterFactory.html">TruncateTokenFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.TruncateTokenFilter.html">TruncateTokenFilter</a>. The following type is recommended for &quot;<em>diacritics-insensitive search</em>&quot; for Turkish.</p>
<pre><code>&lt;fieldType name=&quot;text_tr_ascii_f5&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.StandardTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.ApostropheFilterFactory&quot;/>
&lt;filter class=&quot;solr.TurkishLowerCaseFilterFactory&quot;/>
&lt;filter class=&quot;solr.ASCIIFoldingFilterFactory&quot; preserveOriginal=&quot;true&quot;/>
&lt;filter class=&quot;solr.KeywordRepeatFilterFactory&quot;/>
&lt;filter class=&quot;solr.TruncateTokenFilterFactory&quot; prefixLength=&quot;5&quot;/>
&lt;filter class=&quot;solr.RemoveDuplicatesTokenFilterFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter.html">WordDelimiterFilter</a></h4>
<section><p>Splits words into subwords and performs optional transformations on subword
groups. Words are split into subwords with the following rules:
<ul><li>split on intra-word delimiters (by default, all non alpha-numeric
characters): <code>&quot;Wi-Fi&quot;</code><code>&quot;Wi&quot;, &quot;Fi&quot;</code></li><li>split on case transitions: <code>&quot;PowerShot&quot;</code>
<code>&quot;Power&quot;, &quot;Shot&quot;</code></li><li>split on letter-number transitions: <code>&quot;SD500&quot;</code>
<code>&quot;SD&quot;, &quot;500&quot;</code></li><li>leading and trailing intra-word delimiters on each subword are ignored:
<code>&quot;//hello---there, &apos;dude&apos;&quot;</code>
<code>&quot;hello&quot;, &quot;there&quot;, &quot;dude&quot;</code></li><li>trailing &quot;&apos;s&quot; are removed for each subword: <code>&quot;O&apos;Neil&apos;s&quot;</code>
<code>&quot;O&quot;, &quot;Neil&quot;</code>
<ul>
<item>Note: this step isn&apos;t performed in a separate filter because of possible
subword combinations.</item>
</ul>
</li></ul>
<p>
The <strong>combinations</strong> parameter affects how subwords are combined:
<ul><li>combinations=&quot;0&quot; causes no subword combinations: <pre><code>&quot;PowerShot&quot;</code></pre>
<code>0:&quot;Power&quot;, 1:&quot;Shot&quot;</code> (0 and 1 are the token positions)</li><li>combinations=&quot;1&quot; means that in addition to the subwords, maximum runs of
non-numeric subwords are catenated and produced at the same position of the
last subword in the run:
<ul>
<item><code>&quot;PowerShot&quot;</code>
<code>0:&quot;Power&quot;, 1:&quot;Shot&quot; 1:&quot;PowerShot&quot;</code></item>
<item><code>&quot;A&apos;s+B&apos;s&amp;C&apos;s&quot;</code> -gt; <code>0:&quot;A&quot;, 1:&quot;B&quot;, 2:&quot;C&quot;, 2:&quot;ABC&quot;</code>
</item>
<item><code>&quot;Super-Duper-XL500-42-AutoCoder!&quot;</code>
<code>0:&quot;Super&quot;, 1:&quot;Duper&quot;, 2:&quot;XL&quot;, 2:&quot;SuperDuperXL&quot;, 3:&quot;500&quot; 4:&quot;42&quot;, 5:&quot;Auto&quot;, 6:&quot;Coder&quot;, 6:&quot;AutoCoder&quot;</code>
</item>
</ul>
</li></ul>
<p>
One use for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter.html">WordDelimiterFilter</a> is to help match words with different
subword delimiters. For example, if the source text contained &quot;wi-fi&quot; one may
want &quot;wifi&quot; &quot;WiFi&quot; &quot;wi-fi&quot; &quot;wi+fi&quot; queries to all match. One way of doing so
is to specify combinations=&quot;1&quot; in the analyzer used for indexing, and
combinations=&quot;0&quot; (the default) in the analyzer used for querying. Given that
the current <a class="xref" href="Lucene.Net.Analysis.Standard.StandardTokenizer.html">StandardTokenizer</a> immediately removes many intra-word
delimiters, it is recommended that this filter be used after a tokenizer that
does not do this (such as <a class="xref" href="Lucene.Net.Analysis.Core.WhitespaceTokenizer.html">WhitespaceTokenizer</a>).</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilterFactory.html">WordDelimiterFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter.html">WordDelimiterFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_wd&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.WordDelimiterFilterFactory&quot; protected=&quot;protectedword.txt&quot;
preserveOriginal=&quot;0&quot; splitOnNumerics=&quot;1&quot; splitOnCaseChange=&quot;1&quot;
catenateWords=&quot;0&quot; catenateNumbers=&quot;0&quot; catenateAll=&quot;0&quot;
generateWordParts=&quot;1&quot; generateNumberParts=&quot;1&quot; stemEnglishPossessive=&quot;1&quot;
types=&quot;wdfftypes.txt&quot; />
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterIterator.html">WordDelimiterIterator</a></h4>
<section><p>A BreakIterator-like API for iterating over subwords in text, according to <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter.html">WordDelimiterFilter</a> rules.</p>
<div class="lucene-block lucene-internal">This is a Lucene.NET INTERNAL API, use at your own risk</div></section>
<h3 id="enums">Enums
</h3>
<h4><a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFlags.html">WordDelimiterFlags</a></h4>
<section><p>Configuration options for the <a class="xref" href="Lucene.Net.Analysis.Miscellaneous.WordDelimiterFilter.html">WordDelimiterFilter</a>.
<p>
LUCENENET specific - these options were passed as int constant flags in Lucene.</p>
</section>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00010/src/Lucene.Net.Analysis.Common/Analysis/Miscellaneous/package.md/#L2" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 Licensed to the Apache Software Foundation (ASF)
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
</body>
</html>