docs/4.8.0-beta00014/api/analysis-common/Lucene.Net.Analysis.NGram.html - lucenenet-site - Git at Google

 <!DOCTYPE html>
 <!--[if IE]><![endif]-->
 <html>

   <head>
     <meta charset="utf-8">
     <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
     <title>Namespace Lucene.Net.Analysis.NGram
    | Apache Lucene.NET 4.8.0-beta00014 Documentation </title>
     <meta name="viewport" content="width=device-width">
     <meta name="title" content="Namespace Lucene.Net.Analysis.NGram
    | Apache Lucene.NET 4.8.0-beta00014 Documentation ">
     <meta name="generator" content="docfx 2.56.2.0">

     <link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
     <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
     <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
     <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
     <meta property="docfx:navrel" content="toc.html">
     <meta property="docfx:tocrel" content="analysis-common/toc.html">

     <meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">

   </head>
   <body data-spy="scroll" data-target="#affix" data-offset="120">
     <span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span>
     <div id="wrapper">
       <header>

         <nav id="autocollapse" class="navbar ng-scope" role="navigation">
           <div class="container">
             <div class="navbar-header">
               <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
                 <span class="sr-only">Toggle navigation</span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
                 <span class="icon-bar"></span>
               </button>

               <a class="navbar-brand" href="/">
                 <img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
               </a>
             </div>
             <div class="collapse navbar-collapse" id="navbar">
               <form class="navbar-form navbar-right" role="search" id="search">
                 <div class="form-group">
                   <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
                 </div>
               </form>
             </div>
           </div>
         </nav>

         <div class="subnav navbar navbar-default">
           <div class="container hide-when-search">
             <ul class="level0 breadcrumb">
                 <li>
                     <a href="https://lucenenet.apache.org/docs/4.8.0-beta00014/">API</a>
                      <span id="breadcrumb">
                         <ul class="breadcrumb">
                           <li></li>
                         </ul>
                     </span>
                 </li>
             </ul>
           </div>
         </div>
       </header>
       <div class="container body-content">

         <div id="search-results">
           <div class="search-list"></div>
           <div class="sr-items">
             <p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
           </div>
           <ul id="pagination"></ul>
         </div>
       </div>
       <div role="main" class="container body-content hide-when-search">

         <div class="sidenav hide-when-search">
           <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
           <div class="sidetoggle collapse" id="sidetoggle">
             <div id="sidetoc"></div>
           </div>
         </div>
         <div class="article row grid-right">
           <div class="col-md-10">
             <article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.NGram">

   <h1 id="Lucene_Net_Analysis_NGram" data-uid="Lucene.Net.Analysis.NGram" class="text-break">Namespace Lucene.Net.Analysis.NGram
   </h1>
   <div class="markdown level0 summary"><!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <p>Character n-gram tokenizers and filters.</p>
 </div>
   <div class="markdown level0 conceptual"></div>
   <div class="markdown level0 remarks"></div>
     <h3 id="classes">Classes
   </h3>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramFilterFactory.html">EdgeNGramFilterFactory</a></h4>
       <section><p>Creates new instances of <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.html">EdgeNGramTokenFilter</a>.</p>
 <pre><code>&lt;fieldType name=&quot;text_edgngrm&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
   &lt;analyzer>
     &lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
     &lt;filter class=&quot;solr.EdgeNGramFilterFactory&quot; minGramSize=&quot;1&quot; maxGramSize=&quot;1&quot;/>
   &lt;/analyzer>
 &lt;/fieldType></code></pre>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.html">EdgeNGramTokenFilter</a></h4>
       <section><p>Tokenizes the given token into n-grams of given size(s).
 <p>
 This <span class="xref">Lucene.Net.Analysis.TokenFilter</span> create n-grams from the beginning edge or ending edge of a input token.
 </p>
 <p>As of Lucene 4.4, this filter does not support
 <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.Side.html#Lucene_Net_Analysis_NGram_EdgeNGramTokenFilter_Side_BACK">BACK</a> (you can use <a class="xref" href="Lucene.Net.Analysis.Reverse.ReverseStringFilter.html">ReverseStringFilter</a> up-front and
 afterward to get the same behavior), handles supplementary characters
 correctly and does not update offsets anymore.
 </p></p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizer.html">EdgeNGramTokenizer</a></h4>
       <section><p>Tokenizes the input from an edge into n-grams of given size(s).
 <p>
 This <span class="xref">Lucene.Net.Analysis.Tokenizer</span> create n-grams from the beginning edge or ending edge of a input token.
 </p>
 <p>As of Lucene 4.4, this tokenizer
 <ul><li>can handle <pre><code>maxGram</code></pre> larger than 1024 chars, but beware that this will result in increased memory usage</li><li>doesn&apos;t trim the input,</li><li>sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones</li><li>doesn&apos;t support backward n-grams anymore.</li><li>supports <a class="xref" href="Lucene.Net.Analysis.Util.CharTokenizer.html#Lucene_Net_Analysis_Util_CharTokenizer_IsTokenChar_System_Int32_">IsTokenChar(Int32)</a> pre-tokenization,</li><li>correctly handles supplementary characters.</li></ul>
 </p>
 <p>Although <strong>highly</strong> discouraged, it is still possible
 to use the old behavior through <a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43EdgeNGramTokenizer.html">Lucene43EdgeNGramTokenizer</a>.
 </p></p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizerFactory.html">EdgeNGramTokenizerFactory</a></h4>
       <section><p>Creates new instances of <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizer.html">EdgeNGramTokenizer</a>.</p>
 <pre><code>&lt;fieldType name=&quot;text_edgngrm&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
   &lt;analyzer>
     &lt;tokenizer class=&quot;solr.EdgeNGramTokenizerFactory&quot; minGramSize=&quot;1&quot; maxGramSize=&quot;1&quot;/>
   &lt;/analyzer>
 &lt;/fieldType></code></pre>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43EdgeNGramTokenizer.html">Lucene43EdgeNGramTokenizer</a></h4>
       <section><p>Old version of <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizer.html">EdgeNGramTokenizer</a> which doesn&apos;t handle correctly
 supplementary characters.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43NGramTokenizer.html">Lucene43NGramTokenizer</a></h4>
       <section><p>Old broken version of <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a>.</p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramFilterFactory.html">NGramFilterFactory</a></h4>
       <section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a>.</p>
 <pre><code>&lt;fieldType name=&quot;text_ngrm&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
   &lt;analyzer>
     &lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
     &lt;filter class=&quot;solr.NGramFilterFactory&quot; minGramSize=&quot;1&quot; maxGramSize=&quot;2&quot;/>
   &lt;/analyzer>
 &lt;/fieldType></code></pre>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a></h4>
       <section><p>Tokenizes the input into n-grams of the given size(s).
 <p>You must specify the required <span class="xref">Lucene.Net.Util.LuceneVersion</span> compatibility when
 creating a <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a>. As of Lucene 4.4, this token filters:
 <ul><li>handles supplementary characters correctly,</li><li>emits all n-grams for the same token at the same position,</li><li>does not modify offsets,</li><li>sorts n-grams by their offset in the original token first, then
         increasing length (meaning that &quot;abc&quot; will give &quot;a&quot;, &quot;ab&quot;, &quot;abc&quot;, &quot;b&quot;, &quot;bc&quot;,
         &quot;c&quot;).</li></ul>
 </p>
 <p>You can make this filter use the old behavior by providing a version &lt;
 <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00014/api/core/Lucene.Net.Util.LuceneVersion.html#Lucene_Net_Util_LuceneVersion_LUCENE_44">LUCENE_44</a> in the constructor but this is not recommended as
 it will lead to broken <span class="xref">Lucene.Net.Analysis.TokenStream</span>s that will cause highlighting
 bugs.
 </p>
 <p>If you were using this <span class="xref">Lucene.Net.Analysis.TokenFilter</span> to perform partial highlighting,
 this won&apos;t work anymore since this filter doesn&apos;t update offsets. You should
 modify your analysis chain to use <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a>, and potentially
 override <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html#Lucene_Net_Analysis_NGram_NGramTokenizer_IsTokenChar_System_Int32_">IsTokenChar(Int32)</a> to perform pre-tokenization.
 </p></p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a></h4>
       <section><p>Tokenizes the input into n-grams of the given size(s).
 <p>On the contrary to <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a>, this class sets offsets so
 that characters between startOffset and endOffset in the original stream are
 the same as the term chars.
 </p>
 <p>For example, &quot;abcde&quot; would be tokenized as (minGram=2, maxGram=3):
 <table><thead><tr><th>TermPosition incrementPosition lengthOffsets</th><th></th></tr></thead><tbody><tr><td>ab11[0,2[</td><td></td></tr><tr><td>abc11[0,3[</td><td></td></tr><tr><td>bc11[1,3[</td><td></td></tr><tr><td>bcd11[1,4[</td><td></td></tr><tr><td>cd11[2,4[</td><td></td></tr><tr><td>cde11[2,5[</td><td></td></tr><tr><td>de11[3,5[</td><td></td></tr></tbody></table>
 </p>
 <p>This tokenizer changed a lot in Lucene 4.4 in order to:
 <ul><li>tokenize in a streaming fashion to support streams which are larger
         than 1024 chars (limit of the previous version),</li><li>count grams based on unicode code points instead of java chars (and
         never split in the middle of surrogate pairs),</li><li>give the ability to pre-tokenize the stream (<a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html#Lucene_Net_Analysis_NGram_NGramTokenizer_IsTokenChar_System_Int32_">IsTokenChar(Int32)</a>)
         before computing n-grams.</li></ul>
 </p>
 <p>Additionally, this class doesn&apos;t trim trailing whitespaces and emits
 tokens in a different order, tokens are now emitted by increasing start
 offsets while they used to be emitted by increasing lengths (which prevented
 from supporting large input streams).
 </p>
 <p>Although <strong>highly</strong> discouraged, it is still possible
 to use the old behavior through <a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43NGramTokenizer.html">Lucene43NGramTokenizer</a>.
 </p></p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizerFactory.html">NGramTokenizerFactory</a></h4>
       <section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a>.</p>
 <pre><code>&lt;fieldType name=&quot;text_ngrm&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
   &lt;analyzer>
     &lt;tokenizer class=&quot;solr.NGramTokenizerFactory&quot; minGramSize=&quot;1&quot; maxGramSize=&quot;2&quot;/>
   &lt;/analyzer>
 &lt;/fieldType></code></pre>
 </section>
     <h3 id="enums">Enums
   </h3>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.Side.html">EdgeNGramTokenFilter.Side</a></h4>
       <section><p>Specifies which side of the input the n-gram should be generated from </p>
 </section>
       <h4><a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43EdgeNGramTokenizer.Side.html">Lucene43EdgeNGramTokenizer.Side</a></h4>
       <section><p>Specifies which side of the input the n-gram should be generated from </p>
 </section>
 </article>
           </div>

           <div class="hidden-sm col-md-2" role="complementary">
             <div class="sideaffix">
               <div class="contribution">
                 <ul class="nav">
                   <li>
                     <a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00014/src/Lucene.Net.Analysis.Common/Analysis/NGram/package.md/#L2" class="contribution-link">Improve this Doc</a>
                   </li>
                 </ul>
               </div>
               <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
               <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
               </nav>
             </div>
           </div>
         </div>
       </div>

       <footer>
         <div class="grad-bottom"></div>
         <div class="footer">
           <div class="container">
             <span class="pull-right">
               <a href="#top">Back to top</a>
             </span>
             Copyright &copy; 2021 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small>

           </div>
         </div>
       </footer>
     </div>

     <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
     <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
     <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
   </body>
 </html>
	<!DOCTYPE html>
	<!--[if IE]><![endif]-->
	<html>

	<head>
	<meta charset="utf-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
	<title>Namespace Lucene.Net.Analysis.NGram
	\| Apache Lucene.NET 4.8.0-beta00014 Documentation </title>
	<meta name="viewport" content="width=device-width">
	<meta name="title" content="Namespace Lucene.Net.Analysis.NGram
	\| Apache Lucene.NET 4.8.0-beta00014 Documentation ">
	<meta name="generator" content="docfx 2.56.2.0">

	<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
	<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
	<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
	<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
	<meta property="docfx:navrel" content="toc.html">
	<meta property="docfx:tocrel" content="analysis-common/toc.html">

	<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">

	</head>
	<body data-spy="scroll" data-target="#affix" data-offset="120">
	<span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span>
	<div id="wrapper">
	<header>

	<nav id="autocollapse" class="navbar ng-scope" role="navigation">
	<div class="container">
	<div class="navbar-header">
	<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
	<span class="sr-only">Toggle navigation</span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	</button>

	<a class="navbar-brand" href="/">
	<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
	</a>
	</div>
	<div class="collapse navbar-collapse" id="navbar">
	<form class="navbar-form navbar-right" role="search" id="search">
	<div class="form-group">
	<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
	</div>
	</form>
	</div>
	</div>
	</nav>

	<div class="subnav navbar navbar-default">
	<div class="container hide-when-search">
	<ul class="level0 breadcrumb">
	<li>
	<a href="https://lucenenet.apache.org/docs/4.8.0-beta00014/">API</a>
	<span id="breadcrumb">
	<ul class="breadcrumb">
	<li></li>
	</ul>
	</span>
	</li>
	</ul>
	</div>
	</div>
	</header>
	<div class="container body-content">

	<div id="search-results">
	<div class="search-list"></div>
	<div class="sr-items">
	<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
	</div>
	<ul id="pagination"></ul>
	</div>
	</div>
	<div role="main" class="container body-content hide-when-search">

	<div class="sidenav hide-when-search">
	<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
	<div class="sidetoggle collapse" id="sidetoggle">
	<div id="sidetoc"></div>
	</div>
	</div>
	<div class="article row grid-right">
	<div class="col-md-10">
	<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.NGram">

	<h1 id="Lucene_Net_Analysis_NGram" data-uid="Lucene.Net.Analysis.NGram" class="text-break">Namespace Lucene.Net.Analysis.NGram
	</h1>
	<div class="markdown level0 summary"><!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<p>Character n-gram tokenizers and filters.</p>
	</div>
	<div class="markdown level0 conceptual"></div>
	<div class="markdown level0 remarks"></div>
	<h3 id="classes">Classes
	</h3>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramFilterFactory.html">EdgeNGramFilterFactory</a></h4>
	<section><p>Creates new instances of <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.html">EdgeNGramTokenFilter</a>.</p>
	<pre><code><fieldType name="text_edgngrm" class="solr.TextField" positionIncrementGap="100">
	<analyzer>
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="1"/>
	</analyzer>
	</fieldType></code></pre>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.html">EdgeNGramTokenFilter</a></h4>
	<section><p>Tokenizes the given token into n-grams of given size(s).
	<p>
	This <span class="xref">Lucene.Net.Analysis.TokenFilter</span> create n-grams from the beginning edge or ending edge of a input token.
	</p>
	<p>As of Lucene 4.4, this filter does not support
	<a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.Side.html#Lucene_Net_Analysis_NGram_EdgeNGramTokenFilter_Side_BACK">BACK</a> (you can use <a class="xref" href="Lucene.Net.Analysis.Reverse.ReverseStringFilter.html">ReverseStringFilter</a> up-front and
	afterward to get the same behavior), handles supplementary characters
	correctly and does not update offsets anymore.
	</p></p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizer.html">EdgeNGramTokenizer</a></h4>
	<section><p>Tokenizes the input from an edge into n-grams of given size(s).
	<p>
	This <span class="xref">Lucene.Net.Analysis.Tokenizer</span> create n-grams from the beginning edge or ending edge of a input token.
	</p>
	<p>As of Lucene 4.4, this tokenizer
	<ul><li>can handle <pre><code>maxGram</code></pre> larger than 1024 chars, but beware that this will result in increased memory usage</li><li>doesn't trim the input,</li><li>sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones</li><li>doesn't support backward n-grams anymore.</li><li>supports <a class="xref" href="Lucene.Net.Analysis.Util.CharTokenizer.html#Lucene_Net_Analysis_Util_CharTokenizer_IsTokenChar_System_Int32_">IsTokenChar(Int32)</a> pre-tokenization,</li><li>correctly handles supplementary characters.</li></ul>
	</p>
	<p>Although <strong>highly</strong> discouraged, it is still possible
	to use the old behavior through <a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43EdgeNGramTokenizer.html">Lucene43EdgeNGramTokenizer</a>.
	</p></p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizerFactory.html">EdgeNGramTokenizerFactory</a></h4>
	<section><p>Creates new instances of <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizer.html">EdgeNGramTokenizer</a>.</p>
	<pre><code><fieldType name="text_edgngrm" class="solr.TextField" positionIncrementGap="100">
	<analyzer>
	<tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="1" maxGramSize="1"/>
	</analyzer>
	</fieldType></code></pre>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43EdgeNGramTokenizer.html">Lucene43EdgeNGramTokenizer</a></h4>
	<section><p>Old version of <a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenizer.html">EdgeNGramTokenizer</a> which doesn't handle correctly
	supplementary characters.</p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43NGramTokenizer.html">Lucene43NGramTokenizer</a></h4>
	<section><p>Old broken version of <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a>.</p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramFilterFactory.html">NGramFilterFactory</a></h4>
	<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a>.</p>
	<pre><code><fieldType name="text_ngrm" class="solr.TextField" positionIncrementGap="100">
	<analyzer>
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="2"/>
	</analyzer>
	</fieldType></code></pre>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a></h4>
	<section><p>Tokenizes the input into n-grams of the given size(s).
	<p>You must specify the required <span class="xref">Lucene.Net.Util.LuceneVersion</span> compatibility when
	creating a <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a>. As of Lucene 4.4, this token filters:
	<ul><li>handles supplementary characters correctly,</li><li>emits all n-grams for the same token at the same position,</li><li>does not modify offsets,</li><li>sorts n-grams by their offset in the original token first, then
	increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc",
	"c").</li></ul>
	</p>
	<p>You can make this filter use the old behavior by providing a version <
	<a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00014/api/core/Lucene.Net.Util.LuceneVersion.html#Lucene_Net_Util_LuceneVersion_LUCENE_44">LUCENE_44</a> in the constructor but this is not recommended as
	it will lead to broken <span class="xref">Lucene.Net.Analysis.TokenStream</span>s that will cause highlighting
	bugs.
	</p>
	<p>If you were using this <span class="xref">Lucene.Net.Analysis.TokenFilter</span> to perform partial highlighting,
	this won't work anymore since this filter doesn't update offsets. You should
	modify your analysis chain to use <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a>, and potentially
	override <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html#Lucene_Net_Analysis_NGram_NGramTokenizer_IsTokenChar_System_Int32_">IsTokenChar(Int32)</a> to perform pre-tokenization.
	</p></p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a></h4>
	<section><p>Tokenizes the input into n-grams of the given size(s).
	<p>On the contrary to <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenFilter.html">NGramTokenFilter</a>, this class sets offsets so
	that characters between startOffset and endOffset in the original stream are
	the same as the term chars.
	</p>
	<p>For example, "abcde" would be tokenized as (minGram=2, maxGram=3):
	<table><thead><tr><th>TermPosition incrementPosition lengthOffsets</th><th></th></tr></thead><tbody><tr><td>ab11[0,2[</td><td></td></tr><tr><td>abc11[0,3[</td><td></td></tr><tr><td>bc11[1,3[</td><td></td></tr><tr><td>bcd11[1,4[</td><td></td></tr><tr><td>cd11[2,4[</td><td></td></tr><tr><td>cde11[2,5[</td><td></td></tr><tr><td>de11[3,5[</td><td></td></tr></tbody></table>
	</p>
	<p>This tokenizer changed a lot in Lucene 4.4 in order to:
	<ul><li>tokenize in a streaming fashion to support streams which are larger
	than 1024 chars (limit of the previous version),</li><li>count grams based on unicode code points instead of java chars (and
	never split in the middle of surrogate pairs),</li><li>give the ability to pre-tokenize the stream (<a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html#Lucene_Net_Analysis_NGram_NGramTokenizer_IsTokenChar_System_Int32_">IsTokenChar(Int32)</a>)
	before computing n-grams.</li></ul>
	</p>
	<p>Additionally, this class doesn't trim trailing whitespaces and emits
	tokens in a different order, tokens are now emitted by increasing start
	offsets while they used to be emitted by increasing lengths (which prevented
	from supporting large input streams).
	</p>
	<p>Although <strong>highly</strong> discouraged, it is still possible
	to use the old behavior through <a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43NGramTokenizer.html">Lucene43NGramTokenizer</a>.
	</p></p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizerFactory.html">NGramTokenizerFactory</a></h4>
	<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.NGram.NGramTokenizer.html">NGramTokenizer</a>.</p>
	<pre><code><fieldType name="text_ngrm" class="solr.TextField" positionIncrementGap="100">
	<analyzer>
	<tokenizer class="solr.NGramTokenizerFactory" minGramSize="1" maxGramSize="2"/>
	</analyzer>
	</fieldType></code></pre>
	</section>
	<h3 id="enums">Enums
	</h3>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.Side.html">EdgeNGramTokenFilter.Side</a></h4>
	<section><p>Specifies which side of the input the n-gram should be generated from </p>
	</section>
	<h4><a class="xref" href="Lucene.Net.Analysis.NGram.Lucene43EdgeNGramTokenizer.Side.html">Lucene43EdgeNGramTokenizer.Side</a></h4>
	<section><p>Specifies which side of the input the n-gram should be generated from </p>
	</section>
	</article>
	</div>

	<div class="hidden-sm col-md-2" role="complementary">
	<div class="sideaffix">
	<div class="contribution">
	<ul class="nav">
	<li>
	<a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00014/src/Lucene.Net.Analysis.Common/Analysis/NGram/package.md/#L2" class="contribution-link">Improve this Doc</a>
	</li>
	</ul>
	</div>
	<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
	<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
	</nav>
	</div>
	</div>
	</div>
	</div>

	<footer>
	<div class="grad-bottom"></div>
	<div class="footer">
	<div class="container">
	<span class="pull-right">
	<a href="#top">Back to top</a>
	</span>
	Copyright © 2021 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small>

	</div>
	</div>
	</footer>
	</div>

	<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
	<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
	<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
	</body>
	</html>