blob: 3d9edcca287a2c1ea177533cf4baf75a5316652d [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Namespace Lucene.Net.Analysis
| Apache Lucene.NET 4.8.0 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Namespace Lucene.Net.Analysis
| Apache Lucene.NET 4.8.0 Documentation ">
<meta name="generator" content="docfx 2.47.0.0">
<link rel="shortcut icon" href="../../logo/favicon.ico">
<link rel="stylesheet" href="../../styles/docfx.vendor.css">
<link rel="stylesheet" href="../../styles/docfx.css">
<link rel="stylesheet" href="../../styles/main.css">
<meta property="docfx:navrel" content="../../toc.html">
<meta property="docfx:tocrel" content="../toc.html">
<meta property="docfx:rel" content="../../">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../../index.html">
<img id="logo" class="svg" src="../../logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search" id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis">
<h1 id="Lucene_Net_Analysis" data-uid="Lucene.Net.Analysis" class="text-break">Namespace Lucene.Net.Analysis
</h1>
<div class="markdown level0 summary"><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Support for testing analysis components.</p>
<p> The main classes of interest are: * <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.BaseTokenStreamTestCase.html">BaseTokenStreamTestCase</a>: Highly recommended to use its helper methods, (especially in conjunction with <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.MockAnalyzer.html">MockAnalyzer</a> or <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.MockTokenizer.html">MockTokenizer</a>), as it contains many assertions and checks to catch bugs. * <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.MockTokenizer.html">MockTokenizer</a>: Tokenizer for testing. Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. * <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.MockAnalyzer.html">MockAnalyzer</a>: Analyzer for testing. Analyzer that uses MockTokenizer for additional verification. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead. </p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="markdown level0 remarks"></div>
<h3 id="classes">Classes
</h3>
<h4><a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a></h4>
<section><p>An <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a> builds <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>s, which analyze text. It thus represents a
policy for extracting index terms from text.
<p>
In order to define what analysis is done, subclasses must define their
<a class="xref" href="Lucene.Net.Analysis.TokenStreamComponents.html">TokenStreamComponents</a> in <a class="xref" href="Lucene.Net.Analysis.Analyzer.html#Lucene_Net_Analysis_Analyzer_CreateComponents_System_String_System_IO_TextReader_">CreateComponents(String, TextReader)</a>.
The components are then reused in each call to <a class="xref" href="Lucene.Net.Analysis.Analyzer.html#Lucene_Net_Analysis_Analyzer_GetTokenStream_System_String_System_IO_TextReader_">GetTokenStream(String, TextReader)</a>.
<p>
Simple example:</p>
<pre><code>Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
});</code></pre>
<p>For more examples, see the <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.html">Lucene.Net.Analysis</a> namespace documentation.
<p>
For some concrete implementations bundled with Lucene, look in the analysis modules:
<ul><li>Common:
Analyzers for indexing content in different languages and domains.</li><li>ICU:
Exposes functionality from ICU to Apache Lucene.</li><li>Kuromoji:
Morphological analyzer for Japanese text.</li><li>Morfologik:
Dictionary-driven lemmatization for the Polish language.</li><li>Phonetic:
Analysis for indexing phonetic signatures (for sounds-alike search).</li><li>Smart Chinese:
Analyzer for Simplified Chinese, which indexes words.</li><li>Stempel:
Algorithmic Stemmer for the Polish Language.</li><li>UIMA:
Analysis integration with Apache UIMA.</li></ul></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Analyzer.GlobalReuseStrategy.html">Analyzer.GlobalReuseStrategy</a></h4>
<section><p>Implementation of <a class="xref" href="Lucene.Net.Analysis.ReuseStrategy.html">ReuseStrategy</a> that reuses the same components for
every field. </p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Analyzer.PerFieldReuseStrategy.html">Analyzer.PerFieldReuseStrategy</a></h4>
<section><p>Implementation of <a class="xref" href="Lucene.Net.Analysis.ReuseStrategy.html">ReuseStrategy</a> that reuses components per-field by
maintaining a Map of <a class="xref" href="Lucene.Net.Analysis.TokenStreamComponents.html">TokenStreamComponents</a> per field name.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.AnalyzerWrapper.html">AnalyzerWrapper</a></h4>
<section><p>Extension to <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a> suitable for <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a>s which wrap
other <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a>s.
<p>
<a class="xref" href="Lucene.Net.Analysis.AnalyzerWrapper.html#Lucene_Net_Analysis_AnalyzerWrapper_GetWrappedAnalyzer_System_String_">GetWrappedAnalyzer(String)</a> allows the <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a>
to wrap multiple <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a>s which are selected on a per field basis.
<p>
<a class="xref" href="Lucene.Net.Analysis.AnalyzerWrapper.html#Lucene_Net_Analysis_AnalyzerWrapper_WrapComponents_System_String_Lucene_Net_Analysis_TokenStreamComponents_">WrapComponents(String, TokenStreamComponents)</a> allows the
<a class="xref" href="Lucene.Net.Analysis.TokenStreamComponents.html">TokenStreamComponents</a> of the wrapped <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a> to then be wrapped
(such as adding a new <a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a> to form new <a class="xref" href="Lucene.Net.Analysis.TokenStreamComponents.html">TokenStreamComponents</a>).</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CachingTokenFilter.html">CachingTokenFilter</a></h4>
<section><p>This class can be used if the token attributes of a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>
are intended to be consumed more than once. It caches
all token attribute states locally in a List.</p>
<p><p><a class="xref" href="Lucene.Net.Analysis.CachingTokenFilter.html">CachingTokenFilter</a> implements the optional method
<a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_Reset">Reset()</a>, which repositions the
stream to the first <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilter.html">CharFilter</a></h4>
<section><p>Subclasses of <a class="xref" href="Lucene.Net.Analysis.CharFilter.html">CharFilter</a> can be chained to filter a <span class="xref">System.IO.TextReader</span>
They can be used as <span class="xref">System.IO.TextReader</span> with additional offset
correction. <a class="xref" href="Lucene.Net.Analysis.Tokenizer.html">Tokenizer</a>s will automatically use <a class="xref" href="Lucene.Net.Analysis.CharFilter.html#Lucene_Net_Analysis_CharFilter_CorrectOffset_System_Int32_">CorrectOffset(Int32)</a>
if a <a class="xref" href="Lucene.Net.Analysis.CharFilter.html">CharFilter</a> subclass is used.
<p>
This class is abstract: at a minimum you must implement <span class="xref">System.IO.TextReader.Read(System.Char[], System.Int32, System.Int32)</span>,
transforming the input in some way from <a class="xref" href="Lucene.Net.Analysis.CharFilter.html#Lucene_Net_Analysis_CharFilter_m_input">m_input</a>, and <a class="xref" href="Lucene.Net.Analysis.CharFilter.html#Lucene_Net_Analysis_CharFilter_Correct_System_Int32_">Correct(Int32)</a>
to adjust the offsets to match the originals.
<p>
You can optionally provide more efficient implementations of additional methods
like <span class="xref">System.IO.TextReader.Read()</span>, but this is not required.
<p>
For examples and integration with <a class="xref" href="Lucene.Net.Analysis.Analyzer.html">Analyzer</a>, see the
<a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Analysis.html">Lucene.Net.Analysis</a> namespace documentation.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.NumericTokenStream.html">NumericTokenStream</a></h4>
<section><p><strong>Expert:</strong> this class provides a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>
for indexing numeric values that can be used by <a class="xref" href="Lucene.Net.Search.NumericRangeQuery.html">NumericRangeQuery</a>
or <a class="xref" href="Lucene.Net.Search.NumericRangeFilter.html">NumericRangeFilter</a>.</p>
<p>Note that for simple usage, <a class="xref" href="Lucene.Net.Documents.Int32Field.html">Int32Field</a>, <a class="xref" href="Lucene.Net.Documents.Int64Field.html">Int64Field</a>,
<a class="xref" href="Lucene.Net.Documents.SingleField.html">SingleField</a> or <a class="xref" href="Lucene.Net.Documents.DoubleField.html">DoubleField</a> is
recommended. These fields disable norms and
term freqs, as they are not usually needed during
searching. If you need to change these settings, you
should use this class.
<p>Here&apos;s an example usage, for an <span class="xref">System.Int32</span> field:
<pre><code> IndexableFieldType fieldType = new IndexableFieldType(TextField.TYPE_NOT_STORED)
{
OmitNorms = true,
IndexOptions = IndexOptions.DOCS_ONLY
};
Field field = new Field(name, new NumericTokenStream(precisionStep).SetInt32Value(value), fieldType);
document.Add(field);</code></pre>
<p>For optimal performance, re-use the <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> and <a class="xref" href="Lucene.Net.Documents.Field.html">Field</a> instance
for more than one document:
<pre><code> NumericTokenStream stream = new NumericTokenStream(precisionStep);
IndexableFieldType fieldType = new IndexableFieldType(TextField.TYPE_NOT_STORED)
{
OmitNorms = true,
IndexOptions = IndexOptions.DOCS_ONLY
};
Field field = new Field(name, stream, fieldType);
Document document = new Document();
document.Add(field);
for(all documents)
{
stream.SetInt32Value(value)
writer.AddDocument(document);
}</code></pre>
<p>this stream is not intended to be used in analyzers;
it&apos;s more for iterating the different precisions during
indexing a specific numeric value.</p>
<p><strong>NOTE</strong>: as token streams are only consumed once
the document is added to the index, if you index more
than one numeric field, use a separate <a class="xref" href="Lucene.Net.Analysis.NumericTokenStream.html">NumericTokenStream</a>
instance for each.</p>
<p>See <a class="xref" href="Lucene.Net.Search.NumericRangeQuery.html">NumericRangeQuery</a> for more details on the
<code>precisionStep</code> parameter as well as how numeric fields work under the hood.
</p>
<p>@since 2.9</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.NumericTokenStream.NumericTermAttribute.html">NumericTokenStream.NumericTermAttribute</a></h4>
<section><p>Implementation of <a class="xref" href="Lucene.Net.Analysis.NumericTokenStream.INumericTermAttribute.html">NumericTokenStream.INumericTermAttribute</a>.
@lucene.internal
@since 4.0</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.ReusableStringReader.html">ReusableStringReader</a></h4>
<section><p>Internal class to enable reuse of the string reader by <a class="xref" href="Lucene.Net.Analysis.Analyzer.html#Lucene_Net_Analysis_Analyzer_GetTokenStream_System_String_System_String_">GetTokenStream(String, String)</a></p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.ReuseStrategy.html">ReuseStrategy</a></h4>
<section><p>Strategy defining how <a class="xref" href="Lucene.Net.Analysis.TokenStreamComponents.html">TokenStreamComponents</a> are reused per call to
<a class="xref" href="Lucene.Net.Analysis.Analyzer.html#Lucene_Net_Analysis_Analyzer_GetTokenStream_System_String_System_IO_TextReader_">GetTokenStream(String, TextReader)</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a></h4>
<section><p>A <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> is an occurrence of a term from the text of a field. It consists of
a term&apos;s text, the start and end offset of the term in the text of the field,
and a type string.
<p>
The start and end offsets permit applications to re-associate a token with
its source text, e.g., to display highlighted query terms in a document
browser, or to show matching text fragments in a KWIC (KeyWord In Context)
display, etc.
<p>
The type is a string, assigned by a lexical analyzer
(a.k.a. tokenizer), naming the lexical or syntactic class that the token
belongs to. For example an end of sentence marker token might be implemented
with type &quot;eos&quot;. The default token type is &quot;word&quot;.
<p>
A Token can optionally have metadata (a.k.a. payload) in the form of a variable
length byte array. Use <a class="xref" href="Lucene.Net.Index.DocsAndPositionsEnum.html#Lucene_Net_Index_DocsAndPositionsEnum_GetPayload">GetPayload()</a> to retrieve the
payloads from the index.</p>
<p><p>
<p><strong>NOTE:</strong> As of 2.9, Token implements all <a class="xref" href="Lucene.Net.Util.IAttribute.html">IAttribute</a> interfaces
that are part of core Lucene and can be found in the <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.html">Lucene.Net.Analysis.TokenAttributes</a> namespace.
Even though it is not necessary to use <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> anymore, with the new <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> API it can
be used as convenience class that implements all <a class="xref" href="Lucene.Net.Util.IAttribute.html">IAttribute</a>s, which is especially useful
to easily switch from the old to the new <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> API.
<p><p>
<p><a class="xref" href="Lucene.Net.Analysis.Tokenizer.html">Tokenizer</a>s and <a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a>s should try to re-use a <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a>
instance when possible for best performance, by
implementing the <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a> API.
Failing that, to create a new <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> you should first use
one of the constructors that starts with null text. To load
the token from a char[] use <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_CopyBuffer_System_Char___System_Int32_System_Int32_">CopyBuffer(Char[], Int32, Int32)</a>.
To load from a <span class="xref">System.String</span> use <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_SetEmpty">SetEmpty()</a> followed by
<a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_Append_System_String_">Append(String)</a> or <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_Append_System_String_System_Int32_System_Int32_">Append(String, Int32, Int32)</a>.
Alternatively you can get the <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a>&apos;s termBuffer by calling either <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_Buffer">Buffer</a>,
if you know that your text is shorter than the capacity of the termBuffer
or <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_ResizeBuffer_System_Int32_">ResizeBuffer(Int32)</a>, if there is any possibility
that you may need to grow the buffer. Fill in the characters of your term into this
buffer, with <span class="xref">System.String.ToCharArray(System.Int32,System.Int32)</span> if loading from a string,
or with <span class="xref">System.Array.Copy(System.Array,System.Int32,System.Array,System.Int32,System.Int32)</span>,
and finally call <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_ICharTermAttribute_SetLength_System_Int32_">SetLength(Int32)</a> to
set the length of the term text. See <a target="_top" href="https://issues.apache.org/jira/browse/LUCENE-969">LUCENE-969</a>
for details.</p>
<p>Typical Token reuse patterns:
<ul><li> Copying text from a string (type is reset to <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.TypeAttribute.html#Lucene_Net_Analysis_TokenAttributes_TypeAttribute_DEFAULT_TYPE">DEFAULT_TYPE</a> if not specified):
<pre><code> return reusableToken.Reinit(string, startOffset, endOffset[, type]);</code></pre>
</li><li> Copying some text from a string (type is reset to <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.TypeAttribute.html#Lucene_Net_Analysis_TokenAttributes_TypeAttribute_DEFAULT_TYPE">DEFAULT_TYPE</a> if not specified):
<pre><code> return reusableToken.Reinit(string, 0, string.Length, startOffset, endOffset[, type]);</code></pre>
</li><li> Copying text from char[] buffer (type is reset to <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.TypeAttribute.html#Lucene_Net_Analysis_TokenAttributes_TypeAttribute_DEFAULT_TYPE">DEFAULT_TYPE</a> if not specified):
<pre><code> return reusableToken.Reinit(buffer, 0, buffer.Length, startOffset, endOffset[, type]);</code></pre>
</li><li> Copying some text from a char[] buffer (type is reset to <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.TypeAttribute.html#Lucene_Net_Analysis_TokenAttributes_TypeAttribute_DEFAULT_TYPE">DEFAULT_TYPE</a> if not specified):
<pre><code> return reusableToken.Reinit(buffer, start, end - start, startOffset, endOffset[, type]);</code></pre>
</li><li> Copying from one one <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> to another (type is reset to <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.TypeAttribute.html#Lucene_Net_Analysis_TokenAttributes_TypeAttribute_DEFAULT_TYPE">DEFAULT_TYPE</a> if not specified):
<pre><code> return reusableToken.Reinit(source.Buffer, 0, source.Length, source.StartOffset, source.EndOffset[, source.Type]);</code></pre>
</li></ul>
A few things to note:
<ul><li><a class="xref" href="Lucene.Net.Analysis.Token.html#Lucene_Net_Analysis_Token_Clear">Clear()</a> initializes all of the fields to default values. this was changed in contrast to Lucene 2.4, but should affect no one.</li><li>Because <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>s can be chained, one cannot assume that the <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a>&apos;s current type is correct.</li><li>The startOffset and endOffset represent the start and offset in the source text, so be careful in adjusting them.</li><li>When caching a reusable token, clone it. When injecting a cached token into a stream that can be reset, clone it again.</li></ul>
</p>
<p>
<strong>Please note:</strong> With Lucene 3.1, the <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.CharTermAttribute.html#Lucene_Net_Analysis_TokenAttributes_CharTermAttribute_ToString">ToString()</a> method had to be changed to match the
<a class="xref" href="Lucene.Net.Support.ICharSequence.html">ICharSequence</a> interface introduced by the interface <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ICharTermAttribute.html">ICharTermAttribute</a>.
this method now only prints the term text, no additional information anymore.
</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Token.TokenAttributeFactory.html">Token.TokenAttributeFactory</a></h4>
<section><p><strong>Expert:</strong> Creates a <a class="xref" href="Lucene.Net.Analysis.Token.TokenAttributeFactory.html">Token.TokenAttributeFactory</a> returning <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> as instance for the basic attributes
and for all other attributes calls the given delegate factory.
@since 3.0</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a></h4>
<section><p>A <a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a> is a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> whose input is another <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>.
<p>
This is an abstract class; subclasses must override <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Tokenizer.html">Tokenizer</a></h4>
<section><p>A <a class="xref" href="Lucene.Net.Analysis.Tokenizer.html">Tokenizer</a> is a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> whose input is a <span class="xref">System.IO.TextReader</span>.
<p>
This is an abstract class; subclasses must override <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a>
<p>
NOTE: Subclasses overriding <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a> must
call <a class="xref" href="Lucene.Net.Util.AttributeSource.html#Lucene_Net_Util_AttributeSource_ClearAttributes">ClearAttributes()</a> before
setting attributes.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a></h4>
<section><p>A <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> enumerates the sequence of tokens, either from
<a class="xref" href="Lucene.Net.Documents.Field.html">Field</a>s of a <a class="xref" href="Lucene.Net.Documents.Document.html">Document</a> or from query text.
<p>
this is an abstract class; concrete subclasses are:
<ul><li><a class="xref" href="Lucene.Net.Analysis.Tokenizer.html">Tokenizer</a>, a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> whose input is a <span class="xref">System.IO.TextReader</span>; and</li><li><a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a>, a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> whose input is another
<a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>.</li></ul>
A new <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> API has been introduced with Lucene 2.9. this API
has moved from being <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a>-based to <a class="xref" href="Lucene.Net.Util.IAttribute.html">IAttribute</a>-based. While
<a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> still exists in 2.9 as a convenience class, the preferred way
to store the information of a <a class="xref" href="Lucene.Net.Analysis.Token.html">Token</a> is to use <span class="xref">System.Attribute</span>s.
<p>
<a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> now extends <a class="xref" href="Lucene.Net.Util.AttributeSource.html">AttributeSource</a>, which provides
access to all of the token <a class="xref" href="Lucene.Net.Util.IAttribute.html">IAttribute</a>s for the <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>.
Note that only one instance per <span class="xref">System.Attribute</span> is created and reused
for every token. This approach reduces object creation and allows local
caching of references to the <span class="xref">System.Attribute</span>s. See
<a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a> for further details.
<p>
<strong>The workflow of the new <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> API is as follows:</strong>
<ol><li>Instantiation of <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>/<a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a>s which add/get
attributes to/from the <a class="xref" href="Lucene.Net.Util.AttributeSource.html">AttributeSource</a>.</li><li>The consumer calls <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_Reset">Reset()</a>.</li><li>The consumer retrieves attributes from the stream and stores local
references to all attributes it wants to access.</li><li>The consumer calls <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a> until it returns false
consuming the attributes after each call.</li><li>The consumer calls <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_End">End()</a> so that any end-of-stream operations
can be performed.</li><li>The consumer calls <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_Dispose">Dispose()</a> to release any resource when finished
using the <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>.</li></ol>
To make sure that filters and consumers know which attributes are available,
the attributes must be added during instantiation. Filters and consumers are
not required to check for availability of attributes in
<a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a>.
<p>
You can find some example code for the new API in the analysis
documentation.
<p>
Sometimes it is desirable to capture a current state of a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>,
e.g., for buffering purposes (see <a class="xref" href="Lucene.Net.Analysis.CachingTokenFilter.html">CachingTokenFilter</a>,
TeeSinkTokenFilter). For this usecase
<a class="xref" href="Lucene.Net.Util.AttributeSource.html#Lucene_Net_Util_AttributeSource_CaptureState">CaptureState()</a> and <a class="xref" href="Lucene.Net.Util.AttributeSource.html#Lucene_Net_Util_AttributeSource_RestoreState_Lucene_Net_Util_AttributeSource_State_">RestoreState(AttributeSource.State)</a>
can be used.
<p>The <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a>-API in Lucene is based on the decorator pattern.
Therefore all non-abstract subclasses must be sealed or have at least a sealed
implementation of <a class="xref" href="Lucene.Net.Analysis.TokenStream.html#Lucene_Net_Analysis_TokenStream_IncrementToken">IncrementToken()</a>! This is checked when assertions are enabled.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.TokenStreamComponents.html">TokenStreamComponents</a></h4>
<section><p>This class encapsulates the outer components of a token stream. It provides
access to the source (<a class="xref" href="Lucene.Net.Analysis.Tokenizer.html">Tokenizer</a>) and the outer end (sink), an
instance of <a class="xref" href="Lucene.Net.Analysis.TokenFilter.html">TokenFilter</a> which also serves as the
<a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> returned by
<a class="xref" href="Lucene.Net.Analysis.Analyzer.html#Lucene_Net_Analysis_Analyzer_GetTokenStream_System_String_System_IO_TextReader_">GetTokenStream(String, TextReader)</a>.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.TokenStreamToAutomaton.html">TokenStreamToAutomaton</a></h4>
<section><p>Consumes a <a class="xref" href="Lucene.Net.Analysis.TokenStream.html">TokenStream</a> and creates an <a class="xref" href="Lucene.Net.Util.Automaton.Automaton.html">Automaton</a>
where the transition labels are UTF8 bytes (or Unicode
code points if unicodeArcs is true) from the <a class="xref" href="Lucene.Net.Analysis.TokenAttributes.ITermToBytesRefAttribute.html">ITermToBytesRefAttribute</a>.
Between tokens we insert
<a class="xref" href="Lucene.Net.Analysis.TokenStreamToAutomaton.html#Lucene_Net_Analysis_TokenStreamToAutomaton_POS_SEP">POS_SEP</a> and for holes we insert <a class="xref" href="Lucene.Net.Analysis.TokenStreamToAutomaton.html#Lucene_Net_Analysis_TokenStreamToAutomaton_HOLE">HOLE</a>.</p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
<h3 id="interfaces">Interfaces
</h3>
<h4><a class="xref" href="Lucene.Net.Analysis.NumericTokenStream.INumericTermAttribute.html">NumericTokenStream.INumericTermAttribute</a></h4>
<section><p><strong>Expert:</strong> Use this attribute to get the details of the currently generated token.
@lucene.experimental
@since 4.0</p>
</section>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/docs-4.8.0-beta00007/src/Lucene.Net.TestFramework/Analysis/package.md/#L2" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 Licensed to the Apache Software Foundation (ASF)
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="../../styles/docfx.vendor.js"></script>
<script type="text/javascript" src="../../styles/docfx.js"></script>
<script type="text/javascript" src="../../styles/main.js"></script>
</body>
</html>