blob: 302d9bc41d7746d8e81e098c9e83194830946110 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Namespace Lucene.Net.Analysis.CharFilters
| Apache Lucene.NET 4.8.0-beta00011 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Namespace Lucene.Net.Analysis.CharFilters
| Apache Lucene.NET 4.8.0-beta00011 Documentation ">
<meta name="generator" content="docfx 2.56.0.0">
<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
<meta property="docfx:navrel" content="toc.html">
<meta property="docfx:tocrel" content="analysis-common/toc.html">
<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search">
<ul class="level0 breadcrumb">
<li>
<a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
<span id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</span>
</li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.CharFilters">
<h1 id="Lucene_Net_Analysis_CharFilters" data-uid="Lucene.Net.Analysis.CharFilters" class="text-break">Namespace Lucene.Net.Analysis.CharFilters
</h1>
<div class="markdown level0 summary"><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p> Normalization of text before the tokenizer. </p>
<p> CharFilters are chainable filters that normalize text before tokenization and provide mappings between normalized text offsets and the corresponding offset in the original text. </p>
<h2>CharFilter offset mappings</h2>
<p> CharFilters modify an input stream via a series of substring replacements (including deletions and insertions) to produce an output stream. There are three possible replacement cases: the replacement string has the same length as the original substring; the replacement is shorter; and the replacement is longer. In the latter two cases (when the replacement has a different length than the original), one or more offset correction mappings are required. </p>
<p> When the replacement is shorter than the original (e.g. when the replacement is the empty string), a single offset correction mapping should be added at the replacement&#39;s end offset in the output stream. The <code>cumulativeDiff</code> parameter to the <code>addOffCorrectMapping()</code> method will be the sum of all previous replacement offset adjustments, with the addition of the difference between the lengths of the original substring and the replacement string (a positive value). </p>
<p> When the replacement is longer than the original (e.g. when the original is the empty string), you should add as many offset correction mappings as the difference between the lengths of the replacement string and the original substring, starting at the end offset the original substring would have had in the output stream. The <code>cumulativeDiff</code> parameter to the <code>addOffCorrectMapping()</code> method will be the sum of all previous replacement offset adjustments, with the addition of the difference between the lengths of the original substring and the replacement string so far (a negative value). </p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="markdown level0 remarks"></div>
<h3 id="classes">Classes
</h3>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.BaseCharFilter.html">BaseCharFilter</a></h4>
<section><p>Base utility class for implementing a <span class="xref">Lucene.Net.Analysis.CharFilter</span>.
You subclass this, and then record mappings by calling
<a class="xref" href="Lucene.Net.Analysis.CharFilters.BaseCharFilter.html#Lucene_Net_Analysis_CharFilters_BaseCharFilter_AddOffCorrectMap_System_Int32_System_Int32_">AddOffCorrectMap(Int32, Int32)</a>, and then invoke the correct
method to correct an offset.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.HTMLStripCharFilter.html">HTMLStripCharFilter</a></h4>
<section><p>A <span class="xref">Lucene.Net.Analysis.CharFilter</span> that wraps another <span class="xref">System.IO.TextReader</span> and attempts to strip out HTML constructs.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.HTMLStripCharFilterFactory.html">HTMLStripCharFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.CharFilters.HTMLStripCharFilter.html">HTMLStripCharFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_html&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;charFilter class=&quot;solr.HTMLStripCharFilterFactory&quot; escapedTags=&quot;a, title&quot; />
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.MappingCharFilter.html">MappingCharFilter</a></h4>
<section><p>Simplistic <span class="xref">Lucene.Net.Analysis.CharFilter</span> that applies the mappings
contained in a <a class="xref" href="Lucene.Net.Analysis.CharFilters.NormalizeCharMap.html">NormalizeCharMap</a> to the character
stream, and correcting the resulting changes to the
offsets. Matching is greedy (longest pattern matching at
a given point wins). Replacement is allowed to be the
empty string.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.MappingCharFilterFactory.html">MappingCharFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.CharFilters.MappingCharFilter.html">MappingCharFilter</a>. </p>
<pre><code>&lt;fieldType name=&quot;text_map&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;charFilter class=&quot;solr.MappingCharFilterFactory&quot; mapping=&quot;mapping.txt&quot;/>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
<p>@since Solr 1.4</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.NormalizeCharMap.html">NormalizeCharMap</a></h4>
<section><p>Holds a map of <span class="xref">System.String</span> input to <span class="xref">System.String</span> output, to be used
with <a class="xref" href="Lucene.Net.Analysis.CharFilters.NormalizeCharMap.Builder.html">NormalizeCharMap.Builder</a>. Use the <a class="xref" href="Lucene.Net.Analysis.CharFilters.MappingCharFilter.html">MappingCharFilter</a>
to create this.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.CharFilters.NormalizeCharMap.Builder.html">NormalizeCharMap.Builder</a></h4>
<section><p>Builds an NormalizeCharMap.
<p>
Call add() until you have added all the mappings, then call build() to get a NormalizeCharMap</p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p>
</section>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00011/src/Lucene.Net.Analysis.Common/Analysis/CharFilter/package.md/#L2" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 Licensed to the Apache Software Foundation (ASF)
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
</body>
</html>