blob: 3bf14e57a2fbfea3b7d3258c3ae569200437cfde [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Namespace Lucene.Net.Analysis.Synonym
| Apache Lucene.NET 4.8.0-beta00010 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Namespace Lucene.Net.Analysis.Synonym
| Apache Lucene.NET 4.8.0-beta00010 Documentation ">
<meta name="generator" content="docfx 2.56.0.0">
<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
<meta property="docfx:navrel" content="toc.html">
<meta property="docfx:tocrel" content="analysis-common/toc.html">
<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search">
<ul class="level0 breadcrumb">
<li>
<a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
<span id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</span>
</li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.Synonym">
<h1 id="Lucene_Net_Analysis_Synonym" data-uid="Lucene.Net.Analysis.Synonym" class="text-break">Namespace Lucene.Net.Analysis.Synonym
</h1>
<div class="markdown level0 summary"><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Analysis components for Synonyms.</p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="markdown level0 remarks"></div>
<h3 id="classes">Classes
</h3>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SolrSynonymParser.html">SolrSynonymParser</a></h4>
<section><p>Parser for the Solr synonyms format.
<ul><li> Blank lines and lines starting with &apos;#&apos; are comments.</li><li> Explicit mappings match any token sequence on the LHS of &quot;=&gt;&quot;
and replace with all alternatives on the RHS. These types of mappings
ignore the expand parameter in the constructor.
Example:<p>
<pre><code>i-pod, i pod => ipod</code></pre>
<p></li><li> Equivalent synonyms may be separated with commas and give
no explicit mapping. In this case the mapping behavior will
be taken from the expand parameter in the constructor. This allows
the same synonym file to be used in different synonym handling strategies.
Example:<p>
<pre><code>ipod, i-pod, i pod</code></pre>
<p></li><li> Multiple synonym mapping entries are merged.
Example:<p>
<pre><code> foo => foo bar
foo => baz
is equivalent to
foo => foo bar, baz</code></pre>
<p></li></ul></p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilter.html">SynonymFilter</a></h4>
<section><p>Matches single or multi word synonyms in a token stream.
This token stream cannot properly handle position
increments != 1, ie, you should place this filter before
filtering out stop words.</p>
<p>Note that with the current implementation, parsing is
greedy, so whenever multiple parses would apply, the rule
starting the earliest and parsing the most tokens wins.
For example if you have these rules:
<pre><code> a -> x
a b -> y
b c d -> z</code></pre>
Then input <code>a b c d e</code> parses to <code>y b c
d</code>, ie the 2nd rule &quot;wins&quot; because it started
earliest and matched the most input tokens of other rules
starting at that point.</p>
<p>A future improvement to this filter could allow
non-greedy parsing, such that the 3rd rule would win, and
also separately allow multiple parses, such that all 3
rules would match, perhaps even on a rule by rule
basis.</p>
<p><strong>NOTE</strong>: when a match occurs, the output tokens
associated with the matching rule are &quot;stacked&quot; on top of
the input stream (if the rule had
<code>keepOrig=true</code>) and also on top of another
matched rule&apos;s output tokens. This is not a correct
solution, as really the output should be an arbitrary
graph/lattice. For example, with the above match, you
would expect an exact <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Search.PhraseQuery.html">PhraseQuery</a> <code>&quot;y b
c&quot;</code> to match the parsed tokens, but it will fail to
do so. This limitation is necessary because Lucene&apos;s
<span class="xref">Lucene.Net.Analysis.TokenStream</span> (and index) cannot yet represent an arbitrary
graph.</p>
<p><strong>NOTE</strong>: If multiple incoming tokens arrive on the
same position, only the first token at that position is
used for parsing. Subsequent tokens simply pass through
and are not parsed. A future improvement would be to
allow these tokens to also be matched.</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilterFactory.html">SynonymFilterFactory</a></h4>
<section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilter.html">SynonymFilter</a>.</p>
<pre><code>&lt;fieldType name=&quot;text_synonym&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;>
&lt;analyzer>
&lt;tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/>
&lt;filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;synonyms.txt&quot;
format=&quot;solr&quot; ignoreCase=&quot;false&quot; expand=&quot;true&quot;
tokenizerFactory=&quot;solr.WhitespaceTokenizerFactory&quot;
[optional tokenizer factory parameters]/>
&lt;/analyzer>
&lt;/fieldType></code></pre>
<p>
An optional param name prefix of &quot;tokenizerFactory.&quot; may be used for any
init params that the <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilterFactory.html">SynonymFilterFactory</a> needs to pass to the specified
<a class="xref" href="Lucene.Net.Analysis.Util.TokenizerFactory.html">TokenizerFactory</a>. If the <a class="xref" href="Lucene.Net.Analysis.Util.TokenizerFactory.html">TokenizerFactory</a> expects an init parameters with
the same name as an init param used by the <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilterFactory.html">SynonymFilterFactory</a>, the prefix
is mandatory.
</p>
<p>
The optional <code>format</code> parameter controls how the synonyms will be parsed:
It supports the short names of <code>solr</code> for <a class="xref" href="Lucene.Net.Analysis.Synonym.SolrSynonymParser.html">SolrSynonymParser</a>
and <code>wordnet</code> for and <a class="xref" href="Lucene.Net.Analysis.Synonym.WordnetSynonymParser.html">WordnetSynonymParser</a>, or your own
<a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Parser.html">SynonymMap.Parser</a> class name. The default is <code>solr</code>.
A custom <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Parser.html">SynonymMap.Parser</a> is expected to have a constructor taking:
<ul><li><code><span class="xref">System.Boolean</span> dedup</code> - true if duplicates should be ignored, false otherwise</li><li><code><span class="xref">System.Boolean</span> expand</code> - true if conflation groups should be expanded, false if they are one-directional</li><li><code><span class="xref">Lucene.Net.Analysis.Analyzer</span> analyzer</code> - an analyzer used for each raw synonym</li></ul>
</p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.html">SynonymMap</a></h4>
<section><p>A map of synonyms, keys and values are phrases.</p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Builder.html">SynonymMap.Builder</a></h4>
<section><p>Builds an FSTSynonymMap.
<p>
Call <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Builder.html#Lucene_Net_Analysis_Synonym_SynonymMap_Builder_Add_Lucene_Net_Util_CharsRef_Lucene_Net_Util_CharsRef_System_Boolean_">Add(CharsRef, CharsRef, Boolean)</a> until you have added all the mappings, then call <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Builder.html#Lucene_Net_Analysis_Synonym_SynonymMap_Builder_Build">Build()</a> to get an FSTSynonymMap</p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p>
</section>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Parser.html">SynonymMap.Parser</a></h4>
<section><p>Abstraction for parsing synonym files.</p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
<h4><a class="xref" href="Lucene.Net.Analysis.Synonym.WordnetSynonymParser.html">WordnetSynonymParser</a></h4>
<section><p>Parser for wordnet prolog format
<p>
See <a href="http://wordnet.princeton.edu/man/prologdb.5WN.html">http://wordnet.princeton.edu/man/prologdb.5WN.html</a> for a description of the format.</p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p>
</section>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00010/src/Lucene.Net.Analysis.Common/Analysis/Synonym/package.md/#L2" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 Licensed to the Apache Software Foundation (ASF)
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
</body>
</html>