| <!DOCTYPE html> |
| <!--[if IE]><![endif]--> |
| <html> |
| |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> |
| <title>Namespace Lucene.Net.Analysis.Synonym |
| | Apache Lucene.NET 4.8.0-beta00013 Documentation </title> |
| <meta name="viewport" content="width=device-width"> |
| <meta name="title" content="Namespace Lucene.Net.Analysis.Synonym |
| | Apache Lucene.NET 4.8.0-beta00013 Documentation "> |
| <meta name="generator" content="docfx 2.56.2.0"> |
| |
| <link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css"> |
| <meta property="docfx:navrel" content="toc.html"> |
| <meta property="docfx:tocrel" content="analysis-common/toc.html"> |
| |
| <meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/"> |
| |
| </head> |
| <body data-spy="scroll" data-target="#affix" data-offset="120"> |
| <span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span> |
| <div id="wrapper"> |
| <header> |
| |
| <nav id="autocollapse" class="navbar ng-scope" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="navbar-brand" href="/"> |
| <img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt=""> |
| </a> |
| </div> |
| <div class="collapse navbar-collapse" id="navbar"> |
| <form class="navbar-form navbar-right" role="search" id="search"> |
| <div class="form-group"> |
| <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off"> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| |
| <div class="subnav navbar navbar-default"> |
| <div class="container hide-when-search"> |
| <ul class="level0 breadcrumb"> |
| <li> |
| <a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a> |
| <span id="breadcrumb"> |
| <ul class="breadcrumb"> |
| <li></li> |
| </ul> |
| </span> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </header> |
| <div class="container body-content"> |
| |
| <div id="search-results"> |
| <div class="search-list"></div> |
| <div class="sr-items"> |
| <p><i class="glyphicon glyphicon-refresh index-loading"></i></p> |
| </div> |
| <ul id="pagination"></ul> |
| </div> |
| </div> |
| <div role="main" class="container body-content hide-when-search"> |
| |
| <div class="sidenav hide-when-search"> |
| <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a> |
| <div class="sidetoggle collapse" id="sidetoggle"> |
| <div id="sidetoc"></div> |
| </div> |
| </div> |
| <div class="article row grid-right"> |
| <div class="col-md-10"> |
| <article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.Synonym"> |
| |
| <h1 id="Lucene_Net_Analysis_Synonym" data-uid="Lucene.Net.Analysis.Synonym" class="text-break">Namespace Lucene.Net.Analysis.Synonym |
| </h1> |
| <div class="markdown level0 summary"><!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>Analysis components for Synonyms.</p> |
| </div> |
| <div class="markdown level0 conceptual"></div> |
| <div class="markdown level0 remarks"></div> |
| <h3 id="classes">Classes |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SolrSynonymParser.html">SolrSynonymParser</a></h4> |
| <section><p>Parser for the Solr synonyms format. |
| <ul><li> Blank lines and lines starting with '#' are comments.</li><li> Explicit mappings match any token sequence on the LHS of "=>" |
| and replace with all alternatives on the RHS. These types of mappings |
| ignore the expand parameter in the constructor. |
| Example:<p> |
| <pre><code>i-pod, i pod => ipod</code></pre> |
| <p></li><li> Equivalent synonyms may be separated with commas and give |
| no explicit mapping. In this case the mapping behavior will |
| be taken from the expand parameter in the constructor. This allows |
| the same synonym file to be used in different synonym handling strategies. |
| Example:<p> |
| <pre><code>ipod, i-pod, i pod</code></pre> |
| <p></li><li> Multiple synonym mapping entries are merged. |
| Example:<p> |
| <pre><code> foo => foo bar |
| foo => baz |
| is equivalent to |
| foo => foo bar, baz</code></pre> |
| <p></li></ul></p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilter.html">SynonymFilter</a></h4> |
| <section><p>Matches single or multi word synonyms in a token stream. |
| This token stream cannot properly handle position |
| increments != 1, ie, you should place this filter before |
| filtering out stop words.</p> |
| <p>Note that with the current implementation, parsing is |
| greedy, so whenever multiple parses would apply, the rule |
| starting the earliest and parsing the most tokens wins. |
| For example if you have these rules: |
| |
| <pre><code> a -> x |
| a b -> y |
| b c d -> z</code></pre> |
| |
| Then input <code>a b c d e</code> parses to <code>y b c |
| d</code>, ie the 2nd rule "wins" because it started |
| earliest and matched the most input tokens of other rules |
| starting at that point.</p> |
| |
| <p>A future improvement to this filter could allow |
| non-greedy parsing, such that the 3rd rule would win, and |
| also separately allow multiple parses, such that all 3 |
| rules would match, perhaps even on a rule by rule |
| basis.</p> |
| |
| <p><strong>NOTE</strong>: when a match occurs, the output tokens |
| associated with the matching rule are "stacked" on top of |
| the input stream (if the rule had |
| <code>keepOrig=true</code>) and also on top of another |
| matched rule's output tokens. This is not a correct |
| solution, as really the output should be an arbitrary |
| graph/lattice. For example, with the above match, you |
| would expect an exact <a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Search.PhraseQuery.html">PhraseQuery</a> <code>"y b |
| c"</code> to match the parsed tokens, but it will fail to |
| do so. This limitation is necessary because Lucene's |
| <span class="xref">Lucene.Net.Analysis.TokenStream</span> (and index) cannot yet represent an arbitrary |
| graph.</p> |
| |
| <p><strong>NOTE</strong>: If multiple incoming tokens arrive on the |
| same position, only the first token at that position is |
| used for parsing. Subsequent tokens simply pass through |
| and are not parsed. A future improvement would be to |
| allow these tokens to also be matched.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilterFactory.html">SynonymFilterFactory</a></h4> |
| <section><p>Factory for <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilter.html">SynonymFilter</a>.</p> |
| <pre><code><fieldType name="text_synonym" class="solr.TextField" positionIncrementGap="100"> |
| <analyzer> |
| <tokenizer class="solr.WhitespaceTokenizerFactory"/> |
| <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" |
| format="solr" ignoreCase="false" expand="true" |
| tokenizerFactory="solr.WhitespaceTokenizerFactory" |
| [optional tokenizer factory parameters]/> |
| </analyzer> |
| </fieldType></code></pre> |
| |
| <p> |
| An optional param name prefix of "tokenizerFactory." may be used for any |
| init params that the <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilterFactory.html">SynonymFilterFactory</a> needs to pass to the specified |
| <a class="xref" href="Lucene.Net.Analysis.Util.TokenizerFactory.html">TokenizerFactory</a>. If the <a class="xref" href="Lucene.Net.Analysis.Util.TokenizerFactory.html">TokenizerFactory</a> expects an init parameters with |
| the same name as an init param used by the <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymFilterFactory.html">SynonymFilterFactory</a>, the prefix |
| is mandatory. |
| </p> |
| <p> |
| The optional <code>format</code> parameter controls how the synonyms will be parsed: |
| It supports the short names of <code>solr</code> for <a class="xref" href="Lucene.Net.Analysis.Synonym.SolrSynonymParser.html">SolrSynonymParser</a> |
| and <code>wordnet</code> for and <a class="xref" href="Lucene.Net.Analysis.Synonym.WordnetSynonymParser.html">WordnetSynonymParser</a>, or your own |
| <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Parser.html">SynonymMap.Parser</a> class name. The default is <code>solr</code>. |
| A custom <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Parser.html">SynonymMap.Parser</a> is expected to have a constructor taking: |
| <ul><li><code><span class="xref">System.Boolean</span> dedup</code> - true if duplicates should be ignored, false otherwise</li><li><code><span class="xref">System.Boolean</span> expand</code> - true if conflation groups should be expanded, false if they are one-directional</li><li><code><span class="xref">Lucene.Net.Analysis.Analyzer</span> analyzer</code> - an analyzer used for each raw synonym</li></ul> |
| </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.html">SynonymMap</a></h4> |
| <section><p>A map of synonyms, keys and values are phrases.</p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Builder.html">SynonymMap.Builder</a></h4> |
| <section><p>Builds an FSTSynonymMap. |
| <p> |
| Call <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Builder.html#Lucene_Net_Analysis_Synonym_SynonymMap_Builder_Add_Lucene_Net_Util_CharsRef_Lucene_Net_Util_CharsRef_System_Boolean_">Add(CharsRef, CharsRef, Boolean)</a> until you have added all the mappings, then call <a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Builder.html#Lucene_Net_Analysis_Synonym_SynonymMap_Builder_Build">Build()</a> to get an FSTSynonymMap</p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.SynonymMap.Parser.html">SynonymMap.Parser</a></h4> |
| <section><p>Abstraction for parsing synonym files.</p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Synonym.WordnetSynonymParser.html">WordnetSynonymParser</a></h4> |
| <section><p>Parser for wordnet prolog format |
| <p> |
| See <a href="http://wordnet.princeton.edu/man/prologdb.5WN.html">http://wordnet.princeton.edu/man/prologdb.5WN.html</a> for a description of the format.</p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| </article> |
| </div> |
| |
| <div class="hidden-sm col-md-2" role="complementary"> |
| <div class="sideaffix"> |
| <div class="contribution"> |
| <ul class="nav"> |
| <li> |
| <a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00013/src/Lucene.Net.Analysis.Common/Analysis/Synonym/package.md/#L2" class="contribution-link">Improve this Doc</a> |
| </li> |
| </ul> |
| </div> |
| <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix"> |
| <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> --> |
| </nav> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <footer> |
| <div class="grad-bottom"></div> |
| <div class="footer"> |
| <div class="container"> |
| <span class="pull-right"> |
| <a href="#top">Back to top</a> |
| </span> |
| Copyright © 2020 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small> |
| |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script> |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script> |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script> |
| </body> |
| </html> |