blob: 001d0e39594ba5090445f1ecc954d61cd00bff20 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Class BeiderMorseEncoder
| Apache Lucene.NET 4.8.0-beta00013 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Class BeiderMorseEncoder
| Apache Lucene.NET 4.8.0-beta00013 Documentation ">
<meta name="generator" content="docfx 2.56.2.0">
<link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css">
<link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css">
<meta property="docfx:navrel" content="toc.html">
<meta property="docfx:tocrel" content="analysis-phonetic/toc.html">
<meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span>
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search">
<ul class="level0 breadcrumb">
<li>
<a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a>
<span id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</span>
</li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder">
<h1 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder" class="text-break">Class BeiderMorseEncoder
</h1>
<div class="markdown level0 summary"><p>Encodes strings into their Beider-Morse phonetic encoding.</p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="inheritance">
<h5>Inheritance</h5>
<div class="level0"><span class="xref">System.Object</span></div>
<div class="level1"><span class="xref">BeiderMorseEncoder</span></div>
</div>
<div classs="implements">
<h5>Implements</h5>
<div><a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.IStringEncoder.html">IStringEncoder</a></div>
</div>
<div class="inheritedMembers">
<h5>Inherited Members</h5>
<div>
<span class="xref">System.Object.Equals(System.Object)</span>
</div>
<div>
<span class="xref">System.Object.Equals(System.Object, System.Object)</span>
</div>
<div>
<span class="xref">System.Object.GetHashCode()</span>
</div>
<div>
<span class="xref">System.Object.GetType()</span>
</div>
<div>
<span class="xref">System.Object.MemberwiseClone()</span>
</div>
<div>
<span class="xref">System.Object.ReferenceEquals(System.Object, System.Object)</span>
</div>
<div>
<span class="xref">System.Object.ToString()</span>
</div>
</div>
<h6><strong>Namespace</strong>: <a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.html">Lucene.Net.Analysis.Phonetic.Language.Bm</a></h6>
<h6><strong>Assembly</strong>: Lucene.Net.Analysis.Phonetic.dll</h6>
<h5 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_syntax">Syntax</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public class BeiderMorseEncoder : IStringEncoder</code></pre>
</div>
<h5 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_remarks"><strong>Remarks</strong></h5>
<div class="markdown level0 remarks"><p>Beider-Morse phonetic encodings are optimised for family names. However, they may be useful for a wide range
of words.
<p>
This encoder is intentionally mutable to allow dynamic configuration through bean properties. As such, it
is mutable, and may not be thread-safe. If you require a guaranteed thread-safe encoding then use
<a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.PhoneticEngine.html">PhoneticEngine</a> directly.
<p>
<strong>Encoding overview</strong>
<p>
Beider-Morse phonetic encodings is a multi-step process. Firstly, a table of rules is consulted to guess what
language the word comes from. For example, if it ends in &quot;<code>ault</code>&quot; then it infers that the word is French.
Next, the word is translated into a phonetic representation using a language-specific phonetics table. Some
runs of letters can be pronounced in multiple ways, and a single run of letters may be potentially broken up
into phonemes at different places, so this stage results in a set of possible language-specific phonetic
representations. Lastly, this language-specific phonetic representation is processed by a table of rules that
re-writes it phonetically taking into account systematic pronunciation differences between languages, to move
it towards a pan-indo-european phonetic representation. Again, sometimes there are multiple ways this could be
done and sometimes things that can be pronounced in several ways in the source language have only one way to
represent them in this average phonetic language, so the result is again a set of phonetic spellings.
<p>
Some names are treated as having multiple parts. This can be due to two things. Firstly, they may be hyphenated.
In this case, each individual hyphenated word is encoded, and then these are combined end-to-end for the final
encoding. Secondly, some names have standard prefixes, for example, &quot;<code>Mac/Mc</code>&quot; in Scottish (English)
names. As sometimes it is ambiguous whether the prefix is intended or is an accident of the spelling, the word
is encoded once with the prefix and once without it. The resulting encoding contains one and then the other result.
<p>
<strong>Encoding format</strong>
<p>
Individual phonetic spellings of an input word are represented in upper- and lower-case roman characters. Where
there are multiple possible phonetic representations, these are joined with a pipe (<code>|</code>) character.
If multiple hyphenated words where found, or if the word may contain a name prefix, each encoded word is placed
in elipses and these blocks are then joined with hyphens. For example, &quot;<code>d&apos;ortley</code>&quot; has a possible
prefix. The form without prefix encodes to <code>ortlaj|ortlej</code>, while the form with prefix encodes to
<code>dortlaj|dortlej</code>. Thus, the full, combined encoding is <code>(ortlaj|ortlej)-(dortlaj|dortlej)</code>.
<p>
The encoded forms are often quite a bit longer than the input strings. This is because a single input may have many
potential phonetic interpretations. For example, <code>Renault</code> encodes to
<code>rYnDlt|rYnalt|rYnult|rinDlt|rinalt|rinult</code>. The <a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.RuleType.html#Lucene_Net_Analysis_Phonetic_Language_Bm_RuleType_APPROX">APPROX</a> rules will tend to produce larger
encodings as they consider a wider range of possible, approximate phonetic interpretations of the original word.
Down-stream applications may wish to further process the encoding for indexing or lookup purposes, for example, by
splitting on pipe (<code>|</code>) and indexing under each of these alternatives.
<p>
since 1.6</p>
</div>
<h3 id="properties">Properties
</h3>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_IsConcat.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.IsConcat%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Phonetic/Language/Bm/BeiderMorseEncoder.cs/#L137">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_IsConcat_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.IsConcat*"></a>
<h4 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_IsConcat" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.IsConcat">IsConcat</h4>
<div class="markdown level1 summary"><p>Gets or Sets how multiple possible phonetic encodings are combined.
<code>true</code> if multiple encodings are to be combined with a &apos;|&apos;, <code>false</code> if just the first one is
to be considered.</p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public virtual bool IsConcat { get; set; }</code></pre>
</div>
<h5 class="propertyValue">Property Value</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="xref">System.Boolean</span></td>
<td></td>
</tr>
</tbody>
</table>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_NameType.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.NameType%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Phonetic/Language/Bm/BeiderMorseEncoder.cs/#L104">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_NameType_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.NameType*"></a>
<h4 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_NameType" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.NameType">NameType</h4>
<div class="markdown level1 summary"><p>Gets or Sets the name type currently in operation. Use <a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.NameType.html#Lucene_Net_Analysis_Phonetic_Language_Bm_NameType_GENERIC">GENERIC</a> unless you specifically want phonetic encodings
optimized for Ashkenazi or Sephardic Jewish family names.</p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public virtual NameType NameType { get; set; }</code></pre>
</div>
<h5 class="propertyValue">Property Value</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.NameType.html">NameType</a></td>
<td></td>
</tr>
</tbody>
</table>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_RuleType.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.RuleType%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Phonetic/Language/Bm/BeiderMorseEncoder.cs/#L120">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_RuleType_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.RuleType*"></a>
<h4 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_RuleType" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.RuleType">RuleType</h4>
<div class="markdown level1 summary"><p>Gets or Sets the rule type to apply. This will widen or narrow the range of phonetic encodings considered.
<a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.RuleType.html#Lucene_Net_Analysis_Phonetic_Language_Bm_RuleType_APPROX">APPROX</a> or <a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.RuleType.html#Lucene_Net_Analysis_Phonetic_Language_Bm_RuleType_EXACT">EXACT</a> for approximate or exact phonetic matches.</p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public virtual RuleType RuleType { get; set; }</code></pre>
</div>
<h5 class="propertyValue">Property Value</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.Bm.RuleType.html">RuleType</a></td>
<td></td>
</tr>
</tbody>
</table>
<h3 id="methods">Methods
</h3>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_Encode_System_String_.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.Encode(System.String)%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Phonetic/Language/Bm/BeiderMorseEncoder.cs/#L87">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_Encode_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.Encode*"></a>
<h4 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_Encode_System_String_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.Encode(System.String)">Encode(String)</h4>
<div class="markdown level1 summary"></div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public virtual string Encode(string source)</code></pre>
</div>
<h5 class="parameters">Parameters</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="xref">System.String</span></td>
<td><span class="parametername">source</span></td>
<td></td>
</tr>
</tbody>
</table>
<h5 class="returns">Returns</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="xref">System.String</span></td>
<td></td>
</tr>
</tbody>
</table>
<span class="small pull-right mobile-hide">
<span class="divider">|</span>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_SetMaxPhonemes_System_Int32_.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.SetMaxPhonemes(System.Int32)%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A">Improve this Doc</a>
</span>
<span class="small pull-right mobile-hide">
<a href="https://github.com/NightOwl888/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Phonetic/Language/Bm/BeiderMorseEncoder.cs/#L155">View Source</a>
</span>
<a id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_SetMaxPhonemes_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.SetMaxPhonemes*"></a>
<h4 id="Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder_SetMaxPhonemes_System_Int32_" data-uid="Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder.SetMaxPhonemes(System.Int32)">SetMaxPhonemes(Int32)</h4>
<div class="markdown level1 summary"><p>Sets the number of maximum of phonemes that shall be considered by the engine.
<p>
since 1.7</p>
</div>
<div class="markdown level1 conceptual"></div>
<h5 class="decalaration">Declaration</h5>
<div class="codewrapper">
<pre><code class="lang-csharp hljs">public virtual void SetMaxPhonemes(int maxPhonemes)</code></pre>
</div>
<h5 class="parameters">Parameters</h5>
<table class="table table-bordered table-striped table-condensed">
<thead>
<tr>
<th>Type</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="xref">System.Int32</span></td>
<td><span class="parametername">maxPhonemes</span></td>
<td><p>the maximum number of phonemes returned by the engine</p>
</td>
</tr>
</tbody>
</table>
<h3 id="implements">Implements</h3>
<div>
<a class="xref" href="Lucene.Net.Analysis.Phonetic.Language.IStringEncoder.html">IStringEncoder</a>
</div>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/new/docs/4.8.0-beta00013/websites/apidocs/apiSpec/new?filename=Lucene_Net_Analysis_Phonetic_Language_Bm_BeiderMorseEncoder.md&amp;value=---%0Auid%3A%20Lucene.Net.Analysis.Phonetic.Language.Bm.BeiderMorseEncoder%0Asummary%3A%20'*You%20can%20override%20summary%20for%20the%20API%20here%20using%20*MARKDOWN*%20syntax'%0A---%0A%0A*Please%20type%20below%20more%20information%20about%20this%20API%3A*%0A%0A" class="contribution-link">Improve this Doc</a>
</li>
<li>
<a href="https://github.com/apache/lucenenet/blob/fix/apidocs-layout/src/Lucene.Net.Analysis.Phonetic/Language/Bm/BeiderMorseEncoder.cs/#L69" class="contribution-link">View Source</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small>
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script>
<script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script>
</body>
</html>