| <!DOCTYPE html> |
| <!--[if IE]><![endif]--> |
| <html> |
| |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> |
| <title>Namespace Lucene.Net.Analysis.Compound.Hyphenation |
| | Apache Lucene.NET 4.8.0 Documentation </title> |
| <meta name="viewport" content="width=device-width"> |
| <meta name="title" content="Namespace Lucene.Net.Analysis.Compound.Hyphenation |
| | Apache Lucene.NET 4.8.0 Documentation "> |
| <meta name="generator" content="docfx 2.47.0.0"> |
| |
| <link rel="shortcut icon" href="../../logo/favicon.ico"> |
| <link rel="stylesheet" href="../../styles/docfx.vendor.css"> |
| <link rel="stylesheet" href="../../styles/docfx.css"> |
| <link rel="stylesheet" href="../../styles/main.css"> |
| <meta property="docfx:navrel" content="../../toc.html"> |
| <meta property="docfx:tocrel" content="../toc.html"> |
| |
| <meta property="docfx:rel" content="../../"> |
| |
| </head> |
| <body data-spy="scroll" data-target="#affix" data-offset="120"> |
| <div id="wrapper"> |
| <header> |
| |
| <nav id="autocollapse" class="navbar ng-scope" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="navbar-brand" href="../../index.html"> |
| <img id="logo" class="svg" src="../../logo/lucene-net-color.png" alt=""> |
| </a> |
| </div> |
| <div class="collapse navbar-collapse" id="navbar"> |
| <form class="navbar-form navbar-right" role="search" id="search"> |
| <div class="form-group"> |
| <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off"> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| |
| <div class="subnav navbar navbar-default"> |
| <div class="container hide-when-search" id="breadcrumb"> |
| <ul class="breadcrumb"> |
| <li></li> |
| </ul> |
| </div> |
| </div> |
| </header> |
| <div class="container body-content"> |
| |
| <div id="search-results"> |
| <div class="search-list"></div> |
| <div class="sr-items"> |
| <p><i class="glyphicon glyphicon-refresh index-loading"></i></p> |
| </div> |
| <ul id="pagination"></ul> |
| </div> |
| </div> |
| <div role="main" class="container body-content hide-when-search"> |
| |
| <div class="sidenav hide-when-search"> |
| <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a> |
| <div class="sidetoggle collapse" id="sidetoggle"> |
| <div id="sidetoc"></div> |
| </div> |
| </div> |
| <div class="article row grid-right"> |
| <div class="col-md-10"> |
| <article class="content wrap" id="_content" data-uid="Lucene.Net.Analysis.Compound.Hyphenation"> |
| |
| <h1 id="Lucene_Net_Analysis_Compound_Hyphenation" data-uid="Lucene.Net.Analysis.Compound.Hyphenation" class="text-break">Namespace Lucene.Net.Analysis.Compound.Hyphenation |
| </h1> |
| <div class="markdown level0 summary"><!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p> The code for the compound word hyphenation is taken from the <a href="http://xmlgraphics.apache.org/fop/">Apache FOP project</a>. All credits for the hyphenation code belongs to them. </p> |
| </div> |
| <div class="markdown level0 conceptual"></div> |
| <div class="markdown level0 remarks"></div> |
| <h3 id="classes">Classes |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.ByteVector.html">ByteVector</a></h4> |
| <section><p>This class implements a simple byte vector with access to the underlying |
| array. |
| This class has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified. </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.CharVector.html">CharVector</a></h4> |
| <section><p>This class implements a simple char vector with access to the underlying |
| array.</p> |
| <p>This class has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified. </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.Hyphen.html">Hyphen</a></h4> |
| <section><p>This class represents a hyphen. A 'full' hyphen is made of 3 parts: the |
| pre-break text, post-break text and no-break. If no line-break is generated |
| at this position, the no-break text is used, otherwise, pre-break and |
| post-break are used. Typically, pre-break is equal to the hyphen character |
| and the others are empty. However, this general scheme allows support for |
| cases in some languages where words change spelling if they're split across |
| lines, like german's 'backen' which hyphenates 'bak-ken'. BTW, this comes |
| from TeX. |
| <p> |
| This class has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified. </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.Hyphenation.html">Hyphenation</a></h4> |
| <section><p>This class represents a hyphenated word. |
| <p> |
| This class has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.HyphenationTree.html">HyphenationTree</a></h4> |
| <section><p>This tree structure stores the hyphenation patterns in an efficient way for |
| fast lookup. It provides the provides the method to hyphenate a word. |
| <p> |
| This class has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified. </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.PatternParser.html">PatternParser</a></h4> |
| <section><p>A XMLReader document handler to read and parse hyphenation patterns from a XML |
| file. |
| <p> |
| LUCENENET: This class has been refactored from its Java counterpart to use XmlReader rather |
| than a SAX parser.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.TernaryTree.html">TernaryTree</a></h4> |
| <section><h2>Ternary Search Tree.</h2> |
| |
| <p> |
| A ternary search tree is a hybrid between a binary tree and a digital search |
| tree (trie). Keys are limited to strings. A data value of type char is stored |
| in each leaf node. It can be used as an index (or pointer) to the data. |
| Branches that only contain one key are compressed to one node by storing a |
| pointer to the trailer substring of the key. This class is intended to serve |
| as base class or helper class to implement Dictionary collections or the |
| like. Ternary trees have some nice properties as the following: the tree can |
| be traversed in sorted order, partial matches (wildcard) can be implemented, |
| retrieval of all keys within a given distance from the target, etc. The |
| storage requirements are higher than a binary tree but a lot less than a |
| trie. Performance is comparable with a hash table, sometimes it outperforms a |
| hash function (most of the time can determine a miss faster than a hash). |
| </p> |
| |
| <p> |
| The main purpose of this java port is to serve as a base for implementing |
| TeX's hyphenation algorithm (see The TeXBook, appendix H). Each language |
| requires from 5000 to 15000 hyphenation patterns which will be keys in this |
| tree. The strings patterns are usually small (from 2 to 5 characters), but |
| each char in the tree is stored in a node. Thus memory usage is the main |
| concern. We will sacrifice 'elegance' to keep memory requirements to the |
| minimum. Using java's char type as pointer (yes, I know pointer it is a |
| forbidden word in java) we can keep the size of the node to be just 8 bytes |
| (3 pointers and the data char). This gives room for about 65000 nodes. In my |
| tests the english patterns took 7694 nodes and the german patterns 10055 |
| nodes, so I think we are safe. |
| </p> |
| |
| <p> |
| All said, this is a map with strings as keys and char as value. Pretty |
| limited!. It can be extended to a general map by using the string |
| representation of an object and using the char value as an index to an array |
| that contains the object values. |
| </p> |
| |
| <p>This class has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified. </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.TernaryTree.Iterator.html">TernaryTree.Iterator</a></h4> |
| <section><p>Enumerator for TernaryTree</p> |
| <p>LUCENENET NOTE: This differs a bit from its Java counterpart to adhere to |
| .NET IEnumerator semantics. In Java, when the <a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.TernaryTree.Iterator.html">TernaryTree.Iterator</a> is |
| instantiated, it is already positioned at the first element. However, |
| to act like a .NET IEnumerator, the initial state is undefined and considered |
| to be before the first element until <a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.TernaryTree.Iterator.html#Lucene_Net_Analysis_Compound_Hyphenation_TernaryTree_Iterator_MoveNext">MoveNext()</a> is called, and |
| if a move took place it will return <code>true</code>;</p> |
| </section> |
| <h3 id="interfaces">Interfaces |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Analysis.Compound.Hyphenation.IPatternConsumer.html">IPatternConsumer</a></h4> |
| <section><p>This interface is used to connect the XML pattern file parser to the |
| hyphenation tree. |
| <p> |
| This interface has been taken from the Apache FOP project (<a href="http://xmlgraphics.apache.org/fop/">http://xmlgraphics.apache.org/fop/</a>). They have been slightly modified.</p> |
| </section> |
| </article> |
| </div> |
| |
| <div class="hidden-sm col-md-2" role="complementary"> |
| <div class="sideaffix"> |
| <div class="contribution"> |
| <ul class="nav"> |
| <li> |
| <a href="https://github.com/apache/lucenenet/blob/docs-4.8.0-beta00007/src/Lucene.Net.Analysis.Common/Analysis/Compound/Hyphenation/package.md/#L2" class="contribution-link">Improve this Doc</a> |
| </li> |
| </ul> |
| </div> |
| <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix"> |
| <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> --> |
| </nav> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <footer> |
| <div class="grad-bottom"></div> |
| <div class="footer"> |
| <div class="container"> |
| <span class="pull-right"> |
| <a href="#top">Back to top</a> |
| </span> |
| Copyright © 2020 Licensed to the Apache Software Foundation (ASF) |
| |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <script type="text/javascript" src="../../styles/docfx.vendor.js"></script> |
| <script type="text/javascript" src="../../styles/docfx.js"></script> |
| <script type="text/javascript" src="../../styles/main.js"></script> |
| </body> |
| </html> |