| <!DOCTYPE html> |
| <!--[if IE]><![endif]--> |
| <html> |
| |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> |
| <title>Namespace Lucene.Net.Search.Suggest.Fst |
| | Apache Lucene.NET 4.8.0-beta00013 Documentation </title> |
| <meta name="viewport" content="width=device-width"> |
| <meta name="title" content="Namespace Lucene.Net.Search.Suggest.Fst |
| | Apache Lucene.NET 4.8.0-beta00013 Documentation "> |
| <meta name="generator" content="docfx 2.56.2.0"> |
| |
| <link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css"> |
| <meta property="docfx:navrel" content="toc.html"> |
| <meta property="docfx:tocrel" content="suggest/toc.html"> |
| |
| <meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/"> |
| |
| </head> |
| <body data-spy="scroll" data-target="#affix" data-offset="120"> |
| <span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span> |
| <div id="wrapper"> |
| <header> |
| |
| <nav id="autocollapse" class="navbar ng-scope" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="navbar-brand" href="/"> |
| <img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt=""> |
| </a> |
| </div> |
| <div class="collapse navbar-collapse" id="navbar"> |
| <form class="navbar-form navbar-right" role="search" id="search"> |
| <div class="form-group"> |
| <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off"> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| |
| <div class="subnav navbar navbar-default"> |
| <div class="container hide-when-search"> |
| <ul class="level0 breadcrumb"> |
| <li> |
| <a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a> |
| <span id="breadcrumb"> |
| <ul class="breadcrumb"> |
| <li></li> |
| </ul> |
| </span> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </header> |
| <div class="container body-content"> |
| |
| <div id="search-results"> |
| <div class="search-list"></div> |
| <div class="sr-items"> |
| <p><i class="glyphicon glyphicon-refresh index-loading"></i></p> |
| </div> |
| <ul id="pagination"></ul> |
| </div> |
| </div> |
| <div role="main" class="container body-content hide-when-search"> |
| |
| <div class="sidenav hide-when-search"> |
| <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a> |
| <div class="sidetoggle collapse" id="sidetoggle"> |
| <div id="sidetoc"></div> |
| </div> |
| </div> |
| <div class="article row grid-right"> |
| <div class="col-md-10"> |
| <article class="content wrap" id="_content" data-uid="Lucene.Net.Search.Suggest.Fst"> |
| |
| <h1 id="Lucene_Net_Search_Suggest_Fst" data-uid="Lucene.Net.Search.Suggest.Fst" class="text-break">Namespace Lucene.Net.Search.Suggest.Fst |
| </h1> |
| <div class="markdown level0 summary"><!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>Finite-state based autosuggest.</p> |
| </div> |
| <div class="markdown level0 conceptual"></div> |
| <div class="markdown level0 remarks"></div> |
| <h3 id="classes">Classes |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.ExternalRefSorter.html">ExternalRefSorter</a></h4> |
| <section><p>Builds and iterates over sequences stored on disk.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.html">FSTCompletion</a></h4> |
| <section><p>Finite state automata based implementation of "autocomplete" functionality.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.Completion.html">FSTCompletion.Completion</a></h4> |
| <section><p>A single completion for a given key.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletionBuilder.html">FSTCompletionBuilder</a></h4> |
| <section><p>Finite state automata based implementation of "autocomplete" functionality.</p> |
| <h2>Implementation details</h2> |
| |
| <p> |
| The construction step in the object finalizer works as follows: |
| <ul><li>A set of input terms and their buckets is given.</li><li>All terms in the input are prefixed with a synthetic pseudo-character |
| (code) of the weight bucket the term fell into. For example a term |
| <code>abc</code> with a discretized weight equal '1' would become |
| <code>1abc</code>.</li><li>The terms are then sorted by their raw value of UTF-8 character values |
| (including the synthetic bucket code in front).</li><li>A finite state automaton (<a class="xref" href="https://lucenenet.apache.org/docs/4.8.0-beta00013/api/core/Lucene.Net.Util.Fst.FST.html">FST</a>) is constructed from the input. The |
| root node has arcs labeled with all possible weights. We cache all these |
| arcs, highest-weight first.</li></ul> |
| |
| </p> |
| <p> |
| At runtime, in <a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.html#Lucene_Net_Search_Suggest_Fst_FSTCompletion_DoLookup_System_String_System_Int32_">DoLookup(String, Int32)</a>, |
| the automaton is utilized as follows: |
| <ul><li>For each possible term weight encoded in the automaton (cached arcs from |
| the root above), starting with the highest one, we descend along the path of |
| the input key. If the key is not a prefix of a sequence in the automaton |
| (path ends prematurely), we exit immediately -- no completions.</li><li>Otherwise, we have found an internal automaton node that ends the key. |
| <strong>The entire subautomaton (all paths) starting from this node form the key's |
| completions.</strong> We start the traversal of this subautomaton. Every time we |
| reach a final state (arc), we add a single suggestion to the list of results |
| (the weight of this suggestion is constant and equal to the root path we |
| started from). The tricky part is that because automaton edges are sorted and |
| we scan depth-first, we can terminate the entire procedure as soon as we |
| collect enough suggestions the user requested.</li><li>In case the number of suggestions collected in the step above is still |
| insufficient, we proceed to the next (smaller) weight leaving the root node |
| and repeat the same algorithm again.</li></ul> |
| |
| <h2>Runtime behavior and performance characteristic</h2> |
| |
| The algorithm described above is optimized for finding suggestions to short |
| prefixes in a top-weights-first order. This is probably the most common use |
| case: it allows presenting suggestions early and sorts them by the global |
| frequency (and then alphabetically). |
| |
| </p> |
| <p> |
| If there is an exact match in the automaton, it is returned first on the |
| results list (even with by-weight sorting). |
| |
| </p> |
| <p> |
| Note that the maximum lookup time for <strong>any prefix</strong> is the time of |
| descending to the subtree, plus traversal of the subtree up to the number of |
| requested suggestions (because they are already presorted by weight on the |
| root level and alphabetically at any node level). |
| |
| </p> |
| <p> |
| To order alphabetically only (no ordering by priorities), use identical term |
| weights for all terms. Alphabetical suggestions are returned even if |
| non-constant weights are used, but the algorithm for doing this is |
| suboptimal. |
| |
| </p> |
| <p> |
| "alphabetically" in any of the documentation above indicates UTF-8 |
| representation order, nothing else. |
| |
| </p> |
| <p> |
| <strong>NOTE</strong>: the FST file format is experimental and subject to suddenly |
| change, requiring you to rebuild the FST suggest index. |
| |
| </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletionLookup.html">FSTCompletionLookup</a></h4> |
| <section><p>An adapter from <a class="xref" href="Lucene.Net.Search.Suggest.Lookup.html">Lookup</a> API to <a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.html">FSTCompletion</a>.</p> |
| <p>This adapter differs from <a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.html">FSTCompletion</a> in that it attempts |
| to discretize any "weights" as passed from in <a class="xref" href="Lucene.Net.Search.Suggest.IInputEnumerator.html#Lucene_Net_Search_Suggest_IInputEnumerator_Weight">Weight</a> |
| to match the number of buckets. For the rationale for bucketing, see |
| <a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.html">FSTCompletion</a>. |
| |
| </p> |
| <p><strong>Note:</strong>Discretization requires an additional sorting pass. |
| |
| </p> |
| <p>The range of weights for bucketing/ discretization is determined |
| by sorting the input by weight and then dividing into |
| equal ranges. Then, scores within each range are assigned to that bucket. |
| |
| </p> |
| <p>Note that this means that even large differences in weights may be lost |
| during automaton construction, but the overall distinction between "classes" |
| of weights will be preserved regardless of the distribution of weights. |
| |
| </p> |
| <p>For fine-grained control over which weights are assigned to which buckets, |
| use <a class="xref" href="Lucene.Net.Search.Suggest.Fst.FSTCompletion.html">FSTCompletion</a> directly or <a class="xref" href="Lucene.Net.Search.Suggest.Tst.TSTLookup.html">TSTLookup</a>, for example. |
| |
| </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.WFSTCompletionLookup.html">WFSTCompletionLookup</a></h4> |
| <section><p>Suggester based on a weighted FST: it first traverses the prefix, |
| then walks the <em>n</em> shortest paths to retrieve top-ranked |
| suggestions. |
| <p> |
| <strong>NOTE</strong>: |
| Input weights must be between 0 and <span class="xref">System.Int32.MaxValue</span>, any |
| other values will be rejected.</p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| <h3 id="interfaces">Interfaces |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Search.Suggest.Fst.IBytesRefSorter.html">IBytesRefSorter</a></h4> |
| <section><p>Collects <span class="xref">Lucene.Net.Util.BytesRef</span> and then allows one to iterate over their sorted order. Implementations |
| of this interface will be called in a single-threaded scenario.</p> |
| </section> |
| </article> |
| </div> |
| |
| <div class="hidden-sm col-md-2" role="complementary"> |
| <div class="sideaffix"> |
| <div class="contribution"> |
| <ul class="nav"> |
| <li> |
| <a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00013/src/Lucene.Net.Suggest/Suggest/Fst/package.md/#L2" class="contribution-link">Improve this Doc</a> |
| </li> |
| </ul> |
| </div> |
| <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix"> |
| <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> --> |
| </nav> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <footer> |
| <div class="grad-bottom"></div> |
| <div class="footer"> |
| <div class="container"> |
| <span class="pull-right"> |
| <a href="#top">Back to top</a> |
| </span> |
| Copyright © 2020 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small> |
| |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script> |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script> |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script> |
| </body> |
| </html> |