| <!DOCTYPE html> |
| <!--[if IE]><![endif]--> |
| <html> |
| |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> |
| <title>Namespace Lucene.Net.Codecs.Memory |
| | Apache Lucene.NET 4.8.0-beta00013 Documentation </title> |
| <meta name="viewport" content="width=device-width"> |
| <meta name="title" content="Namespace Lucene.Net.Codecs.Memory |
| | Apache Lucene.NET 4.8.0-beta00013 Documentation "> |
| <meta name="generator" content="docfx 2.56.2.0"> |
| |
| <link rel="shortcut icon" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/favicon.ico"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.css"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.css"> |
| <link rel="stylesheet" href="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.css"> |
| <meta property="docfx:navrel" content="toc.html"> |
| <meta property="docfx:tocrel" content="codecs/toc.html"> |
| |
| <meta property="docfx:rel" content="https://lucenenet.apache.org/docs/4.8.0-beta00009/"> |
| |
| </head> |
| <body data-spy="scroll" data-target="#affix" data-offset="120"> |
| <span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span> |
| <div id="wrapper"> |
| <header> |
| |
| <nav id="autocollapse" class="navbar ng-scope" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="navbar-brand" href="/"> |
| <img id="logo" class="svg" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/logo/lucene-net-color.png" alt=""> |
| </a> |
| </div> |
| <div class="collapse navbar-collapse" id="navbar"> |
| <form class="navbar-form navbar-right" role="search" id="search"> |
| <div class="form-group"> |
| <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off"> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| |
| <div class="subnav navbar navbar-default"> |
| <div class="container hide-when-search"> |
| <ul class="level0 breadcrumb"> |
| <li> |
| <a href="https://lucenenet.apache.org/docs/4.8.0-beta00009/">API</a> |
| <span id="breadcrumb"> |
| <ul class="breadcrumb"> |
| <li></li> |
| </ul> |
| </span> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </header> |
| <div class="container body-content"> |
| |
| <div id="search-results"> |
| <div class="search-list"></div> |
| <div class="sr-items"> |
| <p><i class="glyphicon glyphicon-refresh index-loading"></i></p> |
| </div> |
| <ul id="pagination"></ul> |
| </div> |
| </div> |
| <div role="main" class="container body-content hide-when-search"> |
| |
| <div class="sidenav hide-when-search"> |
| <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a> |
| <div class="sidetoggle collapse" id="sidetoggle"> |
| <div id="sidetoc"></div> |
| </div> |
| </div> |
| <div class="article row grid-right"> |
| <div class="col-md-10"> |
| <article class="content wrap" id="_content" data-uid="Lucene.Net.Codecs.Memory"> |
| |
| <h1 id="Lucene_Net_Codecs_Memory" data-uid="Lucene.Net.Codecs.Memory" class="text-break">Namespace Lucene.Net.Codecs.Memory |
| </h1> |
| <div class="markdown level0 summary"><!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>Term dictionary, DocValues or Postings formats that are read entirely into memory.</p> |
| </div> |
| <div class="markdown level0 conceptual"></div> |
| <div class="markdown level0 remarks"></div> |
| <h3 id="classes">Classes |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.DirectDocValuesFormat.html">DirectDocValuesFormat</a></h4> |
| <section><p>In-memory docvalues format that does no (or very little) |
| compression. Indexed values are stored on disk, but |
| then at search time all values are loaded into memory as |
| simple .NET arrays. For numeric values, it uses |
| byte[], short[], int[], long[] as necessary to fit the |
| range of the values. For binary values, there is an <span class="xref">System.Int32</span> |
| (4 bytes) overhead per value.</p> |
| <p>Limitations: |
| <ul><li>For binary and sorted fields the total space |
| required for all binary values cannot exceed about |
| 2.1 GB (see <a class="xref" href="Lucene.Net.Codecs.Memory.DirectDocValuesFormat.html#Lucene_Net_Codecs_Memory_DirectDocValuesFormat_MAX_TOTAL_BYTES_LENGTH">MAX_TOTAL_BYTES_LENGTH</a>).</li><li>For sorted set fields, the sum of the size of each |
| document's set of values cannot exceed about 2.1 B |
| values (see <a class="xref" href="Lucene.Net.Codecs.Memory.DirectDocValuesFormat.html#Lucene_Net_Codecs_Memory_DirectDocValuesFormat_MAX_SORTED_SET_ORDS">MAX_SORTED_SET_ORDS</a>). For example, |
| if every document has 10 values (10 instances of |
| <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Documents.SortedSetDocValuesField.html">SortedSetDocValuesField</a>) added, then no |
| more than ~210 M documents can be added to one |
| segment. </li></ul> |
| </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.DirectPostingsFormat.html">DirectPostingsFormat</a></h4> |
| <section><p>Wraps <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Codecs.Lucene41.Lucene41PostingsFormat.html">Lucene41PostingsFormat</a> format for on-disk |
| storage, but then at read time loads and stores all |
| terms & postings directly in RAM as byte[], int[].</p> |
| <p><p><strong>WARNING</strong>: This is |
| exceptionally RAM intensive: it makes no effort to |
| compress the postings data, storing terms as separate |
| byte[] and postings as separate int[], but as a result it |
| gives substantial increase in search performance.</p> |
| <p> |
| <p>This postings format supports <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Index.TermsEnum.html#Lucene_Net_Index_TermsEnum_Ord">Ord</a> |
| and <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Index.TermsEnum.html#Lucene_Net_Index_TermsEnum_SeekExact_System_Int64_">SeekExact(Int64)</a>.</p> |
| <p> |
| <p>Because this holds all term bytes as a single |
| byte[], you cannot have more than 2.1GB worth of term |
| bytes in a single segment. |
| </p></p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTOrdPostingsFormat.html">FSTOrdPostingsFormat</a></h4> |
| <section><p>FSTOrd term dict + Lucene41PBF</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTOrdPulsing41PostingsFormat.html">FSTOrdPulsing41PostingsFormat</a></h4> |
| <section><p>FSTOrd + Pulsing41 |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTOrdTermsReader.html">FSTOrdTermsReader</a></h4> |
| <section><p>FST-based terms dictionary reader. |
| <p> |
| The FST index maps each term and its ord, and during seek |
| the ord is used fetch metadata from a single block. |
| The term dictionary is fully memory resident. |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTOrdTermsWriter.html">FSTOrdTermsWriter</a></h4> |
| <section><p>FST-based term dict, using ord as FST output. |
| <p> |
| The FST holds the mapping between <term, ord>, and |
| term's metadata is delta encoded into a single byte block. |
| <p> |
| Typically the byte block consists of four parts: |
| <ol><li>term statistics: docFreq, totalTermFreq;</li><li>monotonic long[], e.g. the pointer to the postings list for that term;</li><li>generic byte[], e.g. other information customized by postings base.</li><li>single-level skip list to speed up metadata decoding by ord.</li></ol> |
| <p> |
| <p> |
| Files: |
| <ul><li><code>.tix</code>: <a href="#Termindex">Term Index</a></li><li><code>.tbk</code>: <a href="#Termblock">Term Block</a></li></ul> |
| </p></p> |
| <p><a name="Termindex" id="Termindex"></a> |
| <h3>Term Index</h3> |
| <p> |
| The .tix contains a list of FSTs, one for each field. |
| The FST maps a term to its corresponding order in current field. |
| </p></p> |
| <ul><li>TermIndex(.tix) --> Header, TermFST<sup>NumFields</sup>, Footer</li><li>TermFST --> <a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Util.Fst.FST-1.html">FST<T></a></li><li>Header --> CodecHeader (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>Footer --> CodecFooter (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul> |
| |
| <p>Notes:</p> |
| <ul><li> |
| Since terms are already sorted before writing to <a href="#Termblock">Term Block</a>, |
| their ords can directly used to seek term metadata from term block. |
| </li></ul> |
| |
| <a name="Termblock" id="Termblock"></a> |
| <h3>Term Block</h3> |
| <p> |
| The .tbk contains all the statistics and metadata for terms, along with field summary (e.g. |
| per-field data like number of documents in current field). For each field, there are four blocks: |
| <ul><li>statistics bytes block: contains term statistics; </li><li>metadata longs block: delta-encodes monotonic part of metadata; </li><li>metadata bytes block: encodes other parts of metadata; </li><li>skip block: contains skip data, to speed up metadata seeking and decoding</li></ul> |
| </p> |
| |
| <p><p>File Format:</p> |
| <ul><li>TermBlock(.tbk) --> Header, <em>PostingsHeader</em>, FieldSummary, DirOffset</li><li>FieldSummary --> NumFields, <FieldNumber, NumTerms, SumTotalTermFreq?, SumDocFreq, |
| DocCount, LongsSize, DataBlock > <sup>NumFields</sup>, Footer</li><li>DataBlock --> StatsBlockLength, MetaLongsBlockLength, MetaBytesBlockLength, |
| SkipBlock, StatsBlock, MetaLongsBlock, MetaBytesBlock </li><li>SkipBlock --> < StatsFPDelta, MetaLongsSkipFPDelta, MetaBytesSkipFPDelta, |
| MetaLongsSkipDelta<sup>LongsSize</sup> ><sup>NumTerms</sup></li><li>StatsBlock --> < DocFreq[Same?], (TotalTermFreq-DocFreq) ? > <sup>NumTerms</sup></li><li>MetaLongsBlock --> < LongDelta<sup>LongsSize</sup>, BytesSize > <sup>NumTerms</sup></li><li>MetaBytesBlock --> Byte <sup>MetaBytesBlockLength</sup></li><li>Header --> CodecHeader (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>DirOffset --> Uint64 (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>NumFields, FieldNumber, DocCount, DocFreq, LongsSize, |
| FieldNumber, DocCount --> VInt (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>NumTerms, SumTotalTermFreq, SumDocFreq, StatsBlockLength, MetaLongsBlockLength, MetaBytesBlockLength, |
| StatsFPDelta, MetaLongsSkipFPDelta, MetaBytesSkipFPDelta, MetaLongsSkipStart, TotalTermFreq, |
| LongDelta,--> VLong (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>) </li><li>Footer --> CodecFooter (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul> |
| <p>Notes: </p> |
| <ul><li> |
| The format of PostingsHeader and MetaBytes are customized by the specific postings implementation: |
| they contain arbitrary per-file data (such as parameters or versioning information), and per-term data |
| (non-monotonic ones like pulsed postings data). |
| </li><li> |
| During initialization the reader will load all the blocks into memory. SkipBlock will be decoded, so that during seek |
| term dict can lookup file pointers directly. StatsFPDelta, MetaLongsSkipFPDelta, etc. are file offset |
| for every SkipInterval's term. MetaLongsSkipDelta is the difference from previous one, which indicates |
| the value of preceding metadata longs for every SkipInterval's term. |
| </li><li> |
| DocFreq is the count of documents which contain the term. TotalTermFreq is the total number of occurrences of the term. |
| Usually these two values are the same for long tail terms, therefore one bit is stole from DocFreq to check this case, |
| so that encoding of TotalTermFreq may be omitted. |
| </li></ul> |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTPostingsFormat.html">FSTPostingsFormat</a></h4> |
| <section><p>FST term dict + Lucene41PBF</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTPulsing41PostingsFormat.html">FSTPulsing41PostingsFormat</a></h4> |
| <section><p>FST + Pulsing41, test only, since |
| FST does no delta encoding here! |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTTermsReader.html">FSTTermsReader</a></h4> |
| <section><p>FST-based terms dictionary reader. |
| <p> |
| The FST directly maps each term and its metadata, |
| it is memory resident. |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.FSTTermsWriter.html">FSTTermsWriter</a></h4> |
| <section><p>FST-based term dict, using metadata as FST output. |
| <p> |
| The FST directly holds the mapping between <term, metadata>. |
| <p> |
| Term metadata consists of three parts: |
| <ol><li>term statistics: docFreq, totalTermFreq;</li><li>monotonic long[], e.g. the pointer to the postings list for that term;</li><li>generic byte[], e.g. other information need by postings reader.</li></ol> |
| <p> |
| File: |
| <ul><li><code>.tst</code>: <a href="#Termdictionary">Term Dictionary</a></li></ul> |
| </p> |
| <p> |
| <p><a name="Termdictionary" id="Termdictionary"></a> |
| <h3>Term Dictionary</h3> |
| </p> |
| <p> |
| The .tst contains a list of FSTs, one for each field. |
| The FST maps a term to its corresponding statistics (e.g. docfreq) |
| and metadata (e.g. information for postings list reader like file pointer |
| to postings list). |
| </p> |
| <p> |
| Typically the metadata is separated into two parts: |
| <ul><li> |
| Monotonical long array: Some metadata will always be ascending in order |
| with the corresponding term. This part is used by FST to share outputs between arcs. |
| </li><li> |
| Generic byte array: Used to store non-monotonic metadata. |
| </li></ul> |
| </p></p> |
| <p>File format: |
| <ul><li>TermsDict(.tst) --> Header, <em>PostingsHeader</em>, FieldSummary, DirOffset</li><li>FieldSummary --> NumFields, <FieldNumber, NumTerms, SumTotalTermFreq?, |
| SumDocFreq, DocCount, LongsSize, TermFST ><sup>NumFields</sup></li><li>TermFST TermData</li><li>TermData --> Flag, BytesSize?, LongDelta<sup>LongsSize</sup>?, Byte<sup>BytesSize</sup>?, |
| < DocFreq[Same?], (TotalTermFreq-DocFreq) > ? </li><li>Header --> CodecHeader (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>DirOffset --> Uint64 (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>DocFreq, LongsSize, BytesSize, NumFields, |
| FieldNumber, DocCount --> VInt (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>TotalTermFreq, NumTerms, SumTotalTermFreq, SumDocFreq, LongDelta --> |
| VLong (<a class="xref" href="http://localhost:8080/api/core/Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>) </li></ul> |
| <p>Notes:</p> |
| <ul><li> |
| The format of PostingsHeader and generic meta bytes are customized by the specific postings implementation: |
| they contain arbitrary per-file data (such as parameters or versioning information), and per-term data |
| (non-monotonic ones like pulsed postings data). |
| </li><li> |
| The format of TermData is determined by FST, typically monotonic metadata will be dense around shallow arcs, |
| while in deeper arcs only generic bytes and term statistics exist. |
| </li><li> |
| The byte Flag is used to indicate which part of metadata exists on current arc. Specially the monotonic part |
| is omitted when it is an array of 0s. |
| </li><li> |
| Since LongsSize is per-field fixed, it is only written once in field summary. |
| </li></ul> |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.MemoryDocValuesFormat.html">MemoryDocValuesFormat</a></h4> |
| <section><p>In-memory docvalues format. </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Memory.MemoryPostingsFormat.html">MemoryPostingsFormat</a></h4> |
| <section><p>Stores terms & postings (docs, positions, payloads) in |
| RAM, using an FST.</p> |
| <p><p>Note that this codec implements advance as a linear |
| scan! This means if you store large fields in here, |
| queries that rely on advance will (AND BooleanQuery, |
| PhraseQuery) will be relatively slow! |
| </p></p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p> |
| </section> |
| </article> |
| </div> |
| |
| <div class="hidden-sm col-md-2" role="complementary"> |
| <div class="sideaffix"> |
| <div class="contribution"> |
| <ul class="nav"> |
| <li> |
| <a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00013/src/Lucene.Net.Codecs/Memory/package.md/#L2" class="contribution-link">Improve this Doc</a> |
| </li> |
| </ul> |
| </div> |
| <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix"> |
| <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> --> |
| </nav> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <footer> |
| <div class="grad-bottom"></div> |
| <div class="footer"> |
| <div class="container"> |
| <span class="pull-right"> |
| <a href="#top">Back to top</a> |
| </span> |
| Copyright © 2020 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small> |
| |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.vendor.js"></script> |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/docfx.js"></script> |
| <script type="text/javascript" src="https://lucenenet.apache.org/docs/4.8.0-beta00009/styles/main.js"></script> |
| </body> |
| </html> |