| <!DOCTYPE html> |
| <!--[if IE]><![endif]--> |
| <html> |
| |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> |
| <title>Namespace Lucene.Net.Codecs.Lucene42 |
| | Apache Lucene.NET 4.8.0 Documentation </title> |
| <meta name="viewport" content="width=device-width"> |
| <meta name="title" content="Namespace Lucene.Net.Codecs.Lucene42 |
| | Apache Lucene.NET 4.8.0 Documentation "> |
| <meta name="generator" content="docfx 2.47.0.0"> |
| |
| <link rel="shortcut icon" href="../../logo/favicon.ico"> |
| <link rel="stylesheet" href="../../styles/docfx.vendor.css"> |
| <link rel="stylesheet" href="../../styles/docfx.css"> |
| <link rel="stylesheet" href="../../styles/main.css"> |
| <meta property="docfx:navrel" content="../../toc.html"> |
| <meta property="docfx:tocrel" content="../toc.html"> |
| |
| <meta property="docfx:rel" content="../../"> |
| |
| </head> |
| <body data-spy="scroll" data-target="#affix" data-offset="120"> |
| <div id="wrapper"> |
| <header> |
| |
| <nav id="autocollapse" class="navbar ng-scope" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="navbar-brand" href="../../index.html"> |
| <img id="logo" class="svg" src="../../logo/lucene-net-color.png" alt=""> |
| </a> |
| </div> |
| <div class="collapse navbar-collapse" id="navbar"> |
| <form class="navbar-form navbar-right" role="search" id="search"> |
| <div class="form-group"> |
| <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off"> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| |
| <div class="subnav navbar navbar-default"> |
| <div class="container hide-when-search" id="breadcrumb"> |
| <ul class="breadcrumb"> |
| <li></li> |
| </ul> |
| </div> |
| </div> |
| </header> |
| <div class="container body-content"> |
| |
| <div id="search-results"> |
| <div class="search-list"></div> |
| <div class="sr-items"> |
| <p><i class="glyphicon glyphicon-refresh index-loading"></i></p> |
| </div> |
| <ul id="pagination"></ul> |
| </div> |
| </div> |
| <div role="main" class="container body-content hide-when-search"> |
| |
| <div class="sidenav hide-when-search"> |
| <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a> |
| <div class="sidetoggle collapse" id="sidetoggle"> |
| <div id="sidetoc"></div> |
| </div> |
| </div> |
| <div class="article row grid-right"> |
| <div class="col-md-10"> |
| <article class="content wrap" id="_content" data-uid="Lucene.Net.Codecs.Lucene42"> |
| |
| <h1 id="Lucene_Net_Codecs_Lucene42" data-uid="Lucene.Net.Codecs.Lucene42" class="text-break">Namespace Lucene.Net.Codecs.Lucene42 |
| </h1> |
| <div class="markdown level0 summary"><!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>Support for testing <a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42Codec.html">Lucene42Codec</a>.</p> |
| </div> |
| <div class="markdown level0 conceptual"></div> |
| <div class="markdown level0 remarks"></div> |
| <h3 id="classes">Classes |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42Codec.html">Lucene42Codec</a></h4> |
| <section><p>Implements the Lucene 4.2 index format, with configurable per-field postings |
| and docvalues formats. |
| <p> |
| If you want to reuse functionality of this codec in another codec, extend |
| <a class="xref" href="Lucene.Net.Codecs.FilterCodec.html">FilterCodec</a>. |
| <p> |
| See <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Codecs.Lucene42.html">Lucene.Net.Codecs.Lucene42</a> package documentation for file format details. |
| <p> |
| @lucene.experimental </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42DocValuesFormat.html">Lucene42DocValuesFormat</a></h4> |
| <section><p>Lucene 4.2 DocValues format. |
| <p> |
| Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with seven basic strategies. |
| <p> |
| <ul><li>Delta-compressed Numerics: per-document integers written in blocks of 4096. For each block |
| the minimum value is encoded, and each entry is a delta from that minimum value.</li><li>Table-compressed Numerics: when the number of unique values is very small, a lookup table |
| is written instead. Each per-document entry is instead the ordinal to this table.</li><li>Uncompressed Numerics: when all values would fit into a single byte, and the |
| <code>acceptableOverheadRatio</code> would pack values into 8 bits per value anyway, they |
| are written as absolute values (with no indirection or packing) for performance.</li><li>GCD-compressed Numerics: when all numbers share a common divisor, such as dates, the greatest |
| common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.</li><li>Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. |
| Each document's value can be addressed by <code>maxDoc*length</code>.</li><li>Variable-width Binary: one large concatenated byte[] is written, along with end addresses |
| for each document. The addresses are written in blocks of 4096, with the current absolute |
| start for the block, and the average (expected) delta per entry. For each document the |
| deviation from the delta (actual - expected) is written.</li><li>Sorted: an FST mapping deduplicated terms to ordinals is written, along with the per-document |
| ordinals written using one of the numeric strategies above.</li><li>SortedSet: an FST mapping deduplicated terms to ordinals is written, along with the per-document |
| ordinal list written using one of the binary strategies above.</li></ul> |
| <p> |
| Files: |
| <ol><li><code>.dvd</code>: DocValues data</li><li><code>.dvm</code>: DocValues metadata</li></ol> |
| <ol><li><a name="dvm" id="dvm"></a> |
| <p>The DocValues metadata or .dvm file.</p> |
| <p>For DocValues field, this stores metadata, such as the offset into the |
| DocValues data (.dvd)</p> |
| <p>DocValues metadata (.dvm) --> Header,<FieldNumber,EntryType,Entry><sup>NumFields</sup>,Footer</p> |
| <ul><li>Entry --> NumericEntry | BinaryEntry | SortedEntry</li><li>NumericEntry --> DataOffset,CompressionType,PackedVersion</li><li>BinaryEntry --> DataOffset,DataLength,MinLength,MaxLength,PackedVersion?,BlockSize?</li><li>SortedEntry --> DataOffset,ValueCount</li><li>FieldNumber,PackedVersion,MinLength,MaxLength,BlockSize,ValueCount --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>DataOffset,DataLength --> Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>EntryType,CompressionType --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) </li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>Footer --> CodecFooter (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul> |
| <p>Sorted fields have two entries: a SortedEntry with the FST metadata, |
| and an ordinary NumericEntry for the document-to-ord metadata.</p> |
| <p>SortedSet fields have two entries: a SortedEntry with the FST metadata, |
| and an ordinary BinaryEntry for the document-to-ord-list metadata.</p> |
| <p>FieldNumber of -1 indicates the end of metadata.</p> |
| <p>EntryType is a 0 (NumericEntry), 1 (BinaryEntry, or 2 (SortedEntry)</p> |
| <p>DataOffset is the pointer to the start of the data in the DocValues data (.dvd)</p> |
| <p>CompressionType indicates how Numeric values will be compressed: |
| <ul><li>0 --> delta-compressed. For each block of 4096 integers, every integer is delta-encoded |
| from the minimum value within the block.</li><li>1 --> table-compressed. When the number of unique numeric values is small and it would save space, |
| a lookup table of unique values is written, followed by the ordinal for each document.</li><li>2 --> uncompressed. When the <code>acceptableOverheadRatio</code> parameter would upgrade the number |
| of bits required to 8, and all values fit in a byte, these are written as absolute binary values |
| for performance.</li><li>3 --> gcd-compressed. When all integers share a common divisor, only quotients are stored |
| using blocks of delta-encoded ints.</li></ul> |
| <p>MinLength and MaxLength represent the min and max byte[] value lengths for Binary values. |
| If they are equal, then all values are of a fixed size, and can be addressed as <code>DataOffset + (docID * length)</code>. |
| Otherwise, the binary values are of variable size, and packed integer metadata (PackedVersion,BlockSize) |
| is written for the addresses.</li><li><a name="dvd" id="dvd"></a> |
| <p>The DocValues data or .dvd file.</p> |
| <p>For DocValues field, this stores the actual per-document data (the heavy-lifting)</p> |
| <p>DocValues data (.dvd) --> Header,<NumericData | BinaryData | SortedData><sup>NumFields</sup>,Footer</p> |
| <ul><li>NumericData --> DeltaCompressedNumerics | TableCompressedNumerics | UncompressedNumerics | GCDCompressedNumerics</li><li>BinaryData --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) <sup>DataLength</sup>,Addresses</li><li>SortedData --> FST<Int64> (<a class="xref" href="Lucene.Net.Util.Fst.FST-1.html">FST<T></a>) </li><li>DeltaCompressedNumerics --> BlockPackedInts(blockSize=4096) (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>TableCompressedNumerics --> TableSize, Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) <sup>TableSize</sup>, PackedInts (<a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a>) </li><li>UncompressedNumerics --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) <sup>maxdoc</sup></li><li>Addresses --> MonotonicBlockPackedInts(blockSize=4096) (<a class="xref" href="Lucene.Net.Util.Packed.MonotonicBlockPackedWriter.html">MonotonicBlockPackedWriter</a>) </li><li>Footer --> CodecFooter (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a></li></ul> |
| <p>SortedSet entries store the list of ordinals in their BinaryData as a |
| sequences of increasing vLongs (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>), delta-encoded.</p></li></ol> |
| <p> |
| Limitations: |
| <ul><li> Binary doc values can be at most <a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42DocValuesFormat.html#Lucene_Net_Codecs_Lucene42_Lucene42DocValuesFormat_MAX_BINARY_FIELD_LENGTH">MAX_BINARY_FIELD_LENGTH</a> in length.</li></ul> </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42FieldInfosFormat.html">Lucene42FieldInfosFormat</a></h4> |
| <section><p>Lucene 4.2 Field Infos format. |
| <p> |
| <p>Field names are stored in the field info file, with suffix <code>.fnm</code>.</p> |
| <p>FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, |
| FieldBits,DocValuesBits,Attributes> <sup>FieldsCount</sup></p> |
| <p>Data types: |
| <ul><li>Header --> CodecHeader <a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a></li><li>FieldsCount --> VInt <a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a></li><li>FieldName --> String <a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteString_System_String_">WriteString(String)</a></li><li>FieldBits, DocValuesBits --> Byte <a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a></li><li>FieldNumber --> VInt <a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a></li><li>Attributes --> IDictionary<String,String> <a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteStringStringMap_System_Collections_Generic_IDictionary_System_String_System_String__">WriteStringStringMap(IDictionary<String, String>)</a></li></ul> |
| </p> |
| Field Descriptions: |
| <ul><li>FieldsCount: the number of fields in this file.</li><li>FieldName: name of the field as a UTF-8 String.</li><li>FieldNumber: the field's number. Note that unlike previous versions of |
| Lucene, the fields are not numbered implicitly by their order in the |
| file, instead explicitly.</li><li>FieldBits: a byte containing field options. |
| <ul><li>The low-order bit is one for indexed fields, and zero for non-indexed |
| fields.</li><li>The second lowest-order bit is one for fields that have term vectors |
| stored, and zero for fields without term vectors.</li><li>If the third lowest order-bit is set (0x4), offsets are stored into |
| the postings list in addition to positions.</li><li>Fourth bit is unused.</li><li>If the fifth lowest-order bit is set (0x10), norms are omitted for the |
| indexed field.</li><li>If the sixth lowest-order bit is set (0x20), payloads are stored for the |
| indexed field.</li><li>If the seventh lowest-order bit is set (0x40), term frequencies and |
| positions omitted for the indexed field.</li><li>If the eighth lowest-order bit is set (0x80), positions are omitted for the |
| indexed field.</li></ul> |
| </li><li>DocValuesBits: a byte containing per-document value types. The type |
| recorded as two four-bit integers, with the high-order bits representing |
| <code>norms</code> options, and the low-order bits representing |
| <a class="xref" href="Lucene.Net.Index.DocValues.html">DocValues</a> options. Each four-bit integer can be decoded as such: |
| <ul><li>0: no DocValues for this field.</li><li>1: NumericDocValues. (<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_NUMERIC">NUMERIC</a>)</li><li>2: BinaryDocValues. (<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_BINARY">BINARY</a>)</li><li>3: SortedDocValues. (<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_SORTED">SORTED</a>)</li></ul> |
| </li><li>Attributes: a key-value map of codec-private attributes.</li></ul> |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42NormsFormat.html">Lucene42NormsFormat</a></h4> |
| <section><p>Lucene 4.2 score normalization format. |
| <p> |
| NOTE: this uses the same format as <a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42DocValuesFormat.html">Lucene42DocValuesFormat</a> |
| Numeric DocValues, but with different file extensions, and passing |
| <a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html#Lucene_Net_Util_Packed_PackedInt32s_FASTEST">FASTEST</a> for uncompressed encoding: trading off |
| space for performance. |
| <p> |
| Files: |
| <ul><li><code>.nvd</code>: DocValues data</li><li><code>.nvm</code>: DocValues metadata</li></ul></p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene42.Lucene42TermVectorsFormat.html">Lucene42TermVectorsFormat</a></h4> |
| <section><p>Lucene 4.2 term vectors format (<a class="xref" href="Lucene.Net.Codecs.TermVectorsFormat.html">TermVectorsFormat</a>). |
| <p> |
| Very similarly to <a class="xref" href="Lucene.Net.Codecs.Lucene41.Lucene41StoredFieldsFormat.html">Lucene41StoredFieldsFormat</a>, this format is based |
| on compressed chunks of data, with document-level granularity so that a |
| document can never span across distinct chunks. Moreover, data is made as |
| compact as possible: |
| <ul><li>textual data is compressed using the very light, |
| <a href="http://code.google.com/p/lz4/">LZ4</a> compression algorithm,</li><li>binary data is written using fixed-size blocks of |
| packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a>).</li></ul> |
| <p> |
| Term vectors are stored using two files |
| <ul><li>a data file where terms, frequencies, positions, offsets and payloads |
| are stored,</li><li>an index file, loaded into memory, used to locate specific documents in |
| the data file.</li></ul> |
| Looking up term vectors for any document requires at most 1 disk seek. |
| <p><strong>File formats</strong> |
| <ol><li><a name="vector_data" id="vector_data"></a> |
| <p>A vector data file (extension <code>.tvd</code>). this file stores terms, |
| frequencies, positions, offsets and payloads for every document. Upon writing |
| a new segment, it accumulates data into memory until the buffer used to store |
| terms and payloads grows beyond 4KB. Then it flushes all metadata, terms |
| and positions to disk using <a href="http://code.google.com/p/lz4/">LZ4</a> |
| compression for terms and payloads and |
| blocks of packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) for positions.</p> |
| <p>Here is a more detailed description of the field data file format:</p> |
| <ul><li>VectorData (.tvd) --> <Header>, PackedIntsVersion, ChunkSize, <Chunk><sup>ChunkCount</sup>, Footer</li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>PackedIntsVersion --> <a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html#Lucene_Net_Util_Packed_PackedInt32s_VERSION_CURRENT">VERSION_CURRENT</a> as a VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>ChunkSize is the number of bytes of terms to accumulate before flushing, as a VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>ChunkCount is not known in advance and is the number of chunks necessary to store all document of the segment</li><li>Chunk --> DocBase, ChunkDocs, < NumFields >, < FieldNums >, < FieldNumOffs >, < Flags >, |
| < NumTerms >, < TermLengths >, < TermFreqs >, < Positions >, < StartOffsets >, < Lengths >, |
| < PayloadLengths >, < TermAndPayloads ></li><li>DocBase is the ID of the first doc of the chunk as a VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>ChunkDocs is the number of documents in the chunk</li><li>NumFields --> DocNumFields<sup>ChunkDocs</sup></li><li>DocNumFields is the number of fields for each doc, written as a VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) if ChunkDocs==1 and as a <a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a> array otherwise</li><li>FieldNums --> FieldNumDelta<sup>TotalDistincFields</sup>, a delta-encoded list of the sorted unique field numbers present in the chunk</li><li>FieldNumOffs --> FieldNumOff<sup>TotalFields</sup>, as a <a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a> array</li><li>FieldNumOff is the offset of the field number in FieldNums</li><li>TotalFields is the total number of fields (sum of the values of NumFields)</li><li>Flags --> Bit < FieldFlags ></li><li>Bit is a single bit which when true means that fields have the same options for every document in the chunk</li><li>FieldFlags --> if Bit==1: Flag<sup>TotalDistinctFields</sup> else Flag<sup>TotalFields</sup></li><li>Flag: a 3-bits int where: |
| <ul><li>the first bit means that the field has positions</li><li>the second bit means that the field has offsets</li><li>the third bit means that the field has payloads</li></ul> |
| </li><li>NumTerms --> FieldNumTerms<sup>TotalFields</sup></li><li>FieldNumTerms: the number of terms for each field, using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>TermLengths --> PrefixLength<sup>TotalTerms</sup> SuffixLength<sup>TotalTerms</sup></li><li>TotalTerms: total number of terms (sum of NumTerms)</li><li>PrefixLength: 0 for the first term of a field, the common prefix with the previous term otherwise using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>SuffixLength: length of the term minus PrefixLength for every term using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>TermFreqs --> TermFreqMinus1<sup>TotalTerms</sup></li><li>TermFreqMinus1: (frequency - 1) for each term using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>Positions --> PositionDelta<sup>TotalPositions</sup></li><li>TotalPositions is the sum of frequencies of terms of all fields that have positions</li><li>PositionDelta: the absolute position for the first position of a term, and the difference with the previous positions for following positions using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>StartOffsets --> (AvgCharsPerTerm<sup>TotalDistinctFields</sup>) StartOffsetDelta<sup>TotalOffsets</sup></li><li>TotalOffsets is the sum of frequencies of terms of all fields that have offsets</li><li>AvgCharsPerTerm: average number of chars per term, encoded as a float on 4 bytes. They are not present if no field has both positions and offsets enabled.</li><li>StartOffsetDelta: (startOffset - previousStartOffset - AvgCharsPerTerm * PositionDelta). previousStartOffset is 0 for the first offset and AvgCharsPerTerm is 0 if the field has no positions using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>Lengths --> LengthMinusTermLength<sup>TotalOffsets</sup></li><li>LengthMinusTermLength: (endOffset - startOffset - termLength) using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>PayloadLengths --> PayloadLength<sup>TotalPayloads</sup></li><li>TotalPayloads is the sum of frequencies of terms of all fields that have payloads</li><li>PayloadLength is the payload length encoded using blocks of 64 packed <span class="xref">System.Int32</span>s (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>TermAndPayloads --> LZ4-compressed representation of < FieldTermsAndPayLoads ><sup>TotalFields</sup></li><li>FieldTermsAndPayLoads --> Terms (Payloads)</li><li>Terms: term bytes</li><li>Payloads: payload bytes (if the field has payloads)</li><li>Footer --> CodecFooter (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul> |
| </li><li><a name="vector_index" id="vector_index"></a> |
| <p>An index file (extension <code>.tvx</code>).</p> |
| <ul><li>VectorIndex (.tvx) --> <Header>, <ChunkIndex>, Footer</li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>ChunkIndex: See <a class="xref" href="Lucene.Net.Codecs.Compressing.CompressingStoredFieldsIndexWriter.html">CompressingStoredFieldsIndexWriter</a></li><li>Footer --> CodecFooter (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul> |
| </li></ol> |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| </article> |
| </div> |
| |
| <div class="hidden-sm col-md-2" role="complementary"> |
| <div class="sideaffix"> |
| <div class="contribution"> |
| <ul class="nav"> |
| <li> |
| <a href="https://github.com/apache/lucenenet/blob/docs-4.8.0-beta00007/src/Lucene.Net.TestFramework/Codecs/Lucene42/package.md/#L2" class="contribution-link">Improve this Doc</a> |
| </li> |
| </ul> |
| </div> |
| <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix"> |
| <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> --> |
| </nav> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <footer> |
| <div class="grad-bottom"></div> |
| <div class="footer"> |
| <div class="container"> |
| <span class="pull-right"> |
| <a href="#top">Back to top</a> |
| </span> |
| Copyright © 2020 Licensed to the Apache Software Foundation (ASF) |
| |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <script type="text/javascript" src="../../styles/docfx.vendor.js"></script> |
| <script type="text/javascript" src="../../styles/docfx.js"></script> |
| <script type="text/javascript" src="../../styles/main.js"></script> |
| </body> |
| </html> |