| <!DOCTYPE html> |
| <!--[if IE]><![endif]--> |
| <html> |
| |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> |
| <title>Namespace Lucene.Net.Codecs.Lucene40 |
| | Apache Lucene.NET 4.8.0 Documentation </title> |
| <meta name="viewport" content="width=device-width"> |
| <meta name="title" content="Namespace Lucene.Net.Codecs.Lucene40 |
| | Apache Lucene.NET 4.8.0 Documentation "> |
| <meta name="generator" content="docfx 2.47.0.0"> |
| |
| <link rel="shortcut icon" href="../../logo/favicon.ico"> |
| <link rel="stylesheet" href="../../styles/docfx.vendor.css"> |
| <link rel="stylesheet" href="../../styles/docfx.css"> |
| <link rel="stylesheet" href="../../styles/main.css"> |
| <meta property="docfx:navrel" content="../../toc.html"> |
| <meta property="docfx:tocrel" content="../toc.html"> |
| |
| <meta property="docfx:rel" content="../../"> |
| |
| </head> |
| <body data-spy="scroll" data-target="#affix" data-offset="120"> |
| <div id="wrapper"> |
| <header> |
| |
| <nav id="autocollapse" class="navbar ng-scope" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| |
| <a class="navbar-brand" href="../../index.html"> |
| <img id="logo" class="svg" src="../../logo/lucene-net-color.png" alt=""> |
| </a> |
| </div> |
| <div class="collapse navbar-collapse" id="navbar"> |
| <form class="navbar-form navbar-right" role="search" id="search"> |
| <div class="form-group"> |
| <input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off"> |
| </div> |
| </form> |
| </div> |
| </div> |
| </nav> |
| |
| <div class="subnav navbar navbar-default"> |
| <div class="container hide-when-search" id="breadcrumb"> |
| <ul class="breadcrumb"> |
| <li></li> |
| </ul> |
| </div> |
| </div> |
| </header> |
| <div class="container body-content"> |
| |
| <div id="search-results"> |
| <div class="search-list"></div> |
| <div class="sr-items"> |
| <p><i class="glyphicon glyphicon-refresh index-loading"></i></p> |
| </div> |
| <ul id="pagination"></ul> |
| </div> |
| </div> |
| <div role="main" class="container body-content hide-when-search"> |
| |
| <div class="sidenav hide-when-search"> |
| <a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a> |
| <div class="sidetoggle collapse" id="sidetoggle"> |
| <div id="sidetoc"></div> |
| </div> |
| </div> |
| <div class="article row grid-right"> |
| <div class="col-md-10"> |
| <article class="content wrap" id="_content" data-uid="Lucene.Net.Codecs.Lucene40"> |
| |
| <h1 id="Lucene_Net_Codecs_Lucene40" data-uid="Lucene.Net.Codecs.Lucene40" class="text-break">Namespace Lucene.Net.Codecs.Lucene40 |
| </h1> |
| <div class="markdown level0 summary"><!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <p>Support for testing <a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40PostingsFormat.html">Lucene40PostingsFormat</a>.</p> |
| </div> |
| <div class="markdown level0 conceptual"></div> |
| <div class="markdown level0 remarks"></div> |
| <h3 id="classes">Classes |
| </h3> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40Codec.html">Lucene40Codec</a></h4> |
| <section><p>Implements the Lucene 4.0 index format, with configurable per-field postings formats. |
| <p> |
| If you want to reuse functionality of this codec in another codec, extend |
| <a class="xref" href="Lucene.Net.Codecs.FilterCodec.html">FilterCodec</a>. |
| <p> |
| See <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Codecs.Lucene40.html">Lucene.Net.Codecs.Lucene40</a> package documentation for file format details.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40DocValuesFormat.html">Lucene40DocValuesFormat</a></h4> |
| <section><p>Lucene 4.0 DocValues format. |
| <p> |
| Files: |
| <ul><li><code>.dv.cfs</code>: compound container (<a class="xref" href="Lucene.Net.Store.CompoundFileDirectory.html">CompoundFileDirectory</a>)</li><li><code>.dv.cfe</code>: compound entries (<a class="xref" href="Lucene.Net.Store.CompoundFileDirectory.html">CompoundFileDirectory</a>)</li></ul> |
| Entries within the compound file: |
| <ul><li><code><segment><em><fieldNumber>.dat</em></code>: data values</li><li><code><segment><fieldNumber>.idx</code>: index into the .dat for DEREF types</li></ul> |
| <p> |
| There are several many types of <a class="xref" href="Lucene.Net.Index.DocValues.html">DocValues</a> with different encodings. |
| From the perspective of filenames, all types store their values in <code>.dat</code> |
| entries within the compound file. In the case of dereferenced/sorted types, the <code>.dat</code> |
| actually contains only the unique values, and an additional <code>.idx</code> file contains |
| pointers to these unique values. |
| </p> |
| Formats: |
| <ul><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.VAR_INTS</span> .dat --> Header, PackedType, MinValue, |
| DefaultValue, PackedStream</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_8</span> .dat --> Header, ValueSize, |
| Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) <sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_16</span> .dat --> Header, ValueSize, |
| Short (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt16_System_Int16_">WriteInt16(Int16)</a>) <sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_32</span> .dat --> Header, ValueSize, |
| Int32 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) <sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_64</span> .dat --> Header, ValueSize, |
| Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) <sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FLOAT_32</span> .dat --> Header, ValueSize, Float32<sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FLOAT_64</span> .dat --> Header, ValueSize, Float64<sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_STRAIGHT</span> .dat --> Header, ValueSize, |
| (Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) * ValueSize)<sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT</span> .idx --> Header, TotalBytes, Addresses</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT</span> .dat --> Header, |
| (Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) * <em>variable ValueSize</em>)<sup>maxdoc</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_DEREF</span> .idx --> Header, NumValues, Addresses</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_DEREF</span> .dat --> Header, ValueSize, |
| (Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) * ValueSize)<sup>NumValues</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF</span> .idx --> Header, TotalVarBytes, Addresses</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF</span> .dat --> Header, |
| (LengthPrefix + Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) * <em>variable ValueSize</em>)<sup>NumValues</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_SORTED</span> .idx --> Header, NumValues, Ordinals</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_SORTED</span> .dat --> Header, ValueSize, |
| (Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) * ValueSize)<sup>NumValues</sup></li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_SORTED</span> .idx --> Header, TotalVarBytes, Addresses, Ordinals</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_SORTED</span> .dat --> Header, |
| (Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) * <em>variable ValueSize</em>)<sup>NumValues</sup></li></ul> |
| Data Types: |
| <ul><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>PackedType --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>)</li><li>MaxAddress, MinValue, DefaultValue --> Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>PackedStream, Addresses, Ordinals --> <a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a></li><li>ValueSize, NumValues --> Int32 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) </li><li>Float32 --> 32-bit float encoded with <span class="xref">J2N.BitConversion.SingleToRawInt32Bits(System.Single)</span> |
| then written as Int32 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) </li><li>Float64 --> 64-bit float encoded with <span class="xref">J2N.BitConversion.DoubleToRawInt64Bits(System.Double)</span> |
| then written as Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>TotalBytes --> VLong (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>) </li><li>TotalVarBytes --> Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>LengthPrefix --> Length of the data value as VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) (maximum |
| of 2 bytes)</li></ul> |
| Notes: |
| <ul><li>PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers.</li><li>Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT |
| case, each entry can have a different length, so to determine the length, docid+1 is |
| retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses |
| stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length |
| is encoded as a prefix to the data itself as a VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) |
| (maximum of 2 bytes).</li><li>Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case, |
| the address into the .dat can be computed from the ordinal as |
| <code>Header+ValueSize+(ordinal*ValueSize)</code> because the byte length is fixed. |
| In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but |
| an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To |
| determine the length, ord+1's address is looked up as well.</li><li><span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT</span> in contrast to other straight |
| variants uses a <code>.idx</code> file to improve lookup perfromance. In contrast to |
| <span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF</span> it doesn't apply deduplication of the document values. |
| </li></ul> |
| <p> |
| Limitations: |
| <ul><li> Binary doc values can be at most <a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40DocValuesFormat.html#Lucene_Net_Codecs_Lucene40_Lucene40DocValuesFormat_MAX_BINARY_FIELD_LENGTH">MAX_BINARY_FIELD_LENGTH</a> in length.</li></ul> </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40FieldInfosFormat.html">Lucene40FieldInfosFormat</a></h4> |
| <section><p>Lucene 4.0 Field Infos format. |
| <p> |
| <p>Field names are stored in the field info file, with suffix <tt>.fnm</tt>.</p> |
| <p>FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, |
| FieldBits,DocValuesBits,Attributes> <sup>FieldsCount</sup></p> |
| <p>Data types: |
| <ul><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>FieldsCount --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>FieldName --> String (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteString_System_String_">WriteString(String)</a>) </li><li>FieldBits, DocValuesBits --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) </li><li>FieldNumber --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) </li><li>Attributes --> IDictionary<String,String> (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteStringStringMap_System_Collections_Generic_IDictionary_System_String_System_String__">WriteStringStringMap(IDictionary<String, String>)</a>) </li></ul> |
| </p> |
| Field Descriptions: |
| <ul><li>FieldsCount: the number of fields in this file.</li><li>FieldName: name of the field as a UTF-8 String.</li><li>FieldNumber: the field's number. Note that unlike previous versions of |
| Lucene, the fields are not numbered implicitly by their order in the |
| file, instead explicitly.</li><li>FieldBits: a byte containing field options. |
| <ul><li>The low-order bit is one for indexed fields, and zero for non-indexed |
| fields.</li><li>The second lowest-order bit is one for fields that have term vectors |
| stored, and zero for fields without term vectors.</li><li>If the third lowest order-bit is set (0x4), offsets are stored into |
| the postings list in addition to positions.</li><li>Fourth bit is unused.</li><li>If the fifth lowest-order bit is set (0x10), norms are omitted for the |
| indexed field.</li><li>If the sixth lowest-order bit is set (0x20), payloads are stored for the |
| indexed field.</li><li>If the seventh lowest-order bit is set (0x40), term frequencies and |
| positions omitted for the indexed field.</li><li>If the eighth lowest-order bit is set (0x80), positions are omitted for the |
| indexed field.</li></ul> |
| </li><li>DocValuesBits: a byte containing per-document value types. The type |
| recorded as two four-bit integers, with the high-order bits representing |
| <code>norms</code> options, and the low-order bits representing |
| <a class="xref" href="Lucene.Net.Index.DocValues.html">DocValues</a> options. Each four-bit integer can be decoded as such: |
| <ul><li>0: no DocValues for this field.</li><li>1: variable-width signed integers. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.VAR_INTS</span>)</li><li>2: 32-bit floating point values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FLOAT_32</span>)</li><li>3: 64-bit floating point values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FLOAT_64</span>)</li><li>4: fixed-length byte array values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_STRAIGHT</span>)</li><li>5: fixed-length dereferenced byte array values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_DEREF</span>)</li><li>6: variable-length byte array values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_STRAIGHT</span>)</li><li>7: variable-length dereferenced byte array values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_DEREF</span>)</li><li>8: 16-bit signed integers. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_16</span>)</li><li>9: 32-bit signed integers. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_32</span>)</li><li>10: 64-bit signed integers. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_64</span>)</li><li>11: 8-bit signed integers. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.FIXED_INTS_8</span>)</li><li>12: fixed-length sorted byte array values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_FIXED_SORTED</span>)</li><li>13: variable-length sorted byte array values. (<span class="xref">Lucene.Net.Codecs.Lucene40.LegacyDocValuesType.BYTES_VAR_SORTED</span>)</li></ul> |
| </li><li>Attributes: a key-value map of codec-private attributes.</li></ul></p> |
| <p>@lucene.experimental </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40LiveDocsFormat.html">Lucene40LiveDocsFormat</a></h4> |
| <section><p>Lucene 4.0 Live Documents Format. |
| <p> |
| <p>The .del file is optional, and only exists when a segment contains |
| deletions.</p> |
| <p>Although per-segment, this file is maintained exterior to compound segment |
| files.</p> |
| <p>Deletions (.del) --> Format,Header,ByteCount,BitCount, Bits | DGaps (depending |
| on Format)</p> |
| <ul><li>Format,ByteSize,BitCount --> Uint32 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) </li><li>Bits --> < Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) > <sup>ByteCount</sup></li><li>DGaps --> <DGap,NonOnesByte> <sup>NonzeroBytesCount</sup></li><li>DGap --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>NonOnesByte --> Byte(<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) </li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li></ul> |
| <p>Format is 1: indicates cleared DGaps.</p> |
| <p>ByteCount indicates the number of bytes in Bits. It is typically |
| (SegSize/8)+1.</p> |
| <p>BitCount indicates the number of bits that are currently set in Bits.</p> |
| <p>Bits contains one bit for each document indexed. When the bit corresponding |
| to a document number is cleared, that document is marked as deleted. Bit ordering |
| is from least to most significant. Thus, if Bits contains two bytes, 0x00 and |
| 0x02, then document 9 is marked as alive (not deleted).</p> |
| <p>DGaps represents sparse bit-vectors more efficiently than Bits. It is made |
| of DGaps on indexes of nonOnes bytes in Bits, and the nonOnes bytes themselves. |
| The number of nonOnes bytes in Bits (NonOnesBytesCount) is not stored.</p> |
| <p>For example, if there are 8000 bits and only bits 10,12,32 are cleared, DGaps |
| would be used:</p> |
| <p>(VInt) 1 , (byte) 20 , (VInt) 3 , (Byte) 1</p></p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40NormsFormat.html">Lucene40NormsFormat</a></h4> |
| <section><p>Lucene 4.0 Norms Format. |
| <p> |
| Files: |
| <ul><li><code>.nrm.cfs</code>: compound container (<a class="xref" href="Lucene.Net.Store.CompoundFileDirectory.html">CompoundFileDirectory</a>) </li><li><code>.nrm.cfe</code>: compound entries (<a class="xref" href="Lucene.Net.Store.CompoundFileDirectory.html">CompoundFileDirectory</a>) </li></ul> |
| Norms are implemented as DocValues, so other than file extension, norms are |
| written exactly the same way as <a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40DocValuesFormat.html">Lucene40DocValuesFormat</a>. |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40PostingsBaseFormat.html">Lucene40PostingsBaseFormat</a></h4> |
| <section><p>Provides a <a class="xref" href="Lucene.Net.Codecs.PostingsReaderBase.html">PostingsReaderBase</a> and |
| <a class="xref" href="Lucene.Net.Codecs.PostingsWriterBase.html">PostingsWriterBase</a>.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40PostingsFormat.html">Lucene40PostingsFormat</a></h4> |
| <section><p>Lucene 4.0 Postings format. |
| <p> |
| Files: |
| <ul><li><tt>.tim</tt>: <a href="#Termdictionary">Term Dictionary</a></li><li><tt>.tip</tt>: <a href="#Termindex">Term Index</a></li><li><tt>.frq</tt>: <a href="#Frequencies">Frequencies</a></li><li><tt>.prx</tt>: <a href="#Positions">Positions</a></li></ul> |
| </p> |
| <p> |
| <a name="Termdictionary" id="Termdictionary"></a> |
| <h3>Term Dictionary</h3></p> |
| <p>The .tim file contains the list of terms in each |
| field along with per-term statistics (such as docfreq) |
| and pointers to the frequencies, positions and |
| skip data in the .frq and .prx files. |
| See <a class="xref" href="Lucene.Net.Codecs.BlockTreeTermsWriter.html">BlockTreeTermsWriter</a> for more details on the format. |
| </p> |
| |
| <p>NOTE: The term dictionary can plug into different postings implementations: |
| the postings writer/reader are actually responsible for encoding |
| and decoding the Postings Metadata and Term Metadata sections described here:</p> |
| <ul><li>Postings Metadata --> Header, SkipInterval, MaxSkipLevels, SkipMinimum</li><li>Term Metadata --> FreqDelta, SkipDelta?, ProxDelta?</li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>SkipInterval,MaxSkipLevels,SkipMinimum --> Uint32 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) </li><li>SkipDelta,FreqDelta,ProxDelta --> VLong (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>) </li></ul> |
| <p>Notes:</p> |
| <ul><li>Header is a CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) storing the version information |
| for the postings.</li><li>SkipInterval is the fraction of TermDocs stored in skip tables. It is used to accelerate |
| <a class="xref" href="Lucene.Net.Search.DocIdSetIterator.html#Lucene_Net_Search_DocIdSetIterator_Advance_System_Int32_">Advance(Int32)</a>. Larger values result in smaller indexes, greater |
| acceleration, but fewer accelerable cases, while smaller values result in bigger indexes, |
| less acceleration (in case of a small value for MaxSkipLevels) and more accelerable cases. |
| </li><li>MaxSkipLevels is the max. number of skip levels stored for each term in the .frq file. A |
| low value results in smaller indexes but less acceleration, a larger value results in |
| slightly larger indexes but greater acceleration. See format of .frq file for more |
| information about skip levels.</li><li>SkipMinimum is the minimum document frequency a term must have in order to write any |
| skip data at all.</li><li>FreqDelta determines the position of this term's TermFreqs within the .frq |
| file. In particular, it is the difference between the position of this term's |
| data in that file and the position of the previous term's data (or zero, for |
| the first term in the block).</li><li>ProxDelta determines the position of this term's TermPositions within the |
| .prx file. In particular, it is the difference between the position of this |
| term's data in that file and the position of the previous term's data (or zero, |
| for the first term in the block. For fields that omit position data, this will |
| be 0 since prox information is not stored.</li><li>SkipDelta determines the position of this term's SkipData within the .frq |
| file. In particular, it is the number of bytes after TermFreqs that the |
| SkipData starts. In other words, it is the length of the TermFreq data. |
| SkipDelta is only stored if DocFreq is not smaller than SkipMinimum.</li></ul> |
| <a name="Termindex" id="Termindex"></a> |
| <h3>Term Index</h3> |
| <p>The .tip file contains an index into the term dictionary, so that it can be |
| accessed randomly. See <a class="xref" href="Lucene.Net.Codecs.BlockTreeTermsWriter.html">BlockTreeTermsWriter</a> for more details on the format.</p> |
| <a name="Frequencies" id="Frequencies"></a> |
| <h3>Frequencies</h3> |
| <p>The .frq file contains the lists of documents which contain each term, along |
| with the frequency of the term in that document (except when frequencies are |
| omitted: <a class="xref" href="Lucene.Net.Index.IndexOptions.html#Lucene_Net_Index_IndexOptions_DOCS_ONLY">DOCS_ONLY</a>).</p> |
| <ul><li>FreqFile (.frq) --> Header, <TermFreqs, SkipData?> <sup>TermCount</sup></li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>TermFreqs --> <TermFreq> <sup>DocFreq</sup></li><li>TermFreq --> DocDelta[, Freq?]</li><li>SkipData --> <<SkipLevelLength, SkipLevel> |
| <sup>NumSkipLevels-1</sup>, SkipLevel> <SkipDatum></li><li>SkipLevel --> <SkipDatum> <sup>DocFreq/(SkipInterval^(Level + |
| 1))</sup></li><li>SkipDatum --> |
| DocSkip,PayloadLength?,OffsetLength?,FreqSkip,ProxSkip,SkipChildLevelPointer?</li><li>DocDelta,Freq,DocSkip,PayloadLength,OffsetLength,FreqSkip,ProxSkip --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>SkipChildLevelPointer --> VLong (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>) </li></ul> |
| <p>TermFreqs are ordered by term (the term is implicit, from the term dictionary).</p> |
| <p>TermFreq entries are ordered by increasing document number.</p> |
| <p>DocDelta: if frequencies are indexed, this determines both the document |
| number and the frequency. In particular, DocDelta/2 is the difference between |
| this document number and the previous document number (or zero when this is the |
| first document in a TermFreqs). When DocDelta is odd, the frequency is one. |
| When DocDelta is even, the frequency is read as another VInt. If frequencies |
| are omitted, DocDelta contains the gap (not multiplied by 2) between document |
| numbers and no frequency information is stored.</p> |
| <p>For example, the TermFreqs for a term which occurs once in document seven |
| and three times in document eleven, with frequencies indexed, would be the |
| following sequence of VInts:</p> |
| <p>15, 8, 3</p> |
| <p>If frequencies were omitted (<a class="xref" href="Lucene.Net.Index.IndexOptions.html#Lucene_Net_Index_IndexOptions_DOCS_ONLY">DOCS_ONLY</a>) it would be this |
| sequence of VInts instead:</p> |
| <p>7,4</p> |
| <p>DocSkip records the document number before every SkipInterval <sup>th</sup> |
| document in TermFreqs. If payloads and offsets are disabled for the term's field, then |
| DocSkip represents the difference from the previous value in the sequence. If |
| payloads and/or offsets are enabled for the term's field, then DocSkip/2 represents the |
| difference from the previous value in the sequence. In this case when |
| DocSkip is odd, then PayloadLength and/or OffsetLength are stored indicating the length of |
| the last payload/offset before the SkipInterval<sup>th</sup> document in TermPositions.</p> |
| <p>PayloadLength indicates the length of the last payload.</p> |
| <p>OffsetLength indicates the length of the last offset (endOffset-startOffset).</p> |
| <p> |
| FreqSkip and ProxSkip record the position of every SkipInterval <sup>th</sup> |
| entry in FreqFile and ProxFile, respectively. File positions are relative to |
| the start of TermFreqs and Positions, to the previous SkipDatum in the |
| sequence.</p> |
| <p>For example, if DocFreq=35 and SkipInterval=16, then there are two SkipData |
| entries, containing the 15 <sup>th</sup> and 31 <sup>st</sup> document numbers |
| in TermFreqs. The first FreqSkip names the number of bytes after the beginning |
| of TermFreqs that the 16 <sup>th</sup> SkipDatum starts, and the second the |
| number of bytes after that that the 32 <sup>nd</sup> starts. The first ProxSkip |
| names the number of bytes after the beginning of Positions that the 16 |
| <sup>th</sup> SkipDatum starts, and the second the number of bytes after that |
| that the 32 <sup>nd</sup> starts.</p> |
| <p>Each term can have multiple skip levels. The amount of skip levels for a |
| term is NumSkipLevels = Min(MaxSkipLevels, |
| floor(log(DocFreq/log(SkipInterval)))). The number of SkipData entries for a |
| skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip level |
| is Level=0. |
| <p> |
| Example: SkipInterval = 4, MaxSkipLevels = 2, DocFreq = 35. Then skip level 0 |
| has 8 SkipData entries, containing the 3<sup>rd</sup>, 7<sup>th</sup>, |
| 11<sup>th</sup>, 15<sup>th</sup>, 19<sup>th</sup>, 23<sup>rd</sup>, |
| 27<sup>th</sup>, and 31<sup>st</sup> document numbers in TermFreqs. Skip level |
| 1 has 2 SkipData entries, containing the 15<sup>th</sup> and 31<sup>st</sup> |
| document numbers in TermFreqs. |
| <p> |
| The SkipData entries on all upper levels > 0 contain a SkipChildLevelPointer |
| referencing the corresponding SkipData entry in level-1. In the example has |
| entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a |
| pointer to entry 31 on level 0. |
| </p> |
| <a name="Positions" id="Positions"></a> |
| <h3>Positions</h3> |
| <p>The .prx file contains the lists of positions that each term occurs at |
| within documents. Note that fields omitting positional data do not store |
| anything into this file, and if all fields in the index omit positional data |
| then the .prx file will not exist.</p> |
| <ul><li>ProxFile (.prx) --> Header, <TermPositions> <sup>TermCount</sup></li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>TermPositions --> <Positions> <sup>DocFreq</sup></li><li>Positions --> <PositionDelta,PayloadLength?,OffsetDelta?,OffsetLength?,PayloadData?> <sup>Freq</sup></li><li>PositionDelta,OffsetDelta,OffsetLength,PayloadLength --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>PayloadData --> byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) <sup>PayloadLength</sup></li></ul> |
| <p>TermPositions are ordered by term (the term is implicit, from the term dictionary).</p> |
| <p>Positions entries are ordered by increasing document number (the document |
| number is implicit from the .frq file).</p> |
| <p>PositionDelta is, if payloads are disabled for the term's field, the |
| difference between the position of the current occurrence in the document and |
| the previous occurrence (or zero, if this is the first occurrence in this |
| document). If payloads are enabled for the term's field, then PositionDelta/2 |
| is the difference between the current and the previous position. If payloads |
| are enabled and PositionDelta is odd, then PayloadLength is stored, indicating |
| the length of the payload at the current term position.</p> |
| <p>For example, the TermPositions for a term which occurs as the fourth term in |
| one document, and as the fifth and ninth term in a subsequent document, would |
| be the following sequence of VInts (payloads disabled):</p> |
| <p>4, 5, 4</p> |
| <p>PayloadData is metadata associated with the current term position. If |
| PayloadLength is stored at the current position, then it indicates the length |
| of this payload. If PayloadLength is not stored, then this payload has the same |
| length as the payload at the previous position.</p> |
| <p>OffsetDelta/2 is the difference between this position's startOffset from the |
| previous occurrence (or zero, if this is the first occurrence in this document). |
| If OffsetDelta is odd, then the length (endOffset-startOffset) differs from the |
| previous occurrence and an OffsetLength follows. Offset data is only written for |
| <a class="xref" href="Lucene.Net.Index.IndexOptions.html#Lucene_Net_Index_IndexOptions_DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS">DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS</a>.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40PostingsReader.html">Lucene40PostingsReader</a></h4> |
| <section><p>Concrete class that reads the 4.0 frq/prox |
| postings format.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40SegmentInfoFormat.html">Lucene40SegmentInfoFormat</a></h4> |
| <section><p>Lucene 4.0 Segment info format. |
| <p> |
| Files: |
| <ul><li><tt>.si</tt>: Header, SegVersion, SegSize, IsCompoundFile, Diagnostics, Attributes, Files</li></ul> |
| </p> |
| Data types: |
| <p> |
| <ul><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>SegSize --> Int32 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt32_System_Int32_">WriteInt32(Int32)</a>) </li><li>SegVersion --> String (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteString_System_String_">WriteString(String)</a>) </li><li>Files --> ISet<String> (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteStringSet_System_Collections_Generic_ISet_System_String__">WriteStringSet(ISet<String>)</a>) </li><li>Diagnostics, Attributes --> IDictionary<String,String> (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteStringStringMap_System_Collections_Generic_IDictionary_System_String_System_String__">WriteStringStringMap(IDictionary<String, String>)</a>) </li><li>IsCompoundFile --> Int8 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) </li></ul> |
| </p> |
| Field Descriptions: |
| <p> |
| <ul><li>SegVersion is the code version that created the segment.</li><li>SegSize is the number of documents contained in the segment index.</li><li>IsCompoundFile records whether the segment is written as a compound file or |
| not. If this is -1, the segment is not a compound file. If it is 1, the segment |
| is a compound file.</li><li>Checksum contains the CRC32 checksum of all bytes in the segments_N file up |
| until the checksum. This is used to verify integrity of the file on opening the |
| index.</li><li>The Diagnostics Map is privately written by <a class="xref" href="Lucene.Net.Index.IndexWriter.html">IndexWriter</a>, as a debugging aid, |
| for each segment it creates. It includes metadata like the current Lucene |
| version, OS, .NET/Java version, why the segment was created (merge, flush, |
| addIndexes), etc.</li><li>Attributes: a key-value map of codec-private attributes.</li><li>Files is a list of files referred to by this segment.</li></ul> |
| </p></p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40SegmentInfoReader.html">Lucene40SegmentInfoReader</a></h4> |
| <section><p>Lucene 4.0 implementation of <a class="xref" href="Lucene.Net.Codecs.SegmentInfoReader.html">SegmentInfoReader</a>. |
| <p> |
| @lucene.experimental </p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40SegmentInfoWriter.html">Lucene40SegmentInfoWriter</a></h4> |
| <section><p>Lucene 4.0 implementation of <a class="xref" href="Lucene.Net.Codecs.SegmentInfoWriter.html">SegmentInfoWriter</a>. |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40SkipListReader.html">Lucene40SkipListReader</a></h4> |
| <section><p>Implements the skip list reader for the 4.0 posting list format |
| that stores positions and payloads.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40StoredFieldsFormat.html">Lucene40StoredFieldsFormat</a></h4> |
| <section><p>Lucene 4.0 Stored Fields Format. |
| <p>Stored fields are represented by two files:</p> |
| <ol><li><a name="field_index" id="field_index"></a> |
| <p>The field index, or <code>.fdx</code> file.</p> |
| <p>This is used to find the location within the field data file of the fields |
| of a particular document. Because it contains fixed-length data, this file may |
| be easily randomly accessed. The position of document <em>n</em> 's field data is |
| the Uint64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) at <em>n*8</em> in this file.</p> |
| <p>This contains, for each document, a pointer to its field data, as |
| follows:</p> |
| <ul><li>FieldIndex (.fdx) --> <Header>, <FieldValuesPosition> <sup>SegSize</sup></li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>FieldValuesPosition --> Uint64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li></ul> |
| </li><li> |
| <p><a name="field_data" id="field_data"></a>The field data, or <code>.fdt</code> file.</p> |
| <p>This contains the stored fields of each document, as follows:</p> |
| <ul><li>FieldData (.fdt) --> <Header>, <DocFieldData> <sup>SegSize</sup></li><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>DocFieldData --> FieldCount, <FieldNum, Bits, Value> |
| <sup>FieldCount</sup></li><li>FieldCount --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>FieldNum --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>Bits --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) |
| <ul><li>low order bit reserved.</li><li>second bit is one for fields containing binary data</li><li>third bit reserved.</li><li>4th to 6th bit (mask: 0x7<<3) define the type of a numeric field: |
| <ul><li>all bits in mask are cleared if no numeric field at all</li><li>1<<3: Value is Int</li><li>2<<3: Value is Long</li><li>3<<3: Value is Int as Float (as of <span class="xref">J2N.BitConversion.Int32BitsToSingle(System.Int32)</span></li><li>4<<3: Value is Long as Double (as of <span class="xref">J2N.BitConversion.Int64BitsToDouble(System.Int64)</span></li></ul> |
| </li></ul> |
| </li><li>Value --> String | BinaryValue | Int | Long (depending on Bits)</li><li>BinaryValue --> ValueSize, < Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) >^ValueSize</li><li>ValueSize --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li></ul> |
| </li></ol></p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40StoredFieldsReader.html">Lucene40StoredFieldsReader</a></h4> |
| <section><p>Class responsible for access to stored document fields. |
| <p> |
| It uses <segment>.fdt and <segment>.fdx; files. |
| <p> |
| <div class="lucene-block lucene-internal">This is a Lucene.NET INTERNAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40StoredFieldsWriter.html">Lucene40StoredFieldsWriter</a></h4> |
| <section><p>Class responsible for writing stored document fields. |
| <p> |
| It uses <segment>.fdt and <segment>.fdx; files. |
| <p> |
| <div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40TermVectorsFormat.html">Lucene40TermVectorsFormat</a></h4> |
| <section><p>Lucene 4.0 Term Vectors format. |
| <p>Term Vector support is an optional on a field by field basis. It consists of |
| 3 files.</p> |
| <ol><li><a name="tvx" id="tvx"></a> |
| <p>The Document Index or .tvx file.</p> |
| <p>For each document, this stores the offset into the document data (.tvd) and |
| field data (.tvf) files.</p> |
| <p>DocumentIndex (.tvx) --> Header,<DocumentPosition,FieldPosition> |
| <sup>NumDocs</sup></p> |
| <ul><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>DocumentPosition --> UInt64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) (offset in the .tvd file)</li><li>FieldPosition --> UInt64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) (offset in the .tvf file)</li></ul> |
| </li><li><a name="tvd" id="tvd"></a> |
| <p>The Document or .tvd file.</p> |
| <p>This contains, for each document, the number of fields, a list of the fields |
| with term vector info and finally a list of pointers to the field information |
| in the .tvf (Term Vector Fields) file.</p> |
| <p>The .tvd file is used to map out the fields that have term vectors stored |
| and where the field information is in the .tvf file.</p> |
| <p>Document (.tvd) --> Header,<NumFields, FieldNums, |
| FieldPositions> <sup>NumDocs</sup></p> |
| <ul><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>NumFields --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>FieldNums --> <FieldNumDelta> <sup>NumFields</sup></li><li>FieldNumDelta --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>FieldPositions --> <FieldPositionDelta> <sup>NumFields-1</sup></li><li>FieldPositionDelta --> VLong (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>) </li></ul> |
| </li><li><a name="tvf" id="tvf"></a> |
| <p>The Field or .tvf file.</p> |
| <p>This file contains, for each field that has a term vector stored, a list of |
| the terms, their frequencies and, optionally, position, offset, and payload |
| information.</p> |
| <p>Field (.tvf) --> Header,<NumTerms, Flags, TermFreqs> |
| <sup>NumFields</sup></p> |
| <ul><li>Header --> CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>NumTerms --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>Flags --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) </li><li>TermFreqs --> <TermText, TermFreq, Positions?, PayloadData?, Offsets?> |
| <sup>NumTerms</sup></li><li>TermText --> <PrefixLength, Suffix></li><li>PrefixLength --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>Suffix --> String (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteString_System_String_">WriteString(String)</a>) </li><li>TermFreq --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>Positions --> <PositionDelta PayloadLength?><sup>TermFreq</sup></li><li>PositionDelta --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>PayloadLength --> VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>PayloadData --> Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) <sup>NumPayloadBytes</sup></li><li>Offsets --> <VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>), VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) ><sup>TermFreq</sup></li></ul> |
| <p>Notes:</p> |
| <ul><li>Flags byte stores whether this term vector has position, offset, payload. |
| information stored.</li><li>Term byte prefixes are shared. The PrefixLength is the number of initial |
| bytes from the previous term which must be pre-pended to a term's suffix |
| in order to form the term's bytes. Thus, if the previous term's text was "bone" |
| and the term is "boy", the PrefixLength is two and the suffix is "y".</li><li>PositionDelta is, if payloads are disabled for the term's field, the |
| difference between the position of the current occurrence in the document and |
| the previous occurrence (or zero, if this is the first occurrence in this |
| document). If payloads are enabled for the term's field, then PositionDelta/2 |
| is the difference between the current and the previous position. If payloads |
| are enabled and PositionDelta is odd, then PayloadLength is stored, indicating |
| the length of the payload at the current term position.</li><li>PayloadData is metadata associated with a term position. If |
| PayloadLength is stored at the current position, then it indicates the length |
| of this payload. If PayloadLength is not stored, then this payload has the same |
| length as the payload at the previous position. PayloadData encodes the |
| concatenated bytes for all of a terms occurrences.</li><li>Offsets are stored as delta encoded VInts. The first VInt is the |
| startOffset, the second is the endOffset.</li></ul> |
| </li></ol></p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40TermVectorsReader.html">Lucene40TermVectorsReader</a></h4> |
| <section><p>Lucene 4.0 Term Vectors reader. |
| <p> |
| It reads .tvd, .tvf, and .tvx files.</p> |
| </section> |
| <h4><a class="xref" href="Lucene.Net.Codecs.Lucene40.Lucene40TermVectorsWriter.html">Lucene40TermVectorsWriter</a></h4> |
| <section><p>Lucene 4.0 Term Vectors writer. |
| <p> |
| It writes .tvd, .tvf, and .tvx files.</p> |
| </section> |
| </article> |
| </div> |
| |
| <div class="hidden-sm col-md-2" role="complementary"> |
| <div class="sideaffix"> |
| <div class="contribution"> |
| <ul class="nav"> |
| <li> |
| <a href="https://github.com/apache/lucenenet/blob/docs-4.8.0-beta00007/src/Lucene.Net.TestFramework/Codecs/Lucene40/package.md/#L2" class="contribution-link">Improve this Doc</a> |
| </li> |
| </ul> |
| </div> |
| <nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix"> |
| <!-- <p><a class="back-to-top" href="#top">Back to top</a><p> --> |
| </nav> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <footer> |
| <div class="grad-bottom"></div> |
| <div class="footer"> |
| <div class="container"> |
| <span class="pull-right"> |
| <a href="#top">Back to top</a> |
| </span> |
| Copyright © 2020 Licensed to the Apache Software Foundation (ASF) |
| |
| </div> |
| </div> |
| </footer> |
| </div> |
| |
| <script type="text/javascript" src="../../styles/docfx.vendor.js"></script> |
| <script type="text/javascript" src="../../styles/docfx.js"></script> |
| <script type="text/javascript" src="../../styles/main.js"></script> |
| </body> |
| </html> |