blob: a6c4a07c89e54d361b2fa91864e3164954d033a0 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Namespace Lucene.Net.Codecs.Lucene45
| Apache Lucene.NET 4.8.0-beta00008 Documentation </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Namespace Lucene.Net.Codecs.Lucene45
| Apache Lucene.NET 4.8.0-beta00008 Documentation ">
<meta name="generator" content="docfx 2.50.0.0">
<link rel="shortcut icon" href="../../logo/favicon.ico">
<link rel="stylesheet" href="../../styles/docfx.vendor.css">
<link rel="stylesheet" href="../../styles/docfx.css">
<link rel="stylesheet" href="../../styles/main.css">
<meta property="docfx:navrel" content="../../toc.html">
<meta property="docfx:tocrel" content="../toc.html">
<meta property="docfx:rel" content="../../">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../../index.html">
<img id="logo" class="svg" src="../../logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search" id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</div>
</div>
</header>
<div class="container body-content">
<div id="search-results">
<div class="search-list"></div>
<div class="sr-items">
<p><i class="glyphicon glyphicon-refresh index-loading"></i></p>
</div>
<ul id="pagination"></ul>
</div>
</div>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="Lucene.Net.Codecs.Lucene45">
<h1 id="Lucene_Net_Codecs_Lucene45" data-uid="Lucene.Net.Codecs.Lucene45" class="text-break">Namespace Lucene.Net.Codecs.Lucene45
</h1>
<div class="markdown level0 summary"><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Support for testing <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45Codec.html">Lucene45Codec</a>.</p>
</div>
<div class="markdown level0 conceptual"></div>
<div class="markdown level0 remarks"></div>
<h3 id="classes">Classes
</h3>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45Codec.html">Lucene45Codec</a></h4>
<section><p>Implements the Lucene 4.5 index format, with configurable per-field postings
and docvalues formats.
<p>
If you want to reuse functionality of this codec in another codec, extend
<a class="xref" href="Lucene.Net.Codecs.FilterCodec.html">FilterCodec</a>.
<p>
See <a class="xref" href="../Lucene.Net.TestFramework/Lucene.Net.Codecs.Lucene45.html">Lucene.Net.Codecs.Lucene45</a> package documentation for file format details.
<p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div><p>
</section>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesConsumer.html">Lucene45DocValuesConsumer</a></h4>
<section><p>Writer for <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesFormat.html">Lucene45DocValuesFormat</a> </p>
</section>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesFormat.html">Lucene45DocValuesFormat</a></h4>
<section><p>Lucene 4.5 DocValues format.
<p>
Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with these strategies:
<p>
<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_NUMERIC">NUMERIC</a>:
<ul><li>Delta-compressed: per-document integers written in blocks of 16k. For each block
the minimum value in that block is encoded, and each entry is a delta from that
minimum value. Each block of deltas is compressed with bitpacking. For more
information, see <a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>.</li><li>Table-compressed: when the number of unique values is very small (&lt; 256), and
when there are unused &quot;gaps&quot; in the range of values used (such as <a class="xref" href="Lucene.Net.Util.SmallSingle.html">SmallSingle</a>),
a lookup table is written instead. Each per-document entry is instead the ordinal
to this table, and those ordinals are compressed with bitpacking (<a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a>).</li><li>GCD-compressed: when all numbers share a common divisor, such as dates, the greatest
common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.</li></ul>
<p>
<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_BINARY">BINARY</a>:
<ul><li>Fixed-width Binary: one large concatenated <span class="xref">byte[]</span> is written, along with the fixed length.
Each document&apos;s value can be addressed directly with multiplication (<code>docID * length</code>).</li><li>Variable-width Binary: one large concatenated <span class="xref">byte[]</span> is written, along with end addresses
for each document. The addresses are written in blocks of 16k, with the current absolute
start for the block, and the average (expected) delta per entry. For each document the
deviation from the delta (actual - expected) is written.</li><li>Prefix-compressed Binary: values are written in chunks of 16, with the first value written
completely and other values sharing prefixes. Chunk addresses are written in blocks of 16k,
with the current absolute start for the block, and the average (expected) delta per entry.
For each chunk the deviation from the delta (actual - expected) is written.</li></ul>
<p>
<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_SORTED">SORTED</a>:
<ul><li>Sorted: a mapping of ordinals to deduplicated terms is written as Prefix-Compressed Binary,
along with the per-document ordinals written using one of the numeric strategies above.</li></ul>
<p>
<a class="xref" href="Lucene.Net.Index.DocValuesType.html#Lucene_Net_Index_DocValuesType_SORTED_SET">SORTED_SET</a>:
<ul><li>SortedSet: a mapping of ordinals to deduplicated terms is written as Prefix-Compressed Binary,
an ordinal list and per-document index into this list are written using the numeric strategies
above.</li></ul>
<p>
Files:
<ol><li><code>.dvd</code>: DocValues data</li><li><code>.dvm</code>: DocValues metadata</li></ol>
<ol><li><a name="dvm" id="dvm"></a>
<p>The DocValues metadata or .dvm file.</p>
<p>For DocValues field, this stores metadata, such as the offset into the
DocValues data (.dvd)</p>
<p>DocValues metadata (.dvm) --&gt; Header,&lt;Entry&gt;<sup>NumFields</sup>,Footer</p>
<ul><li>Entry --&gt; NumericEntry | BinaryEntry | SortedEntry | SortedSetEntry</li><li>NumericEntry --&gt; GCDNumericEntry | TableNumericEntry | DeltaNumericEntry</li><li>GCDNumericEntry --&gt; NumericHeader,MinValue,GCD</li><li>TableNumericEntry --&gt; NumericHeader,TableSize,Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) <sup>TableSize</sup></li><li>DeltaNumericEntry --&gt; NumericHeader</li><li>NumericHeader --&gt; FieldNumber,EntryType,NumericType,MissingOffset,PackedVersion,DataOffset,Count,BlockSize</li><li>BinaryEntry --&gt; FixedBinaryEntry | VariableBinaryEntry | PrefixBinaryEntry</li><li>FixedBinaryEntry --&gt; BinaryHeader</li><li>VariableBinaryEntry --&gt; BinaryHeader,AddressOffset,PackedVersion,BlockSize</li><li>PrefixBinaryEntry --&gt; BinaryHeader,AddressInterval,AddressOffset,PackedVersion,BlockSize</li><li>BinaryHeader --&gt; FieldNumber,EntryType,BinaryType,MissingOffset,MinLength,MaxLength,DataOffset</li><li>SortedEntry --&gt; FieldNumber,EntryType,BinaryEntry,NumericEntry</li><li>SortedSetEntry --&gt; EntryType,BinaryEntry,NumericEntry,NumericEntry</li><li>FieldNumber,PackedVersion,MinLength,MaxLength,BlockSize,ValueCount --&gt; VInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a></li><li>EntryType,CompressionType --&gt; Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a></li><li>Header --&gt; CodecHeader (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteHeader_Lucene_Net_Store_DataOutput_System_String_System_Int32_">WriteHeader(DataOutput, String, Int32)</a>) </li><li>MinValue,GCD,MissingOffset,AddressOffset,DataOffset --&gt; Int64 (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteInt64_System_Int64_">WriteInt64(Int64)</a>) </li><li>TableSize --&gt; vInt (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt32_System_Int32_">WriteVInt32(Int32)</a>) </li><li>Footer --&gt; CodecFooter (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul>
<p>Sorted fields have two entries: a <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.BinaryEntry.html">Lucene45DocValuesProducer.BinaryEntry</a> with the value metadata,
and an ordinary <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.NumericEntry.html">Lucene45DocValuesProducer.NumericEntry</a> for the document-to-ord metadata.</p>
<p>SortedSet fields have three entries: a <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.BinaryEntry.html">Lucene45DocValuesProducer.BinaryEntry</a> with the value metadata,
and two <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.NumericEntry.html">Lucene45DocValuesProducer.NumericEntry</a>s for the document-to-ord-index and ordinal list metadata.</p>
<p>FieldNumber of -1 indicates the end of metadata.</p>
<p>EntryType is a 0 (<a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.NumericEntry.html">Lucene45DocValuesProducer.NumericEntry</a>) or 1 (<a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.BinaryEntry.html">Lucene45DocValuesProducer.BinaryEntry</a>)</p>
<p>DataOffset is the pointer to the start of the data in the DocValues data (.dvd)</p>
<p>NumericType indicates how Numeric values will be compressed:
<ul><li>0 --&gt; delta-compressed. For each block of 16k integers, every integer is delta-encoded
from the minimum value within the block.</li><li>1 --&gt; gcd-compressed. When all integers share a common divisor, only quotients are stored
using blocks of delta-encoded ints.</li><li>2 --&gt; table-compressed. When the number of unique numeric values is small and it would save space,
a lookup table of unique values is written, followed by the ordinal for each document.</li></ul>
<p>BinaryType indicates how Binary values will be stored:
<ul><li>0 --&gt; fixed-width. All values have the same length, addressing by multiplication.</li><li>1 --&gt; variable-width. An address for each value is stored.</li><li>2 --&gt; prefix-compressed. An address to the start of every interval&apos;th value is stored.</li></ul>
<p>MinLength and MaxLength represent the min and max byte[] value lengths for Binary values.
If they are equal, then all values are of a fixed size, and can be addressed as DataOffset + (docID * length).
Otherwise, the binary values are of variable size, and packed integer metadata (PackedVersion,BlockSize)
is written for the addresses.
<p>MissingOffset points to a <span class="xref">byte[]</span> containing a bitset of all documents that had a value for the field.
If its -1, then there are no missing values.
<p>Checksum contains the CRC32 checksum of all bytes in the .dvm file up
until the checksum. this is used to verify integrity of the file on opening the
index.</li><li><a name="dvd" id="dvd"></a>
<p>The DocValues data or .dvd file.</p>
<p>For DocValues field, this stores the actual per-document data (the heavy-lifting)</p>
<p>DocValues data (.dvd) --&gt; Header,&lt;NumericData | BinaryData | SortedData&gt;<sup>NumFields</sup>,Footer</p>
<ul><li>NumericData --&gt; DeltaCompressedNumerics | TableCompressedNumerics | GCDCompressedNumerics</li><li>BinaryData --&gt; Byte (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteByte_System_Byte_">WriteByte(Byte)</a>) <sup>DataLength</sup>,Addresses</li><li>SortedData --&gt; FST&lt;Int64&gt; (<a class="xref" href="Lucene.Net.Util.Fst.FST-1.html">FST&lt;T&gt;</a>) </li><li>DeltaCompressedNumerics --&gt; BlockPackedInts(blockSize=16k) (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>TableCompressedNumerics --&gt; PackedInts (<a class="xref" href="Lucene.Net.Util.Packed.PackedInt32s.html">PackedInt32s</a>) </li><li>GCDCompressedNumerics --&gt; BlockPackedInts(blockSize=16k) (<a class="xref" href="Lucene.Net.Util.Packed.BlockPackedWriter.html">BlockPackedWriter</a>) </li><li>Addresses --&gt; MonotonicBlockPackedInts(blockSize=16k) (<a class="xref" href="Lucene.Net.Util.Packed.MonotonicBlockPackedWriter.html">MonotonicBlockPackedWriter</a>) </li><li>Footer --&gt; CodecFooter (<a class="xref" href="Lucene.Net.Codecs.CodecUtil.html#Lucene_Net_Codecs_CodecUtil_WriteFooter_Lucene_Net_Store_IndexOutput_">WriteFooter(IndexOutput)</a>) </li></ul>
<p>SortedSet entries store the list of ordinals in their BinaryData as a
sequences of increasing vLongs (<a class="xref" href="Lucene.Net.Store.DataOutput.html#Lucene_Net_Store_DataOutput_WriteVInt64_System_Int64_">WriteVInt64(Int64)</a>), delta-encoded.</p></li></ol>
<p>
<div class="lucene-block lucene-experimental">This is a Lucene.NET EXPERIMENTAL API, use at your own risk</div></section>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.html">Lucene45DocValuesProducer</a></h4>
<section><p>Reader for <a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesFormat.html">Lucene45DocValuesFormat</a>. </p>
</section>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.BinaryEntry.html">Lucene45DocValuesProducer.BinaryEntry</a></h4>
<section><p>Metadata entry for a binary docvalues field. </p>
</section>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.NumericEntry.html">Lucene45DocValuesProducer.NumericEntry</a></h4>
<section><p>Metadata entry for a numeric docvalues field. </p>
</section>
<h4><a class="xref" href="Lucene.Net.Codecs.Lucene45.Lucene45DocValuesProducer.SortedSetEntry.html">Lucene45DocValuesProducer.SortedSetEntry</a></h4>
<section><p>Metadata entry for a sorted-set docvalues field. </p>
</section>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/docs/4.8.0-beta00008/src/Lucene.Net.TestFramework/Codecs/Lucene45/package.md/#L2" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<!-- <p><a class="back-to-top" href="#top">Back to top</a><p> -->
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright © 2020 Licensed to the Apache Software Foundation (ASF)
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="../../styles/docfx.vendor.js"></script>
<script type="text/javascript" src="../../styles/docfx.js"></script>
<script type="text/javascript" src="../../styles/main.js"></script>
</body>
</html>