Code to maintain and access indices.
Fields
xref:Lucene.Net.Index.Fields is the initial entry point into the postings APIs, this can be obtained in several ways: // access indexed fields for an index segment Fields fields = reader.fields(); // access term vector fields for a specified document Fields fields = reader.getTermVectors(docid); Fields implements Java's Iterable interface, so its easy to enumerate the list of fields: // enumerate list of fields for (String field : fields) { // access the terms for this field Terms terms = fields.terms(field); }
Terms
xref:Lucene.Net.Index.Terms represents the collection of terms within a field, exposes some metadata and statistics, and an API for enumeration. // metadata about the field System.out.println("positions? " + terms.hasPositions()); System.out.println("offsets? " + terms.hasOffsets()); System.out.println("payloads? " + terms.hasPayloads()); // iterate through terms TermsEnum termsEnum = terms.iterator(null); BytesRef term = null; while ((term = termsEnum.next()) != null) { doSomethingWith(termsEnum.term()); } xref:Lucene.Net.Index.TermsEnum provides an iterator over the list of terms within a field, some statistics about the term, and methods to access the term's documents and positions. // seek to a specific term boolean found = termsEnum.seekExact(new BytesRef(“foobar”)); if (found) { // get the document frequency System.out.println(termsEnum.docFreq()); // enumerate through documents DocsEnum docs = termsEnum.docs(null, null); // enumerate through documents and positions DocsAndPositionsEnum docsAndPositions = termsEnum.docsAndPositions(null, null); }
Documents
xref:Lucene.Net.Index.DocsEnum is an extension of xref:Lucene.Net.Search.DocIdSetIteratorthat iterates over the list of documents for a term, along with the term frequency within that document. int docid; while ((docid = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { System.out.println(docid); System.out.println(docsEnum.freq()); }
Positions
xref:Lucene.Net.Index.DocsAndPositionsEnum is an extension of xref:Lucene.Net.Index.DocsEnum that additionally allows iteration of the positions a term occurred within the document, and any additional per-position information (offsets and payload) int docid; while ((docid = docsAndPositionsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { System.out.println(docid); int freq = docsAndPositionsEnum.freq(); for (int i = 0; i < freq;="" i++)="" {="" system.out.println(docsandpositionsenum.nextposition());="" system.out.println(docsandpositionsenum.startoffset());="" system.out.println(docsandpositionsenum.endoffset());="" system.out.println(docsandpositionsenum.getpayload());="" }="" }="">
Term statistics
-1
) if term frequencies were omitted from the index (DOCS_ONLY) for the field. Like docFreq(), it will also count occurrences that appear in deleted documents.Field statistics
-1
) for some Terms implementations such as xref:Lucene.Net.Index.MultiTerms, where it cannot be efficiently computed. Note that this count also includes terms that appear only in deleted documents: when segments are merged such terms are also merged away and the statistic is then updated. * #getDocCount: Returns the number of documents that contain at least one occurrence of any term for this field. This can be thought of as a Field-level docFreq(). Like docFreq() it will also count deleted documents. * #getSumDocFreq: Returns the number of postings (term-document mappings in the inverted index) for the field. This can be thought of as the sum of #docFreq across all terms in the field, and like docFreq() it will also count postings that appear in deleted documents. * #getSumTotalTermFreq: Returns the number of tokens for the field. This can be thought of as the sum of #totalTermFreq across all terms in the field, and like totalTermFreq() it will also count occurrences that appear in deleted documents, and will be unavailable (returns -1
) if term frequencies were omitted from the index (DOCS_ONLY) for the field.Segment statistics
Document statistics
Document statistics are available during the indexing process for an indexed field: typically a xref:Lucene.Net.Search.Similarities.Similarity implementation will store some of these values (possibly in a lossy way), into the normalization value for the document in its #computeNorm method.
Additional user-supplied statistics can be added to the document as DocValues fields and accessed via #getNumericDocValues.