This module exposes functionality from ICU to Apache Lucene. ICU4N is a .NET library that enhances .NET's internationalization support by improving performance, keeping current with the Unicode Standard, and providing richer APIs.
[!NOTE] Since the .NET platform doesn‘t provide a BreakIterator class (or similar), the functionality that utilizes it was consolidated from Java Lucene’s analyzers-icu package, xref:Lucene.Net.Analysis.Common and xref:Lucene.Net.Highlighter into this unified package. [!WARNING] While ICU4N's BreakIterator has customizable rules, its default behavior is not the same as the one in the JDK. When using any features of this package outside of the xref:Lucene.Net.Analysis.Icu namespace, they will behave differently than they do in Java Lucene and the rules may need some tweaking to fit your needs. See the Break Rules ICU documentation for details on how to customize
ICU4N.Text.RuleBaseBreakIterator.
This module exposes the following functionality:
Text Analysis: For an introduction to Lucene's analysis API, see the xref:Lucene.Net.Analysis package documentation.
Text Segmentation: Tokenizes text based on properties and rules defined in Unicode.
Collation: Compare strings according to the conventions and standards of a particular language, region or country.
Normalization: Converts text to a unique, equivalent form.
Case Folding: Removes case distinctions with Unicode's Default Caseless Matching algorithm.
Search Term Folding: Removes distinctions (such as accent marks) between similar characters for a loose or fuzzy search.
Text Transformation: Transforms Unicode text in a context-sensitive fashion: e.g. mapping Traditional to Simplified Chinese
Unicode Highlighter Support
Postings Highlighter: Highlighter implementation that uses offsets from postings lists.
Vector Highlighter: An implementation of IBoundaryScanner for use with the vector highlighter in the Lucene.Net.Highlighter module.