uid: Lucene.Net.Analysis.Cjk summary: *content
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams. This analyzer generates bigram terms, which are overlapping groups of two adjacent Han, Hiragana, Katakana, or Hangul characters.
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.
- ChineseAnalyzer (in the Lucene.Net.Analysis.Cn namespace): Index unigrams (individual Chinese characters) as a token.
- CJKAnalyzer (in the Lucene.Net.Analysis.Cjk namespace): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
- SmartChineseAnalyzer (in the Lucene.Net.Analysis.SmartCn package): Index words (attempt to segment Chinese text into words) as tokens.
Example phrase: “我是中国人”
- ChineseAnalyzer: 我-是-中-国-人
- CJKAnalyzer: 我是-是中-中国-国人
- SmartChineseAnalyzer: 我-是-中国-人