--- uid: Lucene.Net.Analysis.Cn.Smart summary: *content

Analyzer for Simplified Chinese, which indexes words. @lucene.experimental

  • StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.

  • CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.

  • SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase: “我是中国人”

  1. StandardAnalyzer: 我-是-中-国-人

  2. CJKAnalyzer: 我是-是中-中国-国人

  3. SmartChineseAnalyzer: 我-是-中国-人