| If possible, use the same JRE major version at both index and search time. |
| When upgrading to a different JRE major version, consider re-indexing. |
| |
| Different JRE major versions may implement different versions of Unicode, |
| which will change the way some parts of Lucene treat your text. |
| |
| For example: with Java 1.4, LetterTokenizer will split around the character U+02C6, |
| but with Java 5 it will not. |
| This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4. |
| |
| For reference, JRE major versions with their corresponding Unicode versions: |
| Java 1.4, Unicode 3.0 |
| Java 5, Unicode 4.0 |
| Java 6, Unicode 4.0 |
| Java 7, Unicode 5.1 |
| |
| In general, whether or not you need to re-index largely depends upon the data that |
| you are searching, and what was changed in any given Unicode version. For example, |
| if you are completely sure that your content is limited to the "Basic Latin" range |
| of Unicode, you can safely ignore this. |
| |
| Special Notes: |
| |
| LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION |
| |
| * StandardAnalyzer will return the same results under Java 5 as it did under |
| Java 1.4. This is because it is largely independent of the runtime JRE for |
| Unicode support, (with the exception of lowercasing). However, no changes to |
| casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are |
| using this Analyzer you are NOT affected. |
| |
| * SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and |
| LowerCaseTokenizer may return different results, along with many other Analyzers |
| and TokenStreams in Lucene's contrib area. If you are using one of these |
| components, you may be affected. |
| |