| # Apache Lucene Migration Guide |
| |
| ## Changed SPI lookups for codecs and analysis changed (LUCENE-7873) ## |
| |
| Due to serious problems with context class loaders in several frameworks |
| (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, DocValuesFormats |
| and all analysis factories was changed to only inspect the current classloader |
| that defined the interface class (`lucene-core.jar`). Normal applications |
| should not encounter any issues with that change, because the application |
| classloader (unnamed module in Java 9) can load all SPIs from all JARs |
| from classpath. |
| |
| For any code that relies on the old behaviour (e.g., certain web applications |
| or components in application servers) one can manually instruct the Lucene |
| SPI implementation to also inspect the context classloader. To do this, |
| add this code to the early startup phase of your application before any |
| Apache Lucene component is used: |
| |
| ClassLoader cl = Thread.currentThread().getContextClassLoader(); |
| // Codecs: |
| PostingsFormat.reloadPostingsFormats(cl); |
| DocValuesFormat.reloadDocValuesFormats(cl); |
| Codec.reloadCodecs(cl); |
| // Analysis: |
| CharFilterFactory.reloadCharFilters(cl); |
| TokenFilterFactory.reloadTokenFilters(cl); |
| TokenizerFactory.reloadTokenizers(cl); |
| |
| This code will reload all service providers from the given class loader |
| (in our case the context class loader). Of course, instead of specifying |
| the context class loader, it is receommended to use the application's main |
| class loader or the module class loader. |
| |
| If you are migrating your project to Java 9 Jigsaw module system, keep in mind |
| that Lucene currently does not yet support `module-info.java` declarations of |
| service provider impls (`provides` statement). It is therefore recommended |
| to keep all of Lucene in one Uber-Module and not try to split Lucene into |
| several modules. As soon as Lucene will migrate to Java 9 as minimum |
| requirement, we will work on improving that. |
| |
| For OSGI, the same applies. You have to create a bundle with all of Lucene for |
| SPI to work correctly. |
| |
| ## CustomAnalyzer resources (LUCENE-7883)## |
| |
| Lucene no longer uses the context class loader when resolving resources in |
| CustomAnalyzer or ClassPathResourceLoader. Resources are only resolved |
| against Lucene's class loader by default. Please use another builder method |
| to change to a custom classloader. |
| |
| ## Query.hashCode and Query.equals are now abstract methods (LUCENE-7277) |
| |
| Any custom query subclasses should redeclare equivalence relationship according |
| to the subclass's details. See code patterns used in existing core Lucene query |
| classes for details. |
| |
| ## CompressionTools removed (LUCENE-7322) |
| |
| Per-field compression has been superseded by codec-level compression, which has |
| the benefit of being able to compress several fields, or even documents at once, |
| yielding better compression ratios. In case you would still like to compress on |
| top of the codec, you can do it on the application side by using the utility |
| classes from the java.util.zip package. |
| |
| ## Explanation.toHtml() removed (LUCENE-7360) |
| |
| Clients wishing to render Explanations as HTML should implement their own |
| utilities for this. |
| |
| ## Similarity.coord and BooleanQuery.disableCoord removed (LUCENE-7369) |
| |
| Coordination factors were a workaround for the fact that the ClassicSimilarity |
| does not have strong enough term frequency saturation. This causes disjunctions |
| to get better scores on documents that have many occurrences of a few query |
| terms than on documents that match most clauses, which is most of time |
| undesirable. The new BM25Similarity does not suffer from this problem since it |
| has better saturation for the contribution of the term frequency so the coord |
| factors have been removed from scores. Things now work as if coords were always |
| disabled when constructing boolean queries. |
| |
| ## Weight.getValueForNormalization() and Weight.normalize() removed (LUCENE-7368) |
| |
| Query normalization's goal was to make scores comparable across queries, which |
| was only implemented by the ClassicSimilarity. Since ClassicSimilarity is not |
| the default similarity anymore, this functionality has been removed. Boosts are |
| now propagated through Query#createWeight. |
| |
| ## AnalyzingQueryParser removed (LUCENE-7355) |
| |
| The functionality of AnalyzingQueryParser has been folded into the classic |
| QueryParser, which now passes terms through Analyzer#normalize when generating |
| queries. |
| |
| ## CommonQueryParserConfiguration.setLowerCaseExpandedTerms removed (LUCENE-7355) |
| |
| This option has been removed as expanded terms are now normalized through |
| Analyzer#normalize. |
| |
| ## Cache key and close listener refactoring (LUCENE-7410) |
| |
| The way to access cache keys and add close listeners has been refactored in |
| order to be less trappy. You should now use IndexReader.getReaderCacheHelper() |
| to have manage caches that take deleted docs and doc values updates into |
| account, and LeafReader.getCoreCacheHelper() to manage per-segment caches that |
| do not take deleted docs and doc values updates into account. |
| |
| ## Index-time boosts removal (LUCENE-6819) |
| |
| Index-time boosts are not supported anymore. As a replacement, index-time |
| scoring factors should be indexed in a doc value field and combined with the |
| score at query time using FunctionScoreQuery for instance. |
| |
| ## Grouping collector refactoring (LUCENE-7701) |
| |
| Groups are now defined by GroupSelector classes, making it easier to define new |
| types of groups. Rather than having term or function specific collection |
| classes, FirstPassGroupingCollector, AllGroupsCollector and |
| AllGroupHeadsCollector are now concrete classes taking a GroupSelector. |
| |
| SecondPassGroupingCollector is no longer specifically aimed at |
| collecting TopDocs for each group, but instead takes a GroupReducer that will |
| perform any type of reduction on the top groups collected on a first-pass. To |
| reproduce the old behaviour of SecondPassGroupingCollector, you should instead |
| use TopGroupsCollector. |
| |
| ## Removed legacy numerics (LUCENE-7850) |
| |
| Support for legacy numerics has been removed since legacy numerics had been |
| deprecated since Lucene 6.0. Points should be used instead, see |
| org.apache.lucene.index.PointValues for an introduction. |
| |
| ## TopDocs.totalHits is now a long (LUCENE-7872) |
| |
| TopDocs.totalHits is now a long so that TopDocs instances can be used to |
| represent top hits that have more than 2B matches. This is necessary for the |
| case that multiple TopDocs instances are merged together with TopDocs#merge as |
| they might have more than 2B matches in total. However TopDocs instances |
| returned by IndexSearcher will still have a total number of hits which is less |
| than 2B since Lucene indexes are still bound to at most 2B documents, so it |
| can safely be casted to an int in that case. |
| |
| ## PrefixAwareTokenFilter and PrefixAndSuffixAwareTokenFilter removed |
| (LUCENE-7877) |
| |
| Instead use ConcatentingTokenStream, which will allow for the use of custom |
| attributes. |
| |
| ## FieldValueQuery is renamed to DocValuesFieldExistsQuery (LUCENE-7899) |
| |
| This query matches only documents that have a value for the specified doc |
| values field. |