lucene/MIGRATE.txt - lucene-solr - Git at Google

 # Apache Lucene Migration Guide

 ## Changed SPI lookups for codecs and analysis changed (LUCENE-7873) ##

 Due to serious problems with context class loaders in several frameworks
 (OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, DocValuesFormats
 and all analysis factories was changed to only inspect the current classloader
 that defined the interface class (`lucene-core.jar`). Normal applications
 should not encounter any issues with that change, because the application
 classloader (unnamed module in Java 9) can load all SPIs from all JARs
 from classpath.

 For any code that relies on the old behaviour (e.g., certain web applications
 or components in application servers) one can manually instruct the Lucene
 SPI implementation to also inspect the context classloader. To do this,
 add this code to the early startup phase of your application before any
 Apache Lucene component is used:

     ClassLoader cl = Thread.currentThread().getContextClassLoader();
     // Codecs:
     PostingsFormat.reloadPostingsFormats(cl);
     DocValuesFormat.reloadDocValuesFormats(cl);
     Codec.reloadCodecs(cl);
     // Analysis:
     CharFilterFactory.reloadCharFilters(cl);
     TokenFilterFactory.reloadTokenFilters(cl);
     TokenizerFactory.reloadTokenizers(cl);

 This code will reload all service providers from the given class loader
 (in our case the context class loader). Of course, instead of specifying
 the context class loader, it is receommended to use the application's main
 class loader or the module class loader.

 If you are migrating your project to Java 9 Jigsaw module system, keep in mind
 that Lucene currently does not yet support `module-info.java` declarations of
 service provider impls (`provides` statement). It is therefore recommended
 to keep all of Lucene in one Uber-Module and not try to split Lucene into
 several modules. As soon as Lucene will migrate to Java 9 as minimum
 requirement, we will work on improving that.

 For OSGI, the same applies. You have to create a bundle with all of Lucene for
 SPI to work correctly.

 ## CustomAnalyzer resources (LUCENE-7883)##

 Lucene no longer uses the context class loader when resolving resources in
 CustomAnalyzer or ClassPathResourceLoader. Resources are only resolved
 against Lucene's class loader by default. Please use another builder method
 to change to a custom classloader.

 ## Query.hashCode and Query.equals are now abstract methods (LUCENE-7277)

 Any custom query subclasses should redeclare equivalence relationship according
 to the subclass's details. See code patterns used in existing core Lucene query
 classes for details.

 ## CompressionTools removed (LUCENE-7322)

 Per-field compression has been superseded by codec-level compression, which has
 the benefit of being able to compress several fields, or even documents at once,
 yielding better compression ratios. In case you would still like to compress on
 top of the codec, you can do it on the application side by using the utility
 classes from the java.util.zip package.

 ## Explanation.toHtml() removed (LUCENE-7360)

 Clients wishing to render Explanations as HTML should implement their own
 utilities for this.

 ## Similarity.coord and BooleanQuery.disableCoord removed (LUCENE-7369)

 Coordination factors were a workaround for the fact that the ClassicSimilarity
 does not have strong enough term frequency saturation. This causes disjunctions
 to get better scores on documents that have many occurrences of a few query
 terms than on documents that match most clauses, which is most of time
 undesirable. The new BM25Similarity does not suffer from this problem since it
 has better saturation for the contribution of the term frequency so the coord
 factors have been removed from scores. Things now work as if coords were always
 disabled when constructing boolean queries.

 ## Weight.getValueForNormalization() and Weight.normalize() removed (LUCENE-7368)

 Query normalization's goal was to make scores comparable across queries, which
 was only implemented by the ClassicSimilarity. Since ClassicSimilarity is not
 the default similarity anymore, this functionality has been removed. Boosts are
 now propagated through Query#createWeight.

 ## AnalyzingQueryParser removed (LUCENE-7355)

 The functionality of AnalyzingQueryParser has been folded into the classic
 QueryParser, which now passes terms through Analyzer#normalize when generating
 queries.

 ## CommonQueryParserConfiguration.setLowerCaseExpandedTerms removed (LUCENE-7355)

 This option has been removed as expanded terms are now normalized through
 Analyzer#normalize.

 ## Cache key and close listener refactoring (LUCENE-7410)

 The way to access cache keys and add close listeners has been refactored in
 order to be less trappy. You should now use IndexReader.getReaderCacheHelper()
 to have manage caches that take deleted docs and doc values updates into
 account, and LeafReader.getCoreCacheHelper() to manage per-segment caches that
 do not take deleted docs and doc values updates into account.

 ## Index-time boosts removal (LUCENE-6819)

 Index-time boosts are not supported anymore. As a replacement, index-time
 scoring factors should be indexed in a doc value field and combined with the
 score at query time using FunctionScoreQuery for instance.

 ## Grouping collector refactoring (LUCENE-7701)

 Groups are now defined by GroupSelector classes, making it easier to define new
 types of groups.  Rather than having term or function specific collection
 classes, FirstPassGroupingCollector, AllGroupsCollector and
 AllGroupHeadsCollector are now concrete classes taking a GroupSelector.

 SecondPassGroupingCollector is no longer specifically aimed at
 collecting TopDocs for each group, but instead takes a GroupReducer that will
 perform any type of reduction on the top groups collected on a first-pass.  To
 reproduce the old behaviour of SecondPassGroupingCollector, you should instead
 use TopGroupsCollector.

 ## Removed legacy numerics (LUCENE-7850)

 Support for legacy numerics has been removed since legacy numerics had been
 deprecated since Lucene 6.0. Points should be used instead, see
 org.apache.lucene.index.PointValues for an introduction.

 ## TopDocs.totalHits is now a long (LUCENE-7872)

 TopDocs.totalHits is now a long so that TopDocs instances can be used to
 represent top hits that have more than 2B matches. This is necessary for the
 case that multiple TopDocs instances are merged together with TopDocs#merge as
 they might have more than 2B matches in total. However TopDocs instances
 returned by IndexSearcher will still have a total number of hits which is less
 than 2B since Lucene indexes are still bound to at most 2B documents, so it
 can safely be casted to an int in that case.

 ## PrefixAwareTokenFilter and PrefixAndSuffixAwareTokenFilter removed
 (LUCENE-7877)

 Instead use ConcatentingTokenStream, which will allow for the use of custom
 attributes.

 ## FieldValueQuery is renamed to DocValuesFieldExistsQuery (LUCENE-7899)

 This query matches only documents that have a value for the specified doc
 values field.
	# Apache Lucene Migration Guide

	## Changed SPI lookups for codecs and analysis changed (LUCENE-7873) ##

	Due to serious problems with context class loaders in several frameworks
	(OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, DocValuesFormats
	and all analysis factories was changed to only inspect the current classloader
	that defined the interface class (`lucene-core.jar`). Normal applications
	should not encounter any issues with that change, because the application
	classloader (unnamed module in Java 9) can load all SPIs from all JARs
	from classpath.

	For any code that relies on the old behaviour (e.g., certain web applications
	or components in application servers) one can manually instruct the Lucene
	SPI implementation to also inspect the context classloader. To do this,
	add this code to the early startup phase of your application before any
	Apache Lucene component is used:

	ClassLoader cl = Thread.currentThread().getContextClassLoader();
	// Codecs:
	PostingsFormat.reloadPostingsFormats(cl);
	DocValuesFormat.reloadDocValuesFormats(cl);
	Codec.reloadCodecs(cl);
	// Analysis:
	CharFilterFactory.reloadCharFilters(cl);
	TokenFilterFactory.reloadTokenFilters(cl);
	TokenizerFactory.reloadTokenizers(cl);

	This code will reload all service providers from the given class loader
	(in our case the context class loader). Of course, instead of specifying
	the context class loader, it is receommended to use the application's main
	class loader or the module class loader.

	If you are migrating your project to Java 9 Jigsaw module system, keep in mind
	that Lucene currently does not yet support `module-info.java` declarations of
	service provider impls (`provides` statement). It is therefore recommended
	to keep all of Lucene in one Uber-Module and not try to split Lucene into
	several modules. As soon as Lucene will migrate to Java 9 as minimum
	requirement, we will work on improving that.

	For OSGI, the same applies. You have to create a bundle with all of Lucene for
	SPI to work correctly.

	## CustomAnalyzer resources (LUCENE-7883)##

	Lucene no longer uses the context class loader when resolving resources in
	CustomAnalyzer or ClassPathResourceLoader. Resources are only resolved
	against Lucene's class loader by default. Please use another builder method
	to change to a custom classloader.

	## Query.hashCode and Query.equals are now abstract methods (LUCENE-7277)

	Any custom query subclasses should redeclare equivalence relationship according
	to the subclass's details. See code patterns used in existing core Lucene query
	classes for details.

	## CompressionTools removed (LUCENE-7322)

	Per-field compression has been superseded by codec-level compression, which has
	the benefit of being able to compress several fields, or even documents at once,
	yielding better compression ratios. In case you would still like to compress on
	top of the codec, you can do it on the application side by using the utility
	classes from the java.util.zip package.

	## Explanation.toHtml() removed (LUCENE-7360)

	Clients wishing to render Explanations as HTML should implement their own
	utilities for this.

	## Similarity.coord and BooleanQuery.disableCoord removed (LUCENE-7369)

	Coordination factors were a workaround for the fact that the ClassicSimilarity
	does not have strong enough term frequency saturation. This causes disjunctions
	to get better scores on documents that have many occurrences of a few query
	terms than on documents that match most clauses, which is most of time
	undesirable. The new BM25Similarity does not suffer from this problem since it
	has better saturation for the contribution of the term frequency so the coord
	factors have been removed from scores. Things now work as if coords were always
	disabled when constructing boolean queries.

	## Weight.getValueForNormalization() and Weight.normalize() removed (LUCENE-7368)

	Query normalization's goal was to make scores comparable across queries, which
	was only implemented by the ClassicSimilarity. Since ClassicSimilarity is not
	the default similarity anymore, this functionality has been removed. Boosts are
	now propagated through Query#createWeight.

	## AnalyzingQueryParser removed (LUCENE-7355)

	The functionality of AnalyzingQueryParser has been folded into the classic
	QueryParser, which now passes terms through Analyzer#normalize when generating
	queries.

	## CommonQueryParserConfiguration.setLowerCaseExpandedTerms removed (LUCENE-7355)

	This option has been removed as expanded terms are now normalized through
	Analyzer#normalize.

	## Cache key and close listener refactoring (LUCENE-7410)

	The way to access cache keys and add close listeners has been refactored in
	order to be less trappy. You should now use IndexReader.getReaderCacheHelper()
	to have manage caches that take deleted docs and doc values updates into
	account, and LeafReader.getCoreCacheHelper() to manage per-segment caches that
	do not take deleted docs and doc values updates into account.

	## Index-time boosts removal (LUCENE-6819)

	Index-time boosts are not supported anymore. As a replacement, index-time
	scoring factors should be indexed in a doc value field and combined with the
	score at query time using FunctionScoreQuery for instance.

	## Grouping collector refactoring (LUCENE-7701)

	Groups are now defined by GroupSelector classes, making it easier to define new
	types of groups. Rather than having term or function specific collection
	classes, FirstPassGroupingCollector, AllGroupsCollector and
	AllGroupHeadsCollector are now concrete classes taking a GroupSelector.

	SecondPassGroupingCollector is no longer specifically aimed at
	collecting TopDocs for each group, but instead takes a GroupReducer that will
	perform any type of reduction on the top groups collected on a first-pass. To
	reproduce the old behaviour of SecondPassGroupingCollector, you should instead
	use TopGroupsCollector.

	## Removed legacy numerics (LUCENE-7850)

	Support for legacy numerics has been removed since legacy numerics had been
	deprecated since Lucene 6.0. Points should be used instead, see
	org.apache.lucene.index.PointValues for an introduction.

	## TopDocs.totalHits is now a long (LUCENE-7872)

	TopDocs.totalHits is now a long so that TopDocs instances can be used to
	represent top hits that have more than 2B matches. This is necessary for the
	case that multiple TopDocs instances are merged together with TopDocs#merge as
	they might have more than 2B matches in total. However TopDocs instances
	returned by IndexSearcher will still have a total number of hits which is less
	than 2B since Lucene indexes are still bound to at most 2B documents, so it
	can safely be casted to an int in that case.

	## PrefixAwareTokenFilter and PrefixAndSuffixAwareTokenFilter removed
	(LUCENE-7877)

	Instead use ConcatentingTokenStream, which will allow for the use of custom
	attributes.

	## FieldValueQuery is renamed to DocValuesFieldExistsQuery (LUCENE-7899)

	This query matches only documents that have a value for the specified doc
	values field.