lucene/MIGRATE.txt - lucene-solr - Git at Google

 # Apache Lucene Migration Guide

 ## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) ##

 SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 +
 distance)).  Payload factor calculation is performed by PayloadDecoder in the
 queries module


 ## Scorer must produce positive scores (LUCENE-7996) ##

 Scorers are no longer allowed to produce negative scores. If you have custom
 query implementations, you should make sure their score formula may never produce
 negative scores.

 As a side-effect of this change, negative boosts are now rejected and
 FunctionScoreQuery maps negative values to 0.


 ## CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099) ##

 Instead use FunctionScoreQuery and a DoubleValuesSource implementation.  BoostedQuery
 and BoostingQuery may be replaced by calls to FunctionScoreQuery.boostByValue() and
 FunctionScoreQuery.boostByQuery().  To replace more complex calculations in
 CustomScoreQuery, use the lucene-expressions module:

 SimpleBindings bindings = new SimpleBindings();
 bindings.add("score", DoubleValuesSource.SCORES);
 bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
 bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
 Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
 FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));

 ## Index options can no longer be changed dynamically (LUCENE-8134) ##

 Changing index options on the fly is now going to result into an
 IllegalArgumentException. If a field is indexed
 (FieldType.indexOptions() != IndexOptions.NONE) then all documents must have
 the same index options for that field.


 ## IndexSearcher.createNormalizedWeight() removed (LUCENE-8242) ##

 Instead use IndexSearcher.createWeight(), rewriting the query first, and using
 a boost of 1f.

 ## Memory codecs removed (LUCENE-8267) ##

 Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues).

 ## QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144) ##

 Caching everything is discouraged as it disables the ability to skip non-interesting documents.
 ALWAYS_CACHE can be replaced by a UsageTrackingQueryCachingPolicy with an appropriate config.

 ## English stopwords are no longer removed by default in StandardAnalyzer (LUCENE_7444) ##

 To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument
 to the constructor

 ## StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved ##

 English stop words are now defined in EnglishAnalyzer#ENGLISH_STOP_WORDS_SET in the
 analysis-common module

 ## TopDocs.maxScore removed ##

 TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have
 an option to compute the maximum score when sorting by field. If you need to
 know the maximum score for a query, the recommended approach is to run a
 separate query:

   TopDocs topHits = searcher.search(query, 1);
   float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;

 Thanks to other optimizations that were added to Lucene 8, this query will be
 able to efficiently select the top-scoring document without having to visit
 all matches.

 ## TopFieldCollector always assumes fillFields=true ##

 Because filling sort values doesn't have a significant overhead, the fillFields
 option has been removed from TopFieldCollector factory methods. Everything
 behaves as if it was set to true.

 ## TopFieldCollector no longer takes a trackDocScores option ##

 Computing scores at collection time is less efficient than running a second
 request in order to only compute scores for documents that made it to the top
 hits. As a consequence, the trackDocScores option has been removed and can be
 replaced with the new TopFieldCollector#populateScores helper method.

 ## IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long ##

 Lucene 8 received optimizations for collection of top-k matches by not visiting
 all matches. However these optimizations won't help if all matches still need
 to be visited in order to compute the total number of hits. As a consequence,
 IndexSearcher's search and searchAfter methods were changed to only count hits
 accurately up to 1,000, and Topdocs.totalHits was changed from a long to an
 object that says whether the hit count is accurate or a lower bound of the
 actual hit count.

 ## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated ##

 This RAM-based directory implementation is an old piece of code that uses inefficient
 thread synchronization primitives and can be confused as "faster" than the NIO-based
 MMapDirectory. It is deprecated and scheduled for removal in future versions of
 Lucene. (LUCENE-8467, LUCENE-8438)

 ## LeafCollector.setScorer() now takes a Scorable rather than a Scorer ##

 Scorer has a number of methods that should never be called from Collectors, for example
 those that advance the underlying iterators.  To hide these, LeafCollector.setScorer()
 now takes a Scorable, an abstract class that Scorers can extend, with methods
 docId() and score() (LUCENE-6228)

 ## Scorers must have non-null Weights ##

 If a custom Scorer implementation does not have an associated Weight, it can probably
 be replaced with a Scorable instead.

 ## Suggesters now return Long instead of long for weight() during indexing, and double
 instead of long at suggest time ##

 Most code should just require recompilation, though possibly requiring some added casts.

 ## TokenStreamComponents is now final ##

 Instead of overriding TokenStreamComponents#setReader() to customise analyzer
 initialisation, you should now pass a Consumer&lt;Reader> instance to the
 TokenStreamComponents constructor.

 ## LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed ##

 LowerCaseTokenizer combined tokenization and filtering in a way that broke token
 normalization, so they have been removed. Instead, use a LetterTokenizer followed by
 a LowerCaseFilter

 ## CharTokenizer no longer takes a normalizer function ##

 CharTokenizer now only performs tokenization. To perform any type of filtering
 use a TokenFilter chain as you would with any other Tokenizer.

 ## Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery

 Both Highlighter and FastVectorHighlighter need a custom WeightedSpanTermExtractor or FieldQuery respectively
 in order to support ToParent/ToChildBlockJoinQuery.
	# Apache Lucene Migration Guide

	## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) ##

	SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 +
	distance)). Payload factor calculation is performed by PayloadDecoder in the
	queries module


	## Scorer must produce positive scores (LUCENE-7996) ##

	Scorers are no longer allowed to produce negative scores. If you have custom
	query implementations, you should make sure their score formula may never produce
	negative scores.

	As a side-effect of this change, negative boosts are now rejected and
	FunctionScoreQuery maps negative values to 0.


	## CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099) ##

	Instead use FunctionScoreQuery and a DoubleValuesSource implementation. BoostedQuery
	and BoostingQuery may be replaced by calls to FunctionScoreQuery.boostByValue() and
	FunctionScoreQuery.boostByQuery(). To replace more complex calculations in
	CustomScoreQuery, use the lucene-expressions module:

	SimpleBindings bindings = new SimpleBindings();
	bindings.add("score", DoubleValuesSource.SCORES);
	bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
	bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
	Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
	FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));

	## Index options can no longer be changed dynamically (LUCENE-8134) ##

	Changing index options on the fly is now going to result into an
	IllegalArgumentException. If a field is indexed
	(FieldType.indexOptions() != IndexOptions.NONE) then all documents must have
	the same index options for that field.


	## IndexSearcher.createNormalizedWeight() removed (LUCENE-8242) ##

	Instead use IndexSearcher.createWeight(), rewriting the query first, and using
	a boost of 1f.

	## Memory codecs removed (LUCENE-8267) ##

	Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues).

	## QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144) ##

	Caching everything is discouraged as it disables the ability to skip non-interesting documents.
	ALWAYS_CACHE can be replaced by a UsageTrackingQueryCachingPolicy with an appropriate config.

	## English stopwords are no longer removed by default in StandardAnalyzer (LUCENE_7444) ##

	To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument
	to the constructor

	## StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved ##

	English stop words are now defined in EnglishAnalyzer#ENGLISH_STOP_WORDS_SET in the
	analysis-common module

	## TopDocs.maxScore removed ##

	TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have
	an option to compute the maximum score when sorting by field. If you need to
	know the maximum score for a query, the recommended approach is to run a
	separate query:

	TopDocs topHits = searcher.search(query, 1);
	float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;

	Thanks to other optimizations that were added to Lucene 8, this query will be
	able to efficiently select the top-scoring document without having to visit
	all matches.

	## TopFieldCollector always assumes fillFields=true ##

	Because filling sort values doesn't have a significant overhead, the fillFields
	option has been removed from TopFieldCollector factory methods. Everything
	behaves as if it was set to true.

	## TopFieldCollector no longer takes a trackDocScores option ##

	Computing scores at collection time is less efficient than running a second
	request in order to only compute scores for documents that made it to the top
	hits. As a consequence, the trackDocScores option has been removed and can be
	replaced with the new TopFieldCollector#populateScores helper method.

	## IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long ##

	Lucene 8 received optimizations for collection of top-k matches by not visiting
	all matches. However these optimizations won't help if all matches still need
	to be visited in order to compute the total number of hits. As a consequence,
	IndexSearcher's search and searchAfter methods were changed to only count hits
	accurately up to 1,000, and Topdocs.totalHits was changed from a long to an
	object that says whether the hit count is accurate or a lower bound of the
	actual hit count.

	## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated ##

	This RAM-based directory implementation is an old piece of code that uses inefficient
	thread synchronization primitives and can be confused as "faster" than the NIO-based
	MMapDirectory. It is deprecated and scheduled for removal in future versions of
	Lucene. (LUCENE-8467, LUCENE-8438)

	## LeafCollector.setScorer() now takes a Scorable rather than a Scorer ##

	Scorer has a number of methods that should never be called from Collectors, for example
	those that advance the underlying iterators. To hide these, LeafCollector.setScorer()
	now takes a Scorable, an abstract class that Scorers can extend, with methods
	docId() and score() (LUCENE-6228)

	## Scorers must have non-null Weights ##

	If a custom Scorer implementation does not have an associated Weight, it can probably
	be replaced with a Scorable instead.

	## Suggesters now return Long instead of long for weight() during indexing, and double
	instead of long at suggest time ##

	Most code should just require recompilation, though possibly requiring some added casts.

	## TokenStreamComponents is now final ##

	Instead of overriding TokenStreamComponents#setReader() to customise analyzer
	initialisation, you should now pass a Consumer<Reader> instance to the
	TokenStreamComponents constructor.

	## LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed ##

	LowerCaseTokenizer combined tokenization and filtering in a way that broke token
	normalization, so they have been removed. Instead, use a LetterTokenizer followed by
	a LowerCaseFilter

	## CharTokenizer no longer takes a normalizer function ##

	CharTokenizer now only performs tokenization. To perform any type of filtering
	use a TokenFilter chain as you would with any other Tokenizer.

	## Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery

	Both Highlighter and FastVectorHighlighter need a custom WeightedSpanTermExtractor or FieldQuery respectively
	in order to support ToParent/ToChildBlockJoinQuery.