blob: 0515ff8123c9ab946e4d7ed39f8f46986f8854c7 [file] [log] [blame]
# Apache Lucene Migration Guide
## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) ##
SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 +
distance)). Payload factor calculation is performed by PayloadDecoder in the
queries module
## Scorer must produce positive scores (LUCENE-7996) ##
Scorers are no longer allowed to produce negative scores. If you have custom
query implementations, you should make sure their score formula may never produce
negative scores.
As a side-effect of this change, negative boosts are now rejected and
FunctionScoreQuery maps negative values to 0.
## CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099) ##
Instead use FunctionScoreQuery and a DoubleValuesSource implementation. BoostedQuery
and BoostingQuery may be replaced by calls to FunctionScoreQuery.boostByValue() and
FunctionScoreQuery.boostByQuery(). To replace more complex calculations in
CustomScoreQuery, use the lucene-expressions module:
SimpleBindings bindings = new SimpleBindings();
bindings.add("score", DoubleValuesSource.SCORES);
bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
## Index options can no longer be changed dynamically (LUCENE-8134) ##
Changing index options on the fly is now going to result into an
IllegalArgumentException. If a field is indexed
(FieldType.indexOptions() != IndexOptions.NONE) then all documents must have
the same index options for that field.
## IndexSearcher.createNormalizedWeight() removed (LUCENE-8242) ##
Instead use IndexSearcher.createWeight(), rewriting the query first, and using
a boost of 1f.
## Memory codecs removed (LUCENE-8267) ##
Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues).
## QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144) ##
Caching everything is discouraged as it disables the ability to skip non-interesting documents.
ALWAYS_CACHE can be replaced by a UsageTrackingQueryCachingPolicy with an appropriate config.
## English stopwords are no longer removed by default in StandardAnalyzer (LUCENE_7444) ##
To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument
to the constructor
## StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved ##
English stop words are now defined in EnglishAnalyzer#ENGLISH_STOP_WORDS_SET in the
analysis-common module
## TopDocs.maxScore removed ##
TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have
an option to compute the maximum score when sorting by field. If you need to
know the maximum score for a query, the recommended approach is to run a
separate query:
TopDocs topHits = searcher.search(query, 1);
float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;
Thanks to other optimizations that were added to Lucene 8, this query will be
able to efficiently select the top-scoring document without having to visit
all matches.
## TopFieldCollector always assumes fillFields=true ##
Because filling sort values doesn't have a significant overhead, the fillFields
option has been removed from TopFieldCollector factory methods. Everything
behaves as if it was set to true.
## TopFieldCollector no longer takes a trackDocScores option ##
Computing scores at collection time is less efficient than running a second
request in order to only compute scores for documents that made it to the top
hits. As a consequence, the trackDocScores option has been removed and can be
replaced with the new TopFieldCollector#populateScores helper method.
## IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long ##
Lucene 8 received optimizations for collection of top-k matches by not visiting
all matches. However these optimizations won't help if all matches still need
to be visited in order to compute the total number of hits. As a consequence,
IndexSearcher's search and searchAfter methods were changed to only count hits
accurately up to 1,000, and Topdocs.totalHits was changed from a long to an
object that says whether the hit count is accurate or a lower bound of the
actual hit count.
## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated ##
This RAM-based directory implementation is an old piece of code that uses inefficient
thread synchronization primitives and can be confused as "faster" than the NIO-based
MMapDirectory. It is deprecated and scheduled for removal in future versions of
Lucene. (LUCENE-8467, LUCENE-8438)
## LeafCollector.setScorer() now takes a Scorable rather than a Scorer ##
Scorer has a number of methods that should never be called from Collectors, for example
those that advance the underlying iterators. To hide these, LeafCollector.setScorer()
now takes a Scorable, an abstract class that Scorers can extend, with methods
docId() and score() (LUCENE-6228)
## Scorers must have non-null Weights ##
If a custom Scorer implementation does not have an associated Weight, it can probably
be replaced with a Scorable instead.
## Suggesters now return Long instead of long for weight() during indexing, and double
instead of long at suggest time ##
Most code should just require recompilation, though possibly requiring some added casts.
## TokenStreamComponents is now final ##
Instead of overriding TokenStreamComponents#setReader() to customise analyzer
initialisation, you should now pass a Consumer<Reader> instance to the
TokenStreamComponents constructor.
## LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed ##
LowerCaseTokenizer combined tokenization and filtering in a way that broke token
normalization, so they have been removed. Instead, use a LetterTokenizer followed by
a LowerCaseFilter
## CharTokenizer no longer takes a normalizer function ##
CharTokenizer now only performs tokenization. To perform any type of filtering
use a TokenFilter chain as you would with any other Tokenizer.
## Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery
Both Highlighter and FastVectorHighlighter need a custom WeightedSpanTermExtractor or FieldQuery respectively
in order to support ToParent/ToChildBlockJoinQuery.