| # Apache Lucene Migration Guide |
| |
| ## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) ## |
| |
| SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 + |
| distance)). Payload factor calculation is performed by PayloadDecoder in the |
| queries module |
| |
| |
| ## Scorer must produce positive scores (LUCENE-7996) ## |
| |
| Scorers are no longer allowed to produce negative scores. If you have custom |
| query implementations, you should make sure their score formula may never produce |
| negative scores. |
| |
| As a side-effect of this change, negative boosts are now rejected and |
| FunctionScoreQuery maps negative values to 0. |
| |
| |
| ## CustomScoreQuery, BoostedQuery and BoostingQuery removed (LUCENE-8099) ## |
| |
| Instead use FunctionScoreQuery and a DoubleValuesSource implementation. BoostedQuery |
| and BoostingQuery may be replaced by calls to FunctionScoreQuery.boostByValue() and |
| FunctionScoreQuery.boostByQuery(). To replace more complex calculations in |
| CustomScoreQuery, use the lucene-expressions module: |
| |
| SimpleBindings bindings = new SimpleBindings(); |
| bindings.add("score", DoubleValuesSource.SCORES); |
| bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield")); |
| bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield")); |
| Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))"); |
| FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings)); |
| |
| ## Index options can no longer be changed dynamically (LUCENE-8134) ## |
| |
| Changing index options on the fly is now going to result into an |
| IllegalArgumentException. If a field is indexed |
| (FieldType.indexOptions() != IndexOptions.NONE) then all documents must have |
| the same index options for that field. |
| |
| |
| ## IndexSearcher.createNormalizedWeight() removed (LUCENE-8242) ## |
| |
| Instead use IndexSearcher.createWeight(), rewriting the query first, and using |
| a boost of 1f. |
| |
| ## Memory codecs removed (LUCENE-8267) ## |
| |
| Memory codecs have been removed from the codebase (MemoryPostings, MemoryDocValues). |
| |
| ## QueryCachingPolicy.ALWAYS_CACHE removed (LUCENE-8144) ## |
| |
| Caching everything is discouraged as it disables the ability to skip non-interesting documents. |
| ALWAYS_CACHE can be replaced by a UsageTrackingQueryCachingPolicy with an appropriate config. |
| |
| ## English stopwords are no longer removed by default in StandardAnalyzer (LUCENE_7444) ## |
| |
| To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument |
| to the constructor |
| |
| ## StandardAnalyzer.ENGLISH_STOP_WORDS_SET has been moved ## |
| |
| English stop words are now defined in EnglishAnalyzer#ENGLISH_STOP_WORDS_SET in the |
| analysis-common module |
| |
| ## TopDocs.maxScore removed ## |
| |
| TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have |
| an option to compute the maximum score when sorting by field. If you need to |
| know the maximum score for a query, the recommended approach is to run a |
| separate query: |
| |
| TopDocs topHits = searcher.search(query, 1); |
| float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score; |
| |
| Thanks to other optimizations that were added to Lucene 8, this query will be |
| able to efficiently select the top-scoring document without having to visit |
| all matches. |
| |
| ## TopFieldCollector always assumes fillFields=true ## |
| |
| Because filling sort values doesn't have a significant overhead, the fillFields |
| option has been removed from TopFieldCollector factory methods. Everything |
| behaves as if it was set to true. |
| |
| ## TopFieldCollector no longer takes a trackDocScores option ## |
| |
| Computing scores at collection time is less efficient than running a second |
| request in order to only compute scores for documents that made it to the top |
| hits. As a consequence, the trackDocScores option has been removed and can be |
| replaced with the new TopFieldCollector#populateScores helper method. |
| |
| ## IndexSearcher.search(After) may return lower bounds of the hit count and TopDocs.totalHits is no longer a long ## |
| |
| Lucene 8 received optimizations for collection of top-k matches by not visiting |
| all matches. However these optimizations won't help if all matches still need |
| to be visited in order to compute the total number of hits. As a consequence, |
| IndexSearcher's search and searchAfter methods were changed to only count hits |
| accurately up to 1,000, and Topdocs.totalHits was changed from a long to an |
| object that says whether the hit count is accurate or a lower bound of the |
| actual hit count. |
| |
| ## RAMDirectory, RAMFile, RAMInputStream, RAMOutputStream are deprecated ## |
| |
| This RAM-based directory implementation is an old piece of code that uses inefficient |
| thread synchronization primitives and can be confused as "faster" than the NIO-based |
| MMapDirectory. It is deprecated and scheduled for removal in future versions of |
| Lucene. (LUCENE-8467, LUCENE-8438) |
| |
| ## LeafCollector.setScorer() now takes a Scorable rather than a Scorer ## |
| |
| Scorer has a number of methods that should never be called from Collectors, for example |
| those that advance the underlying iterators. To hide these, LeafCollector.setScorer() |
| now takes a Scorable, an abstract class that Scorers can extend, with methods |
| docId() and score() (LUCENE-6228) |
| |
| ## Scorers must have non-null Weights ## |
| |
| If a custom Scorer implementation does not have an associated Weight, it can probably |
| be replaced with a Scorable instead. |
| |
| ## Suggesters now return Long instead of long for weight() during indexing, and double |
| instead of long at suggest time ## |
| |
| Most code should just require recompilation, though possibly requiring some added casts. |
| |
| ## TokenStreamComponents is now final ## |
| |
| Instead of overriding TokenStreamComponents#setReader() to customise analyzer |
| initialisation, you should now pass a Consumer<Reader> instance to the |
| TokenStreamComponents constructor. |
| |
| ## LowerCaseTokenizer and LowerCaseTokenizerFactory have been removed ## |
| |
| LowerCaseTokenizer combined tokenization and filtering in a way that broke token |
| normalization, so they have been removed. Instead, use a LetterTokenizer followed by |
| a LowerCaseFilter |
| |
| ## CharTokenizer no longer takes a normalizer function ## |
| |
| CharTokenizer now only performs tokenization. To perform any type of filtering |
| use a TokenFilter chain as you would with any other Tokenizer. |
| |
| ## Highlighter and FastVectorHighlighter no longer support ToParent/ToChildBlockJoinQuery |
| |
| Both Highlighter and FastVectorHighlighter need a custom WeightedSpanTermExtractor or FieldQuery respectively |
| in order to support ToParent/ToChildBlockJoinQuery. |