| Lucene contrib change Log |
| |
| ======================= Release 2.9.0 2009-09-23 ======================= |
| |
| Changes in runtime behavior |
| |
| * LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all |
| number conversion. You'll need to fully re-index any previously created indexes. |
| This isn't a break in back-compatibility because local Lucene has not yet |
| been released. (Mike McCandless) |
| |
| * LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined |
| default stopword list, and lowercases non-Arabic text. |
| You'll need to fully re-index any previously created indexes. This isn't a |
| break in back-compatibility because ArabicAnalyzer has not yet been |
| released. (Robert Muir) |
| |
| |
| API Changes |
| |
| * LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards |
| compatibility with some public classes. If you have implemented custom Fragmenters or Scorers, |
| you will need to adjust them to work with the new TokenStream API. Rather than getting passed a |
| Token at a time, you will be given a TokenStream to init your impl with - store the Attributes |
| you are interested in locally and access them on each call to the method that used to pass a new |
| Token. Look at the included updated impls for examples. (Mark Miller) |
| |
| * LUCENE-1460: Change contrib TokenStreams/Filters to use the new |
| TokenStream API. (Robert Muir, Michael Busch) |
| |
| * LUCENE-1775, LUCENE-1903: Change remaining TokenFilters (shingle, prefix-suffix) |
| to use the new TokenStream API. ShingleFilter is much more efficient now, |
| it clones much less often and computes the tokens mostly on the fly now. |
| Also added more tests. (Robert Muir, Michael Busch, Uwe Schindler, Chris Harris) |
| |
| * LUCENE-1685: The position aware SpanScorer has become the default scorer |
| for Highlighting. The SpanScorer implementation has replaced QueryScorer |
| and the old term highlighting QueryScorer has been renamed to |
| QueryTermScorer. Multi-term queries are also now expanded by default. If |
| you were previously rewriting the query for multi-term query highlighting, |
| you should no longer do that (unless you switch to using QueryTermScorer). |
| The SpanScorer API (now QueryScorer) has also been improved to more closely |
| match the API of the previous QueryScorer implementation. (Mark Miller) |
| |
| * LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian |
| Analyzers. If you need to index text in these encodings, please use Java's |
| character set conversion facilities (InputStreamReader, etc) during I/O, |
| so that Lucene can analyze this text as Unicode instead. (Robert Muir) |
| |
| Bug fixes |
| |
| * LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index. |
| (Karl Wettin) |
| |
| * LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the |
| same way IndexWriter does. Parts of InstantiatedIndex was not Serializable. |
| (Karl Wettin) |
| |
| * LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index. |
| (Karl Wettin, Robert Newson) |
| |
| * LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException |
| due to recursive invocation. (Karl Wettin) |
| |
| * LUCENE-1548: Fix distance normalization in LevenshteinDistance to |
| not produce negative distances (Thomas Morton via Mike McCandless) |
| |
| * LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS |
| characters to only apply to the correct subset (Daniel Cheng via |
| Mike McCandless) |
| |
| * LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after |
| StandardTokenizer so that stop words with mixed case are filtered |
| out. (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless) |
| |
| * LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size. |
| (Todd Teak via Otis Gospodnetic) |
| |
| * LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by |
| RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so |
| that the regexp must match the entire string, not just a prefix. |
| (Trejkaz via Mike McCandless) |
| |
| * LUCENE-1792: Fix new query parser to set rewrite method for |
| multi-term queries. (Luis Alves, Mike McCandless via Michael Busch) |
| |
| * LUCENE-1828: Fix memory index to call TokenStream.reset() and |
| TokenStream.end(). (Tim Smith via Michael Busch) |
| |
| * LUCENE-1912: Fix fast-vector-highlighter issue when two or more |
| terms are concatenated (Koji Sekiguchi via Mike McCandless) |
| |
| New features |
| |
| * LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin) |
| |
| * LUCENE-1435: Added contrib/collation, a CollationKeyFilter |
| allowing you to convert tokens into CollationKeys encoded using |
| IndexableBinaryStringTools. This allows for faster RangeQuery when |
| a field needs to use a custom Collator. (Steven Rowe via Mike |
| McCandless) |
| |
| * LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now |
| read/write bz2 using Apache commons compress library. This means |
| you can download the .bz2 export from http://wikipedia.org and |
| immediately index it. (Shai Erera via Mike McCandless) |
| |
| * LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers. It |
| improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese |
| sentences properly. SmartChineseAnalyzer uses a Hidden Markov |
| Model to tokenize Chinese words in a more intelligent way. |
| (Xiaoping Gao via Mike McCandless) |
| |
| * LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll) |
| |
| * LUCENE-1578: Support for loading unoptimized readers to the |
| constructor of InstantiatedIndex. (Karl Wettin) |
| |
| * LUCENE-1704: Allow specifying the Tidy configuration file when |
| parsing HTML docs with contrib/ant. (Keith Sprochi via Mike |
| McCandless) |
| |
| * LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative |
| highlighter. (Koji Sekiguchi via Mike McCandless) |
| |
| * LUCENE-1740: Added "analyzer" command to Lucli, enabling changing |
| the analyzer from the default StandardAnalyzer. (Bernd Fondermann |
| via Mike McCandless) |
| |
| * LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan |
| Leibiusky via Mike McCandless) |
| |
| * LUCENE-1745: Added constructors to JakartaRegexpCapabilities and |
| JavaUtilRegexCapabilities as well as static flags to support |
| configuring a RegexCapabilities implementation with the |
| implementation-specific modifier flags. Allows for callers to |
| customize the RegexQuery using the implementation-specific options |
| and fine tune how regular expressions are compiled and |
| matched. (Marc Zampetti zampettim@aim.com via Mike McCandless) |
| |
| * LUCENE-1567: Added a new QueryParser framework, that allows |
| implementing a new query syntax in a flexible and efficient way. |
| This new QueryParser will be moved to Lucene's core in release |
| 3.0 and will then replace the current core QueryParser, which |
| has been deprecated with this patch. |
| (Luis Alves and Adriano Campos via Michael Busch) |
| |
| * LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser |
| that allows a subset of the Lucene query language to be embedded in |
| PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited |
| boolean logic, can be used within quote operators with this parser, ie: |
| "(jo* -john) smyth~". (Mark Harwood via Mark Miller) |
| |
| * Added web-based demo of functionality in contrib's XML Query Parser |
| packaged as War file (Mark Harwood) |
| |
| * LUCENE-1406: Added Arabic analyzer. (Robert Muir via Grant Ingersoll) |
| |
| * LUCENE-1628: Added Persian analyzer. (Robert Muir) |
| |
| * LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens. |
| (Andrzej Bialecki via Robert Muir) |
| |
| Optimizations |
| |
| * LUCENE-1643: Re-use the collation key (RawCollationKey) for |
| better performance, in ICUCollationKeyFilter. (Robert Muir via |
| Mike McCandless) |
| |
| * LUCENE-1794: Implement TokenStream reuse for contrib Analyzers, |
| and implement reset() for TokenStreams to support reuse. (Robert Muir) |
| |
| Documentation |
| |
| * LUCENE-1876: added missing package level documentation for numerous |
| contrib packages. |
| (Steven Rowe & Robert Muir) |
| |
| Build |
| |
| * LUCENE-1728: Split contrib/analyzers into common and smartcn modules. |
| Contrib/analyzers now builds an additional lucene-smartcn Jar file. All |
| smartcn classes are not included in the lucene-analyzers JAR file. |
| (Robert Muir via Simon Willnauer) |
| |
| * LUCENE-1829: Fix contrib query parser to properly create javacc files. |
| (Jan-Pascal and Luis Alves via Michael Busch) |
| |
| Test Cases |
| |
| |
| ======================= Release 2.4.0 2008-10-06 ======================= |
| |
| Changes in runtime behavior |
| |
| (None) |
| |
| API Changes |
| |
| 1. |
| |
| (None) |
| |
| Bug fixes |
| |
| 1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames() |
| and tests that assert that deleted documents behaves as they should (they did). |
| (Jason Rutherglen, Karl Wettin) |
| |
| 2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat |
| the array offset right. (Jason Rutherglen via Karl Wettin) |
| |
| New features |
| |
| 1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin) |
| |
| 2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500. |
| Introducing Hungarian, Turkish and Romanian support, updated older stemmers |
| and optimized (reflectionless) SnowballFilter. |
| IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older) |
| might not be compatible with these updated classes as some algorithms have changed. |
| (Karl Wettin) |
| |
| 3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors |
| or by resolving the inverted index. (Karl Wettin) |
| |
| Documentation |
| |
| (None) |
| |
| Build |
| |
| (None) |
| |
| Test Cases |
| |
| (None) |