| Lucene Change Log |
| |
| $Id$ |
| |
| 1.5 RC1 |
| |
| 1. Fixed a performance bug in hit sorting code, where values were not |
| correctly cached. (Aviran via cutting) |
| |
| |
| 1.4 final |
| |
| 1. Added "an" to the list of stop words in StopAnalyzer, to complement |
| the existing "a" there. Fix for bug 28960 |
| (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis) |
| |
| 2. Added new class FieldCache to manage in-memory caches of field term |
| values. (Tim Jones) |
| |
| 3. Added overloaded getFieldQuery method to QueryParser which |
| accepts the slop factor specified for the phrase (or the default |
| phrase slop for the QueryParser instance). This allows overriding |
| methods to replace a PhraseQuery with a SpanNearQuery instead, |
| keeping the proper slop factor. (Erik Hatcher) |
| |
| 4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to |
| UTF-8 and changed the build encoding to UTF-8, to make changed files |
| compile. (Otis Gospodnetic) |
| |
| 5. Removed synchronization from term lookup under IndexReader methods |
| termFreq(), termDocs() or termPositions() to improve |
| multi-threaded performance. (cutting) |
| |
| 6. Fix a bug where obsolete segment files were not deleted on Win32. |
| |
| |
| 1.4 RC3 |
| |
| 1. Fixed several search bugs introduced by the skipTo() changes in |
| release 1.4RC1. The index file format was changed a bit, so |
| collections must be re-indexed to take advantage of the skipTo() |
| optimizations. (Christoph Goller) |
| |
| 2. Added new Document methods, removeField() and removeFields(). |
| (Christoph Goller) |
| |
| 3. Fixed inconsistencies with index closing. Indexes and directories |
| are now only closed automatically by Lucene when Lucene opened |
| them automatically. (Christoph Goller) |
| |
| 4. Added new class: FilteredQuery. (Tim Jones) |
| |
| 5. Added a new SortField type for custom comparators. (Tim Jones) |
| |
| 6. Lock obtain timed out message now displays the full path to the lock |
| file. (Daniel Naber via Erik) |
| |
| 7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting) |
| |
| 8. Fixed so that FSDirectory's locks still work when the |
| java.io.tmpdir system property is null. (cutting) |
| |
| 9. Changed FilteredTermEnum's constructor to take no parameters, |
| as the parameters were ignored anyway (bug #28858) |
| |
| 1.4 RC2 |
| |
| 1. GermanAnalyzer now throws an exception if the stopword file |
| cannot be found (bug #27987). It now uses LowerCaseFilter |
| (bug #18410) (Daniel Naber via Otis, Erik) |
| |
| 2. Fixed a few bugs in the file format documentation. (cutting) |
| |
| |
| 1.4 RC1 |
| |
| 1. Changed the format of the .tis file, so that: |
| |
| - it has a format version number, which makes it easier to |
| back-compatibly change file formats in the future. |
| |
| - the term count is now stored as a long. This was the one aspect |
| of the Lucene's file formats which limited index size. |
| |
| - a few internal index parameters are now stored in the index, so |
| that they can (in theory) now be changed from index to index, |
| although there is not yet an API to do so. |
| |
| These changes are back compatible. The new code can read old |
| indexes. But old code will not be able read new indexes. (cutting) |
| |
| 2. Added an optimized implementation of TermDocs.skipTo(). A skip |
| table is now stored for each term in the .frq file. This only |
| adds a percent or two to overall index size, but can substantially |
| speedup many searches. (cutting) |
| |
| 3. Restructured the Scorer API and all Scorer implementations to take |
| advantage of an optimized TermDocs.skipTo() implementation. In |
| particular, PhraseQuerys and conjunctive BooleanQuerys are |
| faster when one clause has substantially fewer matches than the |
| others. (A conjunctive BooleanQuery is a BooleanQuery where all |
| clauses are required.) (cutting) |
| |
| 4. Added new class ParallelMultiSearcher. Combined with |
| RemoteSearchable this makes it easy to implement distributed |
| search systems. (Jean-Francois Halleux via cutting) |
| |
| 5. Added support for hit sorting. Results may now be sorted by any |
| indexed field. For details see the javadoc for |
| Searcher#search(Query, Sort). (Tim Jones via Cutting) |
| |
| 6. Changed FSDirectory to auto-create a full directory tree that it |
| needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis) |
| |
| 7. Added a new span-based query API. This implements, among other |
| things, nested phrases. See javadocs for details. (Doug Cutting) |
| |
| 8. Added new method Query.getSimilarity(Searcher), and changed |
| scorers to use it. This permits one to subclass a Query class so |
| that it can specify it's own Similarity implementation, perhaps |
| one that delegates through that of the Searcher. (Julien Nioche |
| via Cutting) |
| |
| 9. Added MultiReader, an IndexReader that combines multiple other |
| IndexReaders. (Cutting) |
| |
| 10. Added support for term vectors. See Field#isTermVectorStored(). |
| (Grant Ingersoll, Cutting & Dmitry) |
| |
| 11. Fixed the old bug with escaping of special characters in query |
| strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665 |
| (Jean-Francois Halleux via Otis) |
| |
| 12. Added support for overriding default values for the following, |
| using system properties: |
| - default commit lock timeout |
| - default maxFieldLength |
| - default maxMergeDocs |
| - default mergeFactor |
| - default minMergeDocs |
| - default write lock timeout |
| (Otis) |
| |
| 13. Changed QueryParser.jj to allow '-' and '+' within tokens: |
| http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 |
| (Morus Walter via Otis) |
| |
| 14. Changed so that the compound index format is used by default. |
| This makes indexing a bit slower, but vastly reduces the chances |
| of file handle problems. (Cutting) |
| |
| |
| 1.3 final |
| |
| 1. Added catch of BooleanQuery$TooManyClauses in QueryParser to |
| throw ParseException instead. (Erik Hatcher) |
| |
| 2. Fixed a NullPointerException in Query.explain(). (Doug Cutting) |
| |
| 3. Added a new method IndexReader.setNorm(), that permits one to |
| alter the boosting of fields after an index is created. |
| |
| 4. Distinguish between the final position and length when indexing a |
| field. The length is now defined as the total number of tokens, |
| instead of the final position, as it was previously. Length is |
| used for score normalization (Similarity.lengthNorm()) and for |
| controlling memory usage (IndexWriter.maxFieldLength). In both of |
| these cases, the total number of tokens is a better value to use |
| than the final token position. Position is used in phrase |
| searching (see PhraseQuery and Token.setPositionIncrement()). |
| |
| 5. Fix StandardTokenizer's handling of CJK characters (Chinese, |
| Japanese and Korean ideograms). Previously contiguous sequences |
| were combined in a single token, which is not very useful. Now |
| each ideogram generates a separate token, which is more useful. |
| |
| |
| 1.3 RC3 |
| |
| 1. Added minMergeDocs in IndexWriter. This can be raised to speed |
| indexing without altering the number of files, but only using more |
| memory. (Julien Nioche via Otis) |
| |
| 2. Fix bug #24786, in query rewriting. (bschneeman via Cutting) |
| |
| 3. Fix bug #16952, in demo HTML parser, skip comments in |
| javascript. (Christoph Goller) |
| |
| 4. Fix bug #19253, in demo HTML parser, add whitespace as needed to |
| output (Daniel Naber via Christoph Goller) |
| |
| 5. Fix bug #24301, in demo HTML parser, long titles no longer |
| hang things. (Christoph Goller) |
| |
| 6. Fix bug #23534, Replace use of file timestamp of segments file |
| with an index version number stored in the segments file. This |
| resolves problems when running on file systems with low-resolution |
| timestamps, e.g., HFS under MacOS X. (Christoph Goller) |
| |
| 7. Fix QueryParser so that TokenMgrError is not thrown, only |
| ParseException. (Erik Hatcher) |
| |
| 8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller) |
| |
| 9. Fixed a problem compiling TestRussianStem. (Christoph Goller) |
| |
| 10. Cleaned up some build stuff. (Erik Hatcher) |
| |
| |
| 1.3 RC2 |
| |
| 1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and |
| SegmentsReader. (Julien Nioche via otis) |
| |
| 2. Changed file locking to place lock files in |
| System.getProperty("java.io.tmpdir"), where all users are |
| permitted to write files. This way folks can open and correctly |
| lock indexes which are read-only to them. |
| |
| 3. IndexWriter: added a new method, addDocument(Document, Analyzer), |
| permitting one to easily use different analyzers for different |
| documents in the same index. |
| |
| 4. Minor enhancements to FuzzyTermEnum. |
| (Christoph Goller via Otis) |
| |
| 5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher |
| and MultiIndexSearcher to use it. |
| (Christoph Goller via Otis) |
| |
| 6. Fixed a bug in IndexWriter that returned incorrect docCount(). |
| (Christoph Goller via Otis) |
| |
| 7. Fixed SegmentsReader to eliminate the confusing and slightly different |
| behaviour of TermEnum when dealing with an enumeration of all terms, |
| versus an enumeration starting from a specific term. |
| This patch also fixes incorrect term document frequences when the same term |
| is present in multiple segments. |
| (Christoph Goller via Otis) |
| |
| 8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher) |
| |
| 9. Added support for the new "compound file" index format (Dmitry |
| Serebrennikov) |
| |
| 10. Added Locale setting to QueryParser, for use by date range parsing. |
| |
| 11. Changed IndexReader so that it can be subclassed by classes |
| outside of its package. Previously it had package-private |
| abstract methods. Also modified the index merging code so that it |
| can work on an arbitrary IndexReader implementation, and added a |
| new method, IndexWriter.addIndexes(IndexReader[]), to take |
| advantage of this. (cutting) |
| |
| 12. Added a limit to the number of clauses which may be added to a |
| BooleanQuery. The default limit is 1024 clauses. This should |
| stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy |
| queries which run amok. (cutting) |
| |
| 13. Add new method: IndexReader.undeleteAll(). This undeletes all |
| deleted documents which still remain in the index. (cutting) |
| |
| |
| 1.3 RC1 |
| |
| 1. Fixed PriorityQueue's clear() method. |
| Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454 |
| (Matthijs Bomhoff via otis) |
| |
| 2. Changed StandardTokenizer.jj grammar for EMAIL tokens. |
| Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015 |
| (Dale Anson via otis) |
| |
| 3. Added the ability to disable lock creation by using disableLuceneLocks |
| system property. This is useful for read-only media, such as CD-ROMs. |
| (otis) |
| |
| 4. Added id method to Hits to be able to access the index global id. |
| Required for sorting options. |
| (carlson) |
| |
| 5. Added support for new range query syntax to QueryParser.jj. |
| (briangoetz) |
| |
| 6. Added the ability to retrieve HTML documents' META tag values to |
| HTMLParser.jj. |
| (Mark Harwood via otis) |
| |
| 7. Modified QueryParser to make it possible to programmatically specify the |
| default Boolean operator (OR or AND). |
| (Péter Halácsy via otis) |
| |
| 8. Made many search methods and classes non-final, per requests. |
| This includes IndexWriter and IndexSearcher, among others. |
| (cutting) |
| |
| 9. Added class RemoteSearchable, providing support for remote |
| searching via RMI. The test class RemoteSearchableTest.java |
| provides an example of how this can be used. (cutting) |
| |
| 10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The |
| test class TestPhrasePrefixQuery provides the usage example. |
| (Anders Nielsen via otis) |
| |
| 11. Changed the German stemming algorithm to ignore case while |
| stripping. The new algorithm is faster and produces more equal |
| stems from nouns and verbs derived from the same word. |
| (gschwarz) |
| |
| 12. Added support for boosting the score of documents and fields via |
| the new methods Document.setBoost(float) and Field.setBoost(float). |
| |
| Note: This changes the encoding of an indexed value. Indexes |
| should be re-created from scratch in order for search scores to |
| be correct. With the new code and an old index, searches will |
| yield very large scores for shorter fields, and very small scores |
| for longer fields. Once the index is re-created, scores will be |
| as before. (cutting) |
| |
| 13. Added new method Token.setPositionIncrement(). |
| |
| This permits, for the purpose of phrase searching, placing |
| multiple terms in a single position. This is useful with |
| stemmers that produce multiple possible stems for a word. |
| |
| This also permits the introduction of gaps between terms, so that |
| terms which are adjacent in a token stream will not be matched by |
| and exact phrase query. This makes it possible, e.g., to build |
| an analyzer where phrases are not matched over stop words which |
| have been removed. |
| |
| Finally, repeating a token with an increment of zero can also be |
| used to boost scores of matches on that token. (cutting) |
| |
| 14. Added new Filter class, QueryFilter. This constrains search |
| results to only match those which also match a provided query. |
| Results are cached, so that searches after the first on the same |
| index using this filter are very fast. |
| |
| This could be used, for example, with a RangeQuery on a formatted |
| date field to implement date filtering. One could re-use a |
| single QueryFilter that matches, e.g., only documents modified |
| within the last week. The QueryFilter and RangeQuery would only |
| need to be reconstructed once per day. (cutting) |
| |
| 15. Added a new IndexWriter method, getAnalyzer(). This returns the |
| analyzer used when adding documents to this index. (cutting) |
| |
| 16. Fixed a bug with IndexReader.lastModified(). Before, document |
| deletion did not update this. Now it does. (cutting) |
| |
| 17. Added Russian Analyzer. |
| (Boris Okner via otis) |
| |
| 18. Added a public, extensible scoring API. For details, see the |
| javadoc for org.apache.lucene.search.Similarity. |
| |
| 19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter). |
| |
| 20. Added getFieldNames() to IndexReader and Segment(s)Reader classes. |
| (Peter Mularien via otis) |
| |
| 21. Added getFields(String) and getValues(String) methods. |
| Contributed by Rasik Pandey on 2002-10-09 |
| (Rasik Pandey via otis) |
| |
| 22. Revised internal search APIs. Changes include: |
| |
| a. Queries are no longer modified during a search. This makes |
| it possible, e.g., to reuse the same query instance with |
| multiple indexes from multiple threads. |
| |
| b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery, |
| etc.) now work correctly with MultiSearcher, fixing bugs 12619 |
| and 12667. |
| |
| c. Boosting BooleanQuery's now works, and is supported by the |
| query parser (problem reported by Lee Mallabone). Thus a query |
| like "(+foo +bar)^2 +baz" is now supported and equivalent to |
| "(+foo^2 +bar^2) +baz". |
| |
| d. New method: Query.rewrite(IndexReader). This permits a |
| query to re-write itself as an alternate, more primitive query. |
| Most of the term-expanding query classes (PrefixQuery, |
| WildcardQuery, etc.) are now implemented using this method. |
| |
| e. New method: Searchable.explain(Query q, int doc). This |
| returns an Explanation instance that describes how a particular |
| document is scored against a query. An explanation can be |
| displayed as either plain text, with the toString() method, or |
| as HTML, with the toHtml() method. Note that computing an |
| explanation is as expensive as executing the query over the |
| entire index. This is intended to be used in developing |
| Similarity implementations, and, for good performance, should |
| not be displayed with every hit. |
| |
| f. Scorer and Weight are public, not package protected. It now |
| possible for someone to write a Scorer implementation that is |
| not in the org.apache.lucene.search package. This is still |
| fairly advanced programming, and I don't expect anyone to do |
| this anytime soon, but at least now it is possible. |
| |
| g. Added public accessors to the primitive query classes |
| (TermQuery, PhraseQuery and BooleanQuery), permitting access to |
| their terms and clauses. |
| |
| Caution: These are extensive changes and they have not yet been |
| tested extensively. Bug reports are appreciated. |
| (cutting) |
| |
| 23. Added convenience RAMDirectory constructors taking File and String |
| arguments, for easy FSDirectory to RAMDirectory conversion. |
| (otis) |
| |
| 24. Added code for manual renaming of files in FSDirectory, since it |
| has been reported that java.io.File's renameTo(File) method sometimes |
| fails on Windows JVMs. |
| (Matt Tucker via otis) |
| |
| 25. Refactored QueryParser to make it easier for people to extend it. |
| Added the ability to automatically lower-case Wildcard terms in |
| the QueryParser. |
| (Tatu Saloranta via otis) |
| |
| |
| 1.2 RC6 |
| |
| 1. Changed QueryParser.jj to have "?" be a special character which |
| allowed it to be used as a wildcard term. Updated TestWildcard |
| unit test also. (Ralf Hettesheimer via carlson) |
| |
| 1.2 RC5 |
| |
| 1. Renamed build.properties to default.properties and updated |
| the BUILD.txt document to describe how to override the |
| default.property settings without having to edit the file. This |
| brings the build process closer to Scarab's build process. |
| (jon) |
| |
| 2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis) |
| |
| 3. Updated "powered by" links. (otis) |
| |
| 4. Fixed instruction for setting up JavaCC - Bug #7017 (otis) |
| |
| 5. Added throwing exception if FSDirectory could not create diectory |
| - Bug #6914 (Eugene Gluzberg via otis) |
| |
| 6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter, |
| LowerCaseTokenizer javadoc (otis) |
| |
| 7. Added fix to avoid NullPointerException in results.jsp |
| (Mark Hayes via otis) |
| |
| 8. Changed Wildcard search to find 0 or more char instead of 1 or more |
| (Lee Mallobone, via otis) |
| |
| 9. Fixed error in offset issue in GermanStemFilter - Bug #7412 |
| (Rodrigo Reyes, via otis) |
| |
| 10. Added unit tests for wildcard search and DateFilter (otis) |
| |
| 11. Allow co-existence of indexed and non-indexed fields with the same name |
| (cutting/casper, via otis) |
| |
| 12. Add escape character to query parser. |
| (briangoetz) |
| |
| 13. Applied a patch that ensures that searches that use DateFilter |
| don't throw an exception when no matches are found. (David Smiley, via |
| otis) |
| |
| 14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson) |
| |
| |
| 1.2 RC4 |
| |
| 1. Updated contributions section of website. |
| Add XML Document #3 implementation to Document Section. |
| Also added Term Highlighting to Misc Section. (carlson) |
| |
| 2. Fixed NullPointerException for phrase searches containing |
| unindexed terms, introduced in 1.2RC3. (cutting) |
| |
| 3. Changed document deletion code to obtain the index write lock, |
| enforcing the fact that document addition and deletion cannot be |
| performed concurrently. (cutting) |
| |
| 4. Various documentation cleanups. (otis, acoliver) |
| |
| 5. Updated "powered by" links. (cutting, jon) |
| |
| 6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis) |
| |
| 7. Changed Term and Query to implement Serializable. (scottganyo) |
| |
| 8. Fixed to never delete indexes added with IndexWriter.addIndexes(). |
| (cutting) |
| |
| 9. Upgraded to JUnit 3.7. (otis) |
| |
| 1.2 RC3 |
| |
| 1. IndexWriter: fixed a bug where adding an optimized index to an |
| empty index failed. This was encountered using addIndexes to copy |
| a RAMDirectory index to an FSDirectory. |
| |
| 2. RAMDirectory: fixed a bug where RAMInputStream could not read |
| across more than across a single buffer boundary. |
| |
| 3. Fix query parser so it accepts queries with unicode characters. |
| (briangoetz) |
| |
| 4. Fix query parser so that PrefixQuery is used in preference to |
| WildcardQuery when there's only an asterisk at the end of the |
| term. Previously PrefixQuery would never be used. |
| |
| 5. Fix tests so they compile; fix ant file so it compiles tests |
| properly. Added test cases for Analyzers and PriorityQueue. |
| |
| 6. Updated demos, added Getting Started documentation. (acoliver) |
| |
| 7. Added 'contributions' section to website & docs. (carlson) |
| |
| 8. Removed JavaCC from source distribution for copyright reasons. |
| Folks must now download this separately from metamata in order to |
| compile Lucene. (cutting) |
| |
| 9. Substantially improved the performance of DateFilter by adding the |
| ability to reuse TermDocs objects. (cutting) |
| |
| 10. Added IndexReader methods: |
| public static boolean indexExists(String directory); |
| public static boolean indexExists(File directory); |
| public static boolean indexExists(Directory directory); |
| public static boolean isLocked(Directory directory); |
| public static void unlock(Directory directory); |
| (cutting, otis) |
| |
| 11. Fixed bugs in GermanAnalyzer (gschwarz) |
| |
| |
| 1.2 RC2, 19 October 2001: |
| - added sources to distribution |
| - removed broken build scripts and libraries from distribution |
| - SegmentsReader: fixed potential race condition |
| - FSDirectory: fixed so that getDirectory(xxx,true) correctly |
| erases the directory contents, even when the directory |
| has already been accessed in this JVM. |
| - RangeQuery: Fix issue where an inclusive range query would |
| include the nearest term in the index above a non-existant |
| specified upper term. |
| - SegmentTermEnum: Fix NullPointerException in clone() method |
| when the Term is null. |
| - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1, |
| since they rely on a feature added in JDK 1.2. |
| |
| 1.2 RC1 (first Apache release), 2 October 2001: |
| - packages renamed from com.lucene to org.apache.lucene |
| - license switched from LGPL to Apache |
| - ant-only build -- no more makefiles |
| - addition of lock files--now fully thread & process safe |
| - addition of German stemmer |
| - MultiSearcher now supports low-level search API |
| - added RangeQuery, for term-range searching |
| - Analyzers can choose tokenizer based on field name |
| - misc bug fixes. |
| |
| 1.01b (last Sourceforge release), 2 July 2001 |
| . a few bug fixes |
| . new Query Parser |
| . new prefix query (search for "foo*" matches "food") |
| |
| 1.0, 2000-10-04 |
| |
| This release fixes a few serious bugs and also includes some |
| performance optimizations, a stemmer, and a few other minor |
| enhancements. |
| |
| 0.04 2000-04-19 |
| |
| Lucene now includes a grammar-based tokenizer, StandardTokenizer. |
| |
| The only tokenizer included in the previous release (LetterTokenizer) |
| identified terms consisting entirely of alphabetic characters. The |
| new tokenizer uses a regular-expression grammar to identify more |
| complex classes of terms, including numbers, acronyms, email |
| addresses, etc. |
| |
| StandardTokenizer serves two purposes: |
| |
| 1. It is a much better, general purpose tokenizer for use by |
| applications as is. |
| |
| The easiest way for applications to start using |
| StandardTokenizer is to use StandardAnalyzer. |
| |
| 2. It provides a good example of grammar-based tokenization. |
| |
| If an application has special tokenization requirements, it can |
| implement a custom tokenizer by copying the directory containing |
| the new tokenizer into the application and modifying it |
| accordingly. |
| |
| 0.01, 2000-03-30 |
| |
| First open source release. |
| |
| The code has been re-organized into a new package and directory |
| structure for this release. It builds OK, but has not been tested |
| beyond that since the re-organization. |