lucene/MIGRATE.txt - lucene-solr - Git at Google

 # Apache Lucene Migration Guide

 ## Lucene 3.x index format no longer supported

 Lucene 5 no longer supports the Lucene 3.x index format. Opening
 indexes will result in `IndexFormatTooOldException`. It is recommended
 to either reindex all your data, or upgrade the old indexes with
 the `IndexUpgrader` tool of latest Lucene 4 version (4.10.x).
 Those indexes can then be read (see next section) with Lucene 5.

 ## Support for previous Lucene 4.x index formats moved to new module

 Lucene 5 will by default only read indexes created with Lucene 5.
 To read and upgrade Lucene 4.x indexes, you must add the
 `lucene-backward-codecs.jar` to the classpath. It is recommended
 to upgrade the old indexes with the `IndexUpgrader` tool,
 so you can remove the backward-codecs module from classpath.
 This will also improve performance.

 ## All file handling APIs changed to Java 7 NIO.2 (LUCENE-5945)

 All APIs around Directory and other file-based resources were changed to make
 use of the new Java 7 NIO.2 API. It is no longer possible to pass
 java.io.File onames to FSDirectory classes. FSDirectory classes now requires
 java.nio.file.Path instances. This allows to place index directories also
 on "virtual file systems" like ZIP or TAR files. To migrate existing code
 use java.io.File#toPath().

 In addition, make sure that custom directory implementations throw the new
 IOException types, because Lucene cannot understand the old legacy
 IOExceptions (like java.io.FileNotFoundException) instead of the new ones
 like java.nio.file.NoSuchFileException.

 ## Directory and LockFactory APIs restructured (LUCENE-5953)

 Locking is now under the responsibility of the Directory implementation.
 LockFactory is only used by subclasses of BaseDirectory to delegate locking
 to an impl class. LockFactories are responsible to create a Lock on behalf
 of a BaseDirectory subclass.

 The following changes in existing code need to be done:

 - LockFactory implementations are singletons now and have no state. They only
   need to implement one method: makeLock(Directory dir, String name).
   The passed directory can be used to determine the corect file system path
   for the lock file or similar, so it knows where to create the lock.
   In addition, the factory may check with instanceof, if the lock factory
   can be used with the type of directory at all.
 - It was never really supported to place lock files outside of the index
   directory and this functionality was removed. If you still rely on this,
   you can use the following trick: Use FileSwitchDirectory and delegate
   the file extension ".lock" to another Directory instance pointing to
   another path. FileSwitchDirectory also delegates lock files based on
   the extension.
 - If you wrap another directory using FilterDirectory, you cannot make use
   of LockFactories anymore, because only BaseDirectory knows about them.
   To wrap locking, you must hook into FilterDirectory.makeLock(String name)
   and wrap the Lock instance returned, as needed. See MockDirectoryWrapper
   in lucene-test-framework for an example.
 - It is no longer allowed to pass "null" as LockFactory to FSDirectory
   implementations. You have to explicitely pass the platform default to the
   directory (currently always NativeFSLockFactory.INSTANCE, but subject to
   change!). To get the platform default, call FSLockFactory.getDefault().

 ## Removed Reader from Tokenizer constructor (LUCENE-5388)

 The constructor of Tokenizer no longer takes Reader, as this was a leftover
 from before it was reusable. See the org.apache.lucene.analysis package
 documentation for more details.

 ## Refactored Collector API (LUCENE-5299)

 The Collector API has been refactored to use a different Collector instance
 per segment. It is possible to migrate existing collectors painlessly by
 extending SimpleCollector instead of Collector: SimpleCollector is a
 specialization of Collector that returns itself as a per-segment Collector.

 ## Refactored FieldComparator API (LUCENE-5702)

 Like collectors (see above), field comparators have been refactored to
 produce a new comparator (called LeafFieldComparator) per segment. It is
 possible to migrate existing comparators painlessly by extending
 SimpleFieldComparator, which will implements both FieldComparator and
 LeafFieldComparator and return itself as a per-segment comparator.

 ## Removed ChainedFilter (LUCENE-5984)

 Users are advised to switch to BooleanFilter instead.

 ## Removed OpenBitSet (LUCENE-6010)

 OpenBitSet only differs from LongBitSet by its ability to grow automatically.
 In case growth is required, it would need to be managed externally.

 ## FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)

 Bugs fixed in several ValueSource functions may result in different behavior in
 situations where some documents do not have values for fields wrapped in other
 ValueSources.  Users who want to preserve the previous behavior may need to wrap
 their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".

 ## PayloadAttributeImpl.clone() (LUCENE-6055)

 PayloadAttributeImpl.clone() did a shallow clone which was incorrect, and was
 fixed to do a deep clone. If you require shallow cloning of the underlying bytes,
 you should override PayloadAttributeImpl.clone() to do a shallow clone instead.

 ## Removed out-of-order scoring (LUCENE-6179)

 Bulk scorers must now always collect documents in order. If you have custom
 collectors, the acceptsDocsOutOfOrder method has been removed and collectors
 can safely assume that they will be collected in order.

 ## Renamed "Atomic" to "Leaf" for segment readers (LUCENE-5569)

 AtomicReader and AtomicReaderContext are now called LeafReader and LeafReaderContext, respectively.

 ## Removed custom Analyzer per-document indexing APIs from IndexWriter (LUCENE-6212)

 These methods were removed because they are dangerous since they let
 you analyze each document arbitrarily differently, making it difficult
 to properly analyze text at query time and easy to accidentally "lose"
 search hits.  Instead, you should break out text into separate fields
 and use a different analyzer for each field with
 PerFieldAnalyzerWrapper.
	# Apache Lucene Migration Guide

	## Lucene 3.x index format no longer supported

	Lucene 5 no longer supports the Lucene 3.x index format. Opening
	indexes will result in `IndexFormatTooOldException`. It is recommended
	to either reindex all your data, or upgrade the old indexes with
	the `IndexUpgrader` tool of latest Lucene 4 version (4.10.x).
	Those indexes can then be read (see next section) with Lucene 5.

	## Support for previous Lucene 4.x index formats moved to new module

	Lucene 5 will by default only read indexes created with Lucene 5.
	To read and upgrade Lucene 4.x indexes, you must add the
	`lucene-backward-codecs.jar` to the classpath. It is recommended
	to upgrade the old indexes with the `IndexUpgrader` tool,
	so you can remove the backward-codecs module from classpath.
	This will also improve performance.

	## All file handling APIs changed to Java 7 NIO.2 (LUCENE-5945)

	All APIs around Directory and other file-based resources were changed to make
	use of the new Java 7 NIO.2 API. It is no longer possible to pass
	java.io.File onames to FSDirectory classes. FSDirectory classes now requires
	java.nio.file.Path instances. This allows to place index directories also
	on "virtual file systems" like ZIP or TAR files. To migrate existing code
	use java.io.File#toPath().

	In addition, make sure that custom directory implementations throw the new
	IOException types, because Lucene cannot understand the old legacy
	IOExceptions (like java.io.FileNotFoundException) instead of the new ones
	like java.nio.file.NoSuchFileException.

	## Directory and LockFactory APIs restructured (LUCENE-5953)

	Locking is now under the responsibility of the Directory implementation.
	LockFactory is only used by subclasses of BaseDirectory to delegate locking
	to an impl class. LockFactories are responsible to create a Lock on behalf
	of a BaseDirectory subclass.

	The following changes in existing code need to be done:

	- LockFactory implementations are singletons now and have no state. They only
	need to implement one method: makeLock(Directory dir, String name).
	The passed directory can be used to determine the corect file system path
	for the lock file or similar, so it knows where to create the lock.
	In addition, the factory may check with instanceof, if the lock factory
	can be used with the type of directory at all.
	- It was never really supported to place lock files outside of the index
	directory and this functionality was removed. If you still rely on this,
	you can use the following trick: Use FileSwitchDirectory and delegate
	the file extension ".lock" to another Directory instance pointing to
	another path. FileSwitchDirectory also delegates lock files based on
	the extension.
	- If you wrap another directory using FilterDirectory, you cannot make use
	of LockFactories anymore, because only BaseDirectory knows about them.
	To wrap locking, you must hook into FilterDirectory.makeLock(String name)
	and wrap the Lock instance returned, as needed. See MockDirectoryWrapper
	in lucene-test-framework for an example.
	- It is no longer allowed to pass "null" as LockFactory to FSDirectory
	implementations. You have to explicitely pass the platform default to the
	directory (currently always NativeFSLockFactory.INSTANCE, but subject to
	change!). To get the platform default, call FSLockFactory.getDefault().

	## Removed Reader from Tokenizer constructor (LUCENE-5388)

	The constructor of Tokenizer no longer takes Reader, as this was a leftover
	from before it was reusable. See the org.apache.lucene.analysis package
	documentation for more details.

	## Refactored Collector API (LUCENE-5299)

	The Collector API has been refactored to use a different Collector instance
	per segment. It is possible to migrate existing collectors painlessly by
	extending SimpleCollector instead of Collector: SimpleCollector is a
	specialization of Collector that returns itself as a per-segment Collector.

	## Refactored FieldComparator API (LUCENE-5702)

	Like collectors (see above), field comparators have been refactored to
	produce a new comparator (called LeafFieldComparator) per segment. It is
	possible to migrate existing comparators painlessly by extending
	SimpleFieldComparator, which will implements both FieldComparator and
	LeafFieldComparator and return itself as a per-segment comparator.

	## Removed ChainedFilter (LUCENE-5984)

	Users are advised to switch to BooleanFilter instead.

	## Removed OpenBitSet (LUCENE-6010)

	OpenBitSet only differs from LongBitSet by its ability to grow automatically.
	In case growth is required, it would need to be managed externally.

	## FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)

	Bugs fixed in several ValueSource functions may result in different behavior in
	situations where some documents do not have values for fields wrapped in other
	ValueSources. Users who want to preserve the previous behavior may need to wrap
	their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".

	## PayloadAttributeImpl.clone() (LUCENE-6055)

	PayloadAttributeImpl.clone() did a shallow clone which was incorrect, and was
	fixed to do a deep clone. If you require shallow cloning of the underlying bytes,
	you should override PayloadAttributeImpl.clone() to do a shallow clone instead.

	## Removed out-of-order scoring (LUCENE-6179)

	Bulk scorers must now always collect documents in order. If you have custom
	collectors, the acceptsDocsOutOfOrder method has been removed and collectors
	can safely assume that they will be collected in order.

	## Renamed "Atomic" to "Leaf" for segment readers (LUCENE-5569)

	AtomicReader and AtomicReaderContext are now called LeafReader and LeafReaderContext, respectively.

	## Removed custom Analyzer per-document indexing APIs from IndexWriter (LUCENE-6212)

	These methods were removed because they are dangerous since they let
	you analyze each document arbitrarily differently, making it difficult
	to properly analyze text at query time and easy to accidentally "lose"
	search hits. Instead, you should break out text into separate fields
	and use a different analyzer for each field with
	PerFieldAnalyzerWrapper.