This README describes the approach to maintaining compatibility with indices from previous versions and gives guidelines for making format changes.
Codecs and file formats are versioned according to the minor version in which they were created. For example Lucene87Codec represents the codec used for creating Lucene 8.7 indices, and potentially later index versions too. Each segment records the codec name that was used to write it.
Lucene supports the ability to read segments created in older versions by maintaining old codec classes. These older codecs live in the backwards-codecs package along with their file formats. When making a change to a file format, we create fresh copies of the codec and format, and move the existing ones into backwards-codecs.
Older codecs are tested in two ways:
As an example, let‘s say we’re making a change to the norms file format, and the current class in core is Lucene80NormsFormat. We'd perform the following steps:
Each format class maintains an internal version which is written into the file header. Generally these internal versions should not be used to make format changes. For any significant change, we prefer to use the ‘copy-on-write’ approach described above, even if it produces a fair amount of duplicated code. This keeps the versioning strategy simple and clear, and ensures that we unit test all older index formats.