After update entries are saved to the log, it is the job of the update log to propagate updates across the cluster. On every node, update log notifies catalog manager about new update entries, and latter applies them and stores new version of a catalog in a local cache. Below is a sequence diagram describing apply phase:
Current implementation of update log based on a metastorage. Update entries of version N are stored by catalog.update.{N}
key. Also, the latest known version is stored by catalog.version
key. Updates are saved on CAS manner with condition newVersion == value(catalog.version)
.
Over time, the log may grow to a humongous size. To address this, snapshotting was introduced to UpdateLog. When saving snapshot of version N, update entries stored by catalog.update.{N}
will be overwritten with catalog's snapshot of this version. Every update entries of version lower that version of snapshot will be removed. The earliest available version of catalog is tracked under catalog.snapshot.version
key.
During recovery, we read update entries one by one for all version starting with “earliest available” till version stored by catalog.version
key, and apply those updates entries once again.
Update log entries are serialized by custom marshallers (see UpdateLogMarshaller and MarshallableEntry for details). At the moment, backward compatibility is preserved by increasing the version of the protocol, but more sophisticated approach may be introduced later.