Discussion thread: https://lists.apache.org/thread/j92bzsby9n2ozc9gcw5psgcy2026l1wm
The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.
Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.
[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + [MANAGED_CURSOR_INFO_PAYLOAD]
MAGIC_NUMBER Use 0x4778, it is the same as the magic number of ledger info.
METADATA Add a named ManagedCursorInfoMetadata
message to MLDataFormats.proto
message ManagedCursorInfoMetadata { required CompressionType compressionType = 1; required int32 uncompressedSize = 2; }
Currently, these compressions types have been defined and implemented by Pulsar, we only need to deal with compression and decompression of the ManagedCursorInfo
data:
Get CursorInfo from the metadata store We will check the cursor data header, if it is compressed, we will parse the bytes data by compressed format, otherwise we will parse the cursor data directly by the original way.
Add/Update CursorInfo to the metadata store The default is to use compression if the compression type is specified, otherwise we will put this data to the metadata store directly.
Add managedCursorInfoCompressionType
in org.apache.pulsar.broker.ServiceConfiguration
and org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig
.