PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance

Background knowledge

Pulsar provide a way to compress messages before sending them to the broker[0]. This can be done by setting the compressionType in the producer configuration. The compressionType can be set to one of the following values:

  • LZ4
  • ZLIB
  • ZSTD
  • SNAPPY

But the compressionType is applied to all messages sent by the producer. This means that even small messages are compressed.

In our test, we found that compressing small messages can is meaningless. The compression ratio is low and spend more cpu. The relevant description in the official documentation:

The smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms. [1]

The similar configuration in RocketMQ is compressMsgBodyOverHowmuch[2]:

/**

  • Compress message body threshold, namely, message body larger than 4k will be compressed on default. */ private int compressMsgBodyOverHowmuch = 1024 * 4;

[0] https://pulsar.apache.org/docs/4.0.x/concepts-messaging/#compression [1] https://github.com/facebook/zstd?tab=readme-ov-file#the-case-for-small-data-compression [2] https://github.com/apache/rocketmq/blob/dd62ed0f3b16919adec5d5eece21a1050dc9c5a0/client/src/main/java/org/apache/rocketmq/client/producer/DefaultMQProducer.java#L117

Motivation

The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages. We want to add a new configuration compressMinMsgBodySize to the producer configuration. This configuration will allow the user to set the minimum size of the message body that will be compressed. If the message body size is less than the compressMinMsgBodySize, the message will not be compressed.

Goals

In Scope

Add a new configuration compressMinMsgBodySize to the producer configuration.

Out of Scope

Solve the compression problem of small data

High Level Design

Detailed Design

Design & Implementation Details

Add a new configuration compressMinMsgBodySize to the producer configuration. This configuration will allow the user to set the minimum size of the message body that will be compressed. If the message body size is less than the compressMinMsgBodySize, the message will not be compressed.

Public-facing Changes

Add a new configuration compressMinMsgBodySize to the producer configuration.

Public API

NA

Binary protocol

Configuration

CLI

Metrics

NA

Monitoring

NA

Security Considerations

NA

Backward & Forward Compatibility

Upgrade

This is a new feature, and it does not affect the existing configuration.

Downgrade / Rollback

The new configuration compressMinMsgBodySize will to be removed from the producer configuration. If you used it, you need to remove it manually.

Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations

Alternatives

General Notes

Links