| # PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance |
| |
| # Background knowledge |
| Pulsar provide a way to compress messages before sending them to the broker[0]. This can be done by setting the `compressionType` in the producer configuration. |
| The compressionType can be set to one of the following values: |
| - LZ4 |
| - ZLIB |
| - ZSTD |
| - SNAPPY |
| |
| But the compressionType is applied to all messages sent by the producer. This means that even small messages are compressed. |
| |
| In our test, we found that compressing small messages can is meaningless. The compression ratio is low and spend more cpu. |
| The relevant description in the official documentation: |
| >The smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms. [1] |
| |
| The similar configuration in RocketMQ is `compressMsgBodyOverHowmuch`[2]: |
| >/** |
| >* Compress message body threshold, namely, message body larger than 4k will be compressed on default. |
| >*/ |
| >private int compressMsgBodyOverHowmuch = 1024 * 4; |
| |
| [0] https://pulsar.apache.org/docs/4.0.x/concepts-messaging/#compression |
| [1] https://github.com/facebook/zstd?tab=readme-ov-file#the-case-for-small-data-compression |
| [2] https://github.com/apache/rocketmq/blob/dd62ed0f3b16919adec5d5eece21a1050dc9c5a0/client/src/main/java/org/apache/rocketmq/client/producer/DefaultMQProducer.java#L117 |
| |
| # Motivation |
| |
| The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages. |
| We want to add a new configuration `compressMinMsgBodySize` to the producer configuration. |
| This configuration will allow the user to set the minimum size of the message body that will be compressed. |
| If the message body size is less than the `compressMinMsgBodySize`, the message will not be compressed. |
| |
| # Goals |
| |
| ## In Scope |
| |
| Add a new configuration `compressMinMsgBodySize` to the producer configuration. |
| |
| ## Out of Scope |
| |
| Solve the compression problem of small data |
| |
| # High Level Design |
| |
| # Detailed Design |
| |
| ## Design & Implementation Details |
| |
| Add a new configuration `compressMinMsgBodySize` to the producer configuration. |
| This configuration will allow the user to set the minimum size of the message body that will be compressed. |
| If the message body size is less than the `compressMinMsgBodySize`, the message will not be compressed. |
| |
| ## Public-facing Changes |
| |
| Add a new configuration `compressMinMsgBodySize` to the producer configuration. |
| |
| |
| ### Public API |
| NA |
| ### Binary protocol |
| |
| ### Configuration |
| |
| ### CLI |
| |
| ### Metrics |
| |
| NA |
| |
| # Monitoring |
| |
| NA |
| |
| # Security Considerations |
| |
| NA |
| |
| # Backward & Forward Compatibility |
| |
| ## Upgrade |
| |
| This is a new feature, and it does not affect the existing configuration. |
| |
| ## Downgrade / Rollback |
| |
| The new configuration `compressMinMsgBodySize` will to be removed from the producer configuration. |
| If you used it, you need to remove it manually. |
| |
| ## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations |
| |
| <!-- |
| Describe what needs to be considered in Pulsar Geo-Replication in the upgrade and possible downgrade/rollback of this feature. |
| --> |
| |
| # Alternatives |
| |
| <!-- |
| If there are alternatives that were already considered by the authors or, after the discussion, by the community, and were rejected, please list them here along with the reason why they were rejected. |
| --> |
| |
| # General Notes |
| |
| # Links |
| |
| <!-- |
| Updated afterwards |
| --> |
| * Mailing List discussion thread: https://lists.apache.org/thread/vxvy7h61hg9wlgby6lcpkm9osdk9sx20 |
| * Mailing List voting thread: https://lists.apache.org/thread/xv7x3vmycxzsrhbdo7vmssh8lxxzyxd5 |