blob: 85e9aeeffb89a70920edfc7e34808df7832039ff [file] [log] [blame] [view]
# PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance
# Background knowledge
Pulsar provide a way to compress messages before sending them to the broker[0]. This can be done by setting the `compressionType` in the producer configuration.
The compressionType can be set to one of the following values:
- LZ4
- ZLIB
- ZSTD
- SNAPPY
But the compressionType is applied to all messages sent by the producer. This means that even small messages are compressed.
In our test, we found that compressing small messages can is meaningless. The compression ratio is low and spend more cpu.
The relevant description in the official documentation:
>The smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms. [1]
The similar configuration in RocketMQ is `compressMsgBodyOverHowmuch`[2]:
>/**
>* Compress message body threshold, namely, message body larger than 4k will be compressed on default.
>*/
>private int compressMsgBodyOverHowmuch = 1024 * 4;
[0] https://pulsar.apache.org/docs/4.0.x/concepts-messaging/#compression
[1] https://github.com/facebook/zstd?tab=readme-ov-file#the-case-for-small-data-compression
[2] https://github.com/apache/rocketmq/blob/dd62ed0f3b16919adec5d5eece21a1050dc9c5a0/client/src/main/java/org/apache/rocketmq/client/producer/DefaultMQProducer.java#L117
# Motivation
The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages.
We want to add a new configuration `compressMinMsgBodySize` to the producer configuration.
This configuration will allow the user to set the minimum size of the message body that will be compressed.
If the message body size is less than the `compressMinMsgBodySize`, the message will not be compressed.
# Goals
## In Scope
Add a new configuration `compressMinMsgBodySize` to the producer configuration.
## Out of Scope
Solve the compression problem of small data
# High Level Design
# Detailed Design
## Design & Implementation Details
Add a new configuration `compressMinMsgBodySize` to the producer configuration.
This configuration will allow the user to set the minimum size of the message body that will be compressed.
If the message body size is less than the `compressMinMsgBodySize`, the message will not be compressed.
## Public-facing Changes
Add a new configuration `compressMinMsgBodySize` to the producer configuration.
### Public API
NA
### Binary protocol
### Configuration
### CLI
### Metrics
NA
# Monitoring
NA
# Security Considerations
NA
# Backward & Forward Compatibility
## Upgrade
This is a new feature, and it does not affect the existing configuration.
## Downgrade / Rollback
The new configuration `compressMinMsgBodySize` will to be removed from the producer configuration.
If you used it, you need to remove it manually.
## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations
<!--
Describe what needs to be considered in Pulsar Geo-Replication in the upgrade and possible downgrade/rollback of this feature.
-->
# Alternatives
<!--
If there are alternatives that were already considered by the authors or, after the discussion, by the community, and were rejected, please list them here along with the reason why they were rejected.
-->
# General Notes
# Links
<!--
Updated afterwards
-->
* Mailing List discussion thread: https://lists.apache.org/thread/vxvy7h61hg9wlgby6lcpkm9osdk9sx20
* Mailing List voting thread: https://lists.apache.org/thread/xv7x3vmycxzsrhbdo7vmssh8lxxzyxd5