As we discussed along the discussion: https://lists.apache.org/thread/6jfs02ovt13mnhn441txqy5m6knw6rr8
Problem Statement: We‘ve encountered a critical issue in our Apache Pulsar clusters where brokers experience Out-Of-Memory (OOM) errors and continuous restarts under specific load patterns. This occurs when Netty channel write buffers become full, leading to a buildup of unacknowledged responses in the broker’s memory.
Background: Our clusters are configured with numerous namespaces, each containing approximately 8,000 to 10,000 topics. Our consumer applications are quite large, with each consumer using a regular expression (regex) pattern to subscribe to all topics within a namespace.
The problem manifests particularly during consumer application restarts. When a consumer restarts, it issues a getTopicsOfNamespace request. Due to the sheer number of topics, the response size is extremely large. This massive response overwhelms the socket output buffer, causing it to fill up rapidly. Consequently, the broker‘s responses get backlogged in memory, eventually leading to the broker’s OOM and subsequent restart loop.
Solution we got:
- Expose Netty channel configuration WRITE_BUFFER_WATER_MARK to pulsar conf
- Stops receive requests continuously once the Netty channel is unwritable, users can use the new config to control the threshold that limits the max bytes that are pending write.
# It relates to configuration "WriteBufferHighWaterMark" of Netty Channel Config. If the number of bytes queued in the write buffer exceeds this value, channel writable state will start to return "false". pulsarChannelWriteBufferHighWaterMark=64k # It relates to configuration "WriteBufferLowWaterMark" of Netty Channel Config. If the number of bytes queued in the write buffer is smaller than this value, channel writable state will start to return "true". pulsarChannelWriteBufferLowWaterMark=32k # Once the writer buffer is full, the channel stops dealing with new requests until it changes to writable pulsarChannelPauseReceivingRequestsIfUnwritable=false After the connection is recovered from an unreadable state, the channel will be rate-limited for a period of time to avoid overwhelming due to the backlog of requests. This parameter defines how many" requests should be allowed in the rate limiting period. pulsarChannelRateLimitingRateAfterResumeFromUnreadable=1000 After the connection is recovered from an unreadable state, the channel will be rate-limited for a period of time to avoid overwhelming due to the backlog of requests. This parameter defines how long the rate limiting should last, in seconds. Once the bytes that are waiting to be sent out reach the pulsarChannelWriteBufferHighWaterMark\", the timer will be reset. pulsarChannelRateLimitingSecondsAfterResumeFromUnreadable=5
With the settings pulsarChannelPauseReceivingRequestsIfUnwritable=false
, the behaviour is exactly the same as the previous.
After setting pulsarChannelPauseReceivingRequestsIfUnwritable
to true
, the channel state will be changed as follows.
channel.writable
to false
when there is too much data that is waiting to be sent out(the size of the data cached in ChannelOutboundBuffer
is larger than {pulsarChannelWriteBufferHighWaterMark}
)channelWritabilityChanged
autoRead
of the Netty channel
.channel.writable
to true
once the size of the data that is waiting to be sent out is less than {pulsarChannelWriteBufferLowWaterMark}
channel.autoRead
to true
).ServerCnxThrottleTracker
, which will track the “throttle count”. When a throttling condition is present, the throttle count is increased and when it‘s no more present, the count is decreased. The autoread should be switched to false when the counter value goes from 0 to 1, and only when it goes back from 1 to 0 should it be set to true again. The autoread flag is no longer controlled directly from the rate limiters. Rate limiters are only responsible for their part, and it’s ServerCnxThrottleTracker that decides when the autoread flag is toggled. See more details pip-322Name | Description | Attributes | Units |
---|---|---|---|
pulsar_server_channel_write_buf_memory_used_bytes | Counter. The memory amount that is occupied by netty write buffers | cluster | - |
Nothing.
Nothing.
Nothing.
Nothing.
Nothing.