| # PIP-443: Stop using Netty Recycler in new code |
| |
| # Background knowledge |
| |
| Netty [`Recycler`](https://github.com/netty/netty/blob/netty-4.1.127.Final/common/src/main/java/io/netty/util/Recycler.java) is widely used in many places to reduce GC pressure by reusing objects. From the outputs of the following command: |
| |
| ```bash |
| find . -name "*.java" | grep src/main | xargs grep -n "new .*Recycler" |
| ``` |
| |
| You can find: |
| - 5 usages in `pulsar-common` |
| - 10 usages in `pulsar-broker` |
| - 10 usages in `pulsar-client` |
| |
| However, many of them are only the result of cargo cult programming, which blindly believes objects allocated from the `Recycler` is better than allocated from heap memory. Hence, we should understand how it works first. |
| |
| Recycler is basically a thread-local object pool that provides |
| - a `get` method to allocate an object from the pool or call the user-provided `newObject` method to create a new one if the pool is empty or full. |
| - a `recycle` method to return an object back to the pool. |
| |
| Here is a typical implementation that wraps a `Recycler` to manage a triple: |
| |
| ```java |
| @Data // support getter and setter for all fields |
| public class RecyclablePair { |
| |
| private final Recycler.Handle<RecyclablePair> handle; |
| private Object x; |
| private Object y; |
| |
| private RecyclablePair(Handle<RecyclablePair> handle) { |
| this.handle = handle; |
| } |
| |
| public void recycle() { |
| this.x = null; |
| this.y = null; |
| handle.recycle(this); |
| } |
| |
| public static RecyclablePair create(Object x, Object y) { |
| final RecyclablePair s = RECYCLER.get(); |
| s.x = x; |
| s.y = y; |
| return s; |
| } |
| |
| private static final Recycler<RecyclablePair> RECYCLER = new Recycler<RecyclablePair>() { |
| @Override |
| protected RecyclablePair newObject(Handle<RecyclablePair> handle) { |
| return new RecyclablePair(handle); |
| } |
| }; |
| } |
| ``` |
| |
| It's typically used with a "recycle-after-use" pattern like: |
| |
| ```java |
| var triple = RecyclablePair.create(a, b); |
| process(triple); |
| triple.recycle(); |
| ``` |
| |
| or |
| |
| ```java |
| var triple = RecyclablePair.create(a, b); |
| executor.execute(() -> { |
| process(triple); |
| triple.recycle(); |
| }); |
| ``` |
| |
| Unlike a simple thread local pool, recycling in a different thread is supported because in this case, `recycle()` will push the object (`triple` in the code above) into a queue that implements [MessagePassingQueue](https://github.com/JCTools/JCTools/blob/v4.0.5/jctools-core/src/main/java/org/jctools/queues/MessagePassingQueue.java). |
| |
| The following code snippet of the thread local pool used by `Recycler` includes some key parts: |
| |
| ```java |
| private static final class LocalPool<T> implements MessagePassingQueue.Consumer<DefaultHandle<T>> { |
| private final ArrayDeque<DefaultHandle<T>> batch; |
| private volatile Thread owner; |
| private volatile MessagePassingQueue<DefaultHandle<T>> pooledHandles; |
| ``` |
| |
| When `get` is called, it firsts get the thread local `LocalPool` and apply the following logic: |
| 1. Drain some handles from `pooledHandles` to `batch` if `batch` is empty. |
| 2. Poll a handle from `batch` and mark it as "claimed" (used by the current thread). |
| 3. If `handle` is available, call the `get` method to return the object. In the example above, `RecyclablePair#create` calls this `get` method to get a pooled object and initializes some fields. |
| 4. Otherwise, call `newObject` to create a new object from heap memory and associated with a new handle. With the default config, the handle is a "no-op handle" in 7/8 cases that won't push the object into `pooledHandles` when `recycle` is called. In the rest 1/8 cases, the handle type is a `DefaultHandle` that is associated with the `LocalPool` in this thread and can be pushed to `pooledHandles` and `batch` |
| |
| When `recycle` is called if the handle is a `DefaultHandle`: |
| 1. **Good Case**: if the object is allocated by `get()` in a `FastThreadLocalThread` and `recycle()` is called in the same thread, the handle is pushed to `batch` directly. |
| 2. **Bad Case**: otherwise, the handle is pushed to `pooledHandles` queue, which is less efficient. |
| |
| # Motivation |
| |
| There are mainly three issues of using `Recycler` in Pulsar. |
| |
| ## Issue 1: performance overhead |
| |
| As explained in the previous section, we can see that if the recyclable object like `RecyclablePair` is allocated not in a `FastThreadLocalThread`, it's not efficient. Here is an existing benchmark result running on a GitHub Actions Ubuntu 24.04 runner: |
| |
| ``` |
| recycler.RecyclerBenchmark.testRecord thrpt 25 468034078.305 ± 1453202.080 ops/s |
| recycler.RecyclerBenchmark.testThreadLocalSingletonRecycler thrpt 25 137434124.938 ± 173674.317 ops/s |
| recycler.RecyclerBenchmark.testThreadLocalQueueRecycler thrpt 25 70585776.893 ± 418944.351 ops/s |
| recycler.RecyclerBenchmark.testRecycler thrpt 25 36061222.443 ± 368182.450 ops/s |
| ``` |
| |
| - `testRecord` allocates a simple tuple `record` object from heap memory |
| - `testThreadLocalSingletonRecycler` allocates a tuple object from a thread local object |
| - `testThreadLocalQueueRecycler` allocates a recyclable tuple object from a thread local queue (`ArrayDeque`) and add it to the queue (it simulates the case that `recycle` is called in a `FastThreadLocalThread` and recycled in the same thread) |
| - `testRecycler` allocates a recyclable tuple object using `Recycler` and calling `recycle` then |
| |
| The full benchmark code is [here](https://github.com/BewareMyPower/JavaBenchmark/blob/v0.1.0/src/main/java/io/bewaremypower/recycler/RecyclerBenchmark.java). |
| |
| From the results above, the performance is about 7x worse than an object from heap memory in the good case and 13x worse in the bad case. |
| |
| ## Issue 2: error-prone usage |
| |
| Compared to the performance overhead, the error-prone usage is a more serious issue. Users have to call `recycle()` explicitly for each object like the error-prone `malloc/free` in C language. |
| |
| ```java |
| var triple = RecyclablePair.create(a, b); |
| process(triple); |
| triple.recycle(); |
| ``` |
| |
| For example, if `process` in the code above throws an exception, `recycle()` will be skipped. [#13233](https://github.com/apache/pulsar/pull/13233) is a real world fix for such issue. Though the issue is not serious because the worst case is that the pool is full and then the object will be allocated from heap buffer. |
| |
| A more serious issue is that the object is shared among different threads and then `recycle()` are called twice, which will throw an exception for the 2nd recycle call. [#24697](https://github.com/apache/pulsar/issues/24697) suffers this issue and was not fixed when this proposal is written. Since the exception is thrown by `recycle()` during the read path, it affects the client side that the consumer cannot receive messages anymore. [#24725](https://github.com/apache/pulsar/pull/24725) fixes an issue that a recyclable object is unexpected shared between two threads, while if the object is not recyclable, the worst case is that the same batched message is acknowledged twice. |
| |
| The most serious issue is that if the object is shared by two threads and one thread has recycled it, the second thread could access: |
| - the object whose fields are reset with initial values by `recycle()` |
| - or the object that has been reused in another task by `get()` |
| |
| The 1st case might lead to NPE, which might be relatively fine because it's exposed in logs. The 2nd case is really dangerous because the object is in an inconsistent state and might not be exposed with any exception. |
| |
| Since the ownership of a recyclable object must be unique, the move semantics can be implemented like: |
| |
| ```java |
| @RequiredArgsConstructor |
| @Getter |
| static class MovableRecyableObject { |
| |
| private static final Recycler<MovableRecyableObject> RECYCLER = new Recycler<MovableRecyableObject>() { |
| @Override |
| protected MovableRecyableObject newObject(Handle<MovableRecyableObject> handle) { |
| return new MovableRecyableObject(handle); |
| } |
| }; |
| |
| private final Recycler.Handle<MovableRecyableObject> handle; |
| private Object object; |
| |
| public static MovableRecyableObject create(Object object) { |
| final var result = RECYCLER.get(); |
| result.object = object; |
| return result; |
| } |
| |
| public MovableRecyableObject move() { |
| final var result = MovableRecyableObject.create(this.object); |
| this.recycle(); |
| return result; |
| } |
| |
| public void recycle() { |
| this.object = null; |
| handle.recycle(this); |
| } |
| } |
| ``` |
| |
| Before publishing the recyclable object to a different thread, `move()` must be called, so that if the object is accessed by a third thread later, it will fail fast with NPE. However, this makes code harder to write that developers have to follow this pattern. Not calling `move()` won't lead to a compile-time issue or runtime issue. |
| |
| Without following the move-before-publish pattern, we have to make `recycle` synchronized, which will have more performance overhead: |
| |
| ```java |
| public synchronized void recycle() { |
| if (this.object != null) { |
| this.object = null; |
| handle.recycle(this); |
| } |
| } |
| ``` |
| |
| ## Issue 3: questionable benefits |
| |
| How much GC pressure can be reduced by sacrificing the memory allocation latency is hard to measure and never proved by any benchmark. When `Recycler` was introduced in Netty, Java 8 was even not released. The main use case of `Recycler` in Netty is for `ByteBuf` to avoid the time-consuming buffer memory allocation from direct memory, where the buffer is used as a request for users to process in the `FastThreadLocalThread`. |
| |
| However, Java GC has involved a lot in these years. For example, Generational ZGC, which is introduced since Java 21, performs much better on tail latencies than the previous garbage collectors. Here are some blogs that verify the improvement: |
| - https://www.morling.dev/blog/lower-java-tail-latencies-with-zgc/ |
| - https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b |
| |
| It's questionable if `Recycler` is still useful in modern Java versions. Even if assuming a usage of `Recycler` can reduce the GC pressure significantly in a scenario, the benefit can be easily broken by a slight change of the code, which is hard to be noticed and verified. |
| |
| In Pulsar, Netty `Recycler` is typically used to wrap a list of objects to avoid passing them as method parameters, which makes methods too long with duplicated parameters. For example, [`ManagedLedgerImpl#asyncReadOpEntry`](https://github.com/apache/pulsar/blob/d833b8bef21cb9e85f0f313eb9d49c7ca550fbbd/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2064) accepts an `OpReadEntry` instance, which is recycable and includes many fields. This method is called for each read operation, which should happen frequently, so technically it can save heap memory allocations. |
| |
| However, as the code becomes complicated, after being passed to the entry cache to process, the `OpReadEntry` is wrapped as another callback object: https://github.com/apache/pulsar/blob/d833b8bef21cb9e85f0f313eb9d49c7ca550fbbd/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java#L349-L365 |
| |
| Now, if the recycler design on `OpReadEntry` saved 1 of 1 heap memory allocation for each read operation, now it just saves 1 of 2 heap memory allocations. What's more, this wrapped callback is passed to `PendingReadsManager` and eventually used as a field of a new record type `ReadEntriesCallbackWithContext` to an `ArrayList` in https://github.com/apache/pulsar/blob/d833b8bef21cb9e85f0f313eb9d49c7ca550fbbd/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java#L256 |
| |
| Now, each read operation has: |
| - 1 object allocated by Netty recycler (`OpReadEntry`) |
| - 1 object allocated from heap memory (the wrapped callback in `RangeEntryCacheImpl`) |
| - 1 object allocated from heap memory (the `ReadEntriesCallbackWithContext` object that has a field of the wrapped callback) |
| - This object is added to an `ArrayList` allocated from heap memory with at least 1 element |
| |
| Eventually, only saving the 1st heap memory allocation should not make a difference. Actually, many optimizations in garbage collectors can prevent short-lived objects from being allocated from heap memory. |
| |
| # Goals |
| |
| Make a rule for whether to use Netty `Recycler` to avoid debates when reviewing code like "the recycler is good that you should not remove it". |
| |
| # High Level Design |
| |
| For new contributions, don't use Netty `Recycler`. |
| |
| If you need to save heap memory usage, consider using `FastThreadLocal` to cache a single object or an object queue like What Netty `Recycler` does for the **Good Case** mentioned in the **Background knowledge** section. This would also be more effective for normal threads than Netty `Recycler`. |
| |
| For existing code, generally we should not pay efforts to remove recyclable classes. But it's acceptable to remove them especially when: |
| - A known recycler issue is reported but the reason is hard to analyze. (example: [#24741](https://github.com/apache/pulsar/pull/24741)) |
| - The existing recyclable object makes code refactoring hard |
| |
| # Alternatives |
| |
| ## Add an option to allow whether to use Netty recycler |
| |
| This would be helpful to figure out if the GC pressure can be reduced once it's enabled. However, the code would be much more complicated. Additionally, measuring the GC pressure is not easy. In many dashboards, the GC time is mistakenly called as "GC pause time", where there is no actual pause. When objects aren't recycled, it's expected for some use cases that more CPU will be spent in GC. However the Netty Recycler consumes a significant share of CPU too and there's no metric, so just looking at GC metrics increasing wouldn't be a way how to see if getting rid of Netty Recyclers is useful. |
| |
| # Links |
| |
| * Mailing List discussion thread: http://lists.apache.org/thread/bvllwld260g39v5hfsyr8gl2bwb1kxqz |
| * Mailing List voting thread: https://lists.apache.org/thread/bxonjp38p7yzdw8ytk6bkp344nd9vw45 |