| # PIP-384: ManagedLedger interface decoupling |
| |
| ## Background knowledge |
| |
| Apache Pulsar uses a component called ManagedLedger to handle persistent storage of messages. |
| |
| The ManagedLedger interfaces and implementation were initially tightly coupled, making it difficult to introduce alternative implementations or improve the architecture. |
| This PIP documents changes that have been made in the master branch for Pulsar 4.0. Pull Requests [#22891](https://github.com/apache/pulsar/pull/22891) and [#23311](https://github.com/apache/pulsar/pull/23311) have already been merged. |
| This work happened after lazy consensus on the dev mailing list based on the discussion thread ["Preparing for Pulsar 4.0: cleaning up the Managed Ledger interfaces"](https://lists.apache.org/thread/l5zjq0fb2dscys3rsn6kfl7505tbndlx). |
| There is one remaining PR [#23313](https://github.com/apache/pulsar/pull/23313) at the time of writing this document. |
| The goal of this PIP is to document the changes in this area for later reference. |
| |
| Key concepts: |
| |
| - **ManagedLedger**: A component that handles the persistent storage of messages in Pulsar. |
| - **BookKeeper**: The default storage system used by ManagedLedger. |
| - **ManagedLedgerStorage interface**: A factory for configuring and creating the `ManagedLedgerFactory` instance. [ManagedLedgerStorage.java source code](https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/storage/ManagedLedgerStorage.java) |
| - **ManagedLedgerFactory interface**: Creates and manages ManagedLedger instances. [ManagedLedgerFactory.java source code](https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedLedgerFactory.java) |
| - **ManagedLedger interface**: Handles the persistent storage of messages in Pulsar. [ManagedLedger.java source code](https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedLedger.java) |
| - **ManagedCursor interface**: Handles the persistent storage of Pulsar subscriptions and related message acknowledgements. [ManagedCursor.java source code](https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedCursor.java) |
| |
| ## Motivation |
| |
| The current ManagedLedger implementation faces several challenges: |
| |
| 1. **Tight coupling**: The interfaces are tightly coupled with their implementation, making it difficult to introduce alternative implementations. |
| |
| 2. **Limited flexibility**: The current architecture doesn't allow for easy integration of different storage systems or optimizations. |
| |
| 3. **Dependency on BookKeeper**: The ManagedLedger implementation is closely tied to BookKeeper, limiting options for alternative storage solutions. |
| |
| 4. **Complexity**: The tight coupling increases the overall complexity of the system, making it harder to maintain, test and evolve. |
| |
| 5. **Limited extensibility**: Introducing new features or optimizations often requires changes to both interfaces and implementations. |
| |
| ## Goals |
| |
| ### In Scope |
| |
| - Decouple ManagedLedger interfaces from their current implementation. |
| - Introduce a ReadOnlyManagedLedger interface. |
| - Decouple OpAddEntry and LedgerHandle from ManagedLedgerInterceptor. |
| - Enable support for multiple ManagedLedgerFactory instances. |
| - Decouple BookKeeper client from ManagedLedgerStorage. |
| - Improve overall architecture by reducing coupling between core Pulsar components and specific ManagedLedger implementations. |
| - Prepare the groundwork for alternative ManagedLedger implementations in Pulsar 4.0. |
| |
| ### Out of Scope |
| |
| - Implementing alternative ManagedLedger storage backends. |
| - Changes to external APIs or behaviors. |
| - Comprehensive JavaDocs for the interfaces. |
| |
| ## High Level Design |
| |
| 1. **Decouple interfaces from implementations**: |
| - Move required methods from implementation classes to their respective interfaces. |
| - Update code to use interfaces instead of concrete implementations. |
| |
| 2. **Introduce ReadOnlyManagedLedger interface**: |
| - Extract this interface to decouple from ReadOnlyManagedLedgerImpl. |
| - Adjust code to use the new interface where appropriate. |
| |
| 3. **Decouple ManagedLedgerInterceptor**: |
| - Introduce AddEntryOperation and LastEntryHandle interfaces. |
| - Adjust ManagedLedgerInterceptor to use these new interfaces. |
| |
| 4. **Enable multiple ManagedLedgerFactory instances**: |
| - Modify ManagedLedgerStorage interface to support multiple "storage classes". |
| - Implement BookkeeperManagedLedgerStorageClass for BookKeeper support. |
| - Update PulsarService and related classes to support multiple ManagedLedgerFactory instances. |
| - Add "storage class" to persistence policy part of the namespace level or topic level policies. |
| |
| 5. **Decouple BookKeeper client**: |
| - Move BookKeeper client creation and management to BookkeeperManagedLedgerStorageClass. |
| - Update ManagedLedgerStorage interface to remove direct BookKeeper dependencies. |
| |
| ## Detailed Design |
| |
| ### Interface Decoupling |
| |
| 1. Update ManagedLedger interface: |
| - Add methods from ManagedLedgerImpl to the interface. |
| - Remove dependencies on implementation-specific classes. |
| |
| 2. Update ManagedLedgerFactory interface: |
| - Add necessary methods from ManagedLedgerFactoryImpl. |
| - Remove dependencies on implementation-specific classes. |
| |
| 3. Update ManagedCursor interface: |
| - Add required methods from ManagedCursorImpl. |
| - Remove dependencies on implementation-specific classes. |
| |
| 4. Introduce ReadOnlyManagedLedger interface: |
| - Extract methods specific to read-only operations. |
| - Update relevant code to use this interface where appropriate. |
| |
| 5. Decouple ManagedLedgerInterceptor: |
| - Introduce AddEntryOperation interface for beforeAddEntry method. |
| - Introduce LastEntryHandle interface for onManagedLedgerLastLedgerInitialize method. |
| - Update ManagedLedgerInterceptor to use these new interfaces. |
| |
| ### Multiple ManagedLedgerFactory Instances |
| |
| 1. Update ManagedLedgerStorage interface: |
| - Add methods to support multiple storage classes. |
| - Introduce getManagedLedgerStorageClass method to retrieve specific storage implementations. |
| |
| 2. Implement BookkeeperManagedLedgerStorageClass: |
| - Create a new class implementing ManagedLedgerStorageClass for BookKeeper. |
| - Move BookKeeper client creation and management to this class. |
| |
| 3. Update PulsarService and related classes: |
| - Modify to support creation and management of multiple ManagedLedgerFactory instances. |
| - Update configuration to allow specifying different storage classes for different namespaces or topics. |
| |
| ### BookKeeper Client Decoupling |
| |
| 1. Update ManagedLedgerStorage interface: |
| - Remove direct dependencies on BookKeeper client. |
| - Introduce methods to interact with storage without exposing BookKeeper specifics. |
| |
| 2. Implement BookkeeperManagedLedgerStorageClass: |
| - Encapsulate BookKeeper client creation and management. |
| - Implement storage operations using BookKeeper client. |
| |
| 3. Update relevant code: |
| - Replace direct BookKeeper client usage with calls to ManagedLedgerStorage methods. |
| - Update configuration handling to support BookKeeper-specific settings through the new storage class. |
| |
| ## Public-facing Changes |
| |
| ### Configuration |
| |
| - Add new configuration option to specify default ManagedLedger "storage class" at broker level. |
| |
| ### API Changes |
| |
| - No major changes to external APIs are planned. |
| - The only API change is to add `managedLedgerStorageClassName` to `PersistencePolicies` which can be used by a custom `ManagedLedgerStorage` to control the ManagedLedgerFactory instance that is used for a particular namespace or topic. |
| |
| ## Backward & Forward Compatibility |
| |
| The changes are internal and don't affect external APIs or behaviors. |
| Backward compatibility is fully preserved in Apache Pulsar. |
| |
| ## Security Considerations |
| |
| The decoupling of interfaces and implementation doesn't introduce new security concerns. |
| |
| ## Links |
| |
| - Initial mailing List discussion thread: [Preparing for Pulsar 4.0: cleaning up the Managed Ledger interfaces](https://lists.apache.org/thread/l5zjq0fb2dscys3rsn6kfl7505tbndlx) |
| - Merged Pull Request #22891: [Replace dependencies on PositionImpl with Position interface](https://github.com/apache/pulsar/pull/22891) |
| - Merged Pull Request #23311: [Decouple ManagedLedger interfaces from the current implementation](https://github.com/apache/pulsar/pull/23311) |
| - Implementation Pull Request #23313: [Decouple Bookkeeper client from ManagedLedgerStorage and enable multiple ManagedLedgerFactory instances](https://github.com/apache/pulsar/pull/23313) |
| - Mailing List PIP discussion thread: https://lists.apache.org/thread/rtnktrj7tp5ppog0235t2mf9sxrdpfr8 |
| - Mailing List PIP voting thread: https://lists.apache.org/thread/4jj5dmk6jtpq05lcd6dxlkqpn7hov5gv |