[Tiered Storage] Prevent Class Loader Leak; Restore Offloader Directory Override (#9878)

### Motivation

In Pulsar 2.7.0, there is a class loader leak. It looks like https://github.com/apache/pulsar/pull/8739 fixed the leak by only loading the offloader classes for the directory configured in `broker.conf`. However, the solution in https://github.com/apache/pulsar/pull/8739 ignores the fact that an offload policy can override the the offloaded directory. As such, there could be a regression in 2.7.1 if users are providing multiple offload directories.

This PR returns the functionality without reintroducing the class loader leak.

### Modifications

Update the `PulsarService` and the `PulsarConnectorCache` classes to use a map from directory strings to `Offloaders`.

### Alternative Approaches

The new `Map` has keys of type `String`, but we could use keys of type `Path` and then normalize the paths to ensure that `./offloaders` and `offloaders` result in a single class loader. However, it looks like the `normalize` method in the path class has a warning about symbolic links. As such, I went with the basic `String` approach, which might lead to some duplication of loaded classes. Below is the javadoc for `normalize`, in case that helps for a design decision.

```java
  /**
     * Returns a path that is this path with redundant name elements eliminated.
     *
     * <p> The precise definition of this method is implementation dependent but
     * in general it derives from this path, a path that does not contain
     * <em>redundant</em> name elements. In many file systems, the "{@code .}"
     * and "{@code ..}" are special names used to indicate the current directory
     * and parent directory. In such file systems all occurrences of "{@code .}"
     * are considered redundant. If a "{@code ..}" is preceded by a
     * non-"{@code ..}" name then both names are considered redundant (the
     * process to identify such names is repeated until it is no longer
     * applicable).
     *
     * <p> This method does not access the file system; the path may not locate
     * a file that exists. Eliminating "{@code ..}" and a preceding name from a
     * path may result in the path that locates a different file than the original
     * path. This can arise when the preceding name is a symbolic link.
     *
     * @return  the resulting path or this path if it does not contain
     *          redundant name elements; an empty path is returned if this path
     *          does have a root component and all name elements are redundant
     *
     * @see #getParent
     * @see #toRealPath
     */
    Path normalize();
```

### Verifying this change

This change is a code cleanup without any test coverage that should be covered by other tests. If required, I can create some tests.
6 files changed
tree: f180cb5554bdc514825a370964e7897a52f2e780
  1. .github/
  2. .test-infra/
  3. .travis/
  4. bin/
  5. bouncy-castle/
  6. build/
  7. buildtools/
  8. conf/
  9. dashboard/
  10. deployment/
  11. dev/
  12. distribution/
  13. docker/
  14. docker-compose/
  15. jclouds-shaded/
  16. kafka-connect-avro-converter-shaded/
  17. managed-ledger/
  18. pulsar-broker/
  19. pulsar-broker-auth-athenz/
  20. pulsar-broker-auth-sasl/
  21. pulsar-broker-common/
  22. pulsar-broker-shaded/
  23. pulsar-client/
  24. pulsar-client-1x-base/
  25. pulsar-client-admin/
  26. pulsar-client-admin-api/
  27. pulsar-client-admin-shaded/
  28. pulsar-client-all/
  29. pulsar-client-api/
  30. pulsar-client-auth-athenz/
  31. pulsar-client-auth-sasl/
  32. pulsar-client-cpp/
  33. pulsar-client-messagecrypto-bc/
  34. pulsar-client-shaded/
  35. pulsar-client-tools/
  36. pulsar-client-tools-test/
  37. pulsar-common/
  38. pulsar-config-validation/
  39. pulsar-discovery-service/
  40. pulsar-function-go/
  41. pulsar-functions/
  42. pulsar-io/
  43. pulsar-metadata/
  44. pulsar-package-management/
  45. pulsar-proxy/
  46. pulsar-sql/
  47. pulsar-testclient/
  48. pulsar-transaction/
  49. pulsar-websocket/
  50. pulsar-zookeeper/
  51. pulsar-zookeeper-utils/
  52. site2/
  53. src/
  54. testmocks/
  55. tests/
  56. tiered-storage/
  57. .asf.yaml
  58. .gitignore
  59. .travis.yml
  60. CONTRIBUTING.md
  61. CONTRIBUTORS.md
  62. faq.md
  63. LICENSE
  64. lombok.config
  65. NOTICE
  66. pom.xml
  67. README.md
README.md

logo

Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.

Learn more about Pulsar at https://pulsar.apache.org

Main features

  • Horizontally scalable (Millions of independent topics and millions of messages published per second)
  • Strong ordering and consistency guarantees
  • Low latency durable storage
  • Topic and queue semantics
  • Load balancer
  • Designed for being deployed as a hosted service:
    • Multi-tenant
    • Authentication
    • Authorization
    • Quotas
    • Support mixing very different workloads
    • Optional hardware isolation
  • Keeps track of consumer cursor position
  • REST API for provisioning, admin and stats
  • Geo replication
  • Transparent handling of partitioned topics
  • Transparent batching of messages

Repositories

This repository is the main repository of Apache Pulsar. Pulsar PMC also maintains other repositories for components in the Pulsar ecosystem, including connectors, adapters, and other language clients.

Helm Chart

Ecosystem

Clients

Dashboard & Management Tools

Documentation

CI/CD

Build Pulsar

Requirements:

  • Java 8 JDK (for building Pulsar)
    • When building Pulsar on a higher version (higher than Java 8), the resulting artifacts are not compatible with Java 8 runtime because of some issues, such as issue 8445.
  • Maven 3.6.1+

Compile and install:

$ mvn install -DskipTests

Minimal build (This skips most of external connectors and tiered storage handlers)

mvn install -Pcore-modules

Run Unit Tests:

$ mvn test

Run Individual Unit Test:

$ cd module-name (e.g: pulsar-client)
$ mvn test -Dtest=unit-test-name (e.g: ConsumerBuilderImplTest)

Run Selected Test packages:

$ cd module-name (e.g: pulsar-broker)
$ mvn test -pl module-name -Dinclude=org/apache/pulsar/**/*.java

Start standalone Pulsar service:

$ bin/pulsar standalone

Check https://pulsar.apache.org for documentation and examples.

Setting up your IDE

Apache Pulsar is using lombok so you have to ensure your IDE setup with required plugins.

Intellij

Configure annotation processing in IntelliJ

  1. Open Annotation Processors Settings dialog box by going to Settings -> Build, Execution, Deployment -> Compiler -> Annotation Processors.

  2. Select the following buttons:

    1. “Enable annotation processing”
    2. “Obtain processors from project classpath”
    3. “Store generated sources relative to: Module content root”
  3. Set the generated source directories to be equal to the Maven directories:

    1. Set “Production sources directory:” to “target/generated-sources/annotations”.
    2. Set “Test sources directory:” to “target/generated-test-sources/test-annotations”.
  4. Click “OK”.

  5. Install the lombok plugin in intellij.

Further configuration in IntelliJ

  • When working on the Pulsar core modules in IntelliJ, reduce the number of active projects in IntelliJ to speed up IDE actions and reduce unrelated IDE warnings.

    • In IntelliJ‘s Maven UI’s tree view under “Profiles”
      • Activate “core-modules” Maven profile
      • De-activate “main” Maven profile
      • Run the “Reload All Maven Projects” action from the Maven UI toolbar. You can also find the action by the name in the IntelliJ “Search Everywhere” window that gets activated by pressing the Shift key twice.
  • Run the “Generate Sources and Update Folders For All Projects” action from the Maven UI toolbar. You can also find the action by the name in the IntelliJ “Search Everywhere” window that gets activated by pressing the Shift key twice. Running the action takes about 10 minutes for all projects. This is faster when the “core-modules” profile is the only active profile.

IntelliJ usage tips

  • In the case of compilation errors with missing Protobuf classes, ensure to run the “Generate Sources and Update Folders For All Projects” action.

  • All of the Pulsar source code doesn't compile properly in IntelliJ and there are compilation errors.

    • Use the “core-modules” profile if working on the Pulsar core modules since the source code for those modules can be compiled in IntelliJ.
    • Sometimes it might help to mark a specific project ignored in IntelliJ Maven UI by right-clicking the project name and select Ignore Projects from the menu.
    • Currently, it is not always possible to run unit tests directly from the IDE because of the compilation issues. As a workaround, individual test classes can be run by using the mvn test -Dtest=TestClassName command.

Eclipse

Follow the instructions here to configure your Eclipse setup.

Build Pulsar docs

Refer to the docs README.

Contact

Mailing lists
NameScope
users@pulsar.apache.orgUser-related discussionsSubscribeUnsubscribeArchives
dev@pulsar.apache.orgDevelopment-related discussionsSubscribeUnsubscribeArchives
Slack

Pulsar slack channel at https://apache-pulsar.slack.com/

You can self-register at https://apache-pulsar.herokuapp.com/

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Crypto Notice

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software: Pulsar uses the SSL library from Bouncy Castle written by http://www.bouncycastle.org.