PIP-326: Create a BOM to ease dependency management

Background knowledge

A Bill of Materials (BOM) is a special kind of POM that is used to control the versions of a project’s dependencies and provide a central place to define and update those versions. A BOM dependency ensure that all dependencies (both direct and transitive) are at the same version specified in the BOM.

To illustrate, consider the Spring Data BOM which declares the version for each of the published Spring Data modules. Without a BOM, consuming applications must specify the version on each of the imported Spring Data module dependencies. However, when using a BOM the version numbers can be omitted.

Motivation

The BOM provides the following benefits for consuming applications:

  1. Reduce burden by not having to specify the version in multiple locations
  2. Reduce chance of version mismatch (and therefore errors)

The burden and chance of version mismatch is directly proportional to the number of modules published by a project. Pulsar publishes many (29) modules and therefore consuming applications are likely to run into the above issues.

A concrete example of the above symptoms can be found in the Spring Boot BOM which provides a section for the list of Pulsar module dependencies as follows:

library("Pulsar", "3.1.1") {
    group("org.apache.pulsar") {
        modules = [
            "bouncy-castle-bc",
            "bouncy-castle-bcfips",
            "pulsar-client-1x-base",
            "pulsar-client-1x",
            "pulsar-client-2x-shaded",
            "pulsar-client-admin-api",
            "pulsar-client-admin-original",
            "pulsar-client-admin",
            "pulsar-client-all",
            "pulsar-client-api",
            "pulsar-client-auth-athenz",
            "pulsar-client-auth-sasl",
            "pulsar-client-messagecrypto-bc",
            "pulsar-client-original",
            "pulsar-client-tools-api",
            "pulsar-client-tools",
            "pulsar-client",
            "pulsar-common",
            "pulsar-config-validation",
            "pulsar-functions-api",
            "pulsar-functions-proto",
            "pulsar-functions-utils",
            "pulsar-io-aerospike",
            "pulsar-io-alluxio",
            "pulsar-io-aws",
            "pulsar-io-batch-data-generator",
            "pulsar-io-batch-discovery-triggerers",
            "pulsar-io-canal",
            "pulsar-io-cassandra",
            "pulsar-io-common",
            "pulsar-io-core",
            "pulsar-io-data-generator",
            "pulsar-io-debezium-core",
            "pulsar-io-debezium-mongodb",
            "pulsar-io-debezium-mssql",
            "pulsar-io-debezium-mysql",
            "pulsar-io-debezium-oracle",
            "pulsar-io-debezium-postgres",
            "pulsar-io-debezium",
            "pulsar-io-dynamodb",
            "pulsar-io-elastic-search",
            "pulsar-io-file",
            "pulsar-io-flume",
            "pulsar-io-hbase",
            "pulsar-io-hdfs2",
            "pulsar-io-hdfs3",
            "pulsar-io-http",
            "pulsar-io-influxdb",
            "pulsar-io-jdbc-clickhouse",
            "pulsar-io-jdbc-core",
            "pulsar-io-jdbc-mariadb",
            "pulsar-io-jdbc-openmldb",
            "pulsar-io-jdbc-postgres",
            "pulsar-io-jdbc-sqlite",
            "pulsar-io-jdbc",
            "pulsar-io-kafka-connect-adaptor-nar",
            "pulsar-io-kafka-connect-adaptor",
            "pulsar-io-kafka",
            "pulsar-io-kinesis",
            "pulsar-io-mongo",
            "pulsar-io-netty",
            "pulsar-io-nsq",
            "pulsar-io-rabbitmq",
            "pulsar-io-redis",
            "pulsar-io-solr",
            "pulsar-io-twitter",
            "pulsar-io",
            "pulsar-metadata",
            "pulsar-presto-connector-original",
            "pulsar-presto-connector",
            "pulsar-sql",
            "pulsar-transaction-common",
            "pulsar-websocket"
        ]
    }
}

The problem with this hardcoded approach is that the Spring Boot team is not the expert of Pulsar and this list of modules could become stale and/or invalid rather easily. A better suitor for this specification is the Pulsar team, the subject-matter-experts who know exactly what is going on with Pulsar (which modules are available and what those version(s) should be).

If there were a Pulsar BOM, the above Spring Boot dependency section would shrink down to the following:

library("Pulsar", "3.1.1") {
    group("org.apache.pulsar") {
        imports = [
            "pulsar-bom"
        ]
    }
}

It is worth noting that This is an industry best practice and more often than not, a library provides a BOM. A handful of examples can be found in the “Links” section at the bottom of this document.

Goals

Provide a Pulsar BOM in order to solve the issues listed in the motivation section.

In Scope

The intention is to create a single BOM for all published Pulsar modules. The benefit goes to consumers of the project (our users) as described in the motivation.

Out of Scope

This proposal is not attempting to create various BOMs that are tailored to specific usecases.

High Level Design

  1. From a build target, generate a list of published Pulsar modules
  2. From the list of modules, generate a BOM Maven POM file
  3. Publish the BOM artifact as any other Pulsar module is published

Detailed Design

Leaving the detailed design out of the PIP for now. There is a working prototype and more details will be revealed when and if the PIP is approved.

Public-facing Changes

NA (new addition)

Public API

The only public “API” is the newly published POM artifact.

Binary protocol

NA

Configuration

NA

CLI

NA

Metrics

NA

Monitoring

NA

Security Considerations

NA

Backward & Forward Compatibility

NA

Revert

  1. Deprecate the POM module in version m.n.p.
  2. Stop producing subsequent POM modules in version m.n+1.0

Upgrade

NA

Alternatives

Continue on as-is, not publishing a BOM.

General Notes

Links

Example OSS projects with BOMs

Threads