PIP-433: Optimize the conflicts of the replication and automatic creation mechanisms, including the automatic creation of topics and schemas

Background knowledge

Topic auto-creation by Rest API if you have enabled Geo Replication.

The source broker will copy the REST requests that is a partitioned topic creation to the remote cluster if you have already enabled a namespace-level Geo Replication.

The source broker will do the following things when you try to enable a topic-level Geo-Replication, and you did not enable a namespace-level Geo-Replication.

  • The source broker checks whether the partitioned topic exists under the remote cluster
  • If yes, the source broker compares both partition counts, enabling the topic-level replication if both clusters have the same number of partitions; otherwise, you will get a bad request error.
  • If not, it will create the partitioned topic with the same partition count automatically under the remote cluster.

Topic auto-creation by clients if you have enabled Geo Replication.

  • Client of source cluster: start a consumer/producer for a topic
  • Client of source cluster: Get partitions for topic
    • At this step, the client will try to get the partitioned metadata of the topic. If the topic does not exist, the broker will create the partitioned metadata automatically.
  • Client of source cluster: Start internal consumers/producers for each partition
    • At this step, the client will try to connect to the partition. If the partition does not exist, the broker will create the partitioned metadata automatically.
  • Source broker: starts the geo replicator when a partition is loading up.
    • The geo replicator maintains an internal producer for the topic under the remote cluster.
    • The internal producer is a single partition producer; it will not trigger a partitioned metadata auto-creation.
    • When starting the internal producer, it confirms that the target topic under the remote cluster should be a non-partitioned topic or a partition.
    • (Highlight) Otherwise, prints error logs and stops.

Schemas replication if you have enabled Geo Replication.

The internal producer of the geo replicator starts with an auto-produce schema, copies a new schema it reads from the source cluster to the remote cluster, and it will be stuck once a schema is incompatible to the remote cluster.

Motivation

Issue 1: conflict topic creation if enabled Geo-Replication

Steps to reproduce the issue

  • Configurations of both the source cluster and the remote cluster
    • allowAutoTopicCreation: true
    • allowAutoTopicCreationType: partitioned
    • defaultNumPartitions: 2
  • The namespace public/default exists, but you have not enabled Geo-Replication for the namespace yet.
  • Start a producer for a topic persistent://public/default/topic on the source cluster.
    • It triggers a partitioned topic with 2 partitions created.
  • Enable namespace-level Geo-Replication for the namespace public/default.
  • Start geo replicator for the existing partition public/default/topic-partition-0
  • However, the user wants to allow the replicator to copy messages to the remote cluster.

Issue 2: Replication is stuck because the remote side does not allow schema updates

Steps to reproduce the issue

  • The topicpublic/default/topic has enabled a geo replication.
  • Users controls the topic schema manually, do not allow auto update schema by consumers/producers for both clusters.
  • The internal producer is stuck after the user sets a new schema on the source cluster.
  • However, the user wants to allow the replicator to copy the schema to the remote cluster.

Goals

  1. Add an optional choice: always allow the replicator to register schemas if compatible, even if users set set-is-allow-auto-update-schema to false.
  2. Regarding auto-creation that is triggered by replication, rather than the original solution that triggers auto-creation when the internal producer of the replicator, we add the admin api client into the replicator to call the admin api to create topics.
  3. Checks compatibility between two clusters when enabling namespace-level replication, which includes the following
  • All topics’ partitions that have been created should be the same, including __change_events
  • Auto-creation policies between both clusters should be the same, including broker-level and namespace-level.

Detailed Design

Implementation overview

Regarding goal 2: uses admin API client to trigger topic creation on the remote side

  • Replicators will involve an admin api client, which previously only involved a Pulsar client.
    • Do not trigger a topic auto-creation if the value of the configuration createTopicToRemoteClusterForReplication is false, which keeps the previous behavior. See more details PIP-370: configurable remote topic creation in geo-replication .
    • Create a topic if it does not exist on the remote side, ignoring the auto-creation policies that were defined at the broker-level and namespace-level.
      • Difference with the previous behavior: the previous used producer creation events to trigger a topic auto-creation.
    • Print error logs if the partitions between the source and the remote cluster are different, which keeps the previous behavior.
  • Removes the mechanism that copies Rest API commands that create a partitioned topic to the remote cluster, including the following mechanisms, but will not remove the check logic.
    • Broker replicates the topic creation Rest API request if the namespace-level replication is enabled.
    • Broker triggers a topic creation request to the remote cluster when enabling a topic-level Geo Replication

We skip adding the overview of the implementation, since goals 1 and 3 are simple and clear enough.

Public API

Regarding Goal 1: “always allow the replicator to register schemas if compatible”

The original design of pulsar-admin namespaces set-schema-autoupdate-strategy

pulsar-admin namespaces set-schema-autoupdate-strategy
  -c, --compatibility=<strategyParam>
                        Compatibility level required for new schemas created
                          via a Producer. Possible values (Full, Backward,
                          Forward).
  -d, --disabled        Disable automatic schema updates

To add a new param --enable-for-replicator, which means that always allow the replicator to register a new schema if compatible. The default value is true.

Regarding the goal-2: “uses admin API client to trigger topic creation on the remote side”

pulsar-admin topics create-partitioned-topic <topic name>

  • Previous behavior:
    • It copies the creation request to the remote cluster if the topic does not exist on the remote cluster
    • It creates a partitioned topic on the local cluster
  • The behaviors with the proposal
    • It creates a partitioned topic on the local cluster.

pulsar-admin topics set-replication-clusters -c <clusters> <topic name>

  • Previous behavior:
    • It confirms that both partitioned topics between the two clusters have the same partitions.
    • It copies the creation request to the remote cluster if the topic does not exist on the remote cluster
    • It sets the policy.
  • The behaviors with the proposal
    • It confirms that both partitioned topics between the two clusters have the same partitions.
    • It sets the policy.

Regarding the goal-3: “checks compatibility between two clusters when enabling namespace-level replication”

Add additional checks when calling pulsar-admin namespaces set-clusters -c <clusters> <namespace>, which brokers will do

  • Auto-creation policies must be the same, including broker-level and namespace-level.
  • All existing topics that have the same name between two clusters should have the same partitions, including __change_events.

Configuration

The following configurations will never limit the behavior of replication anymore, since replicators have changed to use an Admin API client to trigger the topic creation.

  • broker.conf -> allowAutoTopicCreation
  • broker.conf -> isAllowAutoUpdateSchemaEnabled
  • namespace level policy: auto-topic-creation

Metrics

Nothing.

Monitoring

Nothing.

Security Considerations

Nothing.

Backward & Forward Compatibility

There's nothing that needs to be done.

Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations

Nothing.

Alternatives

Nothing.

Links