Background Knowledge

The TableView interface provides a convenient way to access the streaming updatable dataset in a topic by offering a continuously updated key-value map view. The TableView retains the last value of the key which provides you with an almost up-to-date dataset but cannot guarantee you always get the latest data (with the latest written message).

The TableView can be used to establish a local cache of data. Additionally, clients can register consumers with TableView and specify a listener to scan the map and receive notifications whenever new messages are received. This functionality enables event-driven applications and message monitoring.

For more detailed information about the TableView, please refer to the Pulsar documentation.

Motivation

When a TableView is created, it retrieves the position of the latest written message and reads all messages from the beginning up to that fetched position. This ensures that the TableView will include any messages written prior to its creation. However, it does not guarantee that the TableView will include any newly added messages during its creation. Therefore, the value you read from a TableView instance may not be the most recent value, but you will not read an older value once a new value becomes available. It's important to note that this guarantee is not maintained across multiple TableView instances on the same topic. This means that you may receive a newer value from one instance first, and then receive an older value from another instance later. In addition, we have several other components, such as the transaction buffer snapshot and the topic policies service, that employ a similar mechanism to the TableView. This is because the TableView is not available at that time. However, we cannot replace these implementations with a TableView because they involve multiple TableView instances across brokers within the same system topic, and the data read from these TableViews is not guaranteed to be up-to-date. As a result, subsequent writes may occur based on outdated versions of the data. For example, in the transaction buffer snapshot, when a broker owns topics within a namespace, it maintains a TableView containing all the transaction buffer snapshots for those topics. It is crucial to ensure that the owner can read the most recently written transaction buffer snapshot when loading a topic (where the topic name serves as the key for the transaction buffer snapshot message). However, the current capabilities provided by TableView do not guarantee this, especially when ownership of the topic is transferred and the TableView of transaction buffer snapshots in the new owner broker is not up-to-date.

Regarding both the transaction buffer snapshot and topic policies service, updates to a key are only performed by a single writer at a given time until the topic's owner is changed. As a result, it is crucial to ensure that the last written value of this key is read prior to any subsequent writing. By guaranteeing this, all subsequent writes will consistently be based on the most up-to-date value.

The proposal will introduce a new API to refresh the table view with the latest written data on the topic, ensuring that all subsequent reads are based on the refreshed data.

tableView.refresh();
tableView.get(“key”);

After the refresh, it is ensured that all messages written prior to the refresh will be available to be read. However, it should be noted that the inclusion of newly added messages during or after the refresh is not guaranteed.

Goals

In Scope

Providing the capability to refresh the TableView to the last written message of the topic and all the subsequent reads to be conducted using either the refreshed dataset or a dataset that is even more up-to-date than the refreshed one.

Out of Scope

A static perspective of a TableView at a given moment in time Read consistency across multiple TableViews on the same topic

High-Level Design

Provide a new API for TableView to support refreshing the dataset of the TableView to the last written message.

Design & Implementation Details

Public-Facing Changes

Public API

The following changes will be added to the public API of TableView:

refreshAsync()

This new API retrieves the position of the latest written message and reads all messages from the beginning up to that fetched position. This ensures that the TableView will include any messages written prior to its refresh.

/**
*
* Refresh the table view with the latest data in the topic, ensuring that all subsequent reads are based on the refreshed data.
*
* Example usage:
*
* table.refreshAsync().thenApply(__ -> table.get(key));
*
* This function retrieves the last written message in the topic and refreshes the table view accordingly.
* Once the refresh is complete, all subsequent reads will be performed on the refreshed data or a combination of the refreshed
* data and newly published data. The table view remains synchronized with any newly published data after the refresh.
*
* |x:0|->|y:0|->|z:0|->|x:1|->|z:1|->|x:2|->|y:1|->|y:2|
*
* If a read occurs after the refresh (at the last published message |y:2|), it ensures that outdated data like x=1 is not obtained.
* However, it does not guarantee that the values will always be x=2, y=2, z=1, as the table view may receive updates with newly
* published data.
*
* |x:0|->|y:0|->|z:0|->|x:1|->|z:1|->|x:2|->|y:1|->|y:2| -> |y:3|
*
* Both y=2 or y=3 are possible. Therefore, different readers may receive different values, but all values will be equal to or newer
* than the data refreshed from the last call to the refresh method.
*/
CompletableFuture<Void> refreshAsync();

/**
* Refresh the table view with the latest data in the topic, ensuring that all subsequent reads are based on the refreshed data.
*
* @throws PulsarClientException if there is any error refreshing the table view.
*/
void refresh() throws PulsarClientException;


Monitoring

The proposed changes do not introduce any specific monitoring considerations at this time.

Security Considerations

No specific security considerations have been identified for this proposal.

Backward & Forward Compatibility

Revert

No specific revert instructions are required for this proposal.

Upgrade

No specific upgrade instructions are required for this proposal.

Alternatives

Add consistency model policy to TableView

Add new option configuration STRONG_CONSISTENCY_MODEL and EVENTUAL_CONSISTENCY_MODEL in TableViewConfigurationData. • STRONG_CONSISTENCY_MODEL: any method will be blocked until the latest value is retrieved. • EVENTUAL_CONSISTENCY_MODEL: all methods are non-blocking, but the value retrieved might not be the latest at the time point.

However, there might be some drawbacks to this approach:

  1. As read and write operations might happen simultaneously, we cannot guarantee consistency. If we provide a configuration about consistency, it might confuse users.
  2. This operation will block each get operation. We need to add more asynchronous methods.
  3. Less flexibility if users don’t want to refresh the TableView for any reads.

New method for combining the refresh and get

Another option is to add new methods for the existing methods to combine the refresh and reads. For example

CompletableFuture refreshGet(String key);

It will refresh the dataset of the TableView and perform the get operation based on the refreshed dataset. But we need to add 11 new methods to the public APIs of the TableView.

General Notes

No additional general notes have been provided.

Links

  • Mailing List discussion thread:
  • Mailing List voting thread: