id: txn-how title: How transactions work? sidebar_label: How transactions work?

This section describes transaction components and how the components work together. For the complete design details, see PIP-31: Transactional Streaming.

Key concept

It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.

Transaction coordinator

The transaction coordinator (TC) is a module running inside a Pulsar broker.

  • It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status.

  • It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.

Transaction log

All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.

The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions).

Transaction buffer

Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted.

Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics. After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.

Transaction ID

Transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the ID of the transaction coordinator, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.

Pending acknowledge state

Pending acknowledge state maintains message acknowledgments within a transaction before a transaction completes. If a message is in the pending acknowledge state, the message cannot be acknowledged by other transactions until the message is removed from the pending acknowledge state.

The pending acknowledge state is persisted to the pending acknowledge log (cursor ledger). A new broker can restore the state from the pending acknowledge log to ensure the acknowledgement is not lost.

Data flow

At a high level, the data flow can be split into several steps:

  1. Begin a transaction.

  2. Publish messages with a transaction.

  3. Acknowledge messages with a transaction.

  4. End a transaction.

To help you debug or tune the transaction for better performance, review the following diagrams and descriptions.

1. Begin a transaction

Before introducing the transaction in Pulsar, a producer is created and then messages are sent to brokers and stored in data logs.

Let’s walk through the steps for beginning a transaction.

2. Publish messages with a transaction

In this stage, the Pulsar client enters a transaction loop, repeating the consume-process-produce operation for all the messages that comprise the transaction. This is a long phase and is potentially composed of multiple produce and acknowledgement requests.

Let’s walk through the steps for publishing messages with a transaction.

3. Acknowledge messages with a transaction

In this phase, the Pulsar client sends a request to the transaction coordinator and a new subscription is acknowledged as a part of a transaction.

Let’s walk through the steps for acknowledging messages with a transaction.

4. End a transaction

At the end of a transaction, the Pulsar client decides to commit or abort the transaction. The transaction can be aborted when a conflict is detected on acknowledging messages.

4.1 End transaction request

When the Pulsar client finishes a transaction, it issues an end transaction request.

Let’s walk through the steps for ending the transaction.

4.2 Finalize a transaction

The transaction coordinator starts the process of committing or aborting messages to all the partitions involved in this transaction.

Let’s walk through the steps for finalizing a transaction.

4.3 Mark a transaction as COMMITTED or ABORTED

The transaction coordinator writes the final transaction status to the transaction log to complete the transaction.

Let’s walk through the steps for marking a transaction as COMMITTED or ABORTED.