title: “Row Tracking” weight: 5 type: docs aliases:
Row tracking allows Paimon to track row-level tracking in a Paimon append table. Once enabled on a Paimon table, two more hidden columns will be added to the table schema:
_ROW_ID: BIGINT, this is a unique identifier for each row in the table. It is used to track the update of the row and can be used to identify the row in case of update, merge into or delete._SEQUENCE_NUMBER: BIGINT, this is field indicates which version of this record is. It actually is the snapshot-id of the snapshot that this row belongs to. It is used to track the update of the row version.Hidden columns follows the following rules:
_ROW_ID and _SEQUENCE_NUMBER will be NOT NULL._ROW_ID column should be copied to the target file. The _SEQUENCE_NUMBER field should be set to NULL if the record is changed, otherwise, copy it too._ROW_ID and _SEQUENCE_NUMBER from the data file, then we read the value columns from the data file. If they found NULL, we read from DataFileMeta to fall back to the lazy assigned values. Anyway, it has no way to be NULL.To enable row-tracking, you must config row-tracking.enabled to true in the table options when creating an append table. Consider an example via Flink SQL:
CREATE TABLE part_t ( f0 INT, f1 STRING, dt STRING ) PARTITIONED BY (dt) WITH ('row-tracking.enabled' = 'true');
Notice that:
bucket and bucket-key for the table.After creating a row-tracking table, you can insert data into it as usual. The _ROW_ID and _SEQUENCE_NUMBER columns will be automatically managed by Paimon.
CREATE TABLE t (id INT, data STRING) TBLPROPERTIES ('row-tracking.enabled' = 'true'); INSERT INTO t VALUES (11, 'a'), (22, 'b')
You can select the row tracking meta column with the following sql in spark:
SELECT id, data, _ROW_ID, _SEQUENCE_NUMBER FROM t;
You will get the following result:
+---+----+-------+----------------+ | id|data|_ROW_ID|_SEQUENCE_NUMBER| +---+----+-------+----------------+ | 11| a| 0| 1| | 22| b| 1| 1| +---+----+-------+----------------+
Then you can update and query the table again:
UPDATE t SET data = 'new-data-update' WHERE id = 11; -- Alternatively, update using the hidden row id `_ROW_ID` UPDATE t SET data = 'new-data-update' WHERE _ROW_ID = 0; SELECT id, data, _ROW_ID, _SEQUENCE_NUMBER FROM t;
You will get:
+---+---------------+-------+----------------+ | id| data|_ROW_ID|_SEQUENCE_NUMBER| +---+---------------+-------+----------------+ | 22| b| 1| 1| | 11|new-data-update| 0| 2| +---+---------------+-------+----------------+
You can also merge into the table, suppose you have a source table s that contains (22, ‘new-data-merge’) and (33, ‘c’):
MERGE INTO t USING s ON t.id = s.id WHEN MATCHED THEN UPDATE SET t.data = s.data WHEN NOT MATCHED THEN INSERT *;
You will get:
+---+---------------+-------+----------------+ | id| data|_ROW_ID|_SEQUENCE_NUMBER| +---+---------------+-------+----------------+ | 11|new-data-update| 0| 2| | 22| new-data-merge| 1| 3| | 33| c| 2| 3| +---+---------------+-------+----------------+
You can also delete from the table:
DELETE FROM t WHERE id = 11; -- Alternatively, delete using the hidden row id `_ROW_ID` DELETE FROM t WHERE _ROW_ID = 0;
You will get:
+---+---------------+-------+----------------+ | id| data|_ROW_ID|_SEQUENCE_NUMBER| +---+---------------+-------+----------------+ | 22| new-data-merge| 1| 3| | 33| c| 2| 3| +---+---------------+-------+----------------+