This document explains how to use the UPDATE command to modify data in Doris. The UPDATE command is only applicable to tables with a Unique data model.
Small-scale data updates: Ideal for scenarios where a small amount of data needs to be corrected, such as fixing erroneous fields in certain records or updating the status of specific fields (e.g., order status updates).
ETL batch processing of certain fields: Suitable for large-scale updates of a specific field, commonly seen in ETL processing scenarios. Note: Large-scale data updates should be infrequent.
The query engine uses its own filtering logic to identify the rows that need to be updated. Then, using the Unique model's Value column logic to replace old data with new data. The rows to be updated are modified and reinserted into the table to achieve row-level updates.
The UPDATE syntax in Doris is synchronous, meaning that once the UPDATE statement is successfully executed, the update operation is completed and the data is immediately visible.
The performance of the UPDATE statement is closely related to the number of rows to be updated and the efficiency of the query conditions.
Number of rows to be updated: The more rows that need to be updated, the slower the UPDATE statement will be. For small-scale updates, Doris supports a frequency similar to INSERT INTO statements. For large-scale updates, due to the long execution time, it is only suitable for infrequent calls.
Efficiency of query conditions: The UPDATE implementation first reads the rows that meet the query conditions. Therefore, if the query conditions are efficient, the UPDATE speed will be fast. Ideally, the condition columns should hit the index or partition bucket pruning, so Doris does not need to scan the entire table and can quickly locate the rows that need to be updated, thereby improving update efficiency. It is strongly recommended not to include value columns in the condition columns.
Assume there is a transaction details table with the following structure in a financial risk control scenario:
CREATE TABLE transaction_details ( transaction_id BIGINT NOT NULL, -- Unique transaction ID user_id BIGINT NOT NULL, -- User ID transaction_date DATE NOT NULL, -- Transaction date transaction_time DATETIME NOT NULL, -- Transaction time transaction_amount DECIMAL(18, 2), -- Transaction amount transaction_device STRING, -- Transaction device transaction_region STRING, -- Transaction region average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months recent_transaction_count INT, -- Number of transactions in the last 7 days has_dispute_history BOOLEAN, -- Whether there is a dispute history risk_level STRING -- Risk level ) UNIQUE KEY(transaction_id) DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 PROPERTIES ( "replication_num" = "3", -- Number of replicas, default is 3 "enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge updates );
The following transaction data exists:
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ | transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | +----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ | 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | | 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | | 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | | 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | | 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | +----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
Update the risk level of all daily transaction records according to the following risk control rules:
UPDATE transaction_details SET risk_level = CASE -- Transactions with a dispute history or in high-risk regions WHEN has_dispute_history = TRUE THEN 'high' WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' -- Abnormal transaction amount WHEN transaction_amount > 5 * average_daily_amount THEN 'high' -- High transaction frequency in the last 7 days WHEN recent_transaction_count > 50 THEN 'high' WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' -- Transactions during non-working hours WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' -- Default risk level ELSE 'low' END WHERE transaction_date = '2024-11-24';
The updated data is as follows:
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ | transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | +----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ | 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | | 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | | 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | | 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | | 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | +----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
For more detailed syntax on data updates, please refer to the UPDATE command manual. You can also enter HELP UPDATE in the MySQL client command line for more help.