blob: b7a89228b98db8ae3442d862885062ea08e11f2c [file] [log] [blame] [view]
---
{
"title": "Deleting Data with DELETE Command",
"language": "en"
}
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
The DELETE statement conditionally deletes data from a specified table or partition using the MySQL protocol.The Delete operation differs from import-based bulk deletion in that it is similar to the INSERT INTO statement, which is a synchronous process.All Delete operations are a separate import job in Doris.
The DELETE statement generally requires the specification of tables and partitions as well as deletion conditions to filter the data to be deleted, and will delete data from both the base and rollup tables.
The syntax of the DELETE statement is detailed in the [DELETE](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/DELETE) syntax. Unlike the Insert into command, Delete cannot specify `label` manually. For the concept of `label` , refer to the [Insert Into](../../data-operate/import/insert-into-manual) documentation.
### Delete by Specifying a Filter Predicate
```sql
DELETE FROM table_name [table_alias]
[PARTITION partition_name | PARTITIONS (partition_name [, partition_name])]
WHERE column_name op { value | value_list } [ AND column_name op { value | value_list } ...];
```
### Required Parameters
- table_name: Specify the table from which the data should be deleted;
- column_name: Columns belonging to table_name
- op: Logical comparison operators, optional types include: =, >, <, >=, <=, !=, in, not in
- value | value_list: Values or lists of values for logical comparisons
### Optional Parameters
- PARTITION partition_name | PARTITIONS (partition_name [, partition_name]): Specify the name of the partition in which the deletion is to be performed. If the partition does not exist in the table, an error will be reported.
- table_alias: Aliases of the Table
### Note
- When using the table model Aggregate, you can only specify conditions on the key column.
- If the selected key column does not exist in a rollup, it cannot be deleted.
- Conditions can only be related to each other by "and". If you want an "or" relationship, you need to write the conditions in two separate DELETE statements;
- If the table is partitioned, you need to specify the partition. If not, doris will infer the partition from the condition.In two cases, doris cannot infer the partition from the condition:
- The condition does not contain a partition column
- The op for the partition column is "not in". When the partition table does not specify a partition, or a partition cannot be inferred from the condition, you need to set the session variable `delete_without_partition` to true, in which case delete is applied to all partitions.
- This statement may reduce query efficiency for a period of time after execution. The extent of the impact depends on the number of deleted conditions specified in the statement. The more conditions specified, the greater the impact.
### Examples
**1. Delete the row in my_table partition p1 where column k1 is 3.**
```sql
DELETE FROM my_table PARTITION p1
WHERE k1 = 3;
```
**2. Delete rows in my_table partition p1 where column k1 is greater than or equal to 3 and column k2 is "abc".**
```sql
DELETE FROM my_table PARTITION p1
WHERE k1 = 3 AND k2 = "abc";
```
**3. Delete rows in my_table partition (p1, p2) where column k1 is greater than or equal to 3 and column k2 is "abc".**
```sql
DELETE FROM my_table PARTITIONS (p1, p2)
WHERE k1 = 3 AND k2 = "abc";
```
## Delete via the USING clause
```sql
DELETE FROM table_name [table_alias]
[PARTITION partition_name | PARTITIONS (partition_name [, partition_name])]
[USING additional_tables]
WHERE condition
```
### Required parameters
- table_name: Specify the table from which the data should be deleted;
- WHERE condition: Specify a condition for selecting rows for deletion;
### Optional parameters
- PARTITION partition_name | PARTITIONS (partition_name [, partition_name]): Specify the name of the partition in which the deletion is to be performed. If the partition does not exist in the table, an error will be reported.
- table_alias: Aliases of the Table
### Note
- Only conditions on the key column can be specified when using the UNIQUE model.
### Example
Use the result of joining the `t2` and `t3` tables to delete the data in `t1`. The deleted table only supports the UNIQUE model.
```sql
-- Create t1, t2, t3 tables
CREATE TABLE t1
(id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
UNIQUE KEY (id)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1', "function_column.sequence_col" = "c4");
CREATE TABLE t2
(id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1');
CREATE TABLE t3
(id INT)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1');
-- insert data
INSERT INTO t1 VALUES
(1, 1, '1', 1.0, '2000-01-01'),
(2, 2, '2', 2.0, '2000-01-02'),
(3, 3, '3', 3.0, '2000-01-03');
INSERT INTO t2 VALUES
(1, 10, '10', 10.0, '2000-01-10'),
(2, 20, '20', 20.0, '2000-01-20'),
(3, 30, '30', 30.0, '2000-01-30'),
(4, 4, '4', 4.0, '2000-01-04'),
(5, 5, '5', 5.0, '2000-01-05');
INSERT INTO t3 VALUES
(1),
(4),
(5);
-- remove rows from t1
DELETE FROM t1
USING t2 INNER JOIN t3 ON t2.id = t3.id
WHERE t1.id = t2.id;
```
The expected result is that the column with `id=1` in table `t1` is deleted.
```Plain
+----+----+----+--------+------------+
| id | c1 | c2 | c3 | c4 |
+----+----+----+--------+------------+
| 2 | 2 | 2 | 2.0 | 2000-01-02 |
| 3 | 3 | 3 | 3.0 | 2000-01-03 |
+----+----+----+--------+------------+
```
## Returned Results
Delete command is a SQL command that return results synchronously. The results are classified as follows:
### Implementation Success
If Delete completes successfully and is visible, the following results are returned.`Query OK`indicates success.
```sql
mysql> delete from test_tbl PARTITION p1 where k1 = 1;
Query OK, 0 rows affected (0.04 sec)
{'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'VISIBLE', 'txnId':'4005'}
```
### Submitted Successfully but Invisible
Doris transaction commit is divided into two steps: commit and release version, only after the completion of the release version step, the results will be visible to the user.
If the commit has been successful, then it can be assumed that it will eventually be published successfully, Doris will try to wait for a certain period of time after the commit is completed, if the timeout period is exceeded even if the published version is not yet complete, it will be preferred to return to the user, prompting the user that the commit has been completed.
If Delete has been submitted and executed, but the release version is still not published and visible, the following result will be returned:
```sql
mysql> delete from test_tbl PARTITION p1 where k1 = 1;
Query OK, 0 rows affected (0.04 sec)
{'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'COMMITTED', 'txnId':'4005', 'err':'delete job is committed but may be taking effect later' }
```
The result will also return a json string:
- `affected rows`Indicates the rows affected by this deletion. Since Doris deletion is currently a logical deletion, this value is constant at 0;
- `label`The automatically generated label identifies the import job. Each import job has a Label that is unique within a single database;
- `status`Indicates whether the data deletion is visible. If it's visible, the result displays `VISIBLE`; if it's invisible, the result displays `COMMITTED`;
- `txnId`The transaction id corresponding to Delete;
- `err`This field will display the details of Delete.
### Commit Failed, Transaction Cancelled
If the Delete statement fails to commit, the transaction will be automatically aborted by Doris and the following result will be returned:
```sql
mysql> delete from test_tbl partition p1 where k1 > 80;
ERROR 1064 (HY000): errCode = 2, detailMessage = {Cause of error}
```
For example, a timeout deletion will return the timeout time and the outstanding `(tablet=replica)`
```sql
mysql> delete from test_tbl partition p1 where k1 > 80;
ERROR 1064 (HY000): errCode = 2, detailMessage = failed to delete replicas from job: 4005, Unfinished replicas:10000=60000, 10001=60000, 10002=60000
```
### Summary
The correct logic for handling the results returned by Delete is:
- If returns `ERROR 1064 (HY000)` , the deletion failed;
- If returns`Query OK`, the deletion is successful;
- If `status` is `COMMITTED`, it means that the data is still not visible, users can wait for a while and then check the result with `show delete`;
- If `STATUS` is `VISIBLE`, the deletion is successful.
## Configurations
**TIMEOUT Configurations**
- `insert_timeout`
Because delete itself is a SQL command and treated as a special kind of insert, the delete statement also subject to session limitations. Timeout is determined by the `insert_timeout` value in the session, which can be increased in seconds by `SET insert_timeout = xxx`.
**IN Predicate Configuration**
- `max_allowed_in_element_num_of_delete`
If the user needs to occupy more elements when using the in predicate, the user can adjust the maximum number of elements allowed to be carried by `max_allowed_in_element_num_of_delete`. The default value is 1024.
## View History
Users can view the history of deletions that have been performed by using the show delete statement.
### Syntax
```sql
SHOW DELETE [FROM db_name]
```
### Example
```sql
mysql> show delete from test_db;
+-----------+---------------+---------------------+-----------------+----------+
| TableName | PartitionName | CreateTime | DeleteCondition | State |
+-----------+---------------+---------------------+-----------------+----------+
| empty_tbl | p3 | 2020-04-15 23:09:35 | k1 EQ "1" | FINISHED |
| test_tbl | p4 | 2020-04-15 23:09:53 | k1 GT "80" | FINISHED |
+-----------+---------------+---------------------+-----------------+----------+
2 rows in set (0.00 sec)
```