docs/content/spark/sql-upsert.md - paimon - Git at Google

 ---
 title: "SQL Upsert"
 weight: 10
 type: docs
 aliases:
 - /spark/sql-upsert.html
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 # SQL Upsert

 For table without primary key, Paimon supports upsert write mode: If the row with the same upsert key already exists, perform update; otherwise, perform insert.

 ## Usage

 Specify the following table properties when creating the table

 * `upsert-key`: Defines the key columns used for upsert, cannot be used together with primary key.
 Unlike primary key, the upsert key value can be `null`, and null-equality matching is supported.
 Multiple columns separated by commas.

 * `sequence.field` (optional): When new record share the same upsert key, the row with the larger `sequence.field` value is kept as the merge result.
 And it will also deduplicate the data being written.
 If `sequence.field` is not set, new record share the same upsert key simply update the existing one and no deduplication is performed.
 Multiple columns separated by commas.

 ## Example

 Create table:

 ```sql
 CREATE TABLE t (k1 INT, k2 INT, ts1 INT, ts2 INT, v STRING)
 TBLPROPERTIES ('upsert-key' = 'k1,k2', 'sequence.field' = 'ts1,ts2')
 ```

 Insert data1:

 ```sql
 INSERT INTO t values
 (null, null, 2, 1, 'v1'),
 (null, null, 2, 2, 'v4'),
 (1, null, 1, 1, 'v1'),
 (1, 2, 1, 1, 'v1'),
 (1, 2, 2, 1, 'v2')
 ```

 Query result:

 ```sql
 SELECT * FROM t ORDER BY k1, k2

 -- null, null, 2, 2, "v4"
 -- 1, null, 1, 1, "v1"
 -- 1, 2, 2, 1, "v2"
 ```

 Insert data2:

 ```sql
 INSERT INTO t values
 (null, null, 2, 1, 'v5'),
 (null, 1, 1, 1, 'v1'),
 (1, null, 2, 1, 'v2'),
 (1, 1, 1, 1, 'v1'),
 (1, 2, 2, 0, 'v3')
 ```

 Query result:

 ```sql
 SELECT * FROM t ORDER BY k1, k2

 -- null, null, 2, 2, "v4"
 -- null, 1, 1, 1, "v1"
 -- 1, null, 2, 1, "v2"
 -- 1, 1, 1, 1, "v1"
 -- 1, 2, 2, 1, "v2"
 ```
	---
	title: "SQL Upsert"
	weight: 10
	type: docs
	aliases:
	- /spark/sql-upsert.html
	---
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# SQL Upsert

	For table without primary key, Paimon supports upsert write mode: If the row with the same upsert key already exists, perform update; otherwise, perform insert.

	## Usage

	Specify the following table properties when creating the table

	* `upsert-key`: Defines the key columns used for upsert, cannot be used together with primary key.
	Unlike primary key, the upsert key value can be `null`, and null-equality matching is supported.
	Multiple columns separated by commas.

	* `sequence.field` (optional): When new record share the same upsert key, the row with the larger `sequence.field` value is kept as the merge result.
	And it will also deduplicate the data being written.
	If `sequence.field` is not set, new record share the same upsert key simply update the existing one and no deduplication is performed.
	Multiple columns separated by commas.

	## Example

	Create table:

	```sql
	CREATE TABLE t (k1 INT, k2 INT, ts1 INT, ts2 INT, v STRING)
	TBLPROPERTIES ('upsert-key' = 'k1,k2', 'sequence.field' = 'ts1,ts2')
	```

	Insert data1:

	```sql
	INSERT INTO t values
	(null, null, 2, 1, 'v1'),
	(null, null, 2, 2, 'v4'),
	(1, null, 1, 1, 'v1'),
	(1, 2, 1, 1, 'v1'),
	(1, 2, 2, 1, 'v2')
	```

	Query result:

	```sql
	SELECT * FROM t ORDER BY k1, k2

	-- null, null, 2, 2, "v4"
	-- 1, null, 1, 1, "v1"
	-- 1, 2, 2, 1, "v2"
	```

	Insert data2:

	```sql
	INSERT INTO t values
	(null, null, 2, 1, 'v5'),
	(null, 1, 1, 1, 'v1'),
	(1, null, 2, 1, 'v2'),
	(1, 1, 1, 1, 'v1'),
	(1, 2, 2, 0, 'v3')
	```

	Query result:

	```sql
	SELECT * FROM t ORDER BY k1, k2

	-- null, null, 2, 2, "v4"
	-- null, 1, 1, 1, "v1"
	-- 1, null, 2, 1, "v2"
	-- 1, 1, 1, 1, "v1"
	-- 1, 2, 2, 1, "v2"
	```