docs/content/spark/sql-functions.md - paimon - Git at Google

 ---
 title: "SQL Functions"
 weight: 2
 type: docs
 aliases:
 - /spark/sql-functions.html
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 # SQL Functions

 This section introduce all available Paimon Spark functions.

 ## Built-in Function

 ### max_pt

 `sys.max_pt($table_name)`

 It accepts a string type literal to specify the table name and return a max-valid-toplevel partition value.
 - **valid**: the partition which contains data files
 - **toplevel**: only return the first partition value if the table has multi-partition columns

 It would throw exception when:
 - the table is not a partitioned table
 - the partitioned table does not have partition
 - all of the partitions do not contains data files

 **Example**

 ```sql
 SELECT sys.max_pt('t');
 -- 20250101

 SELECT * FROM t where pt = sys.max_pt('t');
 -- a, 20250101
 ```

 ### path_to_descriptor

 `sys.path_to_descriptor($file_path)`

 Converts a file path (STRING) to a blob descriptor (BINARY). This function is useful when working with blob data stored in external files. It creates a blob descriptor that references the file at the specified path.

 **Arguments:**
 - `file_path` (STRING): The path to the external file containing the blob data.

 **Returns:**
 - A BINARY value representing the serialized blob descriptor.

 **Example**

 ```sql
 -- Insert blob data using path_to_descriptor function
 INSERT INTO t VALUES ('1', 'paimon', sys.path_to_descriptor('file:///path/to/blob_file'));

 -- Insert with partition
 INSERT OVERWRITE TABLE t PARTITION(ds='1017', batch='test')
 VALUES ('1', 'paimon', '1024', '12345678', '20241017', sys.path_to_descriptor('file:///path/to/blob_file'));
 ```

 ### descriptor_to_string

 `sys.descriptor_to_string($descriptor)`

 Converts a blob descriptor (BINARY) to its string representation (STRING). This function is useful for debugging or displaying the contents of a blob descriptor in a human-readable format.

 **Arguments:**
 - `descriptor` (BINARY): The blob descriptor bytes to convert.

 **Returns:**
 - A STRING representation of the blob descriptor.

 **Example**

 ```sql
 -- Convert a blob descriptor to string for inspection
 SELECT sys.descriptor_to_string(content) FROM t WHERE id = '1';
 ```

 ## User-defined Function

 Paimon Spark supports two types of user-defined functions: lambda functions and file-based functions.

 This feature currently only supports the REST catalog.

 ### Lambda Function

 Empowering users to define functions using Java lambda expressions, enabling inline, concise, and functional-style operations.

 **Example**

 ```sql
 -- Create Function
 CALL sys.create_function(`function` => 'my_db.area_func',
   `inputParams` => '[{"id": 0, "name":"length", "type":"INT"}, {"id": 1, "name":"width", "type":"INT"}]',
   `returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]',
   `deterministic` => true,
   `comment` => 'comment',
   `options` => 'k1=v1,k2=v2'
 );

 -- Alter Function
 CALL sys.alter_function(`function` => 'my_db.area_func',
   `change` => '{"action" : "addDefinition", "name" : "spark", "definition" : {"type" : "lambda", "definition" : "(Integer length, Integer width) -> { return (long) length * width; }", "language": "JAVA" } }'
 );

 -- Drop Function
 CALL sys.drop_function(`function` => 'my_db.area_func');
 ```

 ### File Function

 Users can define functions within a file, providing flexibility and modular support for function definition, only supports jar files now.

 Currently, supports Spark or Hive implementations of UDFs and UDAFs, see [Spark UDFs](https://spark.apache.org/docs/latest/sql-ref-functions.html#udfs-user-defined-functions)

 This feature requires Spark 3.4 or higher.

 **Example**

 ```sql
 -- Create Function or Temporary Function (Temporary function should not specify database name)
 CREATE [TEMPORARY] FUNCTION <mydb>.simple_udf
 AS 'com.example.SimpleUdf'
 USING JAR '/tmp/SimpleUdf.jar' [, JAR '/tmp/SimpleUdfR.jar'];

 -- Create or Replace Temporary Function (Temporary function should not specify database name)
 CREATE OR REPLACE [TEMPORARY] FUNCTION <mydb>.simple_udf
 AS 'com.example.SimpleUdf'
 USING JAR '/tmp/SimpleUdf.jar';

 -- Describe Function
 DESCRIBE FUNCTION [EXTENDED] <mydb>.simple_udf;

 -- Drop Function
 DROP [TEMPORARY] FUNCTION <mydb>.simple_udf;
 ```
	---
	title: "SQL Functions"
	weight: 2
	type: docs
	aliases:
	- /spark/sql-functions.html
	---
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# SQL Functions

	This section introduce all available Paimon Spark functions.

	## Built-in Function

	### max_pt

	`sys.max_pt($table_name)`

	It accepts a string type literal to specify the table name and return a max-valid-toplevel partition value.
	- valid: the partition which contains data files
	- toplevel: only return the first partition value if the table has multi-partition columns

	It would throw exception when:
	- the table is not a partitioned table
	- the partitioned table does not have partition
	- all of the partitions do not contains data files

	Example

	```sql
	SELECT sys.max_pt('t');
	-- 20250101

	SELECT * FROM t where pt = sys.max_pt('t');
	-- a, 20250101
	```

	### path_to_descriptor

	`sys.path_to_descriptor($file_path)`

	Converts a file path (STRING) to a blob descriptor (BINARY). This function is useful when working with blob data stored in external files. It creates a blob descriptor that references the file at the specified path.

	Arguments:
	- `file_path` (STRING): The path to the external file containing the blob data.

	Returns:
	- A BINARY value representing the serialized blob descriptor.

	Example

	```sql
	-- Insert blob data using path_to_descriptor function
	INSERT INTO t VALUES ('1', 'paimon', sys.path_to_descriptor('file:///path/to/blob_file'));

	-- Insert with partition
	INSERT OVERWRITE TABLE t PARTITION(ds='1017', batch='test')
	VALUES ('1', 'paimon', '1024', '12345678', '20241017', sys.path_to_descriptor('file:///path/to/blob_file'));
	```

	### descriptor_to_string

	`sys.descriptor_to_string($descriptor)`

	Converts a blob descriptor (BINARY) to its string representation (STRING). This function is useful for debugging or displaying the contents of a blob descriptor in a human-readable format.

	Arguments:
	- `descriptor` (BINARY): The blob descriptor bytes to convert.

	Returns:
	- A STRING representation of the blob descriptor.

	Example

	```sql
	-- Convert a blob descriptor to string for inspection
	SELECT sys.descriptor_to_string(content) FROM t WHERE id = '1';
	```

	## User-defined Function

	Paimon Spark supports two types of user-defined functions: lambda functions and file-based functions.

	This feature currently only supports the REST catalog.

	### Lambda Function

	Empowering users to define functions using Java lambda expressions, enabling inline, concise, and functional-style operations.

	Example

	```sql
	-- Create Function
	CALL sys.create_function(`function` => 'my_db.area_func',
	`inputParams` => '[{"id": 0, "name":"length", "type":"INT"}, {"id": 1, "name":"width", "type":"INT"}]',
	`returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]',
	`deterministic` => true,
	`comment` => 'comment',
	`options` => 'k1=v1,k2=v2'
	);

	-- Alter Function
	CALL sys.alter_function(`function` => 'my_db.area_func',
	`change` => '{"action" : "addDefinition", "name" : "spark", "definition" : {"type" : "lambda", "definition" : "(Integer length, Integer width) -> { return (long) length * width; }", "language": "JAVA" } }'
	);

	-- Drop Function
	CALL sys.drop_function(`function` => 'my_db.area_func');
	```

	### File Function

	Users can define functions within a file, providing flexibility and modular support for function definition, only supports jar files now.

	Currently, supports Spark or Hive implementations of UDFs and UDAFs, see [Spark UDFs](https://spark.apache.org/docs/latest/sql-ref-functions.html#udfs-user-defined-functions)

	This feature requires Spark 3.4 or higher.

	Example

	```sql
	-- Create Function or Temporary Function (Temporary function should not specify database name)
	CREATE [TEMPORARY] FUNCTION <mydb>.simple_udf
	AS 'com.example.SimpleUdf'
	USING JAR '/tmp/SimpleUdf.jar' [, JAR '/tmp/SimpleUdfR.jar'];

	-- Create or Replace Temporary Function (Temporary function should not specify database name)
	CREATE OR REPLACE [TEMPORARY] FUNCTION <mydb>.simple_udf
	AS 'com.example.SimpleUdf'
	USING JAR '/tmp/SimpleUdf.jar';

	-- Describe Function
	DESCRIBE FUNCTION [EXTENDED] <mydb>.simple_udf;

	-- Drop Function
	DROP [TEMPORARY] FUNCTION <mydb>.simple_udf;
	```