blob: bf04badf7515281377711e7283c33fc485522209 [file] [log] [blame] [view]
---
title: "SQL Functions"
weight: 2
type: docs
aliases:
- /spark/sql-functions.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# SQL Functions
This section introduce all available Paimon Spark functions.
## Built-in Function
### max_pt
`sys.max_pt($table_name)`
It accepts a string type literal to specify the table name and return a max-valid-toplevel partition value.
- **valid**: the partition which contains data files
- **toplevel**: only return the first partition value if the table has multi-partition columns
It would throw exception when:
- the table is not a partitioned table
- the partitioned table does not have partition
- all of the partitions do not contains data files
**Example**
```sql
SELECT sys.max_pt('t');
-- 20250101
SELECT * FROM t where pt = sys.max_pt('t');
-- a, 20250101
```
### path_to_descriptor
`sys.path_to_descriptor($file_path)`
Converts a file path (STRING) to a blob descriptor (BINARY). This function is useful when working with blob data stored in external files. It creates a blob descriptor that references the file at the specified path.
**Arguments:**
- `file_path` (STRING): The path to the external file containing the blob data.
**Returns:**
- A BINARY value representing the serialized blob descriptor.
**Example**
```sql
-- Insert blob data using path_to_descriptor function
INSERT INTO t VALUES ('1', 'paimon', sys.path_to_descriptor('file:///path/to/blob_file'));
-- Insert with partition
INSERT OVERWRITE TABLE t PARTITION(ds='1017', batch='test')
VALUES ('1', 'paimon', '1024', '12345678', '20241017', sys.path_to_descriptor('file:///path/to/blob_file'));
```
### descriptor_to_string
`sys.descriptor_to_string($descriptor)`
Converts a blob descriptor (BINARY) to its string representation (STRING). This function is useful for debugging or displaying the contents of a blob descriptor in a human-readable format.
**Arguments:**
- `descriptor` (BINARY): The blob descriptor bytes to convert.
**Returns:**
- A STRING representation of the blob descriptor.
**Example**
```sql
-- Convert a blob descriptor to string for inspection
SELECT sys.descriptor_to_string(content) FROM t WHERE id = '1';
```
## User-defined Function
Paimon Spark supports two types of user-defined functions: lambda functions and file-based functions.
This feature currently only supports the REST catalog.
### Lambda Function
Empowering users to define functions using Java lambda expressions, enabling inline, concise, and functional-style operations.
**Example**
```sql
-- Create Function
CALL sys.create_function(`function` => 'my_db.area_func',
`inputParams` => '[{"id": 0, "name":"length", "type":"INT"}, {"id": 1, "name":"width", "type":"INT"}]',
`returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]',
`deterministic` => true,
`comment` => 'comment',
`options` => 'k1=v1,k2=v2'
);
-- Alter Function
CALL sys.alter_function(`function` => 'my_db.area_func',
`change` => '{"action" : "addDefinition", "name" : "spark", "definition" : {"type" : "lambda", "definition" : "(Integer length, Integer width) -> { return (long) length * width; }", "language": "JAVA" } }'
);
-- Drop Function
CALL sys.drop_function(`function` => 'my_db.area_func');
```
### File Function
Users can define functions within a file, providing flexibility and modular support for function definition, only supports jar files now.
Currently, supports Spark or Hive implementations of UDFs and UDAFs, see [Spark UDFs](https://spark.apache.org/docs/latest/sql-ref-functions.html#udfs-user-defined-functions)
This feature requires Spark 3.4 or higher.
**Example**
```sql
-- Create Function or Temporary Function (Temporary function should not specify database name)
CREATE [TEMPORARY] FUNCTION <mydb>.simple_udf
AS 'com.example.SimpleUdf'
USING JAR '/tmp/SimpleUdf.jar' [, JAR '/tmp/SimpleUdfR.jar'];
-- Create or Replace Temporary Function (Temporary function should not specify database name)
CREATE OR REPLACE [TEMPORARY] FUNCTION <mydb>.simple_udf
AS 'com.example.SimpleUdf'
USING JAR '/tmp/SimpleUdf.jar';
-- Describe Function
DESCRIBE FUNCTION [EXTENDED] <mydb>.simple_udf;
-- Drop Function
DROP [TEMPORARY] FUNCTION <mydb>.simple_udf;
```