absReturns the absolute value of a number.
abs(numeric_expression)
acosReturns the arc cosine or inverse cosine of a number.
acos(numeric_expression)
acoshReturns the area hyperbolic cosine or inverse hyperbolic cosine of a number.
acosh(numeric_expression)
asinReturns the arc sine or inverse sine of a number.
asin(numeric_expression)
asinhReturns the area hyperbolic sine or inverse hyperbolic sine of a number.
asinh(numeric_expression)
atanReturns the arc tangent or inverse tangent of a number.
atan(numeric_expression)
atanhReturns the area hyperbolic tangent or inverse hyperbolic tangent of a number.
atanh(numeric_expression)
atan2Returns the arc tangent or inverse tangent of expression_y / expression_x.
atan2(expression_y, expression_x)
cbrtReturns the cube root of a number.
cbrt(numeric_expression)
ceilReturns the nearest integer greater than or equal to a number.
ceil(numeric_expression)
cosReturns the cosine of a number.
cos(numeric_expression)
coshReturns the hyperbolic cosine of a number.
cosh(numeric_expression)
degreesConverts radians to degrees.
degrees(numeric_expression)
expReturns the base-e exponential of a number.
exp(numeric_expression)
factorialFactorial. Returns 1 if value is less than 2.
factorial(numeric_expression)
floorReturns the nearest integer less than or equal to a number.
floor(numeric_expression)
gcdReturns the greatest common divisor of expression_x and expression_y. Returns 0 if both inputs are zero.
gcd(expression_x, expression_y)
isnanReturns true if a given number is +NaN or -NaN otherwise returns false.
isnan(numeric_expression)
iszeroReturns true if a given number is +0.0 or -0.0 otherwise returns false.
iszero(numeric_expression)
lcmReturns the least common multiple of expression_x and expression_y. Returns 0 if either input is zero.
lcm(expression_x, expression_y)
lnReturns the natural logarithm of a number.
ln(numeric_expression)
logReturns the base-x logarithm of a number. Can either provide a specified base, or if omitted then takes the base-10 of a number.
log(base, numeric_expression) log(numeric_expression)
log10Returns the base-10 logarithm of a number.
log10(numeric_expression)
log2Returns the base-2 logarithm of a number.
log2(numeric_expression)
nanvlReturns the first argument if it's not NaN. Returns the second argument otherwise.
nanvl(expression_x, expression_y)
piReturns an approximate value of π.
pi()
powerReturns a base expression raised to the power of an exponent.
power(base, exponent)
powAlias of power.
radiansConverts degrees to radians.
radians(numeric_expression)
randomReturns a random float value in the range [0, 1). The random seed is unique to each row.
random()
roundRounds a number to the nearest integer.
round(numeric_expression[, decimal_places])
signumReturns the sign of a number. Negative numbers return -1. Zero and positive numbers return 1.
signum(numeric_expression)
sinReturns the sine of a number.
sin(numeric_expression)
sinhReturns the hyperbolic sine of a number.
sinh(numeric_expression)
sqrtReturns the square root of a number.
sqrt(numeric_expression)
tanReturns the tangent of a number.
tan(numeric_expression)
tanhReturns the hyperbolic tangent of a number.
tanh(numeric_expression)
truncTruncates a number to a whole number or truncated to the specified decimal places.
trunc(numeric_expression[, decimal_places])
numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
decimal_places: Optional. The number of decimal places to truncate to. Defaults to 0 (truncate to a whole number). If decimal_places is a positive integer, truncates digits to the right of the decimal point. If decimal_places is a negative integer, replaces digits to the left of the decimal point with 0.
coalesceReturns the first of its arguments that is not null. Returns null if all arguments are null. This function is often used to substitute a default value for null values.
coalesce(expression1[, ..., expression_n])
nullifReturns null if expression1 equals expression2; otherwise it returns expression1. This can be used to perform the inverse operation of coalesce.
nullif(expression1, expression2)
asciiReturns the ASCII value of the first character in a string.
ascii(str)
Related functions: chr
bit_lengthReturns the bit length of a string.
bit_length(str)
Related functions: length, octet_length
btrimTrims the specified trim string from the start and end of a string. If no trim string is provided, all whitespace is removed from the start and end of the input string.
btrim(str[, trim_str])
Related functions: ltrim, rtrim, trim
char_lengthAlias of length.
character_lengthAlias of length.
concatConcatenates multiple strings together.
concat(str[, ..., str_n])
Related functions: concat_ws
concat_wsConcatenates multiple strings together with a specified separator.
concat(separator, str[, ..., str_n])
Related functions: concat
chrReturns the character with the specified ASCII or Unicode code value.
chr(expression)
Related functions: ascii
ends_withTests if a string ends with a substring.
ends_with(str, substr)
initcapCapitalizes the first character in each word in the input string. Words are delimited by non-alphanumeric characters.
initcap(str)
Related functions: lower, upper
instrReturns the location where substr first appeared in str (counting from 1). If substr does not appear in str, return 0.
instr(str, substr)
leftReturns a specified number of characters from the left side of a string.
left(str, n)
Related functions: right
lengthReturns the number of characters in a string.
length(str)
Related functions: bit_length, octet_length
lowerConverts a string to lower-case.
lower(str)
Related functions: initcap, upper
lpadPads the left side of a string with another string to a specified string length.
lpad(str, n[, padding_str])
Related functions: rpad
ltrimRemoves leading spaces from a string.
ltrim(str)
Related functions: btrim, rtrim, trim
octet_lengthReturns the length of a string in bytes.
octet_length(str)
Related functions: bit_length, length
repeatReturns a string with an input string repeated a specified number.
repeat(str, n)
replaceReplaces all occurrences of a specified substring in a string with a new substring.
replace(str, substr, replacement)
reverseReverses the character order of a string.
reverse(str)
rightReturns a specified number of characters from the right side of a string.
right(str, n)
Related functions: left
rpadPads the right side of a string with another string to a specified string length.
rpad(str, n[, padding_str])
Related functions: lpad
rtrimRemoves trailing spaces from a string.
rtrim(str)
Related functions: btrim, ltrim, trim
split_partSplits a string based on a specified delimiter and returns the substring in the specified position.
split_part(str, delimiter, pos)
starts_withTests if a string starts with a substring.
starts_with(str, substr)
strposReturns the starting position of a specified substring in a string. Positions begin at 1. If the substring does not exist in the string, the function returns 0.
strpos(str, substr)
substrExtracts a substring of a specified number of characters from a specific starting position in a string.
substr(str, start_pos[, length])
translateTranslates characters in a string to specified translation characters.
translate(str, chars, translation)
to_hexConverts an integer to a hexadecimal string.
to_hex(int)
trimRemoves leading and trailing spaces from a string.
trim(str)
Related functions: btrim, ltrim, rtrim
upperConverts a string to upper-case.
upper(str)
Related functions: initcap, lower
uuidReturns UUID v4 string value which is unique per row.
uuid()
overlayReturns the string which is replaced by another string from the specified position and specified count length. For example, overlay('Txxxxas' placing 'hom' from 2 for 4) → Thomas
overlay(str PLACING substr FROM pos [FOR count])
levenshteinReturns the Levenshtein distance between the two given strings. For example, levenshtein('kitten', 'sitting') = 3
levenshtein(str1, str2)
substr_indexReturns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. For example, substr_index('www.apache.org', '.', 1) = www, substr_index('www.apache.org', '.', -1) = org
substr_index(str, delim, count)
find_in_setReturns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings. For example, find_in_set('b', 'a,b,c,d') = 2
find_in_set(str, strlist)
encodeEncode binary data into a textual representation.
encode(expression, format)
expression: Expression containing string or binary data
format: Supported formats are: base64, hex
Related functions: decode
decodeDecode binary data from textual representation in string.
decode(expression, format)
expression: Expression containing encoded string data
format: Same arguments as encode
Related functions: encode
Apache DataFusion uses a PCRE-like regular expression syntax (minus support for several features including look-around and backreferences). The following regular expression functions are supported:
regexp_likeReturns true if a regular expression has at least one match in a string, false otherwise.
regexp_like(str, regexp[, flags])
select regexp_like('Köln', '[a-zA-Z]ö[a-zA-Z]{2}'); +--------------------------------------------------------+ | regexp_like(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) | +--------------------------------------------------------+ | true | +--------------------------------------------------------+ SELECT regexp_like('aBc', '(b|d)', 'i'); +--------------------------------------------------+ | regexp_like(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) | +--------------------------------------------------+ | true | +--------------------------------------------------+
Additional examples can be found here
regexp_matchReturns a list of regular expression matches in a string.
regexp_match(str, regexp[, flags])
select regexp_match('Köln', '[a-zA-Z]ö[a-zA-Z]{2}'); +---------------------------------------------------------+ | regexp_match(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) | +---------------------------------------------------------+ | [Köln] | +---------------------------------------------------------+ SELECT regexp_match('aBc', '(b|d)', 'i'); +---------------------------------------------------+ | regexp_match(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) | +---------------------------------------------------+ | [B] | +---------------------------------------------------+
Additional examples can be found here
regexp_replaceReplaces substrings in a string that match a regular expression.
regexp_replace(str, regexp, replacement[, flags])
SELECT regexp_replace('foobarbaz', 'b(..)', 'X\\1Y', 'g'); +------------------------------------------------------------------------+ | regexp_replace(Utf8("foobarbaz"),Utf8("b(..)"),Utf8("X\1Y"),Utf8("g")) | +------------------------------------------------------------------------+ | fooXarYXazY | +------------------------------------------------------------------------+ SELECT regexp_replace('aBc', '(b|d)', 'Ab\\1a', 'i'); +-------------------------------------------------------------------+ | regexp_replace(Utf8("aBc"),Utf8("(b|d)"),Utf8("Ab\1a"),Utf8("i")) | +-------------------------------------------------------------------+ | aAbBac | +-------------------------------------------------------------------+
Additional examples can be found here
positionReturns the position of substr in orig_str
position(substr in origstr)
nowReturns the current UTC timestamp.
The now() return value is determined at query time and will return the same timestamp, no matter when in the query plan the function executes.
now()
current_dateReturns the current UTC date.
The current_date() return value is determined at query time and will return the same date, no matter when in the query plan the function executes.
current_date()
todayAlias of current_date.
current_timeReturns the current UTC time.
The current_time() return value is determined at query time and will return the same time, no matter when in the query plan the function executes.
current_time()
date_binCalculates time intervals and returns the start of the interval nearest to the specified timestamp. Use date_bin to downsample time series data by grouping rows into time-based “bins” or “windows” and applying an aggregate or selector function to each window.
For example, if you “bin” or “window” data into 15 minute intervals, an input timestamp of 2023-01-01T18:18:18Z will be updated to the start time of the 15 minute bin it is in: 2023-01-01T18:15:00Z.
date_bin(interval, expression, origin-timestamp)
1970-01-01T00:00:00Z (the UNIX epoch in UTC).The following intervals are supported:
date_truncTruncates a timestamp value to a specified precision.
date_trunc(precision, expression)
precision: Time precision to truncate to. The following precisions are supported:
expression: Time expression to operate on. Can be a constant, column, or function.
datetruncAlias of date_trunc.
date_partReturns the specified part of the date as an integer.
date_part(part, expression)
part: Part of the date to return. The following date parts are supported:
expression: Time expression to operate on. Can be a constant, column, or function.
datepartAlias of date_part.
extractReturns a sub-field from a time value as an integer. Similar to date_part, but with different arguments.
extract(field FROM source)
field: Part or field of the date to return. The following date fields are supported:
source: Source time expression to operate on. Can be a constant, column, or function.
make_dateMake a date from year/month/day component parts.
make_date(year, month, day)
❯ select make_date(2023, 1, 31);
+-------------------------------------------+
| make_date(Int64(2023),Int64(1),Int64(31)) |
+-------------------------------------------+
| 2023-01-31 |
+-------------------------------------------+
❯ select make_date('2023', '01', '31');
+-----------------------------------------------+
| make_date(Utf8("2023"),Utf8("01"),Utf8("31")) |
+-----------------------------------------------+
| 2023-01-31 |
+-----------------------------------------------+
Additional examples can be found here
to_charReturns a string representation of a date, time, timestamp or duration based on a Chrono format. Unlike the PostgreSQL equivalent of this function numerical formatting is not supported.
to_char(expression, format)
❯ ❯ select to_char('2023-03-01'::date, '%d-%m-%Y');
+----------------------------------------------+
| to_char(Utf8("2023-03-01"),Utf8("%d-%m-%Y")) |
+----------------------------------------------+
| 01-03-2023 |
+----------------------------------------------+
Additional examples can be found here
to_timestampConverts a value to a timestamp (YYYY-MM-DDT00:00:00Z). Supports strings, integer, unsigned integer, and double types as input. Strings are parsed as RFC3339 (e.g. ‘2023-07-20T05:44:00’) if no [Chrono formats] are provided. Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
Note: to_timestamp returns Timestamp(Nanosecond). The supported range for integer input is between -9223372037 and 9223372036. Supported range for string input is between 1677-09-21T00:12:44.0 and 2262-04-11T23:47:16.0. Please use to_timestamp_seconds for the input outside of supported bounds.
to_timestamp(expression[, ..., format_n])
❯ select to_timestamp('2023-01-31T09:26:56.123456789-05:00');
+-----------------------------------------------------------+
| to_timestamp(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+-----------------------------------------------------------+
| 2023-01-31T14:26:56.123456789 |
+-----------------------------------------------------------+
❯ select to_timestamp('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+--------------------------------------------------------------------------------------------------------+
| to_timestamp(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+--------------------------------------------------------------------------------------------------------+
| 2023-05-17T03:59:00.123456789 |
+--------------------------------------------------------------------------------------------------------+
Additional examples can be found here
to_timestamp_millisConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. ‘2023-07-20T05:44:00’) if no Chrono formats are provided. Integers and unsigned integers are interpreted as milliseconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
to_timestamp_millis(expression[, ..., format_n])
❯ select to_timestamp_millis('2023-01-31T09:26:56.123456789-05:00');
+------------------------------------------------------------------+
| to_timestamp_millis(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+------------------------------------------------------------------+
| 2023-01-31T14:26:56.123 |
+------------------------------------------------------------------+
❯ select to_timestamp_millis('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+---------------------------------------------------------------------------------------------------------------+
| to_timestamp_millis(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+---------------------------------------------------------------------------------------------------------------+
| 2023-05-17T03:59:00.123 |
+---------------------------------------------------------------------------------------------------------------+
Additional examples can be found here
to_timestamp_microsConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. ‘2023-07-20T05:44:00’) if no Chrono formats are provided. Integers and unsigned integers are interpreted as microseconds since the unix epoch (1970-01-01T00:00:00Z) Returns the corresponding timestamp.
to_timestamp_micros(expression[, ..., format_n])
❯ select to_timestamp_micros('2023-01-31T09:26:56.123456789-05:00');
+------------------------------------------------------------------+
| to_timestamp_micros(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+------------------------------------------------------------------+
| 2023-01-31T14:26:56.123456 |
+------------------------------------------------------------------+
❯ select to_timestamp_micros('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+---------------------------------------------------------------------------------------------------------------+
| to_timestamp_micros(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+---------------------------------------------------------------------------------------------------------------+
| 2023-05-17T03:59:00.123456 |
+---------------------------------------------------------------------------------------------------------------+
Additional examples can be found here
to_timestamp_nanosConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000000000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. ‘2023-07-20T05:44:00’) if no Chrono formats are provided. Integers and unsigned integers are interpreted as nanoseconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
to_timestamp_nanos(expression[, ..., format_n])
❯ select to_timestamp_nanos('2023-01-31T09:26:56.123456789-05:00');
+-----------------------------------------------------------------+
| to_timestamp_nanos(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+-----------------------------------------------------------------+
| 2023-01-31T14:26:56.123456789 |
+-----------------------------------------------------------------+
❯ select to_timestamp_nanos('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+--------------------------------------------------------------------------------------------------------------+
| to_timestamp_nanos(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+--------------------------------------------------------------------------------------------------------------+
| 2023-05-17T03:59:00.123456789 |
+---------------------------------------------------------------------------------------------------------------+
Additional examples can be found here
to_timestamp_secondsConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. ‘2023-07-20T05:44:00’) if no Chrono formats are provided. Integers and unsigned integers are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
to_timestamp_seconds(expression[, ..., format_n])
❯ select to_timestamp_seconds('2023-01-31T09:26:56.123456789-05:00');
+-------------------------------------------------------------------+
| to_timestamp_seconds(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+-------------------------------------------------------------------+
| 2023-01-31T14:26:56 |
+-------------------------------------------------------------------+
❯ select to_timestamp_seconds('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+----------------------------------------------------------------------------------------------------------------+
| to_timestamp_seconds(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+----------------------------------------------------------------------------------------------------------------+
| 2023-05-17T03:59:00 |
+----------------------------------------------------------------------------------------------------------------+
Additional examples can be found here
from_unixtimeConverts an integer to RFC3339 timestamp format (YYYY-MM-DDT00:00:00.000000000Z). Integers and unsigned integers are interpreted as nanoseconds since the unix epoch (1970-01-01T00:00:00Z) return the corresponding timestamp.
from_unixtime(expression)
array_appendAppends an element to the end of an array.
array_append(array, element)
❯ select array_append([1, 2, 3], 4); +--------------------------------------+ | array_append(List([1,2,3]),Int64(4)) | +--------------------------------------+ | [1, 2, 3, 4] | +--------------------------------------+
array_sortSort array.
array_sort(array, desc, nulls_first)
ASC or DESC).NULLS FIRST or NULLS LAST).❯ select array_sort([3, 1, 2]); +-----------------------------+ | array_sort(List([3,1,2])) | +-----------------------------+ | [1, 2, 3] | +-----------------------------+
array_catAlias of array_concat.
array_concatConcatenates arrays.
array_concat(array[, ..., array_n])
❯ select array_concat([1, 2], [3, 4], [5, 6]); +---------------------------------------------------+ | array_concat(List([1,2]),List([3,4]),List([5,6])) | +---------------------------------------------------+ | [1, 2, 3, 4, 5, 6] | +---------------------------------------------------+
array_containsAlias of array_has.
array_hasReturns true if the array contains the element
array_has(array, element)
array_has_allReturns true if all elements of sub-array exist in array
array_has_all(array, sub-array)
array_has_anyReturns true if any elements exist in both arrays
array_has_any(array, sub-array)
array_dimsReturns an array of the array's dimensions.
array_dims(array)
❯ select array_dims([[1, 2, 3], [4, 5, 6]]); +---------------------------------+ | array_dims(List([1,2,3,4,5,6])) | +---------------------------------+ | [2, 3] | +---------------------------------+
array_elementExtracts the element with the index n from the array.
array_element(array, index)
❯ select array_element([1, 2, 3, 4], 3); +-----------------------------------------+ | array_element(List([1,2,3,4]),Int64(3)) | +-----------------------------------------+ | 3 | +-----------------------------------------+
array_extractAlias of array_element.
array_fillReturns an array filled with copies of the given value.
DEPRECATED: use array_repeat instead!
array_fill(element, array)
flattenConverts an array of arrays to a flat array
The flattened array contains all the elements from all source arrays.
flatten(array)
array_indexofAlias of array_position.
array_joinAlias of array_to_string.
array_lengthReturns the length of the array dimension.
array_length(array, dimension)
❯ select array_length([1, 2, 3, 4, 5]); +---------------------------------+ | array_length(List([1,2,3,4,5])) | +---------------------------------+ | 5 | +---------------------------------+
array_ndimsReturns the number of dimensions of the array.
array_ndims(array, element)
❯ select array_ndims([[1, 2, 3], [4, 5, 6]]); +----------------------------------+ | array_ndims(List([1,2,3,4,5,6])) | +----------------------------------+ | 2 | +----------------------------------+
array_prependPrepends an element to the beginning of an array.
array_prepend(element, array)
❯ select array_prepend(1, [2, 3, 4]); +---------------------------------------+ | array_prepend(Int64(1),List([2,3,4])) | +---------------------------------------+ | [1, 2, 3, 4] | +---------------------------------------+
array_pop_frontReturns the array without the first element.
array_pop_front(array)
❯ select array_pop_front([1, 2, 3]); +-------------------------------+ | array_pop_front(List([1,2,3])) | +-------------------------------+ | [2, 3] | +-------------------------------+
array_pop_backReturns the array without the last element.
array_pop_back(array)
❯ select array_pop_back([1, 2, 3]); +-------------------------------+ | array_pop_back(List([1,2,3])) | +-------------------------------+ | [1, 2] | +-------------------------------+
array_positionReturns the position of the first occurrence of the specified element in the array.
array_position(array, element) array_position(array, element, index)
❯ select array_position([1, 2, 2, 3, 1, 4], 2); +----------------------------------------------+ | array_position(List([1,2,2,3,1,4]),Int64(2)) | +----------------------------------------------+ | 2 | +----------------------------------------------+
array_positionsSearches for an element in the array, returns all occurrences.
array_positions(array, element)
❯ select array_positions([1, 2, 2, 3, 1, 4], 2); +-----------------------------------------------+ | array_positions(List([1,2,2,3,1,4]),Int64(2)) | +-----------------------------------------------+ | [2, 3] | +-----------------------------------------------+
array_push_backAlias of array_append.
array_push_frontAlias of array_prepend.
array_repeatReturns an array containing element count times.
array_repeat(element, count)
❯ select array_repeat(1, 3); +---------------------------------+ | array_repeat(Int64(1),Int64(3)) | +---------------------------------+ | [1, 1, 1] | +---------------------------------+
❯ select array_repeat([1, 2], 2); +------------------------------------+ | array_repeat(List([1,2]),Int64(2)) | +------------------------------------+ | [[1, 2], [1, 2]] | +------------------------------------+
array_removeRemoves the first element from the array equal to the given value.
array_remove(array, element)
❯ select array_remove([1, 2, 2, 3, 2, 1, 4], 2); +----------------------------------------------+ | array_remove(List([1,2,2,3,2,1,4]),Int64(2)) | +----------------------------------------------+ | [1, 2, 3, 2, 1, 4] | +----------------------------------------------+
array_remove_nRemoves the first max elements from the array equal to the given value.
array_remove_n(array, element, max)
❯ select array_remove_n([1, 2, 2, 3, 2, 1, 4], 2, 2); +---------------------------------------------------------+ | array_remove_n(List([1,2,2,3,2,1,4]),Int64(2),Int64(2)) | +---------------------------------------------------------+ | [1, 3, 2, 1, 4] | +---------------------------------------------------------+
array_remove_allRemoves all elements from the array equal to the given value.
array_remove_all(array, element)
❯ select array_remove_all([1, 2, 2, 3, 2, 1, 4], 2); +--------------------------------------------------+ | array_remove_all(List([1,2,2,3,2,1,4]),Int64(2)) | +--------------------------------------------------+ | [1, 3, 1, 4] | +--------------------------------------------------+
array_replaceReplaces the first occurrence of the specified element with another specified element.
array_replace(array, from, to)
❯ select array_replace([1, 2, 2, 3, 2, 1, 4], 2, 5); +--------------------------------------------------------+ | array_replace(List([1,2,2,3,2,1,4]),Int64(2),Int64(5)) | +--------------------------------------------------------+ | [1, 5, 2, 3, 2, 1, 4] | +--------------------------------------------------------+
array_replace_nReplaces the first max occurrences of the specified element with another specified element.
array_replace_n(array, from, to, max)
❯ select array_replace_n([1, 2, 2, 3, 2, 1, 4], 2, 5, 2); +-------------------------------------------------------------------+ | array_replace_n(List([1,2,2,3,2,1,4]),Int64(2),Int64(5),Int64(2)) | +-------------------------------------------------------------------+ | [1, 5, 5, 3, 2, 1, 4] | +-------------------------------------------------------------------+
array_replace_allReplaces all occurrences of the specified element with another specified element.
array_replace_all(array, from, to)
❯ select array_replace_all([1, 2, 2, 3, 2, 1, 4], 2, 5); +------------------------------------------------------------+ | array_replace_all(List([1,2,2,3,2,1,4]),Int64(2),Int64(5)) | +------------------------------------------------------------+ | [1, 5, 5, 3, 5, 1, 4] | +------------------------------------------------------------+
array_reverseReturns the array with the order of the elements reversed.
array_reverse(array)
❯ select array_reverse([1, 2, 3, 4]); +------------------------------------------------------------+ | array_reverse(List([1, 2, 3, 4])) | +------------------------------------------------------------+ | [4, 3, 2, 1] | +------------------------------------------------------------+
array_sliceReturns a slice of the array based on 1-indexed start and end positions.
array_slice(array, begin, end)
❯ select array_slice([1, 2, 3, 4, 5, 6, 7, 8], 3, 6); +--------------------------------------------------------+ | array_slice(List([1,2,3,4,5,6,7,8]),Int64(3),Int64(6)) | +--------------------------------------------------------+ | [3, 4, 5, 6] | +--------------------------------------------------------+
array_to_stringConverts each element to its text representation.
array_to_string(array, delimiter)
❯ select array_to_string([[1, 2, 3, 4], [5, 6, 7, 8]], ',');
+----------------------------------------------------+
| array_to_string(List([1,2,3,4,5,6,7,8]),Utf8(",")) |
+----------------------------------------------------+
| 1,2,3,4,5,6,7,8 |
+----------------------------------------------------+
array_unionReturns an array of elements that are present in both arrays (all elements from both arrays) with out duplicates.
array_union(array1, array2)
❯ select array_union([1, 2, 3, 4], [5, 6, 3, 4]); +----------------------------------------------------+ | array_union([1, 2, 3, 4], [5, 6, 3, 4]); | +----------------------------------------------------+ | [1, 2, 3, 4, 5, 6] | +----------------------------------------------------+ ❯ select array_union([1, 2, 3, 4], [5, 6, 7, 8]); +----------------------------------------------------+ | array_union([1, 2, 3, 4], [5, 6, 7, 8]); | +----------------------------------------------------+ | [1, 2, 3, 4, 5, 6, 7, 8] | +----------------------------------------------------+
array_exceptReturns an array of the elements that appear in the first array but not in the second.
array_except(array1, array2)
❯ select array_except([1, 2, 3, 4], [5, 6, 3, 4]); +----------------------------------------------------+ | array_except([1, 2, 3, 4], [5, 6, 3, 4]); | +----------------------------------------------------+ | [1, 2] | +----------------------------------------------------+ ❯ select array_except([1, 2, 3, 4], [3, 4, 5, 6]); +----------------------------------------------------+ | array_except([1, 2, 3, 4], [3, 4, 5, 6]); | +----------------------------------------------------+ | [1, 2] | +----------------------------------------------------+
cardinalityReturns the total number of elements in the array.
cardinality(array)
❯ select cardinality([[1, 2, 3, 4], [5, 6, 7, 8]]); +--------------------------------------+ | cardinality(List([1,2,3,4,5,6,7,8])) | +--------------------------------------+ | 8 | +--------------------------------------+
emptyReturns 1 for an empty array or 0 for a non-empty array.
empty(array)
❯ select empty([1]); +------------------+ | empty(List([1])) | +------------------+ | 0 | +------------------+
list_appendAlias of array_append.
list_sortAlias of array_sort.
list_catAlias of array_concat.
list_concatAlias of array_concat.
list_dimsAlias of array_dims.
list_elementAlias of array_element.
list_extractAlias of array_element.
list_hasAlias of array_has.
list_has_allAlias of array_has_all.
list_has_anyAlias of array_has_any.
list_indexofAlias of array_position.
list_joinAlias of array_to_string.
list_lengthAlias of array_length.
list_ndimsAlias of array_ndims.
list_prependAlias of array_prepend.
list_pop_backAlias of array_pop_back.
list_pop_frontAlias of array_pop_front.
list_positionAlias of array_position.
list_positionsAlias of array_positions.
list_push_backAlias of array_append.
list_push_frontAlias of array_prepend.
list_repeatAlias of array_repeat.
list_removeAlias of array_remove.
list_remove_nAlias of array_remove_n.
list_remove_allAlias of array_remove_all.
list_replaceAlias of array_replace.
list_replace_nAlias of array_replace_n.
list_replace_allAlias of array_replace_all.
list_reverseAlias of array_reverse.
list_sliceAlias of array_slice.
list_to_stringAlias of list_to_string.
make_arrayReturns an Arrow array using the specified input expressions.
make_array(expression1[, ..., expression_n])
❯ select make_array(1, 2, 3, 4, 5); +----------------------------------------------------------+ | make_array(Int64(1),Int64(2),Int64(3),Int64(4),Int64(5)) | +----------------------------------------------------------+ | [1, 2, 3, 4, 5] | +----------------------------------------------------------+
make_listAlias of make_array.
string_to_arraySplits a string in to an array of substrings based on a delimiter. Any substrings matching the optional null_str argument are replaced with NULL.
starts_with(str, delimiter[, null_str])
NULLstring_to_listAlias of string_to_array.
trim_arrayRemoves the last n elements from the array.
DEPRECATED: use array_slice instead!
trim_array(array, n)
rangeReturns an Arrow array between start and stop with step. SELECT range(2, 10, 3) -> [2, 5, 8]
The range start..end contains all values with start <= x < end. It is empty if start >= end.
Step can not be 0 (then the range will be nonsense.).
structReturns an Arrow struct using the specified input expressions. Fields in the returned struct use the cN naming convention. For example: c0, c1, c2, etc.
struct(expression1[, ..., expression_n])
For example, this query converts two columns a and b to a single column with a struct type of fields c0 and c1:
select * from t;
+---+---+
| a | b |
+---+---+
| 1 | 2 |
| 3 | 4 |
+---+---+
select struct(a, b) from t;
+-----------------+
| struct(t.a,t.b) |
+-----------------+
| {c0: 1, c1: 2} |
| {c0: 3, c1: 4} |
+-----------------+
digestComputes the binary hash of an expression using the specified algorithm.
digest(expression, algorithm)
expression: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
algorithm: String expression specifying algorithm to use. Must be one of:
md5Computes an MD5 128-bit checksum for a string expression.
md5(expression)
sha224Computes the SHA-224 hash of a binary string.
sha224(expression)
sha256Computes the SHA-256 hash of a binary string.
sha256(expression)
sha384Computes the SHA-384 hash of a binary string.
sha384(expression)
sha512Computes the SHA-512 hash of a binary string.
sha512(expression)
arrow_castCasts a value to a specific Arrow data type:
arrow_cast(expression, datatype)
arrow_typeof]❯ select arrow_cast(-5, 'Int8') as a,
arrow_cast('foo', 'Dictionary(Int32, Utf8)') as b,
arrow_cast('bar', 'LargeUtf8') as c,
arrow_cast('2023-01-02T12:53:02', 'Timestamp(Microsecond, Some("+08:00"))') as d
;
+----+-----+-----+---------------------------+
| a | b | c | d |
+----+-----+-----+---------------------------+
| -5 | foo | bar | 2023-01-02T12:53:02+08:00 |
+----+-----+-----+---------------------------+
1 row in set. Query took 0.001 seconds.
arrow_typeofReturns the name of the underlying Arrow data type of the expression:
arrow_typeof(expression)
❯ select arrow_typeof('foo'), arrow_typeof(1);
+---------------------------+------------------------+
| arrow_typeof(Utf8("foo")) | arrow_typeof(Int64(1)) |
+---------------------------+------------------------+
| Utf8 | Int64 |
+---------------------------+------------------------+
1 row in set. Query took 0.001 seconds.