blob: 1dd3e912c60a92e3a953bd72e5c9d09d71649d39 [file] [log] [blame]
 AsterixDB – Builtin Functions

Builtin Functions

The system provides various classes of functions to support operations on numeric, string, spatial, and temporal data. This document explains how to use these functions.

Numeric Functions

abs

• Syntax:

abs(numeric_value)

• Computes the absolute value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• The absolute value of the argument with the same type as the input argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": abs(2013), "v2": abs(-4036), "v3": abs(0), "v4": abs(float("-2013.5")), "v5": abs(double("-2013.593823748327284")) };

• The expected result is:

{ "v1": 2013, "v2": 4036, "v3": 0, "v4": 2013.5, "v5": 2013.5938237483274 }

acos

• Syntax:

acos(numeric_value)

• Computes the arc cosine value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double arc cosine in radians for the argument, if the argument is in the range of -1 (inclusive) to 1 (inclusive),
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error,
• “NaN” for other legitimate numeric values.
• Example:

{ "v1": acos(1), "v2": acos(2), "v3": acos(0), "v4": acos(float("0.5")), "v5": acos(double("-0.5")) };

• The expected result is:

{ "v1": 0.0, "v2": "NaN", "v3": 1.5707963267948966, "v4": 1.0471975511965979, "v5": 2.0943951023931957 }

asin

• Syntax:

asin(numeric_value)

• Computes the arc sine value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double arc sin in radians for the argument, if the argument is in the range of -1 (inclusive) to 1 (inclusive),
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error,
• “NaN” for other legitimate numeric values.
• Example:

{ "v1": asin(1), "v2": asin(2), "v3": asin(0), "v4": asin(float("0.5")), "v5": asin(double("-0.5")) };

• The expected result is:

{ "v1": 1.5707963267948966, "v2": "NaN", "v3": 0.0, "v4": 0.5235987755982989, "v5": -0.5235987755982989 }

atan

• Syntax:

atan(numeric_value)

• Computes the arc tangent value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double arc tangent in radians for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": atan(1), "v2": atan(2), "v3": atan(0), "v4": atan(float("0.5")), "v5": atan(double("1000")) };

• The expected result is:

{ "v1": 0.7853981633974483, "v2": 1.1071487177940904, "v3": 0.0, "v4": 0.4636476090008061, "v5": 1.5697963271282298 }

atan2

• Syntax:

atan2(numeric_value1, numeric_value2)

• Computes the arc tangent value of numeric_value2/numeric_value1.

• Arguments:
• numeric_value1: a tinyint/smallint/integer/bigint/float/double value,
• numeric_value2: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double arc tangent in radians for numeric_value1 and numeric_value2,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": atan2(1, 2), "v2": atan2(0, 4), "v3": atan2(float("0.5"), double("-0.5")) };

• The expected result is:

{ "v1": 0.4636476090008061, "v2": 0.0, "v3": 2.356194490192345 }

ceil

• Syntax:

ceil(numeric_value)

• Computes the smallest (closest to negative infinity) number with no fractional part that is not less than the value of the argument. If the argument is already equal to mathematical integer, then the result is the same as the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• The ceiling value for the given number in the same type as the input argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{
"v1": ceil(2013),
"v2": ceil(-4036),
"v3": ceil(0.3),
"v4": ceil(float("-2013.2")),
"v5": ceil(double("-2013.893823748327284"))
};

• The expected result is:

{ "v1": 2013, "v2": -4036, "v3": 1.0, "v4": -2013.0, "v5": -2013.0 }

cos

• Syntax:

cos(numeric_value)

• Computes the cosine value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double cosine value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": cos(1), "v2": cos(2), "v3": cos(0), "v4": cos(float("0.5")), "v5": cos(double("1000")) };

• The expected result is:

{ "v1": 0.5403023058681398, "v2": -0.4161468365471424, "v3": 1.0, "v4": 0.8775825618903728, "v5": 0.562379076290703 }

cosh

• Syntax:

cosh(numeric_value)

• Computes the hyperbolic cosine value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double hyperbolic cosine value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": cosh(1), "v2": cosh(2), "v3": cosh(0), "v4": cosh(float("0.5")), "v5": cosh(double("8")) };

• The expected result is:

{ "v1": 1.5430806348152437, "v2": 3.7621956910836314, "v3": 1.0, "v4": 1.1276259652063807, "v5": 1490.479161252178 }

degrees

• Syntax:

degrees(numeric_value)

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• The degrees value for the given radians value. The returned value has type double,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": degrees(pi()) };

• The expected result is:

{ "v1": 180.0 }

e

• Syntax:

e()

• Return Value:

• e (base of the natural logarithm)
• Example:

{ "v1": e() };

• The expected result is:

{ "v1": 2.718281828459045 }

exp

• Syntax:

exp(numeric_value)

• Computes enumeric_value.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• enumeric_value,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": exp(1), "v2": exp(2), "v3": exp(0), "v4": exp(float("0.5")), "v5": exp(double("1000")) };

• The expected result is:

{ "v1": 2.718281828459045, "v2": 7.38905609893065, "v3": 1.0, "v4": 1.6487212707001282, "v5": "Infinity" }

floor

• Syntax:

floor(numeric_value)

• Computes the largest (closest to positive infinity) number with no fractional part that is not greater than the value. If the argument is already equal to mathematical integer, then the result is the same as the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• The floor value for the given number in the same type as the input argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{
"v1": floor(2013),
"v2": floor(-4036),
"v3": floor(0.8),
"v4": floor(float("-2013.2")),
"v5": floor(double("-2013.893823748327284"))
};

• The expected result is:

{ "v1": 2013, "v2": -4036, "v3": 0.0, "v4": -2014.0, "v5": -2014.0 }

ln

• Syntax:

ln(numeric_value)

• Computes logenumeric_value.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• logenumeric_value,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": ln(1), "v2": ln(2), "v3": ln(0), "v4": ln(float("0.5")), "v5": ln(double("1000")) };

• The expected result is:

{ "v1": 0.0, "v2": 0.6931471805599453, "v3": "-Infinity", "v4": -0.6931471805599453, "v5": 6.907755278982137 }

log

• Syntax:

log(numeric_value)

• Computes log10numeric_value.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• log10numeric_value,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": log(1), "v2": log(2), "v3": log(0), "v4": log(float("0.5")), "v5": log(double("1000")) };

• The expected result is:

{ "v1": 0.0, "v2": 0.3010299956639812, "v3": "-Infinity", "v4": -0.3010299956639812, "v5": 3.0 }

pi

• Syntax:

pi()

• Return Value:

• Pi
• Example:

{ "v1": pi() };

• The expected result is:

{ "v1": 3.141592653589793 }

power

• Syntax:

power(numeric_value1, numeric_value2)

• Computes numeric_value1numeric_value2.

• Arguments:
• numeric_value1: a tinyint/smallint/integer/bigint/float/double value,
• numeric_value2: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• numeric_value1numeric_value2,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": power(1, 2), "v3": power(0, 4), "v4": power(float("0.5"), double("-0.5")) };

• The expected result is:

{ "v1": 1, "v3": 0, "v4": 1.4142135623730951 }

• Syntax:

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• The radians value for the given degrees value. The returned value has type double,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

• The expected result is:

{ "v1": 3.141592653589793 }

round

• Syntax:

round(numeric_value[, round_digit])

• Rounds the value to the given number of integer digits to the right of the decimal point, or to the left of the decimal point if the number of digits is negative.

• Arguments:

• numeric_value: a tinyint/smallint/integer/bigint/float/double value that represents the numeric value to be rounded.
• round_digit: (Optional) a tinyint/smallint/integer/bigint/float/double value that specifies the digit to round to. This argument may be positive or negative; positive indicating that rounding needs to be to the right of the decimal point, and negative indicating that rounding needs to be to the left of the decimal point. Values such as 1.0 and 2.0 are acceptable, but values such as 1.3 and 1.5 result in a null. If omitted, the default is 0.
• Return Value:
• The rounded value for the given number. The returned value has the following type:
• bigint if the input value has type tinyint, smallint, integer or bigint,
• float if the input value has type float,
• double if the input value has type double;
• missing if the input value is a missing value,
• null if the input value is a null value,
• any other non-numeric input value will return a null value.
• Example:

{
"v1": round(2013),
"v2": round(-4036),
"v3": round(0.8),
"v4": round(float("-2013.256")),
"v5": round(double("-2013.893823748327284"))
"v6": round(123456, -1),
"v7": round(456.456, 2),
"v8": round(456.456, -1),
"v9": round(-456.456, -2)
};

• The expected result is:

{ "v1": 2013, "v2": -4036, "v3": 1.0, "v4": -2013.0, "v5": -2014.0, "v6": 123460, "v7": 456.46, "v8": 460, "v9": -500 }

sign

• Syntax:

sign(numeric_value)

• Computes the sign of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the sign (a tinyint) of the argument, -1 for negative values, 0 for 0, and 1 for positive values,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": sign(1), "v2": sign(2), "v3": sign(0), "v4": sign(float("0.5")), "v5": sign(double("-1000")) };

• The expected result is:

{ "v1": 1, "v2": 1, "v3": 0, "v4": 1, "v5": -1 }

sin

• Syntax:

sin(numeric_value)

• Computes the sine value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double sine value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": sin(1), "v2": sin(2), "v3": sin(0), "v4": sin(float("0.5")), "v5": sin(double("1000")) };

• The expected result is:

{ "v1": 0.8414709848078965, "v2": 0.9092974268256817, "v3": 0.0, "v4": 0.479425538604203, "v5": 0.8268795405320025 }

sinh

• Syntax:

sinh(numeric_value)

• Computes the hyperbolic sine value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double hyperbolic sine value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": sinh(1), "v2": sinh(2), "v3": sinh(0), "v4": sinh(float("0.5")), "v5": sinh(double("8")) };

• The expected result is:

{ "v1": 1.1752011936438014, "v2": 3.626860407847019, "v3": 0.0, "v4": 0.5210953054937474, "v5": 1490.4788257895502 }

sqrt

• Syntax:

sqrt(numeric_value)

• Computes the square root of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double square root value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": sqrt(1), "v2": sqrt(2), "v3": sqrt(0), "v4": sqrt(float("0.5")), "v5": sqrt(double("1000")) };

• The expected result is:

{ "v1": 1.0, "v2": 1.4142135623730951, "v3": 0.0, "v4": 0.7071067811865476, "v5": 31.622776601683793 }

tan

• Syntax:

tan(numeric_value)

• Computes the tangent value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double tangent value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": tan(1), "v2": tan(2), "v3": tan(0), "v4": tan(float("0.5")), "v5": tan(double("1000")) };

• The expected result is:

{ "v1": 1.5574077246549023, "v2": -2.185039863261519, "v3": 0.0, "v4": 0.5463024898437905, "v5": 1.4703241557027185 }

tanh

• Syntax:

tanh(numeric_value)

• Computes the hyperbolic tangent value of the argument.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• Return Value:
• the double hyperbolic tangent value for the argument,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-numeric input value will cause a type error.
• Example:

{ "v1": tanh(1), "v2": tanh(2), "v3": tanh(0), "v4": tanh(float("0.5")), "v5": tanh(double("8")) };

• The expected result is:

{ "v1": 0.7615941559557649, "v2": 0.964027580075817, "v3": 0.0, "v4": 0.4621171572600098, "v5": 0.999999774929676 }

trunc

• Syntax:

trunc(numeric_value, number_digits)

• Truncates the number to the given number of integer digits to the right of the decimal point (left if digits is negative). Digits is 0 if not given.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value,
• number_digits: a tinyint/smallint/integer/bigint value.
• Return Value:
• the double tangent value for the argument,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is missing,
• a type error will be raised if:
• the first argument is any other non-numeric value,
• the second argument is any other non-tinyint, non-smallint, non-integer, and non-bigint value.
• Example:

{ "v1": trunc(1, 1), "v2": trunc(2, -2), "v3": trunc(0.122, 2), "v4": trunc(float("11.52"), -1), "v5": trunc(double("1000.5252"), 3) };

• The expected result is:

{ "v1": 1, "v2": 2, "v3": 0.12, "v4": 10.0, "v5": 1000.525 }

round_half_to_even

• Syntax:

round_half_to_even(numeric_value, [precision])

• Computes the closest numeric value to numeric_value that is a multiple of ten to the power of minus precision. precision is optional and by default value 0 is used.

• Arguments:
• numeric_value: a tinyint/smallint/integer/bigint/float/double value.
• precision: an optional tinyint/smallint/integer/bigint field representing the number of digits in the fraction of the the result
• Return Value:
• The rounded value for the given number in the same type as the input argument,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• a type error will be raised if:
• the first argument is any other non-numeric value,
• or, the second argument is any other non-tinyint, non-smallint, non-integer, or non-bigint value.
• Example:

{
"v1": round_half_to_even(2013),
"v2": round_half_to_even(-4036),
"v3": round_half_to_even(0.8),
"v4": round_half_to_even(float("-2013.256")),
"v5": round_half_to_even(double("-2013.893823748327284")),
"v6": round_half_to_even(double("-2013.893823748327284"), 2),
"v7": round_half_to_even(2013, 4),
"v8": round_half_to_even(float("-2013.256"), 5)
};

• The expected result is:

{ "v1": 2013, "v2": -4036, "v3": 1.0, "v4": -2013.0, "v5": -2014.0, "v6": -2013.89, "v7": 2013, "v8": -2013.256 }

String Functions

concat

• Syntax:

concat(string1, string2, ...)

• Returns a concatenated string from arguments.

• Arguments:
• string1: a string value,
• string2: a string value,
• ….
• Return Value:
• a concatenated string from arguments,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Example:

concat("test ", "driven ", "development");

• The expected result is:

"test driven development"

contains

• Syntax:

contains(string, substring_to_contain)

• Checks whether the string string contains the string substring_to_contain

• Arguments:
• string : a string that might contain the given substring,
• substring_to_contain : a target string that might be contained.
• Return Value:
• a boolean value, true if string contains substring_to_contain,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error,
• false otherwise.
• Note: an n_gram index can be utilized for this function.

• Example:
{ "v1": contains("I like x-phone", "phone"), "v2": contains("one", "phone") };

• The expected result is:

{ "v1": true, "v2": false }

ends_with

• Syntax:

ends_with(string, substring_to_end_with)

• Checks whether the string string ends with the string substring_to_end_with.

• Arguments:
• string : a string that might end with the given string,
• substring_to_end_with : a string that might be contained as the ending substring.
• Return Value:
• a boolean value, true if string contains substring_to_contain,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error,
• false otherwise.
• Example:

{
"v1": ends_with(" love product-b its shortcut_menu is awesome:)", ":)"),
"v2": ends_with(" awsome:)", ":-)")
};

• The expected result is:

{ "v1": true, "v2": false }

initcap (or title)

• Syntax:

initcap(string)

• Converts a given string string so that the first letter of each word is uppercase and every other letter is lowercase. The function has an alias called “title”.

• Arguments:
• string : a string to be converted.
• Return Value:
• a string as the title form of the given string,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-string input value will cause a type error.
• Example:

{ "v1": initcap("ASTERIXDB is here!"), "v2": title("ASTERIXDB is here!") };

• The expected result is:

{ "v1": "Asterixdb Is Here!", "v2": "Asterixdb Is Here!" }

length

• Syntax:

length(string)

• Returns the length of the string string. Note that the length is in the unit of code point. See the following examples for more details.

• Arguments:
• string : a string or null that represents the string to be checked.
• Return Value:
• an bigint that represents the length of string,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-string input value will cause a type error.
• Example:

length("test string");

• The expected result is:

11

• Example:

length("👩‍👩‍👧‍👦");

• The expected result is (the emoji character 👩‍👩‍👧‍👦 has 7 code points):

7

lower

• Syntax:

lower(string)

• Converts a given string string to its lowercase form.

• Arguments:
• string : a string to be converted.
• Return Value:
• a string as the lowercase form of the given string,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-string input value will cause a type error.
• Example:

lower("ASTERIXDB");

• The expected result is:

"asterixdb"

ltrim

• Syntax:

ltrim(string[, chars]);

• Returns a new string with all leading characters that appear in chars removed. By default, white space is the character to trim. Note that here one character means one code point. For example, the emoji 4-people-family notation “👩‍👩‍👧‍👦” contains 7 code points, and it is possible to trim a few code points (such as a 2-people-family “👨‍👦”) from it. See the following example for more details.

• Arguments:
• string : a string to be trimmed,
• chars : a string that contains characters that are used to trim.
• Return Value:
• a trimmed, new string,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Related functions: see trim(), rtrim()
• Example:

ltrim("me like x-phone", "eml");

• The expected result is:

" like x-phone"

• Example with multi-codepoint notation (trim the man and boy from the family of man, woman, girl and boy):

ltrim("👨‍👩‍👧‍👦", "👨‍👦")

• The expected result is (only woman, girl and boy are left in the family):

"👩‍👧‍👦"

position

• Syntax:

position(string, string_pattern)

• Returns the first position of string_pattern within string. The result is counted in the unit of code points. See the following example for more details.

• The function returns the 0-based position. Another version of the function returns the 1-based position. Below are the aliases for each version:

• 0-based: position, pos, position0, pos0.
• 1-based: position1, pos1.
• Arguments:

• string : a string that might contain the pattern.
• string_pattern : a pattern string to be matched.
• Return Value:
• the first position that string_pattern appears within string (starting at 0), or -1 if it does not appear,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will return a null.
• Example:

{
"v1": position("ppphonepp", "phone"),
"v2": position("hone", "phone"),
"v3": position1("ppphonepp", "phone"),
"v4": position1("hone", "phone")
};

• The expected result is:

{ "v1": 2, "v2": -1, v3": 3, "v4": -1 }

• Example of multi-code-point character:

position("👩‍👩‍👧‍👦🏀", "🏀");

• The expected result is (the emoji family character has 7 code points):

7

regexp_contains

• Syntax:

regexp_contains(string, string_pattern[, string_flags])

• Checks whether the strings string contains the regular expression pattern string_pattern (a Java regular expression pattern).

• Aliases:

• regexp_contains, regex_contains, contains_regexp, contains_regex.
• Arguments:

• string : a string that might contain the pattern.
• string_pattern : a pattern string to be matched.
• string_flag : (Optional) a string with flags to be used during regular expression matching.
• The following modes are enabled with these flags: dotall (s), multiline (m), case_insensitive (i), and comments and whitespace (x).
• Return Value:
• a boolean, returns true if string contains the pattern string_pattern, false otherwise.
• missing if any argument is a missing value.
• null if any argument is a null value but no argument is a missing value.
• any other non-string input value will return a null.
• Example:

{
"v1": regexp_contains("pphonepp", "p*hone"),
"v2": regexp_contains("hone", "p+hone")
};

• The expected result is:

{ "v1": true, "v2": false }

regexp_like

• Syntax:

regexp_like(string, string_pattern[, string_flags])

• Checks whether the string string exactly matches the regular expression pattern string_pattern (a Java regular expression pattern).

• Aliases:

• regexp_like, regex_like.
• Arguments:

• string : a string that might contain the pattern.
• string_pattern : a pattern string that might be contained.
• string_flag : (Optional) a string with flags to be used during regular expression matching.
• The following modes are enabled with these flags: dotall (s), multiline (m), case_insensitive (i), and comments and whitespace (x).
• Return Value:
• a boolean value, true if string contains the pattern string_pattern, false otherwise.
• missing if any argument is a missing value.
• null if any argument is a null value but no argument is a missing value.
• any other non-string input value will return a null.
• Example:

{
"v1": regexp_like(" can't stand acast the network is horrible:(", ".*acast.*"),
"v2": regexp_like("acast", ".*acst.*")
};

• The expected result is:

{ "v1": true, "v2": false }

regexp_position

• Syntax:

regexp_position(string, string_pattern[, string_flags])

• Returns first position of the regular expression string_pattern (a Java regular expression pattern) within string. The function returns the 0-based position. Another version of the function returns the 1-based position. Below are the aliases for each version:

• Aliases:

• 0-Based: regexp_position, regexp_pos, regexp_position0, regexp_pos0, regex_position, regex_pos, regex_position0, regex_pos0.
• 1-Based: regexp_position1, regexp_pos1, regex_position1 regex_pos1.
• Arguments:

• string : a string that might contain the pattern.
• string_pattern : a pattern string to be matched.
• string_flag : (Optional) a string with flags to be used during regular expression matching.
• The following modes are enabled with these flags: dotall (s), multiline (m), case_insensitive (i), and comments and whitespace (x).
• Return Value:
• the first position that the regular expression string_pattern appears in string (starting at 0), or -1 if it does not appear.
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will return a null.
• Example:

{
"v1": regexp_position("pphonepp", "p*hone"),
"v2": regexp_position("hone", "p+hone"),
"v3": regexp_position1("pphonepp", "p*hone"),
"v4": regexp_position1("hone", "p+hone")
};

• The expected result is:

{ "v1": 0, "v2": -1, "v3": 1, "v4": -1 }

regexp_replace

• Syntax:

regexp_replace(string, string_pattern, string_replacement[, string_flags])
regexp_replace(string, string_pattern, string_replacement[, replacement_limit])

• Checks whether the string string matches the given regular expression pattern string_pattern (a Java regular expression pattern), and replaces the matched pattern string_pattern with the new pattern string_replacement.

• Aliases:

• regexp_replace, regex_replace.
• Arguments:

• string : a string that might contain the pattern.
• string_pattern : a pattern string to be matched.
• string_replacement : a pattern string to be used as the replacement.
• string_flag : (Optional) a string with flags to be used during replace.
• The following modes are enabled with these flags: dotall (s), multiline (m), case_insensitive (i), and comments and whitespace (x).
• replacement_limit: (Optional) an integer specifying the maximum number of replacements to make (if negative then all occurrences will be replaced)
• Return Value:
• Returns a string that is obtained after the replacements.
• missing if any argument is a missing value.
• null if any argument is a null value but no argument is a missing value.
• any other non-string input value will return a null.
• Example:

regexp_replace(" like x-phone the voicemail_service is awesome", " like x-phone", "like product-a");

• The expected result is:

"like product-a the voicemail_service is awesome"

repeat

• Syntax:

repeat(string, n)

• Returns a string formed by repeating the input string n times.

• Arguments:
• string : a string to be repeated,
• n : an tinyint/smallint/integer/bigint value - how many times the string should be repeated.
• Return Value:
• a string that repeats the input string n times,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• a type error will be raised if:
• the first argument is any other non-string value,
• or, the second argument is not a tinyint, smallint, integer, or bigint.
• Example:

repeat("test", 3);

• The expected result is:

"testtesttest"

replace

• Syntax:

replace(string, search_string, replacement_string[, limit])

• Finds occurrences of the given substring search_string in the input string string and replaces them with the new substring replacement_string.

• Arguments:
• string : an input string,
• search_string : a string substring to be searched for,
• replacement_string : a string to be used as the replacement,
• limit : (Optional) an integer - maximum number of occurrences to be replaced. If not specified or negative then all occurrences will be replaced
• Return Value:
• Returns a string that is obtained after the replacements,
• missing if any argument is a missing value,
• any other non-string input value or non-integer limit will cause a type error,
• null if any argument is a null value but no argument is a missing value.
• Example:

{
"v1": replace(" like x-phone the voicemail_service is awesome", " like x-phone", "like product-a"),
"v2": replace("x-phone and x-phone", "x-phone", "product-a", 1)
};

• The expected result is:

{
"v1": "like product-a the voicemail_service is awesome",
"v2": "product-a and x-phone"
}

reverse

• Syntax:

reverse(string)

• Returns a string formed by reversing characters in the input string. For characters of multiple code points, code point is the minimal unit to reverse. See the following examples for more details.

• Arguments:
• string : a string to be reversed
• Return Value:
• a string containing characters from the the input string in the reverse order,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• a type error will be raised if:
• the first argument is any other non-string value
• Example:

reverse("hello");

• The expected result is:

"olleh"

• Example of multi-code-point character (Korean):

reverse("한글");

• The expected result is (the Korean characters are splitted into code points and then the code points are reversed):

"ᆯᅳᄀᆫᅡᄒ"

rtrim

• Syntax:

rtrim(string[, chars]);

• Returns a new string with all trailing characters that appear in chars removed. By default, white space is the character to trim. Note that here one character means one code point. For example, the emoji 4-people-family notation “👩‍👩‍👧‍👦” contains 7 code points, and it is possible to trim a few code points (such as a 2-people-family “👨‍👦”) from it. See the following example for more details.

• Arguments:
• string : a string to be trimmed,
• chars : a string that contains characters that are used to trim.
• Return Value:
• a trimmed, new string,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Related functions: see trim(), ltrim()
• Example:

{
"v1": rtrim("i like x-phone", "x-phone"),
"v2": rtrim("i like x-phone", "onexph")
};

• The expected result is:

{ "v1": "i like ", "v2": "i like x-" }

• Example with multi-codepoint notation (trim the man and boy from the family of man, woman, girl and boy):

rtrim("👨‍👩‍👧‍👦", "👨‍👦")

• The expected result is (only man, woman and girl are left in the family):

"👨‍👩‍👧"

split

• Syntax:

split(string, sep)

• Splits the input string into an array of substrings separated by the string sep.

• Arguments:
• string : a string to be split.
• Return Value:
• an array of substrings by splitting the input string by sep,
• in case of two consecutive seps in the string, the result of splitting the two consecutive seps will be the empty string "",
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-string input value will cause a type error.
• Example:

split("test driven development", " ");

• The expected result is:

[ "test", "driven", "development" ]

• Example with two consecutive seps in the string:

split("123//456", "/");

• The expected result is:

[ "123", "", "456" ]

starts_with

• Syntax:

starts_with(string, substring_to_start_with)

• Checks whether the string string starts with the string substring_to_start_with.

• Arguments:
• string : a string that might start with the given string.
• substring_to_start_with : a string that might be contained as the starting substring.
• Return Value:
• a boolean, returns true if string starts with the string substring_to_start_with,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error,
• false otherwise.
• Example:

{
"v1" : starts_with(" like the plan, amazing", " like"),
"v2" : starts_with("I like the plan, amazing", " like")
};

• The expected result is:

{ "v1": true, "v2": false }

substr

• Syntax:

substr(string, offset[, length])

• Returns the substring from the given string string based on the given start offset offset with the optional length. Note that both of the offset and length are in the unit of code point (e.g. the emoji family 👨‍👩‍👧‍👦 has 7 code points). The function uses the 0-based position. Another version of the function uses the 1-based position. Below are the aliases for each version:

• Aliases:

• 0-Based: substring, substr, substring0, substr0.
• 1-Based: substring1, substr1.
• Arguments:

• string : a string to be extracted.
• offset : an tinyint/smallint/integer/bigint value as the starting offset of the substring in string (starting at 0). If negative then counted from the end of the string.
• length : (Optional) an an tinyint/smallint/integer/bigint value as the length of the substring.
• Return Value:
• a string that represents the substring,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value, or if the substring could not be obtained because the starting offset is not within string bounds or length is negative.
• a null will be returned if:
• the first argument is any other non-string value.
• the second argument is not a tinyint, smallint, integer, or bigint.
• the third argument is not a tinyint, smallint, integer, or bigint if the argument is present.
• Example:

{ "v1": substr("test string", 6, 3), "v2": substr1("test string", 6, 3) };

• The expected result is:

{ "v1": "tri", "v2": "str" }

The function has an alias substring.

trim

• Syntax:

trim(string[, chars]);

• Returns a new string with all leading and trailing characters that appear in chars removed. By default, white space is the character to trim. Note that here one character means one code point. For example, the emoji 4-people-family notation “👩‍👩‍👧‍👦” contains 7 code points, and it is possible to trim a few code points (such as a 2-people-family “👨‍👦”) from it. See the following example for more details.

• Arguments:
• string : a string to be trimmed,
• chars : a string that contains characters that are used to trim.
• Return Value:
• a trimmed, new string,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Related functions: see ltrim(), rtrim()
• Example:

trim("i like x-phone", "xphoen");

• The expected result is:

" like "

• Example with multi-codepoint notation (trim the man and boy from the family of man, woman, girl and boy):

trim(“👨‍👩‍👧‍👦”, “👨‍👦”)

• The expected result is (only woman and girl are left in the family):

"👩‍👧"

upper

• Syntax:

upper(string)

• Converts a given string string to its uppercase form.

• Arguments:
• string : a string to be converted.
• Return Value:
• a string as the uppercase form of the given string,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-string input value will cause a type error.
• Example:

upper("hello")

• The expected result is:

"HELLO"

string_concat

• Syntax:

string_concat(array)

• Concatenates an array of strings array into a single string.

• Arguments:
• array : an array or multiset of strings (could be null or missing) to be concatenated.
• Return Value:
• the concatenated string value,
• missing if the argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• missing if any element in the input array is missing,
• null if any element in the input array is null but no element in the input array is missing,
• any other non-array input value or non-integer element in the input array will cause a type error.
• Example:

string_concat(["ASTERIX", " ", "ROCKS!"]);

• The expected result is:

"ASTERIX ROCKS!"

string_join

• Syntax:

string_join(array, string)

• Joins an array or multiset of strings array with the given separator string into a single string.

• Arguments:
• array : an array or multiset of strings (could be null) to be joined.
• string : a string to serve as the separator.
• Return Value:
• the joined string,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• missing if the first argument array contains a missing,
• null if the first argument array contains a null but does not contain a missing,
• a type error will be raised if:
• the first argument is any other non-array value, or contains any other non-string value,
• or, the second argument is any other non-string value.
• Example:

string_join(["ASTERIX", "ROCKS~"], "!! ");

• The expected result is:

"ASTERIX!! ROCKS~"

string_to_codepoint

• Syntax:

string_to_codepoint(string)

• Converts the string string to its code_based representation.

• Arguments:
• string : a string that will be converted.
• Return Value:
• an array of the code points for the string string,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-string input value will cause a type error.
• Example:

string_to_codepoint("Hello ASTERIX!");

• The expected result is:

[ 72, 101, 108, 108, 111, 32, 65, 83, 84, 69, 82, 73, 88, 33 ]

codepoint_to_string

• Syntax:

codepoint_to_string(array)

• Converts the ordered code_based representation array to the corresponding string.

• Arguments:
• array : an array of integer code_points.
• Return Value:
• a string representation of array.
• missing if the argument is a missing value,
• null if the argument is a null value,
• missing if any element in the input array is missing,
• null if any element in the input array is null but no element in the input array is missing,
• any other non-array input value or non-integer element in the input array will cause a type error.
• Example:

codepoint_to_string([72, 101, 108, 108, 111, 32, 65, 83, 84, 69, 82, 73, 88, 33]);

• The expected result is:

"Hello ASTERIX!"

substring_before

• Syntax:

substring_before(string, string_pattern)

• Returns the substring from the given string string before the given pattern string_pattern.

• Arguments:
• string : a string to be extracted.
• string_pattern : a string pattern to be searched.
• Return Value:
• a string that represents the substring,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Example:

substring_before(" like x-phone", "x-phone");

• The expected result is:

" like "

substring_after

• Syntax:

substring_after(string, string_pattern);

• Returns the substring from the given string string after the given pattern string_pattern.

• Arguments:
• string : a string to be extracted.
• string_pattern : a string pattern to be searched.
• Return Value:
• a string that represents the substring,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Example:

substring_after(" like x-phone", "xph");

• The expected result is:

"one"

Binary Functions

parse_binary

• Syntax:

parse_binary(string, encoding)

• Creates a binary from an string encoded in encoding format.

• Arguments:
• string : an encoded string,
• encoding : a string notation specifies the encoding type of the given string. Currently we support hex and base64 format.
• Return Value:
• a binary that is decoded from the given string,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Example:

[ parse_binary(“ABCDEF0123456789”,“hex”), parse_binary(“abcdef0123456789”,“HEX”), parse_binary(‘QXN0ZXJpeAE=’,“base64”) ];

• The expected result is:

[ hex(“ABCDEF0123456789”), hex(“ABCDEF0123456789”), hex(“4173746572697801”) ]

print_binary

• Syntax:

print_binary(binary, encoding)

• Prints a binary to the required encoding string format.

• Arguments:
• binary : a binary data need to be printed.
• encoding : a string notation specifies the expected encoding type. Currently we support hex and base64 format.
• Return Value:
• a string that represents the encoded format of a binary,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• any other non-string input value will cause a type error.
• Example:

[ print_binary(hex("ABCDEF0123456789"), "base64"), print_binary(base64("q83vASNFZ4k="), "hex") ]

• The expected result are:

[ "q83vASNFZ4k=", "ABCDEF0123456789" ]

binary_length

• Syntax:

binary_length(binary)

• Returns the number of bytes storing the binary data.

• Arguments:
• binary : a binary value to be checked.
• Return Value:
• an bigint that represents the number of bytes,
• missing if the argument is a missing value,
• null if the argument is a null value,
• any other non-binary input value will cause a type error.
• Example:

binary_length(hex("00AA"))

• The expected result is:

2

sub_binary

• Syntax:

sub_binary(binary, offset[, length])

• Returns the sub binary from the given binary based on the given start offset with the optional length.

• Arguments:
• binary : a binary to be extracted,
• offset : a tinyint, smallint, integer, or bigint value as the starting offset of the sub binary in binary (starting at 0),
• length : (Optional) a tinyint, smallint, integer, or bigint value as the length of the sub binary.
• Return Value:
• a binary that represents the sub binary,
• missing if any argument is a missing value,
• null if any argument is a null value but no argument is a missing value,
• a type error will be raised if:
• the first argument is any other non-binary value,
• or, the second argument is any other non-integer value,
• or, the third argument is any other non-integer value, if it is present.
• Example:

sub_binary(hex("AABBCCDD"), 4);

• The expected result is

hex("DD")

binary_concat

• Syntax:

binary_concat(array)

• Concatenates a binary array or multiset into a single binary.

• Arguments:
• array : an array or multiset of binaries (could be null or missing) to be concatenated.
• Return Value :
• the concatenated binary value,
• missing if the argument is a missing value,
• null if the argument is a null value,
• missing if any element in the input array is missing,
• null if any element in the input array is null but no element in the input array is missing,
• any other non-array input value or non-binary element in the input array will cause a type error.
• Example:

binary_concat([hex(“42”), hex(""), hex(‘42’)]);

• The expected result is

hex(“4242”)