DataFrame methods such as select and filter accept one or more logical expressions and there are many functions available for creating logical expressions. These are documented below.
:::{tip} Most functions and methods may receive and return an Expr, which can be chained together using a fluent-style API:
use datafusion::prelude::*; // create the expression `(a > 6) AND (b < 7)` col("a").gt(lit(6)).and(col("b").lt(lit(7)));
:::
| Syntax | Description |
|---|---|
| col(ident) | Reference a column in a dataframe col("a") |
:::{note} ident : A type which implement Into<Column> trait :::
| Syntax | Description |
|---|---|
| lit(value) | Literal value such as lit(123) or lit("hello") |
:::{note} value : A type which implement Literal :::
| Syntax | Description |
|---|---|
| and(x, y), x.and(y) | Logical AND |
| or(x, y), x.or(y) | Logical OR |
| !x, not(x), x.not() | Logical NOT |
:::{note} ! is a bitwise or logical complement operator in Rust, but it only works as a logical NOT in expression API. :::
:::{note} Since && and || are logical operators in Rust and cannot be overloaded these are not available in the expression API. :::
| Syntax | Description |
|---|---|
| x & y, bitwise_and(x, y), x.bitand(y) | AND |
| x | y, bitwise_or(x, y), x.bitor(y) | OR |
| x ^ y, bitwise_xor(x, y), x.bitxor(y) | XOR |
| x << y, bitwise_shift_left(x, y), x.shl(y) | Left shift |
| x >> y, bitwise_shift_right(x, y), x.shr(y) | Right shift |
| Syntax | Description |
|---|---|
| x.eq(y) | Equal |
| x.not_eq(y) | Not Equal |
| x.gt(y) | Greater Than |
| x.gt_eq(y) | Greater Than or Equal |
| x.lt(y) | Less Than |
| x.lt_eq(y) | Less Than or Equal |
:::{note} Comparison operators (<, <=, ==, >=, >) could be overloaded by the PartialOrd and PartialEq trait in Rust, but these operators always return a bool which makes them not work with the expression API. :::
| Syntax | Description |
|---|---|
| x + y, x.add(y) | Addition |
| x - y, x.sub(y) | Subtraction |
| x * y, x.mul(y) | Multiplication |
| x / y, x.div(y) | Division |
| x % y, x.rem(y) | Remainder |
| -x, x.neg() | Negation |
| Syntax | Description |
|---|---|
| abs(x) | absolute value |
| acos(x) | inverse cosine |
| acosh(x) | inverse hyperbolic cosine |
| asin(x) | inverse sine |
| asinh(x) | inverse hyperbolic sine |
| atan(x) | inverse tangent |
| atanh(x) | inverse hyperbolic tangent |
| atan2(y, x) | inverse tangent of y / x |
| cbrt(x) | cube root |
| ceil(x) | nearest integer greater than or equal to argument |
| cos(x) | cosine |
| cosh(x) | hyperbolic cosine |
| degrees(x) | converts radians to degrees |
| exp(x) | exponential |
| factorial(x) | factorial |
| floor(x) | nearest integer less than or equal to argument |
| gcd(x, y) | greatest common divisor |
| isnan(x) | predicate determining whether NaN/-NaN or not |
| iszero(x) | predicate determining whether 0.0/-0.0 or not |
| lcm(x, y) | least common multiple |
| ln(x) | natural logarithm |
| log(base, x) | logarithm of x for a particular base |
| log10(x) | base 10 logarithm |
| log2(x) | base 2 logarithm |
| nanvl(x, y) | returns x if x is not NaN otherwise returns y |
| pi() | approximate value of π |
| power(base, exponent) | base raised to the power of exponent |
| radians(x) | converts degrees to radians |
| round(x) | round to nearest integer |
| signum(x) | sign of the argument (-1, 0, +1) |
| sin(x) | sine |
| sinh(x) | hyperbolic sine |
| sqrt(x) | square root |
| tan(x) | tangent |
| tanh(x) | hyperbolic tangent |
| trunc(x) | truncate toward zero |
:::{note} Unlike to some databases the math functions in Datafusion works the same way as Rust math functions, avoiding failing on corner cases e.g.
select log(-1), log(0), sqrt(-1); +----------------+---------------+-----------------+ | log(Int64(-1)) | log(Int64(0)) | sqrt(Int64(-1)) | +----------------+---------------+-----------------+ | NaN | -inf | NaN | +----------------+---------------+-----------------+
:::
| Syntax | Description |
|---|---|
| coalesce([value, ...]) | Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display. |
| case(expr) .when(expr) .end(),case(expr) .when(expr) .otherwise(expr) | CASE expression. The expression may chain multiple when expressions and end with an end or otherwise expression. Example: case(col(“a”) % lit(3)) .when(lit(0), lit(“A”)) .when(lit(1), lit(“B”)) .when(lit(2), lit(“C”)) .end()or, end with otherwise to match any other conditions: case(col(“b”).gt(lit(100))) .when(lit(true), lit(“value > 100”)) .otherwise(lit(“value <= 100”)) |
| nullif(value1, value2) | Returns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the coalesce expression. |
| Syntax | Description |
|---|---|
| ascii(character) | Returns a numeric representation of the character (character). Example: ascii('a') -> 97 |
| bit_length(text) | Returns the length of the string (text) in bits. Example: bit_length('spider') -> 48 |
| btrim(text, characters) | Removes all specified characters (characters) from both the beginning and the end of the string (text). Example: btrim('aabchelloccb', 'abc') -> hello |
| char_length(text) | Returns number of characters in the string (text). The same as character_length and length. Example: character_length('lion') -> 4 |
| character_length(text) | Returns number of characters in the string (text). The same as char_length and length. Example: char_length('lion') -> 4 |
| concat(value1, [value2 [, ...]]) | Concatenates the text representations (value1, [value2 [, ...]]) of all the arguments. NULL arguments are ignored. Example: concat('aaa', 'bbc', NULL, 321) -> aaabbc321 |
| concat_ws(separator, value1, [value2 [, ...]]) | Concatenates the text representations (value1, [value2 [, ...]]) of all the arguments with the separator (separator). NULL arguments are ignored. concat_ws('/', 'path', 'to', NULL, 'my', 'folder', 123) -> path/to/my/folder/123 |
| chr(integer) | Returns a character by its numeric representation (integer). Example: chr(90) -> 8 |
| initcap | Converts the first letter of each word to upper case and the rest to lower case. Example: initcap('hi TOM') -> Hi Tom |
| left(text, number) | Returns a certain number (number) of first characters (text). Example: left('like', 2) -> li |
| length(text) | Returns number of characters in the string (text). The same as character_length and char_length. Example: length('lion') -> 4 |
| lower(text) | Converts all characters in the string (text) into lower case. Example: lower('HELLO') -> hello |
| lpad(text, length, [, fill]) | Extends the string to length (length) by prepending the characters (fill) (a space by default). Example: lpad('bb', 5, 'a') → aaabb |
| ltrim(text, text) | Removes all specified characters (characters) from the beginning of the string (text). Example: ltrim('aabchelloccb', 'abc') -> helloccb |
| md5(text) | Computes the MD5 hash of the argument (text). |
| octet_length(text) | Returns number of bytes in the string (text). |
| repeat(text, number) | Repeats the string the specified number of times. Example: repeat('1', 4) -> 1111 |
| replace(string, from, to) | Replaces a specified string (from) with another specified string (to) in the string (string). Example: replace('Hello', 'replace', 'el') -> Hola |
| reverse(text) | Reverses the order of the characters in the string (text). Example: reverse('hello') -> olleh |
| right(text, number) | Returns a certain number (number) of last characters (text). Example: right('like', 2) -> ke |
| rpad(text, length, [, fill]) | Extends the string to length (length) by prepending the characters (fill) (a space by default). Example: rpad('bb', 5, 'a') → bbaaa |
| rtrim | Removes all specified characters (characters) from the end of the string (text). Example: rtrim('aabchelloccb', 'abc') -> aabchello |
| digest(input, algorithm) | Computes the binary hash of input, using the algorithm. |
| split_part(string, delimiter, index) | Splits the string (string) based on a delimiter (delimiter) and picks out the desired field based on the index (index). |
| starts_with(string, prefix) | Returns true if the string (string) starts with the specified prefix (prefix). If not, it returns false. Example: starts_with('Hi Tom', 'Hi') -> true |
| strpos | Finds the position from where the substring matches the string |
| substr(string, position, [, length]) | Returns substring from the position (position) with length (length) characters in the string (string). |
| translate(string, from, to) | Replaces the characters in from with the counterpart in to. Example: translate('abcde', 'acd', '15') -> 1b5e |
| trim(string) | Removes all characters, space by default from the string (string) |
| upper | Converts all characters in the string into upper case. Example: upper('hello') -> HELLO |
| Syntax | Description |
|---|---|
| array_any_value(array) | Returns the first non-null element in the array. array_any_value([NULL, 1, 2, 3]) -> 1 |
| array_append(array, element) | Appends an element to the end of an array. array_append([1, 2, 3], 4) -> [1, 2, 3, 4] |
| array_concat(array[, ..., array_n]) | Concatenates arrays. array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6] |
| array_has(array, element) | Returns true if the array contains the element array_has([1,2,3], 1) -> true |
| array_has_all(array, sub-array) | Returns true if all elements of sub-array exist in array array_has_all([1,2,3], [1,3]) -> true |
| array_has_any(array, sub-array) | Returns true if any elements exist in both arrays array_has_any([1,2,3], [1,4]) -> true |
| array_dims(array) | Returns an array of the array's dimensions. array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3] |
| array_distinct(array) | Returns distinct values from the array after removing duplicates. array_distinct([1, 3, 2, 3, 1, 2, 4]) -> [1, 2, 3, 4] |
| array_element(array, index) | Extracts the element with the index n from the array array_element([1, 2, 3, 4], 3) -> 3 |
| empty(array) | Returns true for an empty array or false for a non-empty array. empty([1]) -> false |
| flatten(array) | Converts an array of arrays to a flat array flatten([[1], [2, 3], [4, 5, 6]]) -> [1, 2, 3, 4, 5, 6] |
| array_length(array, dimension) | Returns the length of the array dimension. array_length([1, 2, 3, 4, 5]) -> 5 |
| array_ndims(array) | Returns the number of dimensions of the array. array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2 |
| array_pop_front(array) | Returns the array without the first element. array_pop_front([1, 2, 3]) -> [2, 3] |
| array_pop_back(array) | Returns the array without the last element. array_pop_back([1, 2, 3]) -> [1, 2] |
| array_position(array, element) | Searches for an element in the array, returns first occurrence. array_position([1, 2, 2, 3, 4], 2) -> 2 |
| array_positions(array, element) | Searches for an element in the array, returns all occurrences. array_positions([1, 2, 2, 3, 4], 2) -> [2, 3] |
| array_prepend(element, array) | Prepends an element to the beginning of an array. array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4] |
| array_repeat(element, count) | Returns an array containing element count times. array_repeat(1, 3) -> [1, 1, 1] |
| array_remove(array, element) | Removes the first element from the array equal to the given value. array_remove([1, 2, 2, 3, 2, 1, 4], 2) -> [1, 2, 3, 2, 1, 4] |
| array_remove_n(array, element, max) | Removes the first max elements from the array equal to the given value. array_remove_n([1, 2, 2, 3, 2, 1, 4], 2, 2) -> [1, 3, 2, 1, 4] |
| array_remove_all(array, element) | Removes all elements from the array equal to the given value. array_remove_all([1, 2, 2, 3, 2, 1, 4], 2) -> [1, 3, 1, 4] |
| array_replace(array, from, to) | Replaces the first occurrence of the specified element with another specified element. array_replace([1, 2, 2, 3, 2, 1, 4], 2, 5) -> [1, 5, 2, 3, 2, 1, 4] |
| array_replace_n(array, from, to, max) | Replaces the first max occurrences of the specified element with another specified element. array_replace_n([1, 2, 2, 3, 2, 1, 4], 2, 5, 2) -> [1, 5, 5, 3, 2, 1, 4] |
| array_replace_all(array, from, to) | Replaces all occurrences of the specified element with another specified element. array_replace_all([1, 2, 2, 3, 2, 1, 4], 2, 5) -> [1, 5, 5, 3, 5, 1, 4] |
| array_slice(array, begin,end) | Returns a slice of the array. array_slice([1, 2, 3, 4, 5, 6, 7, 8], 3, 6) -> [3, 4, 5, 6] |
| array_slice(array, begin, end, stride) | Returns a slice of the array with added stride feature. array_slice([1, 2, 3, 4, 5, 6, 7, 8], 3, 6, 2) -> [3, 5, 6] |
| array_to_string(array, delimiter) | Converts each element to its text representation. array_to_string([1, 2, 3, 4], ',') -> 1,2,3,4 |
| array_intersect(array1, array2) | Returns an array of the elements in the intersection of array1 and array2. array_intersect([1, 2, 3, 4], [5, 6, 3, 4]) -> [3, 4] |
| array_union(array1, array2) | Returns an array of the elements in the union of array1 and array2 without duplicates. array_union([1, 2, 3, 4], [5, 6, 3, 4]) -> [1, 2, 3, 4, 5, 6] |
| array_except(array1, array2) | Returns an array of the elements that appear in the first array but not in the second. array_except([1, 2, 3, 4], [5, 6, 3, 4]) -> [1, 2] |
| array_resize(array, size, value) | Resizes the list to contain size elements. Initializes new elements with value or empty if value is not set. array_resize([1, 2, 3], 5, 0) -> [1, 2, 3, 0, 0] |
| array_sort(array, desc, null_first) | Returns sorted array. array_sort([3, 1, 2, 5, 4]) -> [1, 2, 3, 4, 5] |
| cardinality(array/map) | Returns the total number of elements in the array or map. cardinality([[1, 2, 3], [4, 5, 6]]) -> 6 |
| make_array(value1, [value2 [, ...]]) | Returns an Arrow array using the specified input expressions. make_array(1, 2, 3) -> [1, 2, 3] |
| range(start [, stop, step]) | Returns an Arrow array between start and stop with step. SELECT range(2, 10, 3) -> [2, 5, 8] |
| string_to_array(array, delimiter, null_string) | Splits a string based on a delimiter and returns an array of parts. Any parts matching the optional null_string will be replaced with NULL. string_to_array('abc#def#ghi', '#', ' ') -> ['abc', 'def', 'ghi'] |
| trim_array(array, n) | Deprecated |
| Syntax | Description |
|---|---|
| regexp_match | Matches a regular expression against a string and returns matched substrings. |
| regexp_replace | Replaces strings that match a regular expression |
| Syntax | Description |
|---|---|
| date_part | Extracts a subfield from the date. |
| date_trunc | Truncates the date to a specified level of precision. |
| from_unixtime | Returns the unix time in format. |
| to_timestamp | Converts a string to a Timestamp(_, _) |
| to_timestamp_millis | Converts a string to a Timestamp(Milliseconds, None) |
| to_timestamp_micros | Converts a string to a Timestamp(Microseconds, None) |
| to_timestamp_seconds | Converts a string to a Timestamp(Seconds, None) |
| now() | Returns current time. |
| Syntax | Description |
|---|---|
| array([value1, ...]) | Returns an array of fixed size with each argument ([value1, ...]) on it. |
| in_list(expr, list, negated) | Returns true if (expr) belongs or not belongs (negated) to a list (list), otherwise returns false. |
| random() | Returns a random value from 0 (inclusive) to 1 (exclusive). |
| sha224(text) | Computes the SHA224 hash of the argument (text). |
| sha256(text) | Computes the SHA256 hash of the argument (text). |
| sha384(text) | Computes the SHA384 hash of the argument (text). |
| sha512(text) | Computes the SHA512 hash of the argument (text). |
| to_hex(integer) | Converts the integer (integer) to the corresponding hexadecimal string. |
| Syntax | Description |
|---|---|
| avg(expr) | Сalculates the average value for expr. |
| avg_distinct(expr) | Creates an expression to represent the avg(distinct) aggregate function |
| approx_distinct(expr) | Calculates an approximate count of the number of distinct values for expr. |
| approx_median(expr) | Calculates an approximation of the median for expr. |
| approx_percentile_cont(expr, percentile [, centroids]) | Calculates an approximation of the specified percentile for expr. Optional centroids parameter controls accuracy (default: 100). |
| approx_percentile_cont_with_weight(expr, weight_expr, percentile [, centroids]) | Calculates an approximation of the specified percentile for expr and weight_expr. Optional centroids parameter controls accuracy (default: 100). |
| bit_and(expr) | Computes the bitwise AND of all non-null input values for expr. |
| bit_or(expr) | Computes the bitwise OR of all non-null input values for expr. |
| bit_xor(expr) | Computes the bitwise exclusive OR of all non-null input values for expr. |
| bool_and(expr) | Returns true if all non-null input values (expr) are true, otherwise false. |
| bool_or(expr) | Returns true if any non-null input value (expr) is true, otherwise false. |
| count(expr) | Returns the number of rows for expr. |
| count_distinct(expr) | Creates an expression to represent the count(distinct) aggregate function |
| cube(exprs) | Creates a grouping set for all combination of exprs |
| grouping_set(exprs) | Create a grouping set. |
| max(expr) | Finds the maximum value of expr. |
| median(expr) | Сalculates the median of expr. |
| min(expr) | Finds the minimum value of expr. |
| rollup(exprs) | Creates a grouping set for rollup sets. |
| sum(expr) | Сalculates the sum of expr. |
| sum_distinct(expr) | Creates an expression to represent the sum(distinct) aggregate function |
You can also use the ExprFunctionExt trait to more easily build Aggregate arguments Expr.
See datafusion-examples/examples/expr_api.rs for example usage.
| Syntax | Equivalent to |
|---|---|
| first_value_udaf.call(vec![expr]).order_by(vec![expr]).build().unwrap() | first_value(expr, Some(vec![expr])) |
| Syntax | Description |
|---|---|
| exists | Creates an EXISTS subquery expression |
| in_subquery | df1.filter(in_subquery(col("foo"), df2))? is the equivalent of the SQL WHERE foo IN <df2> |
| not_exists | Creates a NOT EXISTS subquery expression |
| not_in_subquery | Creates a NOT IN subquery expression |
| scalar_subquery | Creates a scalar subquery expression |
| Syntax | Description |
|---|---|
| create_udf | Creates a new UDF with a specific signature and specific return type. |
| create_udaf | Creates a new UDAF with a specific signature, state type and return type. |