Expressions

DataFrame methods such as select and filter accept one or more logical expressions and there are many functions available for creating logical expressions. These are documented below.

Expressions can be chained together using a fluent-style API:

// create the expression `(a > 6) AND (b < 7)`
col("a").gt(lit(6)).and(col("b").lt(lit(7)))

Identifiers

FunctionNotes
colReference a column in a dataframe col("a")

Literal Values

FunctionNotes
litLiteral value such as lit(123) or lit("hello")

Boolean Expressions

FunctionNotes
andand(expr1, expr2) or expr1.and(expr2)
oror(expr1, expr2) or expr1.or(expr2)
notnot(expr) or expr.not()

Comparison Expressions

FunctionNotes
eqexpr1.eq(expr2)
gtexpr1.gt(expr2)
gt_eqexpr1.gt_eq(expr2)
ltexpr1.lt(expr2)
lt_eqexpr1.lt_eq(expr2)
not_eqexpr1.not_eq(expr2)

Math Functions

In addition to the math functions listed here, some Rust operators are implemented for expressions, allowing expressions such as col("a") + col("b") to be used.

FunctionNotes
abs(x)absolute value
acos(x)inverse cosine
asin(x)inverse sine
atan(x)inverse tangent
atan2(y, x)inverse tangent of y / x
ceil(x)nearest integer greater than or equal to argument
cos(x)cosine
exp(x)exponential
floor(x)nearest integer less than or equal to argument
ln(x)natural logarithm
log(base, x)logarithm of x for a particular base
log10(x)base 10 logarithm
log2(x)base 2 logarithm
power(base, exponent)base raised to the power of exponent
round(x)round to nearest integer
signum(x)sign of the argument (-1, 0, +1)
sin(x)sine
sqrt(x)square root
tan(x)tangent
trunc(x)truncate toward zero

Math functions usage notes:

Unlike to some databases the math functions in Datafusion works the same way as Rust math functions, avoiding failing on corner cases e.g

❯ select log(-1), log(0), sqrt(-1);
+----------------+---------------+-----------------+
| log(Int64(-1)) | log(Int64(0)) | sqrt(Int64(-1)) |
+----------------+---------------+-----------------+
| NaN            | -inf          | NaN             |
+----------------+---------------+-----------------+

Bitwise Operators

OperatorNotes
&Bitwise AND => (expr1 & expr2)
|Bitwise OR => (expr1 | expr2)
#Bitwise XOR => (expr1 # expr2)
<<Bitwise left shift => (expr1 << expr2)
>>Bitwise right shift => (expr1 << expr2)

Conditional Expressions

FunctionNotes
coalesceReturns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display.
caseCASE expression. Example: case(expr).when(expr, expr).when(expr, expr).otherwise(expr).end().
nullifReturns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the coalesce expression.

String Expressions

FunctionNotes
ascii
bit_length
btrim
char_length
character_length
concat
concat_ws
chr
initcap
left
length
lower
lpad
ltrim
md5
octet_length
repeat
replace
reverse
right
rpad
rtrim
digest
split_part
starts_with
strpos
substr
translate
trim
upper

Regular Expressions

FunctionNotes
regexp_match
regexp_replace

Temporal Expressions

FunctionNotes
date_part
date_trunc
from_unixtime
to_timestamp
to_timestamp_millis
to_timestamp_micros
to_timestamp_seconds
now()current time

Other Expressions

FunctionNotes
array
in_list
random
sha224
sha256
sha384
sha512
struct
to_hex

Aggregate Functions

FunctionNotes
avg
approx_distinct
approx_median
approx_percentile_cont
approx_percentile_cont_with_weight
count
count_distinct
cube
grouping_set
max
median
min
rollup
sum

Subquery Expressions

FunctionNotes
exists
in_subquerydf1.filter(in_subquery(col("foo"), df2))? is the equivalent of the SQL WHERE foo IN <df2>
not_exists
not_in_subquery
scalar_subquery

User-Defined Function Expressions

FunctionNotes
create_udf
create_udaf