Expressions

DataFrame methods such as select and filter accept one or more logical expressions and there are many functions available for creating logical expressions. These are documented below.

Expressions can be chained together using a fluent-style API:

// create the expression `(a > 6) AND (b < 7)`
col("a").gt(lit(6)).and(col("b").lt(lit(7)))

Identifiers

Function	Notes
col	Reference a column in a dataframe `col("a")`

Literal Values

Function	Notes
lit	Literal value such as `lit(123)` or `lit("hello")`

Boolean Expressions

Function	Notes
and	`and(expr1, expr2)` or `expr1.and(expr2)`
or	`or(expr1, expr2)` or `expr1.or(expr2)`
not	`not(expr)` or `expr.not()`

Comparison Expressions

Function	Notes
eq	`expr1.eq(expr2)`
gt	`expr1.gt(expr2)`
gt_eq	`expr1.gt_eq(expr2)`
lt	`expr1.lt(expr2)`
lt_eq	`expr1.lt_eq(expr2)`
not_eq	`expr1.not_eq(expr2)`

Math Functions

In addition to the math functions listed here, some Rust operators are implemented for expressions, allowing expressions such as col("a") + col("b") to be used.

Function	Notes
abs(x)	absolute value
acos(x)	inverse cosine
asin(x)	inverse sine
atan(x)	inverse tangent
atan2(y, x)	inverse tangent of y / x
ceil(x)	nearest integer greater than or equal to argument
cos(x)	cosine
exp(x)	exponential
floor(x)	nearest integer less than or equal to argument
ln(x)	natural logarithm
log(base, x)	logarithm of x for a particular base
log10(x)	base 10 logarithm
log2(x)	base 2 logarithm
power(base, exponent)	base raised to the power of exponent
round(x)	round to nearest integer
signum(x)	sign of the argument (-1, 0, +1)
sin(x)	sine
sqrt(x)	square root
tan(x)	tangent
trunc(x)	truncate toward zero

Math functions usage notes:

Unlike to some databases the math functions in Datafusion works the same way as Rust math functions, avoiding failing on corner cases e.g

❯ select log(-1), log(0), sqrt(-1);
+----------------+---------------+-----------------+
| log(Int64(-1)) | log(Int64(0)) | sqrt(Int64(-1)) |
+----------------+---------------+-----------------+
| NaN            | -inf          | NaN             |
+----------------+---------------+-----------------+

Bitwise Operators

Operator	Notes
&	Bitwise AND => `(expr1 & expr2)`
\|	Bitwise OR => (expr1 \| expr2)
#	Bitwise XOR => `(expr1 # expr2)`
<<	Bitwise left shift => `(expr1 << expr2)`
>>	Bitwise right shift => `(expr1 << expr2)`

Conditional Expressions

Function	Notes
coalesce	Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display.
case	CASE expression. Example: `case(expr).when(expr, expr).when(expr, expr).otherwise(expr).end()`.
nullif	Returns a null value if `value1` equals `value2`; otherwise it returns `value1`. This can be used to perform the inverse operation of the `coalesce` expression.

String Expressions

Function	Notes
ascii
bit_length
btrim
char_length
character_length
concat
concat_ws
chr
initcap
left
length
lower
lpad
ltrim
md5
octet_length
repeat
replace
reverse
right
rpad
rtrim
digest
split_part
starts_with
strpos
substr
translate
trim
upper

Regular Expressions

Function	Notes
regexp_match
regexp_replace

Temporal Expressions

Function	Notes
date_part
date_trunc
from_unixtime
to_timestamp
to_timestamp_millis
to_timestamp_micros
to_timestamp_seconds
now()	current time

Other Expressions

Function	Notes
array
in_list
random
sha224
sha256
sha384
sha512
struct
to_hex

Aggregate Functions

Function	Notes
avg
approx_distinct
approx_median
approx_percentile_cont
approx_percentile_cont_with_weight
count
count_distinct
cube
grouping_set
max
median
min
rollup
sum

Subquery Expressions

Function	Notes
exists
in_subquery	`df1.filter(in_subquery(col("foo"), df2))?` is the equivalent of the SQL `WHERE foo IN <df2>`
not_exists
not_in_subquery
scalar_subquery

User-Defined Function Expressions

Function	Notes
create_udf
create_udaf