In addition to the primary Expression DSL, PyIceberg provides a string-based statement interface for filtering rows in Iceberg tables. This guide explains the syntax and provides examples for supported operations.
The row filter syntax is designed to be similar to SQL WHERE clauses. Here are the basic components:
Columns can be referenced using either unquoted or quoted identifiers:
column_name "column.name"
The following literal types are supported:
'hello world'42, -42, 3.14true, false (case insensitive)column = 42 column != 42 column > 42 column >= 42 column < 42 column <= 42
!!! note The == operator is an alias for = and <> is an alias for !=
column = 'hello' column != 'world'
Check for NULL values using the IS NULL and IS NOT NULL operators:
column IS NULL column IS NOT NULL
For floating-point columns, you can check for NaN values:
column IS NAN column IS NOT NAN
Check if a value is in a set of values:
column IN ('a', 'b', 'c') column NOT IN (1, 2, 3)
The LIKE operator supports pattern matching with a wildcard % at the end of the string:
column LIKE 'prefix%' column NOT LIKE 'prefix%'
!!! important The % wildcard is only supported at the end of the pattern. Using it in the middle or beginning of the pattern will raise an error.
The BETWEEN operator filters a numeric value against an inclusive range, e.g. a between 1 and 2 is equivalent to a >= 1 and a <= 2.
column BETWEEN 1 AND 2 column BETWEEN 1.0 AND 2.0
Combine multiple conditions using logical operators:
column1 = 42 AND column2 = 'hello' column1 > 0 OR column2 IS NULL NOT (column1 = 42)
!!! tip Parentheses can be used to group logical operations for clarity: sql (column1 = 42 AND column2 = 'hello') OR column3 IS NULL
Here are some complete examples showing how to combine different operations:
-- Complex filter with multiple conditions status = 'active' AND age > 18 AND NOT (country IN ('US', 'CA')) -- Filter with string pattern matching name LIKE 'John%' AND age >= 21 -- Filter with NULL checks and numeric comparisons price IS NOT NULL AND price > 100 AND quantity > 0 -- Filter with multiple logical operations (status = 'pending' OR status = 'processing') AND NOT (priority = 'low')
String Quoting: Always use single quotes for string literals. Double quotes are reserved for column identifiers.
-- Correct name = 'John' -- Incorrect name = "John"
Wildcard Usage: The % wildcard in LIKE patterns can only appear at the end.
-- Correct name LIKE 'John%' -- Incorrect (will raise an error) name LIKE '%John%'
Case Sensitivity: Boolean literals (true/false) are case insensitive, but string comparisons are case sensitive.
-- All valid is_active = true is_active = TRUE is_active = True -- Case sensitive status = 'Active' -- Will not match 'active'