Fix strict NotEqualTo/NotIn pruning with partial nulls or NaNs (#3521)

## Summary

Related to #3498

Fix strict metrics evaluation for `NotEqualTo` and `NotIn` so files are
only proven to match when a column contains only nulls or only NaNs.
Mixed null/NaN files now continue through the existing bounds checks
instead of being treated as `ROWS_MUST_MATCH`.

## Root Cause

The strict evaluator used `_can_contain_nulls` / `_can_contain_nans` for
negative predicates. That is too broad: a file with values like `[null,
5]` and bounds `5..5` cannot be proven to match `x != 5` or `x not in
{5}` because the non-null row may still fail the predicate.

## Java Parity

This matches Java's `StrictMetricsEvaluator`, which only short-circuits
negative predicates when the column contains only nulls or only NaNs:

-
[`notEq`](https://github.com/apache/iceberg/blob/0b30919372df34afb632f037df88c05cdba0b134/api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java#L341-L375)
-
[`notIn`](https://github.com/apache/iceberg/blob/0b30919372df34afb632f037df88c05cdba0b134/api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java#L418-L462)

## Validation

- `UV_CACHE_DIR=.cache/uv PYTHON_GIL=1 PYTHONPATH=. uv run pytest
tests/expressions/test_evaluator.py -k "mixed_nulls_and_matching_bounds
or mixed_nans_and_matching_bounds or all_nulls or all_nans or
strict_integer_not_in"`
- `UV_CACHE_DIR=.cache/uv PYTHON_GIL=1 PYTHONPATH=. uv run pytest
tests/expressions/test_evaluator.py`
- `UV_CACHE_DIR=.cache/uv PYTHON_GIL=1 PYTHONPATH=. uv run ruff check
pyiceberg/expressions/visitors.py tests/expressions/test_evaluator.py`
- `git diff --check`

---------

Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
2 files changed
tree: 1802de1b0bde036692174a0f6b098e4de60e71ae
  1. .github/
  2. dev/
  3. mkdocs/
  4. notebooks/
  5. pyiceberg/
  6. tests/
  7. vendor/
  8. .asf.yaml
  9. .codespellrc
  10. .gitignore
  11. .markdownlint.yaml
  12. .pre-commit-config.yaml
  13. AGENTS.md
  14. LICENSE
  15. Makefile
  16. MANIFEST.in
  17. NOTICE
  18. pyproject.toml
  19. README.md
  20. ruff.toml
  21. SECURITY-THREAT-MODEL.md
  22. setup.py
  23. uv.lock
README.md

Iceberg Python

PyIceberg is a Python library for programmatic access to Iceberg table metadata as well as to table data in Iceberg format. It is a Python implementation of the Iceberg table spec.

The documentation is available at https://py.iceberg.apache.org/.

Get in Touch