Fix DELETED manifest entry snapshot_id in OverwriteFiles (#3237)

# Rationale for this change
When _OverwriteFiles._deleted_entries() creates DELETED manifest
entries, it now sets snapshot_id to the current (deleting) snapshot's ID
instead of retaining the original INSERT snapshot's ID.

Closes #3236 

According to the [Iceberg spec (Manifest Entry
Fields)](https://iceberg.apache.org/spec/#manifest-entry-fields),
`snapshot_id` for a DELETED entry (status=2) should be the snapshot ID
in which the file was deleted. However,
`_OverwriteFiles._deleted_entries()` was copying the original entry's
`snapshot_id` (from the INSERT snapshot) into the new DELETED entry.

This caused downstream consumers that filter manifest entries by
`snapshot_id` (e.g. Iceberg Java's `IncrementalChangelogScan`) to
silently miss DELETED files, breaking CDC pipelines.

## Are these changes tested?
Added `test_manifest_entry_snapshot_id_after_partial_deletes` in
`tests/integration/test_deletes.py`.

## Are there any user-facing changes?
N/A

---------

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
2 files changed
tree: 43cc3054e8867c0e81370d5f279d11ff44c53883
  1. .github/
  2. dev/
  3. mkdocs/
  4. notebooks/
  5. pyiceberg/
  6. tests/
  7. vendor/
  8. .asf.yaml
  9. .codespellrc
  10. .gitignore
  11. .markdownlint.yaml
  12. .pre-commit-config.yaml
  13. LICENSE
  14. Makefile
  15. MANIFEST.in
  16. NOTICE
  17. pyproject.toml
  18. README.md
  19. ruff.toml
  20. setup.py
  21. uv.lock
README.md

Iceberg Python

PyIceberg is a Python library for programmatic access to Iceberg table metadata as well as to table data in Iceberg format. It is a Python implementation of the Iceberg table spec.

The documentation is available at https://py.iceberg.apache.org/.

Get in Touch