| tag | db41c2e3542355a74d3fcc4dc6c7eb598acf8e68 | |
|---|---|---|
| tagger | Fokko Driesprong <fokko@apache.org> | Thu Aug 21 14:03:37 2025 +0200 |
| object | 8db086d00e26339b45a2bfffcff46ec39722a7cd |
PyIceberg 0.10.0rc1
| commit | 8db086d00e26339b45a2bfffcff46ec39722a7cd | [log] [tgz] |
|---|---|---|
| author | Hanzhi Wang <hanzhi@apple.com> | Wed Aug 20 14:43:25 2025 -0700 |
| committer | GitHub <noreply@github.com> | Wed Aug 20 23:43:25 2025 +0200 |
| tree | 14b394dd76e5d1abf9de763c05856561027944b2 | |
| parent | 950fc7131b8e597f73647c6ff2bd78d0b24102ad [diff] |
perf: optimize `inspect.partitions` (#2359)
Parallelizes manifest processing to improve performance for large tables
with many manifest files. After parallel processing, merges the
resulting partition maps to produce the final aggregated result.
Previous example ref: e937f6a1811c9e090552a4ae2015a8032e7ea910
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Perf improvement.
We experienced slowness with table.inspect.partitions() with large
table.
# Are these changes tested?
Yes.
# Are there any user-facing changes?
No.
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Hanzhi Wang <hanzhi_wang@apple.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>PyIceberg is a Python library for programmatic access to Iceberg table metadata as well as to table data in Iceberg format. It is a Python implementation of the Iceberg table spec.
The documentation is available at https://py.iceberg.apache.org/.