fix: Use correct byte representation for decimal hashing (#1998)

## Which issue does this PR close?

- Closes #1981.

## What changes are included in this PR?

The
[spec](https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements)
states that:
>"Decimal values are hashed using the minimum number of bytes required
to hold the unscaled value as a two's complement big-endian".

Prior to this fix, we would incorrectly consume leading `0xFF` bytes and
hash them. Now, we only consume the bytes starting with the one that is
used to preserve the sign, and everything that follows it.

## Are these changes tested?

Added unit tests for original scenario mentioned in the issue, as well
as some additional cases
1 file changed
tree: 8acc4ba9fdfe8e8e6ce6fa56bf5dd3a3af437ec9
  1. .cargo/
  2. .devcontainer/
  3. .github/
  4. .idea/
  5. bindings/
  6. crates/
  7. docs/
  8. scripts/
  9. website/
  10. .asf.yaml
  11. .gitattributes
  12. .gitignore
  13. .licenserc.yaml
  14. .taplo.toml
  15. .typos.toml
  16. Cargo.lock
  17. Cargo.toml
  18. CHANGELOG.md
  19. CONTRIBUTING.md
  20. deny.toml
  21. LICENSE
  22. Makefile
  23. NOTICE
  24. README.md
  25. rust-toolchain.toml
  26. rustfmt.toml
README.md

Apache Iceberg™ Rust

Rust implementation of Apache Iceberg™.

Components

The Apache Iceberg Rust project is composed of the following components:

NameReleaseDocs
icebergiceberg imagedocs release docs dev
iceberg-datafusioniceberg-datafusion imagedocs release docs dev
iceberg-catalog-glueiceberg-catalog-glue imagedocs release docs dev
iceberg-catalog-hmsiceberg-catalog-hms imagedocs release docs dev
iceberg-catalog-resticeberg-catalog-rest imagedocs release docs dev

Iceberg Rust Implementation Status

The features that Iceberg Rust currently supports can be found here.

Supported Rust Version

Iceberg Rust is built and tested with stable rust, and will keep a rolling MSRV(minimum supported rust version). At least three months from latest rust release is supported. MSRV is updated when we release iceberg-rust.

Check the current MSRV on crates.io.

Contribute

Apache Iceberg is an active open-source project, governed under the Apache Software Foundation (ASF). Iceberg-rust are always open to people who want to use or contribute to it. Here are some ways to get involved.

The Apache Iceberg community is built on the principles described in the Apache Way and all who engage with the community are expected to be respectful, open, come with the best interests of the community in mind, and abide by the Apache Foundation Code of Conduct.

Users

  • Databend: An open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake.
  • Lakekeeper: An Apache-licensed Iceberg REST Catalog with data access controls.
  • Moonlink: A Rust library that enables sub-second mirroring (CDC) of Postgres tables into Iceberg.
  • RisingWave: A Postgres-compatible SQL database designed for real-time event streaming data processing, analysis, and management.
  • Wrappers: Postgres Foreign Data Wrapper development framework in Rust.

License

Licensed under the Apache License, Version 2.0