| commit | 4f010424b4e25954d819dd28fdeb582139b3af1b | [log] [tgz] |
|---|---|---|
| author | Badal Prasad Singh <badal@datazip.io> | Wed Sep 24 01:30:03 2025 +0530 |
| committer | GitHub <noreply@github.com> | Tue Sep 23 16:00:03 2025 -0400 |
| tree | 163883fd0a1e9c4c5971fe0fde574c61a18a7321 | |
| parent | a250163b7c8ad15df84cb3b2ffdc266b789b6c4d [diff] |
feat(table): add fanout partition writer and rolling data writer (#524) # Partitioned Fanout Writer with Rolling Data File Support (Append Mode) This PR completes the implementation of partitioned writing with support for rolling data files in append mode. It enables efficient, parallelized ingestion into partitioned tables while maintaining manifest and snapshot correctness. **Slack Thread Discussion**: [Link](https://apache-iceberg.slack.com/archives/C05J3MJ42BD/p1751002533414969) **Proposal Document**: [Google Drive](https://drive.google.com/file/d/18CwR9nhwkThs-Q-JZZvisBEaDICvp5Z7/view?usp=drive_link) ### Details * Introduced parallel processing of `arrow.Record` using a user-defined number of goroutines. * Each goroutine maintains its own hash table to map partition keys to row indices. * After partitioning, `compute.Take()` is used to extract per-partition data slices. * Integrated dedicated rolling writers per partition to manage data file size thresholds and output constraints. ### Tests Performed * [x] Compatible with all partition transforms * [x] Handled null values in partition columns * [x] Validated compatibility with partition spec evolution * [x] Verified correctness for non-linear transformation cases * [x] Confirmed schema evolution compatibility * [x] Partition pruning verified --- @zeroshade — would appreciate your review when you get a chance! --------- Signed-off-by: badalprasadsingh <badal@datazip.io> Co-authored-by: Matt Topol <zotthewizard@gmail.com>
iceberg is a Golang implementation of the Iceberg table spec.
$ git clone https://github.com/apache/iceberg-go.git $ cd iceberg-go/cmd/iceberg && go build .
| Filesystem Type | Supported |
|---|---|
| S3 | X |
| Google Cloud Storage | X |
| Azure Blob Storage | X |
| Local Filesystem | X |
| Operation | Supported |
|---|---|
| Get Schema | X |
| Get Snapshots | X |
| Get Sort Orders | X |
| Get Partition Specs | X |
| Get Manifests | X |
| Create New Manifests | X |
| Plan Scan | x |
| Plan Scan for Snapshot | x |
| Operation | REST | Hive | Glue | SQL |
|---|---|---|---|---|
| Load Table | X | X | X | |
| List Tables | X | X | X | |
| Create Table | X | X | X | |
| Register Table | X | X | ||
| Update Current Snapshot | X | X | X | |
| Create New Snapshot | X | X | X | |
| Rename Table | X | X | X | |
| Drop Table | X | X | X | |
| Alter Table | X | X | X | |
| Check Table Exists | X | X | X | |
| Set Table Properties | X | X | X | |
| List Namespaces | X | X | X | |
| Create Namespace | X | X | X | |
| Check Namespace Exists | X | X | X | |
| Drop Namespace | X | X | X | |
| Update Namespace Properties | X | X | X | |
| Create View | X | X | ||
| Load View | X | |||
| List View | X | X | ||
| Drop View | X | X | ||
| Check View Exists | X | X |
As long as the FileSystem is supported and the Catalog supports altering the table, the following tracks the current write support:
| Operation | Supported |
|---|---|
| Append Stream | X |
| Append Data Files | X |
| Rewrite Files | |
| Rewrite manifests | |
| Overwrite Files | |
| Write Pos Delete | |
| Write Eq Delete | |
| Row Delta |
Run go build ./cmd/iceberg from the root of this repository to build the CLI executable, alternately you can run go install github.com/apache/iceberg-go/cmd/iceberg to install it to the bin directory of your GOPATH.
The iceberg CLI usage is very similar to pyiceberg CLI
You can pass the catalog URI with --uri argument.
Example: You can start the Iceberg REST API docker image which runs on default in port 8181
docker pull apache/iceberg-rest-fixture:latest docker run -p 8181:8181 apache/iceberg-rest-fixture:latest
and run the iceberg CLI pointing to the REST API server.
./iceberg --uri http://0.0.0.0:8181 list ┌─────┐ | IDs | | --- | └─────┘
Create Namespace
./iceberg --uri http://0.0.0.0:8181 create namespace taxitrips
List Namespace
./iceberg --uri http://0.0.0.0:8181 list ┌───────────┐ | IDs | | --------- | | taxitrips | └───────────┘