tag | 3b6eeb39d1810d79bbcc73a31bc0c12be51814df | |
---|---|---|
tagger | Matt Topol <zotthewizard@gmail.com> | Thu May 29 11:55:00 2025 -0400 |
object | 8270498f1932bececb5cec22c174df5f1865f7eb |
Release v0.3.0
commit | 8270498f1932bececb5cec22c174df5f1865f7eb | [log] [tgz] |
---|---|---|
author | Matt Topol <zotthewizard@gmail.com> | Wed May 21 07:45:42 2025 -0400 |
committer | GitHub <noreply@github.com> | Wed May 21 13:45:42 2025 +0200 |
tree | 2eff8367fda4273f8dc738536cf47eab201b6b7f | |
parent | f3813937d149b503052dcd1b7820d273f396417a [diff] |
fix: Writing Avro for Spark (#435) Turns out that the Java Iceberg implementation *requires* the Avro field names to match exactly instead of only relying on the field-ids/element-ids. In this case, the issue is the Partition Field Summary element records. It appears that all "struct" types in iceberg java are expected to be named ["r" + fieldID](https://github.com/apache/iceberg/blob/a7f3dc79a2f42a4875ac35eec2137ecff15204fc/core/src/main/java/org/apache/iceberg/avro/TypeToSchema.java#L100-L105) and if you don't name it appropriately, Spark throws a very unhelpful exception (`ArrayStoreException`). This was causing anything best on Java Iceberg to fail when reading Avro written by Iceberg-Go. fixes #434
iceberg
is a Golang implementation of the Iceberg table spec.
$ git clone https://github.com/apache/iceberg-go.git $ cd iceberg-go/cmd/iceberg && go build .
Filesystem Type | Supported |
---|---|
S3 | X |
Google Cloud Storage | X |
Azure Blob Storage | X |
Local Filesystem | X |
Operation | Supported |
---|---|
Get Schema | X |
Get Snapshots | X |
Get Sort Orders | X |
Get Partition Specs | X |
Get Manifests | X |
Create New Manifests | X |
Plan Scan | x |
Plan Scan for Snapshot | x |
Operation | REST | Hive | DynamoDB | Glue | SQL |
---|---|---|---|---|---|
Load Table | X | X | X | ||
List Tables | X | X | X | ||
Create Table | X | X | X | ||
Register Table | X | X | |||
Update Current Snapshot | X | X | |||
Create New Snapshot | X | X | |||
Rename Table | X | X | X | ||
Drop Table | X | X | X | ||
Alter Table | X | X | X | ||
Check Table Exists | X | X | X | ||
Set Table Properties | X | X | X | ||
List Namespaces | X | X | X | ||
Create Namespace | X | X | X | ||
Check Namespace Exists | X | X | X | ||
Drop Namespace | X | X | X | ||
Update Namespace Properties | X | X | X | ||
Create View | X | ||||
List View | X | ||||
Drop View | X | ||||
Check View Exists | X |
As long as the FileSystem is supported and the Catalog supports altering the table, the following tracks the current write support:
Operation | Supported |
---|---|
Append Stream | X |
Append Data Files | X |
Rewrite Files | |
Rewrite manifests | |
Overwrite Files | |
Write Pos Delete | |
Write Eq Delete | |
Row Delta |
Run go build ./cmd/iceberg
from the root of this repository to build the CLI executable, alternately you can run go install github.com/apache/iceberg-go/cmd/iceberg
to install it to the bin
directory of your GOPATH
.
The iceberg
CLI usage is very similar to pyiceberg CLI
You can pass the catalog URI with --uri
argument.
Example: You can start the Iceberg REST API docker image which runs on default in port 8181
docker pull apache/iceberg-rest-fixture:latest docker run -p 8181:8181 apache/iceberg-rest-fixture:latest
and run the iceberg
CLI pointing to the REST API server.
./iceberg --uri http://0.0.0.0:8181 list ┌─────┐ | IDs | | --- | └─────┘
Create Namespace
./iceberg --uri http://0.0.0.0:8181 create namespace taxitrips
List Namespace
./iceberg --uri http://0.0.0.0:8181 list ┌───────────┐ | IDs | | --------- | | taxitrips | └───────────┘