hide:
Pyiceberg comes with a CLI that's available after installing the pyiceberg package.
You can pass the path to the Catalog using the --uri and --credential argument, but it is recommended to setup a ~/.pyiceberg.yaml config as described in the Catalog section.
➜ pyiceberg --help Usage: pyiceberg [OPTIONS] COMMAND [ARGS]... Options: --catalog TEXT --verbose BOOLEAN --output [text|json] --ugi TEXT --uri TEXT --credential TEXT --help Show this message and exit. Commands: create Operation to create a namespace. describe Describe a namespace or a table. drop Operations to drop a namespace or table. files List all the files of the table. list List tables or namespaces. list-refs List all the refs in the provided table. location Return the location of the table. properties Properties on tables/namespaces. rename Rename a table. schema Get the schema of the table. spec Return the partition spec of the table. uuid Return the UUID of the table. version Print pyiceberg version.
This example assumes that you have a default catalog set. If you want to load another catalog, for example, the rest example above. Then you need to set --catalog rest.
➜ pyiceberg list default nyc
➜ pyiceberg list nyc nyc.taxis
➜ pyiceberg describe nyc.taxis Table format version 1 Metadata location file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json Table UUID 6cdfda33-bfa3-48a7-a09e-7abb462e3460 Last Updated 1661783158061 Partition spec [] Sort order [] Current schema Schema, id=0 ├── 1: VendorID: optional long ├── 2: tpep_pickup_datetime: optional timestamptz ├── 3: tpep_dropoff_datetime: optional timestamptz ├── 4: passenger_count: optional double ├── 5: trip_distance: optional double ├── 6: RatecodeID: optional double ├── 7: store_and_fwd_flag: optional string ├── 8: PULocationID: optional long ├── 9: DOLocationID: optional long ├── 10: payment_type: optional long ├── 11: fare_amount: optional double ├── 12: extra: optional double ├── 13: mta_tax: optional double ├── 14: tip_amount: optional double ├── 15: tolls_amount: optional double ├── 16: improvement_surcharge: optional double ├── 17: total_amount: optional double ├── 18: congestion_surcharge: optional double └── 19: airport_fee: optional double Current snapshot Operation.APPEND: id=5937117119577207079, schema_id=0 Snapshots Snapshots └── Snapshot 5937117119577207079, schema 0: file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro Properties owner root write.format.default parquet
Or output in JSON for automation:
➜ pyiceberg --output json describe nyc.taxis | jq { "identifier": [ "nyc", "taxis" ], "metadata_location": "file:/.../nyc.db/taxis/metadata/00000-aa3a3eac-ea08-4255-b890-383a64a94e42.metadata.json", "metadata": { "location": "file:/.../nyc.db/taxis", "table-uuid": "6cdfda33-bfa3-48a7-a09e-7abb462e3460", "last-updated-ms": 1661783158061, "last-column-id": 19, "schemas": [ { "type": "struct", "fields": [ { "id": 1, "name": "VendorID", "type": "long", "required": false }, ... { "id": 19, "name": "airport_fee", "type": "double", "required": false } ], "schema-id": 0, "identifier-field-ids": [] } ], "current-schema-id": 0, "partition-specs": [ { "spec-id": 0, "fields": [] } ], "default-spec-id": 0, "last-partition-id": 999, "properties": { "owner": "root", "write.format.default": "parquet" }, "current-snapshot-id": 5937117119577207000, "snapshots": [ { "snapshot-id": 5937117119577207000, "timestamp-ms": 1661783158061, "manifest-list": "file:/.../nyc.db/taxis/metadata/snap-5937117119577207079-1-94656c4f-4c66-4600-a4ca-f30377300527.avro", "summary": { "operation": "append", "spark.app.id": "local-1661783139151", "added-data-files": "1", "added-records": "2979431", "added-files-size": "46600777", "changed-partition-count": "1", "total-records": "2979431", "total-files-size": "46600777", "total-data-files": "1", "total-delete-files": "0", "total-position-deletes": "0", "total-equality-deletes": "0" }, "schema-id": 0 } ], "snapshot-log": [ { "snapshot-id": "5937117119577207079", "timestamp-ms": 1661783158061 } ], "metadata-log": [], "sort-orders": [ { "order-id": 0, "fields": [] } ], "default-sort-order-id": 0, "refs": { "main": { "snapshot-id": 5937117119577207000, "type": "branch" } }, "format-version": 1, "schema": { "type": "struct", "fields": [ { "id": 1, "name": "VendorID", "type": "long", "required": false }, ... { "id": 19, "name": "airport_fee", "type": "double", "required": false } ], "schema-id": 0, "identifier-field-ids": [] }, "partition-spec": [] } }
You can also add, update or remove properties on tables or namespaces:
➜ pyiceberg properties set table nyc.taxis write.metadata.delete-after-commit.enabled true Set write.metadata.delete-after-commit.enabled=true on nyc.taxis ➜ pyiceberg properties get table nyc.taxis write.metadata.delete-after-commit.enabled true ➜ pyiceberg properties remove table nyc.taxis write.metadata.delete-after-commit.enabled Property write.metadata.delete-after-commit.enabled removed from nyc.taxis ➜ pyiceberg properties get table nyc.taxis write.metadata.delete-after-commit.enabled Could not find property write.metadata.delete-after-commit.enabled on nyc.taxis