hide:

  • navigation

PyIceberg

PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.

Install

Before installing PyIceberg, make sure that you're on an up-to-date version of pip:

pip install --upgrade pip

You can install the latest release version from pypi:

pip install "pyiceberg[s3fs,hive]"

Install it directly for Github (not recommended), but sometimes handy:

pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"

Or clone the repository for local development:

git clone https://github.com/apache/iceberg.git
cd iceberg/python
pip3 install -e ".[s3fs,hive]"

You can mix and match optional dependencies depending on your needs:

KeyDescription:
hiveSupport for the Hive metastore
glueSupport for AWS Glue
dynamodbSupport for AWS DynamoDB
pyarrowPyArrow as a FileIO implementation to interact with the object store
pandasInstalls both PyArrow and Pandas
duckdbInstalls both PyArrow and DuckDB
rayInstalls PyArrow, Pandas, and Ray
s3fsS3FS as a FileIO implementation to interact with the object store
adlfsADLFS as a FileIO implementation to interact with the object store
snappySupport for snappy Avro compression
gcsGCS as the FileIO implementation to interact with the object store

You either need to install s3fs, adlfs, gcs, or pyarrow for fetching files.

There is both a CLI and Python API available.