PyIceberg

PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.

Install

Before installing PyIceberg, make sure that you're on an up-to-date version of pip:

pip install --upgrade pip

You can install the latest release version from pypi:

pip install "pyiceberg[s3fs,hive]"

Install it directly for Github (not recommended), but sometimes handy:

pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"

Or clone the repository for local development:

git clone https://github.com/apache/iceberg.git
cd iceberg/python
pip3 install -e ".[s3fs,hive]"

You can mix and match optional dependencies depending on your needs:

Key	Description:
hive	Support for the Hive metastore
glue	Support for AWS Glue
dynamodb	Support for AWS DynamoDB
pyarrow	PyArrow as a FileIO implementation to interact with the object store
pandas	Installs both PyArrow and Pandas
duckdb	Installs both PyArrow and DuckDB
ray	Installs PyArrow, Pandas, and Ray
s3fs	S3FS as a FileIO implementation to interact with the object store
adlfs	ADLFS as a FileIO implementation to interact with the object store
snappy	Support for snappy Avro compression
gcs	GCS as the FileIO implementation to interact with the object store

You either need to install s3fs, adlfs, gcs, or pyarrow for fetching files.

There is both a CLI and Python API available.