hide:
PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.
Before installing PyIceberg, make sure that you're on an up-to-date version of pip:
pip install --upgrade pip
You can install the latest release version from pypi:
pip install "pyiceberg[s3fs,hive]"
Install it directly for Github (not recommended), but sometimes handy:
pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"
Or clone the repository for local development:
git clone https://github.com/apache/iceberg.git cd iceberg/python pip3 install -e ".[s3fs,hive]"
You can mix and match optional dependencies depending on your needs:
| Key | Description: |
|---|---|
| hive | Support for the Hive metastore |
| glue | Support for AWS Glue |
| dynamodb | Support for AWS DynamoDB |
| pyarrow | PyArrow as a FileIO implementation to interact with the object store |
| pandas | Installs both PyArrow and Pandas |
| duckdb | Installs both PyArrow and DuckDB |
| ray | Installs PyArrow, Pandas, and Ray |
| s3fs | S3FS as a FileIO implementation to interact with the object store |
| adlfs | ADLFS as a FileIO implementation to interact with the object store |
| snappy | Support for snappy Avro compression |
| gcs | GCS as the FileIO implementation to interact with the object store |
You either need to install s3fs, adlfs, gcs, or pyarrow for fetching files.
There is both a CLI and Python API available.