blob: 9278d33a8278c9cada111bcbba939dfe168c4bff [file] [log] [blame] [view]
---
hide:
- navigation
---
<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
-->
# Contributing
We welcome contributions to Apache Iceberg! To learn more about contributing to Apache Iceberg, please refer to the [official Iceberg contribution guidelines](https://iceberg.apache.org/contribute/). These guidelines are intended as helpful suggestions to make the contribution process as seamless as possible, and are not strict rules.
If you would like to discuss your proposed change before contributing, we encourage you to visit our [Community](https://iceberg.apache.org/community/) page. There, you will find various ways to connect with the community, including Slack and our mailing lists. Alternatively, you can open a [new issue](https://github.com/apache/iceberg-python/issues) directly in the GitHub repository.
For first-time contributors, feel free to check out our [good first issues](https://github.com/apache/iceberg-python/issues/?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) for an easy way to get started.
## Contributing to PyIceberg
The PyIceberg Project is hosted on GitHub at <https://github.com/apache/iceberg-python>.
For development, [uv](https://docs.astral.sh/uv/) is used for dependency management and packaging. uv is a Python package installer and resolver, written in Rust, that serves as a drop-in replacement for pip, and virtualenv.
### Getting Started
Install uv and set up the development environment:
```bash
make install
```
This will install uv if needed, create a virtual environment in `.venv`, and install all dependencies.
If you only want to just install uv:
```bash
make install-uv
```
### Python Version Selection
You can specify which Python version to use when creating your virtual environment:
```bash
PYTHON=3.12 make install # Create environment with Python 3.12
make test # Run tests against Python 3.12
```
> **Tip:** `uv python list` shows available interpreters. `uv python install 3.12` can install one if needed.
### IDE Setup
After running `make install`, configure your IDE to use the Python interpreter at `.venv/bin/python`.
**To set up IDEA with uv:**
- Open up the Python project in IntelliJ
- Make sure that you're on latest main
- Go to File -> Project Structure (⌘;)
- Go to Platform Settings -> SDKs
- Add Python SDK -> Virtualenv Environment -> Existing environment
- Point to `.venv/bin/python`
**VS Code:**
- Press Cmd/Ctrl+Shift+P -> "Python: Select Interpreter"
- Choose `.venv/bin/python`
### Advanced uv Usage
For full control over your environment, you can use uv commands directly. See the [uv documentation](https://docs.astral.sh/uv/) to learn more about:
- Managing dependencies with `uv add` and `uv remove`
- Python version management with `uv python`
- Running commands with `uv run`
- Lock file management with `uv.lock`
## Installation from source
Clone the repository for local development:
```sh
git clone https://github.com/apache/iceberg-python.git
cd iceberg-python
pip3 install -e ".[s3fs,hive]"
```
Install it directly for GitHub (not recommended), but sometimes handy:
```shell
pip install "git+https://github.com/apache/iceberg-python.git#egg=pyiceberg[pyarrow]"
```
## Linting
`prek` is used for autoformatting and linting:
```bash
make lint
```
`prek` will automatically fix the violations such as import orders, formatting etc. Pylint errors you need to fix yourself.
In addition to manually running `make lint`, you can install the pre-commit hooks in your local repo with `prek install`. By doing this, linting is run automatically every time you make a commit.
You can bump the integrations to the latest version using `prek auto-update`. This will check if there is a newer version of `{ruff,mypy,...}` and update the yaml.
## Cleaning
Removal of old cached files generated during the Cython build process:
```bash
make clean
```
Helps prevent build failures and unexpected behavior by removing outdated files, ensuring that only up-to-date sources are used & the build environment is always clean.
## Testing
For Python, `pytest` is used a testing framework in combination with `coverage` to enforce 90%+ code coverage.
```bash
make test
```
By default, S3 and ADLS tests are ignored because that require minio and azurite to be running.
To run the S3 suite:
```bash
make test-s3
```
To run the ADLS suite:
```bash
make test-adls
```
To pass additional arguments to pytest, you can use `PYTEST_ARGS`.
### Run pytest in verbose mode
```sh
make test PYTEST_ARGS="-v"
```
### Run pytest with pdb enabled
```sh
make test PYTEST_ARGS="--pdb"
```
To see all available pytest arguments, run `make test PYTEST_ARGS="--help"`.
### Integration tests
PyIceberg has integration tests with Apache Spark. Spark will create a new database and provision some tables that PyIceberg can query against.
```sh
make test-integration
```
This will restart the containers, to get to a clean state, and then run the PyTest suite. In case something changed in the Dockerfile or the provision script, you can run:
```sh
make test-integration-rebuild
```
To rebuild the containers from scratch.
#### Running Integration Tests against REST Catalogs
!!! warning "Do not run against production catalogs"
The integration tests will delete data throughout the entirety of your catalog. Running these integration tests against production catalogs will result in data loss.
PyIceberg supports the ability to run our catalog tests against an arbitrary REST Catalog.
In order to run the test catalog, you will need to specify which REST catalog to run against with the `PYICEBERG_TEST_CATALOG` environment variable
```sh
export PYICEBERG_TEST_CATALOG=test_catalog
```
The catalog in question can be configured either through the ~/.pyiceberg.yaml file or through environment variables.
```yaml
catalog:
test_catalog:
uri: http://rest-catalog/ws/
credential: t-1234:secret
```
```sh
export PYICEBERG_CATALOG__TEST_CATALOG__URI=thrift://localhost:9083
export PYICEBERG_CATALOG__TEST_CATALOG__ACCESS_KEY_ID=username
export PYICEBERG_CATALOG__TEST_CATALOG__SECRET_ACCESS_KEY=password
```
## Code standards
Below are the formalized conventions that we adhere to in the PyIceberg project. The goal of this is to have a common agreement on how to evolve the codebase, but also using it as guidelines for newcomers to the project.
### API Compatibility
It is important to keep the Python public API compatible across versions. The Python official [PEP-8](https://peps.python.org/pep-0008/) defines public methods as: _Public attributes should have no leading underscores_. This means not removing any methods without any notice, or removing or renaming any existing parameters. Adding new optional parameters is okay.
If you want to remove a method, please add a deprecation notice by annotating the function using `@deprecated`:
```python
from pyiceberg.utils.deprecated import deprecated
@deprecated(
deprecated_in="0.1.0",
removed_in="0.2.0",
help_message="Please use load_something_else() instead",
)
def load_something():
pass
```
Which will warn:
```text
Call to load_something, deprecated in 0.1.0, will be removed in 0.2.0. Please use load_something_else() instead.
```
If you want to remove a property or notify about a behavior change, please add a deprecation notice by calling the deprecation_message function:
```python
from pyiceberg.utils.deprecated import deprecation_message
deprecation_message(
deprecated_in="0.1.0",
removed_in="0.2.0",
help_message="The old_property is deprecated. Please use the something_else property instead.",
)
```
Which will warn:
```text
Deprecated in 0.1.0, will be removed in 0.2.0. The old_property is deprecated. Please use the something_else property instead.
```
### Logging
PyIceberg uses Python's standard logging module. You can control the logging level using either:
**CLI option:**
```bash
pyiceberg --log-level DEBUG describe my_table
```
**Environment variable:**
```bash
export PYICEBERG_LOG_LEVEL=DEBUG
pyiceberg describe my_table
```
Valid log levels are: `DEBUG`, `INFO`, `WARNING` (default), `ERROR`, `CRITICAL`.
Debug logging is particularly useful for troubleshooting issues with FileIO implementations, catalog connections, and other integration points.
### Type annotations
For the type annotation the types from the `Typing` package are used.
### Third party libraries
PyIceberg naturally integrates into the rich Python ecosystem, however it is important to be hesitant adding third party packages. Adding a lot of packages makes the library heavyweight, and causes incompatibilities with other projects if they use a different version of the library. Also, big libraries such as `s3fs`, `adlfs`, `pyarrow`, `thrift` should be optional to avoid downloading everything, while not being sure if is actually being used.