| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| Introduction |
| ============ |
| We welcome and encourage contributions of all kinds, such as: |
| |
| 1. Tickets with issue reports of feature requests |
| 2. Documentation improvements |
| 3. Code, both PR and (especially) PR Review. |
| |
| In addition to submitting new PRs, we have a healthy tradition of community members reviewing each other’s PRs. |
| Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases. |
| |
| Before opening a pull request that touches PyO3 bindings, please review the |
| :ref:`PyO3 class mutability guidelines <ffi_pyclass_mutability>` so you can flag missing |
| ``#[pyclass(frozen)]`` annotations during development and review. |
| |
| How to develop |
| -------------- |
| |
| This assumes that you have rust and cargo installed. We use the workflow recommended by |
| `pyo3 <https://github.com/PyO3/pyo3>`_ and `maturin <https://github.com/PyO3/maturin>`_. We recommend using |
| `uv <https://docs.astral.sh/uv/>`_ for python package management. |
| |
| By default `uv` will attempt to build the datafusion python package. For our development we prefer to build manually. This means |
| that when creating your virtual environment using `uv sync` you need to pass in the additional `--no-install-package datafusion` |
| and for `uv run` commands the additional parameter `--no-project` |
| |
| Bootstrap: |
| |
| .. code-block:: shell |
| |
| # fetch this repo |
| git clone git@github.com:apache/datafusion-python.git |
| # create the virtual environment |
| uv sync --dev --no-install-package datafusion |
| # activate the environment |
| source .venv/bin/activate |
| |
| The tests rely on test data in git submodules. |
| |
| .. code-block:: shell |
| |
| git submodule init |
| git submodule update |
| |
| |
| Whenever rust code changes (your changes or via `git pull`): |
| |
| .. code-block:: shell |
| |
| # make sure you activate the venv using "source .venv/bin/activate" first |
| maturin develop -uv |
| python -m pytest |
| |
| Running & Installing pre-commit hooks |
| ------------------------------------- |
| |
| arrow-datafusion-python takes advantage of `pre-commit <https://pre-commit.com/>`_ to assist developers with code linting to help reduce the number of commits that ultimately fail in CI due to linter errors. Using the pre-commit hooks is optional for the developer but certainly helpful for keeping PRs clean and concise. |
| |
| Our pre-commit hooks can be installed by running :code:`pre-commit install`, which will install the configurations in your ARROW_DATAFUSION_PYTHON_ROOT/.github directory and run each time you perform a commit, failing to complete the commit if an offending lint is found allowing you to make changes locally before pushing. |
| |
| The pre-commit hooks can also be run adhoc without installing them by simply running :code:`pre-commit run --all-files` |
| |
| Guidelines for Separating Python and Rust Code |
| ---------------------------------------------- |
| |
| Version 40 of ``datafusion-python`` introduced ``python`` wrappers around the ``pyo3`` generated code to vastly improve the user experience. (See the `blog post <https://datafusion.apache.org/blog/2024/08/20/python-datafusion-40.0.0/>`_ and `pull request <https://github.com/apache/datafusion-python/pull/750>`_ for more details.) |
| |
| Mostly, the ``python`` code is limited to pure wrappers with type hints and good docstrings, but there are a few reasons for when the code does more: |
| |
| 1. Trivial aliases like :py:func:`~datafusion.functions.array_append` and :py:func:`~datafusion.functions.list_append`. |
| 2. Simple type conversion, like from a ``path`` to a ``string`` of the path or from ``number`` to ``lit(number)``. |
| 3. The additional code makes an API **much** more pythonic, like we do for :py:func:`~datafusion.functions.named_struct` (see `source code <https://github.com/apache/datafusion-python/blob/a0913c728f5f323c1eb4913e614c9d996083e274/python/datafusion/functions.py#L1040-L1046>`_). |
| |
| |
| Update Dependencies |
| ------------------- |
| |
| To change test dependencies, change the ``pyproject.toml`` and run |
| |
| To update dependencies, run |
| |
| .. code-block:: shell |
| |
| uv sync --dev --no-install-package datafusion |
| |
| Improving Build Speed |
| --------------------- |
| |
| The `pyo3 <https://github.com/PyO3/pyo3>`_ dependency of this project contains a ``build.rs`` file which |
| can cause it to rebuild frequently. You can prevent this from happening by defining a ``PYO3_CONFIG_FILE`` |
| environment variable that points to a file with your build configuration. Whenever your build configuration |
| changes, such as during some major version updates, you will need to regenerate this file. This variable |
| should point to a fully resolved path on your build machine. |
| |
| To generate this file, use the following command: |
| |
| .. code-block:: shell |
| |
| PYO3_PRINT_CONFIG=1 cargo build |
| |
| This will generate some output that looks like the following. You will want to copy these contents intro |
| a file. If you place this file in your project directory with filename ``.pyo3_build_config`` it will |
| be ignored by ``git``. |
| |
| .. code-block:: |
| |
| implementation=CPython |
| version=3.9 |
| shared=true |
| abi3=true |
| lib_name=python3.12 |
| lib_dir=/opt/homebrew/opt/python@3.12/Frameworks/Python.framework/Versions/3.12/lib |
| executable=/Users/myusername/src/datafusion-python/.venv/bin/python |
| pointer_width=64 |
| build_flags= |
| suppress_build_script_link_lines=false |
| |
| Add the environment variable to your system. |
| |
| .. code-block:: shell |
| |
| export PYO3_CONFIG_FILE="/Users//myusername/src/datafusion-python/.pyo3_build_config" |
| |
| If you are on a Mac and you use VS Code for your IDE, you will want to add these variables |
| to your settings. You can find the appropriate rust flags by looking in the |
| ``.cargo/config.toml`` file. |
| |
| .. code-block:: |
| |
| "rust-analyzer.cargo.extraEnv": { |
| "RUSTFLAGS": "-C link-arg=-undefined -C link-arg=dynamic_lookup", |
| "PYO3_CONFIG_FILE": "/Users/myusername/src/datafusion-python/.pyo3_build_config" |
| }, |
| "rust-analyzer.runnables.extraEnv": { |
| "RUSTFLAGS": "-C link-arg=-undefined -C link-arg=dynamic_lookup", |
| "PYO3_CONFIG_FILE": "/Users/myusername/src/personal/datafusion-python/.pyo3_build_config" |
| } |