tree: 42e934dc3a3e334aaf40f22a73c1aa64a8324e46 [path history] [tgz]
  1. check-rat-report.py
  2. create-tarball.sh
  3. generate-changelog.py
  4. rat_exclude_files.txt
  5. README.md
  6. release-tarball.sh
  7. run-rat.sh
  8. update_change_log-datafusion-python.sh
  9. update_change_log.sh
  10. verify-release-candidate.sh
dev/release/README.md

DataFusion Python Release Process

Development happens on the main branch, and most of the time, we depend on DataFusion using GitHub dependencies rather than using an official release from crates.io. This allows us to pick up new features and bug fixes frequently by creating PRs to move to a later revision of the code. It also means we can incrementally make updates that are required due to changes in DataFusion rather than having a large amount of work to do when the next official release is available.

When there is a new official release of DataFusion, we update the main branch to point to that, update the version number, and create a new release branch, such as branch-0.8. Once this branch is created, we switch the main branch back to using GitHub dependencies. The release activity (such as generating the changelog) can then happen on the release branch without blocking ongoing development in the main branch.

We can cherry-pick commits from the main branch into branch-0.8 as needed and then create new patch releases from that branch.

Detailed Guide

Pre-requisites

Releases can currently only be created by PMC members due to the permissions needed.

You will need a GitHub Personal Access Token. Follow these instructions to generate one if you do not already have one.

You will need a PyPI API token. Create one at https://test.pypi.org/manage/account/#api-tokens, setting the “Scope” to “Entire account”.

You will also need access to the datafusion project on testpypi.

Preparing the main Branch

Before creating a new release:

  • We need to ensure that the main branch does not have any GitHub dependencies
  • a PR should be created and merged to update the major version number of the project
  • A new release branch should be created, such as branch-0.8

Change Log

We maintain a CHANGELOG.md so our users know what has been changed between releases.

The changelog is generated using a Python script:

$ GITHUB_TOKEN=<TOKEN> ./dev/release/generate-changelog.py apache/arrow-datafusion-python 24.0.0 HEAD > dev/changelog/25.0.0.md

This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for titles starting with feat:, fix:, or docs: . The script will produce output similar to:

Fetching list of commits between 24.0.0 and HEAD
Fetching pull requests
Categorizing pull requests
Generating changelog content

This process is not fully automated, so there are some additional manual steps:

  • Add the ASF header to the generated file
  • Add a link to this changelog from the top-level /datafusion/CHANGELOG.md
  • Add the following content (copy from the previous version's changelog and update as appropriate:
## [24.0.0](https://github.com/apache/arrow-datafusion-python/tree/24.0.0) (2023-05-06)

[Full Changelog](https://github.com/apache/arrow-datafusion-python/compare/23.0.0...24.0.0)

Preparing a Release Candidate

Tag the Repository

git tag 0.8.0-rc1
git push apache 0.8.0-rc1

Create a source release

./dev/release/create-tarball.sh 0.8.0 1

This will also create the email template to send to the mailing list. Here is an example:

To: dev@arrow.apache.org
Subject: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 0.7.0 RC2
Hi,

I would like to propose a release of Apache Arrow DataFusion Python Bindings,
version 0.7.0.

This release candidate is based on commit: bd1b78b6d444b7ab172c6aec23fa58c842a592d7 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].
The Python wheels are located at [4].

Please download, verify checksums and signatures, run the unit tests, and vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at https://github.com/apache/arrow-datafusion-python/blob/main/dev/release/README.md#verifying-release-candidates.

[ ] +1 Release this as Apache Arrow DataFusion Python 0.7.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow DataFusion Python 0.7.0 because...

Here is my vote:

+1

[1]: https://github.com/apache/arrow-datafusion-python/tree/bd1b78b6d444b7ab172c6aec23fa58c842a592d7
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-python-0.7.0-rc2
[3]: https://github.com/apache/arrow-datafusion-python/blob/bd1b78b6d444b7ab172c6aec23fa58c842a592d7/CHANGELOG.md
[4]: https://test.pypi.org/project/datafusion/0.7.0/

Create a draft email using this content, but do not send until after completing the next step.

Publish Python Artifacts to testpypi

This section assumes some familiarity with publishing Python packages to PyPi. For more information, refer to
this tutorial.

Publish Python Wheels to testpypi

Pushing an rc tag to the release branch will cause a GitHub Workflow to run that will build the Python wheels.

Go to https://github.com/apache/arrow-datafusion-python/actions and look for an action named “Python Release Build” that has run against the pushed tag.

Click on the action and scroll down to the bottom of the page titled “Artifacts”. Download dist.zip. It should contain files such as:

datafusion-22.0.0-cp37-abi3-macosx_10_7_x86_64.whl
datafusion-22.0.0-cp37-abi3-macosx_11_0_arm64.whl
datafusion-22.0.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
datafusion-22.0.0-cp37-abi3-win_amd64.whl

Upload the wheels to testpypi.

unzip dist.zip
python3 -m pip install --upgrade setuptools twine build
python3 -m twine upload --repository testpypi datafusion-22.0.0-cp37-abi3-*.whl

When prompted for username, enter __token__. When prompted for a password, enter a valid GitHub Personal Access Token

Publish Python Source Distribution to testpypi

Download the source tarball created in the previous step, untar it, and run:

maturin sdist

This will create a file named dist/datafusion-0.7.0.tar.gz. Upload this to testpypi:

python3 -m twine upload --repository testpypi dist/datafusion-0.7.0.tar.gz

Send the Email

Send the email to start the vote.

Verifying a Release

Install the release from testpypi:

pip install --extra-index-url https://test.pypi.org/simple/ datafusion==0.7.0

Try running one of the examples from the top-level README, or write some custom Python code to query some available data files.

Publishing a Release

Publishing Apache Source Release

Once the vote passes, we can publish the release.

Create the source release tarball:

./dev/release/release-tarball.sh 0.8.0 1

Publishing Rust Crate to crates.io

Some projects depend on the Rust crate directly, so we publish this to crates.io

cargo publish

Publishing Python Artifacts to PyPi

Go to the Test PyPI page of Datafusion, and download all published artifacts under dist-release/ directory. Then proceed uploading them using twine:

twine upload --repository pypi dist-release/*

Publish Python Artifacts to Anaconda

Publishing artifacts to Anaconda is similar to PyPi. First, Download the source tarball created in the previous step and untar it.

# Assuming you have an existing conda environment named `datafusion-dev` if not see root README for instructions
conda activate datafusion-dev
conda build .

This will setup a virtual conda environment and build the artifacts inside of that virtual env. This step can take a few minutes as the entire build, host, and runtime environments are setup. Once complete a local filesystem path will be emitted for the location of the resulting package. Observe that path and copy to your clipboard.

Ex: /home/conda/envs/datafusion/conda-bld/linux-64/datafusion-0.7.0.tar.bz2

Now you are ready to publish this resulting package to anaconda.org. This can be accomplished in a few simple steps.

# First login to Anaconda with the datafusion credentials
anaconda login
# Upload the package
anaconda upload /home/conda/envs/datafusion/conda-bld/linux-64/datafusion-0.7.0.tar.bz2

Push the Release Tag

git checkout 0.8.0-rc1
git tag 0.8.0
git push apache 0.8.0

Add the release to Apache Reporter

Add the release to https://reporter.apache.org/addrelease.html?arrow with a version name prefixed with RS-DATAFUSION-PYTHON, for example RS-DATAFUSION-PYTHON-31.0.0.

The release information is used to generate a template for a board report (see example here).

Delete old RCs and Releases

See the ASF documentation on when to archive for more information.

Deleting old release candidates from dev svn

Release candidates should be deleted once the release is published.

Get a list of DataFusion release candidates:

svn ls https://dist.apache.org/repos/dist/dev/arrow | grep datafusion-python

Delete a release candidate:

svn delete -m "delete old DataFusion RC" https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-python-7.1.0-rc1/

Deleting old releases from release svn

Only the latest release should be available. Delete old releases after publishing the new release.

Get a list of DataFusion releases:

svn ls https://dist.apache.org/repos/dist/release/arrow | grep datafusion-python

Delete a release:

svn delete -m "delete old DataFusion release" https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-python-7.0.0