blob: 567f034020070ba6101867f6a01c529b53d5909e [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# DataFusion Python Release Process
Development happens on the `main` branch, and most of the time, we depend on DataFusion using GitHub dependencies
rather than using an official release from crates.io. This allows us to pick up new features and bug fixes frequently
by creating PRs to move to a later revision of the code. It also means we can incrementally make updates that are
required due to changes in DataFusion rather than having a large amount of work to do when the next official release
is available.
When there is a new official release of DataFusion, we update the `main` branch to point to that, update the version
number, and create a new release branch, such as `branch-0.8`. Once this branch is created, we switch the `main` branch
back to using GitHub dependencies. The release activity (such as generating the changelog) can then happen on the
release branch without blocking ongoing development in the `main` branch.
We can cherry-pick commits from the `main` branch into `branch-0.8` as needed and then create new patch releases
from that branch.
## Detailed Guide
### Pre-requisites
Releases can currently only be created by PMC members due to the permissions needed.
You will need a GitHub Personal Access Token. Follow
[these instructions](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
to generate one if you do not already have one.
You will need a PyPI API token. Create one at https://test.pypi.org/manage/account/#api-tokens, setting the “Scope” to
“Entire account”.
You will also need access to the [datafusion](https://test.pypi.org/project/datafusion/) project on testpypi.
### Preparing the `main` Branch
Before creating a new release:
- We need to ensure that the main branch does not have any GitHub dependencies
- a PR should be created and merged to update the major version number of the project
- A new release branch should be created, such as `branch-0.8`
## Preparing a Release Candidate
### Change Log
We maintain a `CHANGELOG.md` so our users know what has been changed between releases.
The changelog is generated using a Python script:
```bash
$ GITHUB_TOKEN=<TOKEN> ./dev/release/generate-changelog.py 24.0.0 HEAD 25.0.0 > dev/changelog/25.0.0.md
```
This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for
titles starting with `feat:`, `fix:`, or `docs:` . The script will produce output similar to:
```
Fetching list of commits between 24.0.0 and HEAD
Fetching pull requests
Categorizing pull requests
Generating changelog content
```
### Update the version number
The only place you should need to update the version is in the root `Cargo.toml`.
### Tag the Repository
Commit the changes to the changelog and version.
Assuming you have set up a remote to the `apache` repository rather than your personal fork,
you need to push a tag to start the CI process for release candidates. The following assumes
the upstream repository is called `apache`.
```bash
git tag 0.8.0-rc1
git push apache 0.8.0-rc1
```
### Create a source release
```bash
./dev/release/create-tarball.sh 0.8.0 1
```
This will also create the email template to send to the mailing list.
Create a draft email using this content, but do not send until after completing the next step.
### Publish Python Artifacts to testpypi
This section assumes some familiarity with publishing Python packages to PyPi. For more information, refer to \
[this tutorial](https://packaging.python.org/en/latest/tutorials/packaging-projects/#uploading-the-distribution-archives).
#### Publish Python Wheels to testpypi
Pushing an `rc` tag to the release branch will cause a GitHub Workflow to run that will build the Python wheels.
Go to https://github.com/apache/datafusion-python/actions and look for an action named "Python Release Build"
that has run against the pushed tag.
Click on the action and scroll down to the bottom of the page titled "Artifacts". Download `dist.zip`. It should
contain files such as:
```text
datafusion-22.0.0-cp37-abi3-macosx_10_7_x86_64.whl
datafusion-22.0.0-cp37-abi3-macosx_11_0_arm64.whl
datafusion-22.0.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
datafusion-22.0.0-cp37-abi3-win_amd64.whl
```
Upload the wheels to testpypi.
```bash
unzip dist.zip
python3 -m pip install --upgrade setuptools twine build
python3 -m twine upload --repository testpypi datafusion-22.0.0-cp37-abi3-*.whl
```
When prompted for username, enter `__token__`. When prompted for a password, enter a valid GitHub Personal Access Token
#### Publish Python Source Distribution to testpypi
Download the source tarball created in the previous step, untar it, and run:
```bash
maturin sdist
```
This will create a file named `dist/datafusion-0.7.0.tar.gz`. Upload this to testpypi:
```bash
python3 -m twine upload --repository testpypi dist/datafusion-0.7.0.tar.gz
```
### Send the Email
Send the email to start the vote.
## Verifying a Release
Releases may be verified using `verify-release-candidate.sh`:
```bash
git clone https://github.com/apache/datafusion-python.git
dev/release/verify-release-candidate.sh 48.0.0 1
```
Alternatively, one can run unit tests against a testpypi release candidate:
```bash
# clone a fresh repo
git clone https://github.com/apache/datafusion-python.git
cd datafusion-python
# checkout the release commit
git fetch --tags
git checkout 40.0.0-rc1
git submodule update --init --recursive
# create the env
python3 -m venv .venv
source .venv/bin/activate
# install release candidate
pip install --extra-index-url https://test.pypi.org/simple/ datafusion==40.0.0
# install test dependencies
pip install pytest numpy pytest-asyncio
# run the tests
pytest --import-mode=importlib python/tests -vv
```
Try running one of the examples from the top-level README, or write some custom Python code to query some available
data files.
## Publishing a Release
### Publishing Apache Source Release
Once the vote passes, we can publish the release.
Create the source release tarball:
```bash
./dev/release/release-tarball.sh 0.8.0 1
```
### Publishing Rust Crate to crates.io
Some projects depend on the Rust crate directly, so we publish this to crates.io
```shell
cargo publish
```
### Publishing Python Artifacts to PyPi
Go to the Test PyPI page of Datafusion, and download
[all published artifacts](https://test.pypi.org/project/datafusion/#files) under `dist-release/` directory. Then proceed
uploading them using `twine`:
```bash
twine upload --repository pypi dist-release/*
```
### Publish Python Artifacts to conda-forge
Pypi packages auto upload to conda-forge via [datafusion feedstock](https://github.com/conda-forge/datafusion-feedstock)
### Push the Release Tag
```bash
git checkout 0.8.0-rc1
git tag 0.8.0
git push apache 0.8.0
```
### Add the release to Apache Reporter
Add the release to https://reporter.apache.org/addrelease.html?datafusion with a version name prefixed with `DATAFUSION-PYTHON`,
for example `DATAFUSION-PYTHON-31.0.0`.
The release information is used to generate a template for a board report (see example from Apache Arrow
[here](https://github.com/apache/arrow/pull/14357)).
### Delete old RCs and Releases
See the ASF documentation on [when to archive](https://www.apache.org/legal/release-policy.html#when-to-archive)
for more information.
#### Deleting old release candidates from `dev` svn
Release candidates should be deleted once the release is published.
Get a list of DataFusion release candidates:
```bash
svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep datafusion-python
```
Delete a release candidate:
```bash
svn delete -m "delete old DataFusion RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-python-7.1.0-rc1/
```
#### Deleting old releases from `release` svn
Only the latest release should be available. Delete old releases after publishing the new release.
Get a list of DataFusion releases:
```bash
svn ls https://dist.apache.org/repos/dist/release/datafusion | grep datafusion-python
```
Delete a release:
```bash
svn delete -m "delete old DataFusion release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-python-7.0.0
```