blob: 898bceb6f7f41dda797e049bb6fa4261fe076aa1 [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Release Process
DataFusion typically has major releases around once per month, including breaking API changes.
Patch releases are made on an adhoc basis, but we try and avoid them given the frequent major releases.
## Release Process Overview
New development happens on the `main` branch.
Releases are made from branches, e.g. `branch-50` for the `50.x.y` release series.
To prepare for a new release series, we:
- Create a new branch from `main`, such as `branch-50` in the Apache repository (not in a fork)
- Continue merging new features changes to `main` branch
- Prepare the release branch for release:
- Update version numbers in `Cargo.toml` files and create `CHANGELOG.md`
- Add additional changes to the release branch as needed
- When the code is ready, create GitHub tags release candidate (rc) artifacts from the release branch.
- After the release is approved, publish to [crates.io], the ASF distribution servers, and GitHub tags.
To add changes to the release branch, depending on the change we either:
- Fix the issue on `main` and then backport the change to the release branch (e.g. [#18129])
- Fix the issue on the release branch and then forward-port the change back to `main` (e.g.[#18057])
[crates.io]: https://crates.io/crates/datafusion
[#18129]: https://github.com/apache/datafusion/pull/18129
[#18057]: https://github.com/apache/datafusion/pull/18057
## Backporting (add changes) to `branch-*` branch
If you would like to propose your change for inclusion in a patch release, the
change must be applied to the relevant release branch. To do so please follow
these steps:
1. Find (or create) the issue for the incremental release ([example release issue]) and discuss the proposed change there with the maintainers.
2. Follow normal workflow to create PR to `main` branch and wait for its approval and merge.
3. After PR is squash merged to `main`, branch from most recent release branch (e.g. `branch-50`), cherry-pick the commit and create a PR targeting the release branch [example backport PR].
For example, to backport commit `12345` from `main` to `branch-50`:
```shell
git checkout branch-50
git checkout -b backport_to_50
git cherry-pick 12345 # your git commit hash
git push -u <your fork>
# make a PR as normal targeting branch-50, prefixed with [branch-50]
```
It is also acceptable to fix the issue directly on the release branch first
and then cherry-pick the change back to `main` branch in a new PR.
[example release issue]: https://github.com/apache/datafusion/issues/18072
[example backport pr]: https://github.com/apache/datafusion/pull/18131
## Release Prerequisites
### Add git remote for `apache` repo
The instructions below assume the upstream git repo `git@github.com:apache/datafusion.git` in remote `apache`.
```shell
git remote add apache git@github.com:apache/datafusion.git
```
### Create GitHub Personal Access Token (PAT)
A personal access token (PAT) is needed for changelog automation script. If you
do not already have one, create a token with the `repo` access by navigating to
[GitHub Developer Settings] page, and [follow these steps].
[github developer settings]: https://github.com/settings/developers
[follow these steps]: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
### Add GPG Public Key to SVN `KEYS` file
If you will be releasing the final tarball, your GPG public key must be present in the following SVN files:
- https://dist.apache.org/repos/dist/dev/datafusion/KEYS
- https://dist.apache.org/repos/dist/release/datafusion/KEYS
See instructions at https://infra.apache.org/release-signing.html#generate for generating keys.
Committers can add signing keys using the Subversion client and their ASF account:
```shell
$ svn co https://dist.apache.org/repos/dist/dev/datafusion
$ cd datafusion
$ editor KEYS # add your key here
$ svn ci KEYS # commit changes
```
Follow the instructions in the header of the KEYS file to append your key. Here is an example:
```shell
(gpg --list-sigs "John Doe" && gpg --armor --export "John Doe") >> KEYS
svn commit KEYS -m "Add key for John Doe"
```
## Release Process: Step by Step
As part of the Apache governance model, official releases consist of signed
source tarballs approved by the PMC.
We then publish the code in the approved artifacts to crates.io.
### 1. Create Release Branch
First create a new release branch from `main` in the apache repository.
For example, to create the `branch-50` branch for the `50.x.y` release series:
```shell
git fetch apache # make sure we are up to date
git checkout apache/main # checkout current latest development branch
git checkout -b branch-50 # create local branch
git push -u apache branch-50 # push branch to apache remote
```
### 2. Add a protection to release candidate branch
To protect a release candidate branch from accidental merges, run:
```shell
./dev/release/add-branch-protection.sh 50
```
The script will modify `.asf.yaml` and add following block:
```yaml
branch-50:
required_pull_request_reviews:
required_approving_review_count: 1
```
- Create a PR.
- Merge to `main`.
### 3. Prepare PR to Update Changelog and the Release Version
First, prepare a PR to update the changelog and versions to reflect the planned
release. See [#18173](https://github.com/apache/datafusion/pull/18173) for an example.
#### Update Version Numbers
Manually update the DataFusion version in the root `Cargo.toml` to reflect the new release version.
Ensure Cargo.lock is updated accordingly by running:
```shell
cargo check -p datafusion
```
#### Changelog Generation
We maintain a [changelog] so our users know what has been changed between releases.
[changelog]: ../changelog
The changelog is generated using a Python script.
To run the script, you will need a GitHub Personal Access Token (described in the prerequisites section) and the `PyGitHub` library. First install the `PyGitHub` dependency via `pip`:
```shell
pip3 install PyGitHub
```
To generate the changelog, set the `GITHUB_TOKEN` environment variable and then run `./dev/release/generate-changelog.py`
providing two commit ids or tags followed by the version number of the release being created. For example,
to generate a change log of all changes between the `50.3.0` tag and `branch-51`, in preparation for release `51.0.0`:
> [!NOTE]
>
> If you see errors such as the following, it is likely due to not setting
> the `GITHUB_TOKEN` environment variable.
>
> ```
> Request GET ... failed with 403: rate limit exceeded
> ```
```shell
export GITHUB_TOKEN=<your-token-here>
./dev/release/generate-changelog.py 50.3.0 branch-51 51.0.0 > dev/changelog/51.0.0.md
```
This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for
titles starting with `feat:`, `fix:`, or `docs:`.
Once the change log is generated, run `prettier` to format the document:
```shell
prettier -w dev/changelog/51.0.0.md
```
#### Commit and PR
Then commit the changes and create a PR targeting the release branch.
```shell
git commit -a -m 'Update version'
```
Remember to merge any fixes back to `main` branch as well.
### 4. Prepare Release Candidate Artifacts
After the PR gets merged, you are ready to create release artifacts based off the
merged commit.
(Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers)
#### Pick a Release Candidate (RC) number
Pick numbers in sequential order, with `1` for `rc1`, `2` for `rc2`, etc.
#### Create git Tag for the Release:
While the official release artifacts are signed tarballs and zip files, we also
tag the commit it was created for convenience and code archaeology. Release tags
have the format `<version>` (e.g. `38.0.0`), and release candidates have the
format `<version>-rc<rc>` (e.g. `38.0.0-rc0`). See [the list of existing
tags].
[the list of existing tags]: https://github.com/apache/datafusion/tags
Using a string such as `38.0.0` as the `<version>`, create and push the rc tag by running these commands:
```shell
git fetch apache
git tag <version>-<rc> apache/branch-X # create tag from the release branch
git push apache <version>-<rc> # push tag to Github remote
```
For example, to create the `50.3.0-rc1 tag from `branch-50`:
```shell
git fetch apache
git tag 50.3.0-rc1 apache/branch-50
git push apache 50.3.0-rc1
```
#### Create, Sign, and Upload Artifacts
Run the `create-tarball.sh` script with the `<version>` tag and `<rc>` and you determined in previous steps:
For example, to create the `50.3.0-rc1` artifacts:
```shell
GH_TOKEN=<TOKEN> ./dev/release/create-tarball.sh 50.3.0 1
```
The `create-tarball.sh` script
1. Creates and uploads all release candidate artifacts to the [datafusion
dev](https://dist.apache.org/repos/dist/dev/datafusion) location on the
apache distribution SVN server
2. Provides you an email template to
send to dev@datafusion.apache.org for release voting.
### 5. Vote on Release Candidate Artifacts
Send the email output from the script to dev@datafusion.apache.org.
In order to publish the release on crates.io, it must be "official". To become
official it needs at least three PMC members to vote +1 on it.
#### Verifying Release Candidates
The `dev/release/verify-release-candidate.sh` is a script in this repository that can assist in the verification process. Run it like:
```shell
./dev/release/verify-release-candidate.sh 50.3.0 1
```
#### If the Release is not Approved
If the release is not approved, fix whatever the problem is, merge changelog
changes into the release branch and try again with the next RC number.
Remember to merge any fixes back to `main` branch as well.
#### If the Release is Approved: Call the Vote
Call the vote on the Arrow dev list by replying to the RC voting thread. The
reply should have a new subject constructed by adding `[RESULT]` prefix to the
old subject line.
Sample announcement template:
```
The vote has passed with <NUMBER> +1 votes. Thank you to all who helped
with the release verification.
```
### 6. Finalize the Release
NOTE: steps in this section can only be done by PMC members.
#### After the release is approved
Move artifacts to the release location in SVN, e.g.
https://dist.apache.org/repos/dist/release/datafusion/datafusion-50.3.0/, using
the `release-tarball.sh` script:
```shell
./dev/release/release-tarball.sh 50.3.0 1
```
Congratulations! The release is now official!
### 7. Create Release git tags
Tag the same release candidate commit with the final release tag
```shell
git co apache/50.3.0-rc1
git tag 50.3.0
git push apache 50.3.0
```
### 8. Publish on Crates.io
Only approved releases of the tarball should be published to
crates.io, in order to conform to Apache Software Foundation
governance standards.
An Arrow committer can publish this crate after an official project release has
been made to crates.io using the following instructions.
Follow [these
instructions](https://doc.rust-lang.org/cargo/reference/publishing.html) to
create an account and login to crates.io before asking to be added as an owner
to all DataFusion crates.
Download and unpack the official release tarball
Verify that the Cargo.toml in the tarball contains the correct version
(e.g. `version = "38.0.0"`) and then publish the crates by running the following commands
```shell
(cd datafusion/common && cargo publish)
(cd datafusion/expr-common && cargo publish)
(cd datafusion/physical-expr-common && cargo publish)
(cd datafusion/functions-aggregate-common && cargo publish)
(cd datafusion/functions-window-common && cargo publish)
(cd datafusion/doc && cargo publish)
(cd datafusion/expr && cargo publish)
(cd datafusion/macros && cargo publish)
(cd datafusion/execution && cargo publish)
(cd datafusion/functions && cargo publish)
(cd datafusion/physical-expr && cargo publish)
(cd datafusion/physical-expr-adapter && cargo publish)
(cd datafusion/functions-aggregate && cargo publish)
(cd datafusion/functions-window && cargo publish)
(cd datafusion/functions-nested && cargo publish)
(cd datafusion/sql && cargo publish)
(cd datafusion/optimizer && cargo publish)
(cd datafusion/common-runtime && cargo publish)
(cd datafusion/physical-plan && cargo publish)
(cd datafusion/pruning && cargo publish)
(cd datafusion/physical-optimizer && cargo publish)
(cd datafusion/session && cargo publish)
(cd datafusion/datasource && cargo publish)
(cd datafusion/catalog && cargo publish)
(cd datafusion/catalog-listing && cargo publish)
(cd datafusion/functions-table && cargo publish)
(cd datafusion/datasource-arrow && cargo publish)
(cd datafusion/datasource-csv && cargo publish)
(cd datafusion/datasource-json && cargo publish)
(cd datafusion/datasource-parquet && cargo publish)
(cd datafusion/core && cargo publish)
(cd datafusion/proto-common && cargo publish)
(cd datafusion/proto && cargo publish)
(cd datafusion/datasource-avro && cargo publish)
(cd datafusion/substrait && cargo publish)
(cd datafusion/ffi && cargo publish)
(cd datafusion-cli && cargo publish)
(cd datafusion/spark && cargo publish)
(cd datafusion/sqllogictest && cargo publish)
```
### Publish datafusion-cli on Homebrew
Note: [`datafusion` formula](https://formulae.brew.sh/formula/datafusion) is [updated automatically](https://github.com/Homebrew/homebrew-core/pulls?q=is%3Apr+datafusion+is%3Aclosed),
so no action is needed.
### 9: Add the release to Apache Reporter
When you have published the release, please help the project by adding the release to
[Apache Reporter](https://reporter.apache.org/). The reporter system should
send you a reminder email, but in case you miss it, you can add
the release to https://reporter.apache.org/addrelease.html?datafusion following
the examples from previous releases.
The release information is used to generate a template for a board report (see example from Apache Arrow project
[here](https://github.com/apache/arrow/pull/14357)).
### 10: Delete old RCs and Releases
See the ASF documentation on [when to archive](https://www.apache.org/legal/release-policy.html#when-to-archive)
for more information.
#### Deleting old release candidates from `dev` svn
Release candidates should be deleted once the release is published.
Get a list of DataFusion release candidates:
```shell
svn ls https://dist.apache.org/repos/dist/dev/datafusion
```
Delete a release candidate:
```shell
svn delete -m "delete old DataFusion RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-50.0.0-rc1/
```
#### Deleting old releases from `release` svn
Only the latest release should be available. Delete old releases after publishing the new release.
Get a list of DataFusion releases:
```shell
svn ls https://dist.apache.org/repos/dist/release/datafusion
```
Delete a release:
```shell
svn delete -m "delete old DataFusion release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-50.0.0
```