DataFusion typically has major releases around once per month, including breaking API changes.
Patch releases are made on an adhoc basis, but we try and avoid them given the frequent major releases.
New development happens on the main branch. Releases are made from branches, e.g. branch-50 for the 50.x.y release series.
To prepare for a new release series, we:
main, such as branch-50 in the Apache repository (not in a fork)main branchCargo.toml files and create CHANGELOG.mdTo add changes to the release branch, depending on the change we either:
main and then backport the change to the release branch (e.g. #18129)main (e.g.#18057)branch-* branchIf you would like to propose your change for inclusion in a patch release, the change must be applied to the relevant release branch. To do so please follow these steps:
main branch and wait for its approval and merge.main, branch from most recent release branch (e.g. branch-50), cherry-pick the commit and create a PR targeting the release branch example backport PR.For example, to backport commit 12345 from main to branch-50:
git checkout branch-50 git checkout -b backport_to_50 git cherry-pick 12345 # your git commit hash git push -u <your fork> # make a PR as normal targeting branch-50, prefixed with [branch-50]
It is also acceptable to fix the issue directly on the release branch first and then cherry-pick the change back to main branch in a new PR.
apache repoThe instructions below assume the upstream git repo git@github.com:apache/datafusion.git in remote apache.
git remote add apache git@github.com:apache/datafusion.git
A personal access token (PAT) is needed for changelog automation script. If you do not already have one, create a token with the repo access by navigating to GitHub Developer Settings page, and follow these steps.
KEYS fileIf you will be releasing the final tarball, your GPG public key must be present in the following SVN files:
See instructions at https://infra.apache.org/release-signing.html#generate for generating keys.
Committers can add signing keys using the Subversion client and their ASF account:
$ svn co https://dist.apache.org/repos/dist/dev/datafusion $ cd datafusion $ editor KEYS # add your key here $ svn ci KEYS # commit changes
Follow the instructions in the header of the KEYS file to append your key. Here is an example:
(gpg --list-sigs "John Doe" && gpg --armor --export "John Doe") >> KEYS svn commit KEYS -m "Add key for John Doe"
As part of the Apache governance model, official releases consist of signed source tarballs approved by the PMC. We then publish the code in the approved artifacts to crates.io.
First create a new release branch from main in the apache repository.
For example, to create the branch-50 branch for the 50.x.y release series:
git fetch apache # make sure we are up to date git checkout apache/main # checkout current latest development branch git checkout -b branch-50 # create local branch git push -u apache branch-50 # push branch to apache remote
To protect a release candidate branch from accidental merges, run:
./dev/release/add-branch-protection.sh 50
The script will modify .asf.yaml and add following block:
branch-50: required_pull_request_reviews: required_approving_review_count: 1
main.First, prepare a PR to update the changelog and versions to reflect the planned release. See #18173 for an example.
Manually update the DataFusion version in the root Cargo.toml to reflect the new release version.
Ensure Cargo.lock is updated accordingly by running:
cargo check -p datafusion
We maintain a changelog so our users know what has been changed between releases.
The changelog is generated using a Python script.
To run the script, you will need a GitHub Personal Access Token (described in the prerequisites section) and the PyGitHub library. First install the PyGitHub dependency via pip:
pip3 install PyGitHub
To generate the changelog, set the GITHUB_TOKEN environment variable and then run ./dev/release/generate-changelog.py providing two commit ids or tags followed by the version number of the release being created. For example, to generate a change log of all changes between the 50.3.0 tag and branch-51, in preparation for release 51.0.0:
[!NOTE]
If you see errors such as the following, it is likely due to not setting the
GITHUB_TOKENenvironment variable.Request GET ... failed with 403: rate limit exceeded
export GITHUB_TOKEN=<your-token-here> ./dev/release/generate-changelog.py 50.3.0 branch-51 51.0.0 > dev/changelog/51.0.0.md
This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for titles starting with feat:, fix:, or docs:.
Once the change log is generated, run prettier to format the document:
prettier -w dev/changelog/51.0.0.md
Then commit the changes and create a PR targeting the release branch.
git commit -a -m 'Update version'
Remember to merge any fixes back to main branch as well.
After the PR gets merged, you are ready to create release artifacts based off the merged commit.
(Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers)
Pick numbers in sequential order, with 1 for rc1, 2 for rc2, etc.
While the official release artifacts are signed tarballs and zip files, we also tag the commit it was created for convenience and code archaeology. Release tags have the format <version> (e.g. 38.0.0), and release candidates have the format <version>-rc<rc> (e.g. 38.0.0-rc0). See the list of existing tags.
Using a string such as 38.0.0 as the <version>, create and push the rc tag by running these commands:
git fetch apache git tag <version>-<rc> apache/branch-X # create tag from the release branch git push apache <version>-<rc> # push tag to Github remote
For example, to create the 50.3.0-rc1 tag from branch-50`:
git fetch apache git tag 50.3.0-rc1 apache/branch-50 git push apache 50.3.0-rc1
Run the create-tarball.sh script with the <version> tag and <rc> and you determined in previous steps:
For example, to create the 50.3.0-rc1 artifacts:
GH_TOKEN=<TOKEN> ./dev/release/create-tarball.sh 50.3.0 1
The create-tarball.sh script
Creates and uploads all release candidate artifacts to the datafusion dev location on the apache distribution SVN server
Provides you an email template to send to dev@datafusion.apache.org for release voting.
Send the email output from the script to dev@datafusion.apache.org.
In order to publish the release on crates.io, it must be “official”. To become official it needs at least three PMC members to vote +1 on it.
The dev/release/verify-release-candidate.sh is a script in this repository that can assist in the verification process. Run it like:
./dev/release/verify-release-candidate.sh 50.3.0 1
If the release is not approved, fix whatever the problem is, merge changelog changes into the release branch and try again with the next RC number.
Remember to merge any fixes back to main branch as well.
Call the vote on the Arrow dev list by replying to the RC voting thread. The reply should have a new subject constructed by adding [RESULT] prefix to the old subject line.
Sample announcement template:
The vote has passed with <NUMBER> +1 votes. Thank you to all who helped with the release verification.
NOTE: steps in this section can only be done by PMC members.
Move artifacts to the release location in SVN, e.g. https://dist.apache.org/repos/dist/release/datafusion/datafusion-50.3.0/, using the release-tarball.sh script:
./dev/release/release-tarball.sh 50.3.0 1
Congratulations! The release is now official!
Tag the same release candidate commit with the final release tag
git co apache/50.3.0-rc1 git tag 50.3.0 git push apache 50.3.0
Only approved releases of the tarball should be published to crates.io, in order to conform to Apache Software Foundation governance standards.
An Arrow committer can publish this crate after an official project release has been made to crates.io using the following instructions.
Follow these instructions to create an account and login to crates.io before asking to be added as an owner to all DataFusion crates.
Download and unpack the official release tarball
Verify that the Cargo.toml in the tarball contains the correct version (e.g. version = "38.0.0") and then publish the crates by running the following commands
(cd datafusion/common && cargo publish) (cd datafusion/expr-common && cargo publish) (cd datafusion/physical-expr-common && cargo publish) (cd datafusion/functions-aggregate-common && cargo publish) (cd datafusion/functions-window-common && cargo publish) (cd datafusion/doc && cargo publish) (cd datafusion/expr && cargo publish) (cd datafusion/macros && cargo publish) (cd datafusion/execution && cargo publish) (cd datafusion/functions && cargo publish) (cd datafusion/physical-expr && cargo publish) (cd datafusion/physical-expr-adapter && cargo publish) (cd datafusion/functions-aggregate && cargo publish) (cd datafusion/functions-window && cargo publish) (cd datafusion/functions-nested && cargo publish) (cd datafusion/sql && cargo publish) (cd datafusion/optimizer && cargo publish) (cd datafusion/common-runtime && cargo publish) (cd datafusion/physical-plan && cargo publish) (cd datafusion/pruning && cargo publish) (cd datafusion/physical-optimizer && cargo publish) (cd datafusion/session && cargo publish) (cd datafusion/datasource && cargo publish) (cd datafusion/catalog && cargo publish) (cd datafusion/catalog-listing && cargo publish) (cd datafusion/functions-table && cargo publish) (cd datafusion/datasource-arrow && cargo publish) (cd datafusion/datasource-csv && cargo publish) (cd datafusion/datasource-json && cargo publish) (cd datafusion/datasource-parquet && cargo publish) (cd datafusion/core && cargo publish) (cd datafusion/proto-common && cargo publish) (cd datafusion/proto && cargo publish) (cd datafusion/datasource-avro && cargo publish) (cd datafusion/substrait && cargo publish) (cd datafusion/ffi && cargo publish) (cd datafusion-cli && cargo publish) (cd datafusion/spark && cargo publish) (cd datafusion/sqllogictest && cargo publish)
Note: datafusion formula is updated automatically, so no action is needed.
When you have published the release, please help the project by adding the release to Apache Reporter. The reporter system should send you a reminder email, but in case you miss it, you can add the release to https://reporter.apache.org/addrelease.html?datafusion following the examples from previous releases.
The release information is used to generate a template for a board report (see example from Apache Arrow project here).
See the ASF documentation on when to archive for more information.
dev svnRelease candidates should be deleted once the release is published.
Get a list of DataFusion release candidates:
svn ls https://dist.apache.org/repos/dist/dev/datafusion
Delete a release candidate:
svn delete -m "delete old DataFusion RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-50.0.0-rc1/
release svnOnly the latest release should be available. Delete old releases after publishing the new release.
Get a list of DataFusion releases:
svn ls https://dist.apache.org/repos/dist/release/datafusion
Delete a release:
svn delete -m "delete old DataFusion release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-50.0.0