Development happens on the main
branch, and most of the time, we depend on DataFusion using a git dependency (depending on a specific git revision) rather than using an official release from crates.io. This allows us to pick up new features and bug fixes frequently by creating PRs to move to a later revision of the code. It also means we can incrementally make updates that are required due to changes in DataFusion rather than having a large amount of work to do when the next official release is available.
When there is a new official release of DataFusion, we update the main
branch to point to that, update the version number, and create a new release branch, such as branch-0.11
. Once this branch is created, we switch the main
branch back to using GitHub dependencies. The release activity (such as generating the changelog) can then happen on the release branch without blocking ongoing development in the main
branch.
We can cherry-pick commits from the main
branch into branch-0.11
as needed and then create new patch releases from that branch.
Although some tasks can only be performed by a PMC member, many tasks can be performed by committers and contributors.
Task | Role Required |
---|---|
Create PRs against main branch to update DataFusion dependencies | None |
Create PRs against main branch to update Ballista version | None |
Create release branch (e.g. branch-0.11) | Committer |
Create PRs against release branch with CHANGELOG | None |
Create PRs against release branch with cherry-picked commits | None |
Create release candidate tag | Committer |
Task | Role Required |
---|---|
Create release candidate tarball and publish to SVN | PMC |
Start vote on mailing list | PMC |
Call vote on mailing list | PMC |
Publish release tarball to SVN | PMC |
Publish binary artifacts to crates.io | PMC |
Task | Role Required |
---|---|
Create PR against datafusion-site with updated documentation | None |
git@github.com:apache/datafusion-ballista.git
add as git remote apache
.main
BranchBefore creating a new release:
./dev/update_ballista_versions.py 0.11.0
branch-0.11
Once the release branch has been created, the main
branch can immediately go back to depending on DataFusion with a GitHub dependency.
We maintain a CHANGELOG.md
so our users know what has been changed between releases.
You will need a GitHub Personal Access Token for the following steps. Follow these instructions to generate one if you do not already have one.
The changelog is generated using a Python script. There is a dependency on PyGitHub
, which can be installed using pip:
pip3 install PyGitHub
Run the following command to generate the changelog content.
$ GITHUB_TOKEN=<TOKEN> ./dev/release/generate-changelog.py apache/datafusion-ballista 0.11.0 HEAD > 0.12.0.md
This script creates a changelog from GitHub PRs based on the labels associated with them as well as looking for titles starting with feat:
, fix:
, or docs:
. The script will produce output similar to:
Fetching list of commits between 0.11.0 and HEAD Fetching pull requests Categorizing pull requests Generating changelog content
This process is not fully automated, so there are some additional manual steps:
## [0.12.0](https://github.com/apache/datafusion-ballista/tree/0.12.0) (2024-01-14) [Full Changelog](https://github.com/apache/datafusion-ballista/compare/0.11.0...0.12.0)
Send a PR to get these changes merged into the release branch (e.g. branch-0.12
). If new commits that could change the change log content landed in the release branch before you could merge the PR, you need to rerun the changelog update script to regenerate the changelog and update the PR accordingly.
After the PR gets merged, you are ready to create release artifacts based off the merged commit.
(Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers)
Pick numbers in sequential order, with 0
for rc0
, 1
for rc1
, etc.
While the official release artifacts are signed tarballs and zip files, we also tag the commit it was created for convenience and code archaeology.
Using a string such as 0.11.0
as the <version>
, create and push the tag by running these commands:
git tag <version>-<rc> # push tag to Github remote git push apache <version>
See instructions at https://infra.apache.org/release-signing.html#generate for generating keys.
Committers can add signing keys in Subversion client with their ASF account. e.g.:
$ svn co https://dist.apache.org/repos/dist/dev/datafusion $ cd datafusion $ editor KEYS $ svn ci KEYS
Follow the instructions in the header of the KEYS file to append your key. Here is an example:
(gpg --list-sigs "John Doe" && gpg --armor --export "John Doe") >> KEYS svn commit KEYS -m "Add key for John Doe"
Run create-tarball.sh
with the <version>
tag and <rc>
and you found in previous steps:
GH_TOKEN=<TOKEN> ./dev/release/create-tarball.sh 0.11.0 1
The create-tarball.sh
script
creates and uploads all release candidate artifacts to the datafusion dev location on the apache distribution svn server
provide you an email template to send to dev@datafusion.apache.org for release voting.
Send the email output from the script to dev@datafusion.apache.org.
For the release to become “official” it needs at least three PMC members to vote +1 on it.
The dev/release/verify-release-candidate.sh
is a script in this repository that can assist in the verification process. Run it like:
./dev/release/verify-release-candidate.sh 0.11.0 0
If the release is not approved, fix whatever the problem is, merge changelog changes into main if there is any and try again with the next RC number.
NOTE: steps in this section can only be done by PMC members.
Move artifacts to the release location in SVN, e.g. https://dist.apache.org/repos/dist/release/datafusion/datafusion-ballista-0.8.0/, using the release-tarball.sh
script:
./dev/release/release-tarball.sh 0.11.0 1
Congratulations! The release is now official!
Tag the same release candidate commit with the final release tag
git checkout 0.11.0-rc1 git tag 0.11.0 git push apache 0.11.0
Only approved releases of the tarball should be published to crates.io, in order to conform to Apache Software Foundation governance standards.
A DataFusion committer can publish this crate after an official project release has been made to crates.io using the following instructions.
Follow these instructions to create an account and login to crates.io before asking to be added as an owner of the following crates:
Download and unpack the official release tarball
Verify that the Cargo.toml in the tarball contains the correct version (e.g. version = "0.8.0"
) and then publish the crates with the following commands. Crates need to be published in the correct order as shown in this diagram.
To update this diagram, manually edit the dependencies in crate-deps.dot and then run:
dot -Tsvg dev/release/crate-deps.dot > dev/release/crate-deps.svg
(cd ballista/core && cargo publish) (cd ballista/executor && cargo publish) (cd ballista/scheduler && cargo publish) (cd ballista/client && cargo publish) (cd ballista-cli && cargo publish)
Pushing a release tag causes Docker images to be published.
Images can be found at https://github.com/apache/datafusion-ballista/pkgs/container/datafusion-ballista-standalone
Call the vote on the DataFusion dev list by replying to the RC voting thread. The reply should have a new subject constructed by adding [RESULT]
prefix to the old subject line.
Sample announcement template:
The vote has passed with <NUMBER> +1 votes. Thank you to all who helped with the release verification.
Add the release to https://reporter.apache.org/addrelease.html?datafusion with a version name prefixed with BALLISTA-
, for example BALLISTA-0.9.0
.
The release information is used to generate a template for a board report (see example here).
See the ASF documentation on when to archive for more information.
dev
svnRelease candidates should be deleted once the release is published.
Get a list of Ballista release candidates:
svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep ballista
Delete a release candidate:
svn delete -m "delete old Ballista RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-ballista-0.8.0-rc1/
release
svnOnly the latest release should be available. Delete old releases after publishing the new release.
Get a list of Ballista releases:
svn ls https://dist.apache.org/repos/dist/release/datafusion | grep ballista
Delete a release:
svn delete -m "delete old Ballista release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-ballista-0.8.0
We typically crowdsource release announcements by collaborating on a Google document, usually starting with a copy of the previous release announcement.
Run the following commands to get the number of commits and number of unique contributors for inclusion in the blog post.
git log --pretty=oneline 0.11.0..0.10.0 ballista ballista-cli examples | wc -l git shortlog -sn 0.11.0..0.10.0 ballista ballista-cli examples | wc -l
Once there is consensus on the contents of the post, create a PR to add a blog post to the datafusion-site repository. Note that there is no need for a formal PMC vote on the blog post contents since this isn't considered to be a “release”.
Once the PR is merged, a GitHub action will publish the new blog post to https://datafusion.apache.org/blog/.