This documentation explains the release process for Apache DataFusion Comet. Some preparation tasks can be performed by any contributor, while certain release tasks can only be performed by a DataFusion Project Management Committee (PMC) member.
The following is a quick-reference checklist for the full release process. See the detailed sections below for instructions on each step.
Before starting the release process, review the user guide to ensure it accurately reflects the current state of the project:
It is also recommended to run benchmarks (such as TPC-H and TPC-DS) comparing performance against the previous release to check for regressions. See the Comet Benchmarking Guide for instructions.
These are tasks where agentic coding tools can be particularly helpful — for example, scanning the codebase for newly registered expressions and cross-referencing them against the documented list, or generating test queries to verify expression support status.
Any issues found should be addressed before creating the release branch.
This part of the process can be performed by any committer.
Here are the steps, using the 0.13.0 release as an example.
This document assumes that GitHub remotes are set up as follows:
$ git remote -v apache git@github.com:apache/datafusion-comet.git (fetch) apache git@github.com:apache/datafusion-comet.git (push) origin git@github.com:yourgithubid/datafusion-comet.git (fetch) origin git@github.com:yourgithubid/datafusion-comet.git (push)
Create a release branch from the latest commit in main and push to the apache repo:
git fetch apache git checkout main git reset --hard apache/main git checkout -b branch-0.13 git push apache branch-0.13
Generate the documentation content for this release. The docs on main contain only template markers, so we need to generate the actual content (config tables, compatibility matrices) for the release branch:
./dev/generate-release-docs.sh git add docs/source/user-guide/latest/ git commit -m "Generate docs for 0.13.0 release" git push apache branch-0.13
This freezes the documentation to reflect the configs and expressions available in this release.
Update the pom.xml files in the release branch to update the Maven version from 0.13.0-SNAPSHOT to 0.13.0.
There is no need to update the Rust crate versions because they will already be 0.13.0.
Create a PR against the main branch to prepare for developing the next release:
0.14.0.0.14.0-SNAPSHOT (both in the pom.xml files and also in the diff files under dev/diffs).Generate a change log to cover changes between the previous release and the release branch HEAD by running the provided dev/release/generate-changelog.py.
It is recommended that you set up a virtual Python environment and then install the dependencies:
cd dev/release python3 -m venv venv source venv/bin/activate pip3 install -r requirements.txt
To generate the changelog, set the GITHUB_TOKEN environment variable to a valid token and then run the script providing two commit ids or tags followed by the version number of the release being created. The following example generates a change log of all changes between the previous version and the current release branch HEAD revision.
export GITHUB_TOKEN=<your-token-here> python3 generate-changelog.py 0.12.0 HEAD 0.13.0 > ../changelog/0.13.0.md
Create a PR against the main branch to add this change log and once this is approved and merged, cherry-pick the commit into the release branch.
The build process requires Docker. Download the latest Docker Desktop from https://www.docker.com/products/docker-desktop/. If you have multiple docker contexts running switch to the context of the Docker Desktop. For example -
$ docker context ls NAME DESCRIPTION DOCKER ENDPOINT ERROR default Current DOCKER_HOST based configuration unix:///var/run/docker.sock desktop-linux Docker Desktop unix:///Users/parth/.docker/run/docker.sock my_custom_context * tcp://192.168.64.2:2376 $ docker context use desktop-linux
The build-release-comet.sh script will create a docker image for each architecture and use the image to build the platform specific binaries. These builder images are created every time this script is run. The script optionally allows overriding of the repository and branch to build the binaries from (Note that the local git repo is not used in the building of the binaries, but it is used to build the final uber jar).
Usage: build-release-comet.sh [options] This script builds comet native binaries inside a docker image. The image is named "comet-rm" and will be generated by this script Options are: -r [repo] : git repo (default: https://github.com/apache/datafusion-comet.git) -b [branch] : git branch (default: release) -t [tag] : tag for the spark-rm docker image to use for building (default: "latest").
Example:
cd dev/release && ./build-release-comet.sh && cd ../..
The build output is installed to a temporary local maven repository. The build script will print the name of the repository location at the end. This location will be required at the time of deploying the artifacts to a staging repository
Ensure that the Maven version update and changelog cherry-pick have been pushed to the release branch before tagging.
Tag the release branch with 0.13.0-rc1 and push to the apache repo
git fetch apache git checkout branch-0.13 git reset --hard apache/branch-0.13 git tag 0.13.0-rc1 git push apache 0.13.0-rc1
Note that pushing a release candidate tag will trigger a GitHub workflow that will build a Docker image and publish it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet
In docs directory:
docs/source/index.rst and add a new navigation menu link for the new release in the section _toc.user-guide-links-versionedbuild.sh to delete the locally cloned comet-* branch for the new release e.g. comet-0.13generate-versions.py:latest_released_version = "0.13.0" previous_versions = ["0.11.0", "0.12.0"]
Test the documentation build locally, following the instructions in docs/README.md.
Once verified, create a PR against the main branch with these documentation changes. After merging, the docs will be deployed to https://datafusion.apache.org/comet/ by the documentation publishing workflow.
Note that the download links in the installation guide will not work until the release is finalized, but having the documentation available could be useful for anyone testing out the release candidate during the voting period.
This part of the process can mostly only be performed by a PMC member.
Setting up your project in the ASF Nexus Repository from here: https://infra.apache.org/publishing-maven-artifacts.html
Set up your development environment from here: https://infra.apache.org/publishing-maven-artifacts.html
The script publish-to-maven.sh will publish the artifacts created by the build-release-comet.sh script. The artifacts will be signed using the gpg key of the release manager and uploaded to the maven staging repository.
Note that installed GPG keys can be listed with gpg --list-keys. The gpg key is a 40 character hex string.
Note: This script needs xmllint to be installed. On macOS xmllint is available by default.
On Ubuntu apt-get install -y libxml2-utils
On RedHat yum install -y xmlstarlet
./dev/release/publish-to-maven.sh -h usage: publish-to-maven.sh options Publish signed artifacts to Maven. Options -u ASF_USERNAME - Username of ASF committer account -r LOCAL_REPO - path to temporary local maven repo (created and written to by 'build-release-comet.sh') The following will be prompted for - ASF_PASSWORD - Password of ASF committer account GPG_KEY - GPG key used to sign release artifacts GPG_PASSPHRASE - Passphrase for GPG key
example
./dev/release/publish-to-maven.sh -u release_manager_asf_id -r /tmp/comet-staging-repo-VsYOX ASF Password : GPG Key (Optional): GPG Passphrase : Creating Nexus staging repository ...
In the Nexus repository UI (https://repository.apache.org/) locate and verify the artifacts in staging (https://central.sonatype.org/publish/release/#locate-and-examine-your-staging-repository).
The script closes the staging repository but does not release it. Releasing to Maven Central is a manual step performed only after the vote passes (see Publishing Maven Artifacts below).
Note that the Maven artifacts are always published under the final release version (e.g. 0.13.0), not the RC version — the -rc1 / -rc2 suffix only appears in the git tag and the source tarball in SVN. Because the script creates a new staging repository on each run, re-staging the same version for a subsequent RC is supported as long as no staging repository for that version has been released to Maven Central.
The create-tarball.sh script creates a signed source tarball and uploads it to the dev subversion repository.
Before running this script, ensure you have:
requests Python package installed (pip3 install requests)Run the create-tarball script on the release candidate tag (0.13.0-rc1):
./dev/release/create-tarball.sh 0.13.0 1
This will generate an email template for starting the vote.
Send the email that is generated in the previous step to dev@datafusion.apache.org.
The verification procedure for voters is documented in Verifying Release Candidates. Voters can also use the dev/release/verify-release-candidate.sh script to assist with verification:
./dev/release/verify-release-candidate.sh 0.13.0 1
If the vote does not pass, address the issues raised, increment the release candidate number, and repeat from the Tag the Release Candidate step. For example, the next attempt would be tagged 0.13.0-rc2.
Before staging the next RC, drop the previous RC's staging repository in the Nexus UI by selecting it and clicking “Drop”. This avoids leaving multiple closed staging repositories for the same version and prevents accidentally releasing the wrong one when the vote eventually passes. The Maven version (e.g. 0.13.0) is shared across all RCs, so each run of publish-to-maven.sh creates a new staging repository for the same GAV — only one of them should ever be released to Maven Central.
Once the vote passes, we can publish the source and binary releases.
Run the release-tarball script to move the tarball to the release subversion repository.
./dev/release/release-tarball.sh 0.13.0 1
Go to https://github.com/apache/datafusion-comet/releases and create a release for the release tag, and paste the changelog in the description.
Promote the Maven artifacts from staging to production by visiting https://repository.apache.org/#stagingRepositories and selecting the staging repository and then clicking the “release” button.
Push a release tag (0.13.0) to the apache repository.
git fetch apache git checkout 0.13.0-rc1 git tag 0.13.0 git push apache 0.13.0
Note that pushing a release tag will trigger a GitHub workflow that will build a Docker image and publish it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet
Reply to the vote thread to close the vote and announce the release. The announcement email should include:
Register the release with the Apache Reporter Service using a version such as COMET-0.13.0.
See the ASF documentation on when to archive for more information.
dev svnRelease candidates should be deleted once the release is published.
Get a list of DataFusion Comet release candidates:
svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep comet
Delete a release candidate:
svn delete -m "delete old DataFusion Comet RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-comet-0.13.0-rc1/
release svnOnly the latest release should be available. Delete old releases after publishing the new release.
Get a list of DataFusion releases:
svn ls https://dist.apache.org/repos/dist/release/datafusion | grep comet
Delete a release:
svn delete -m "delete old DataFusion Comet release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-0.12.0
Writing a blog post about the release is a great way to generate more interest in the project. We typically create a Google document where the community can collaborate on a blog post. Once the content is agreed then a PR can be created against the datafusion-site repository to add the blog post. Any contributor can drive this process.