blob: 23d2b91105411ac7e32eb50639afb38594c47814 [file] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Apache DataFusion Comet: Release Process
This documentation explains the release process for Apache DataFusion Comet. Some preparation tasks can be
performed by any contributor, while certain release tasks can only be performed by a DataFusion Project Management
Committee (PMC) member.
## Checklist
The following is a quick-reference checklist for the full release process. See the detailed sections below for
instructions on each step.
- [ ] Release preparation: review expression support status and user guide
- [ ] Create release branch
- [ ] Generate release documentation
- [ ] Update Maven version in release branch
- [ ] Update version in main for next development cycle
- [ ] Generate the change log and create PR against main
- [ ] Cherry-pick the change log commit into the release branch
- [ ] Build the jars
- [ ] Tag the release candidate
- [ ] Update documentation for the new release
- [ ] Publish Maven artifacts to staging
- [ ] Create the release candidate tarball
- [ ] Start the email voting thread
- [ ] Once the vote passes:
- [ ] Publish source tarball
- [ ] Create GitHub release
- [ ] Promote Maven artifacts to production
- [ ] Push the release tag
- [ ] Close the vote and announce the release
- [ ] Post release:
- [ ] Register the release with Apache Reporter
- [ ] Delete old RCs and releases from SVN
- [ ] Write a blog post
## Release Preparation
Before starting the release process, review the user guide to ensure it accurately reflects the current state of the
project:
- Review the supported expressions and operators lists in the user guide. Verify that any expressions added since
the last release are included and that their support status is accurate.
- Spot-check the support status of individual expressions by running tests or queries to confirm they work as
documented.
- Look for any expressions that may have regressed or changed behavior since the last release and update the
documentation accordingly.
It is also recommended to run benchmarks (such as TPC-H and TPC-DS) comparing performance against the previous
release to check for regressions. See the
[Comet Benchmarking Guide](benchmarking.md) for instructions.
These are tasks where agentic coding tools can be particularly helpful — for example, scanning the codebase for
newly registered expressions and cross-referencing them against the documented list, or generating test queries to
verify expression support status.
Any issues found should be addressed before creating the release branch.
## Creating the Release Candidate
This part of the process can be performed by any committer.
Here are the steps, using the 0.13.0 release as an example.
### Create Release Branch
This document assumes that GitHub remotes are set up as follows:
```shell
$ git remote -v
apache git@github.com:apache/datafusion-comet.git (fetch)
apache git@github.com:apache/datafusion-comet.git (push)
origin git@github.com:yourgithubid/datafusion-comet.git (fetch)
origin git@github.com:yourgithubid/datafusion-comet.git (push)
```
Create a release branch from the latest commit in main and push to the `apache` repo:
```shell
git fetch apache
git checkout main
git reset --hard apache/main
git checkout -b branch-0.13
git push apache branch-0.13
```
### Generate Release Documentation
Generate the documentation content for this release. The docs on `main` contain only template markers,
so we need to generate the actual content (config tables, compatibility matrices) for the release branch:
```shell
./dev/generate-release-docs.sh
git add docs/source/user-guide/latest/
git commit -m "Generate docs for 0.13.0 release"
git push apache branch-0.13
```
This freezes the documentation to reflect the configs and expressions available in this release.
### Update Maven Version
Update the `pom.xml` files in the release branch to update the Maven version from `0.13.0-SNAPSHOT` to `0.13.0`.
There is no need to update the Rust crate versions because they will already be `0.13.0`.
### Update Version in main
Create a PR against the main branch to prepare for developing the next release:
- Update the Rust crate version to `0.14.0`.
- Update the Maven version to `0.14.0-SNAPSHOT` (both in the `pom.xml` files and also in the diff files
under `dev/diffs`).
### Generate the Change Log
Generate a change log to cover changes between the previous release and the release branch HEAD by running
the provided `dev/release/generate-changelog.py`.
It is recommended that you set up a virtual Python environment and then install the dependencies:
```shell
cd dev/release
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
```
To generate the changelog, set the `GITHUB_TOKEN` environment variable to a valid token and then run the script
providing two commit ids or tags followed by the version number of the release being created. The following
example generates a change log of all changes between the previous version and the current release branch HEAD revision.
```shell
export GITHUB_TOKEN=<your-token-here>
python3 generate-changelog.py 0.12.0 HEAD 0.13.0 > ../changelog/0.13.0.md
```
Create a PR against the _main_ branch to add this change log and once this is approved and merged, cherry-pick the
commit into the release branch.
### Build the jars
#### A note on workspace cleanliness
The `common/pom.xml` resource configuration unconditionally bundles
`native/target/{x86_64,aarch64}-apple-darwin/release/libcomet.dylib` into the
`common` jar when those files exist on disk. Maven's `clean` removes
`common/target` but does not touch Cargo's `native/target` directory, so a
stale dylib left over from a prior local `make release` or `make release-linux`
on the release manager's workstation can silently end up in a release jar
(see [#2232](https://github.com/apache/datafusion-comet/issues/2232) for the
incident in 0.9.1).
The `build-release-comet.sh` script now runs `cargo clean` for you, but as a
defensive measure, prefer running the release build from a fresh clone of the
repository rather than your day-to-day working tree.
#### Setup to do the build
The build process requires Docker. Download the latest Docker Desktop from https://www.docker.com/products/docker-desktop/.
If you have multiple docker contexts running switch to the context of the Docker Desktop. For example -
```shell
$ docker context ls
NAME DESCRIPTION DOCKER ENDPOINT ERROR
default Current DOCKER_HOST based configuration unix:///var/run/docker.sock
desktop-linux Docker Desktop unix:///Users/parth/.docker/run/docker.sock
my_custom_context * tcp://192.168.64.2:2376
$ docker context use desktop-linux
```
#### Run the build script
The `build-release-comet.sh` script will create a docker image for each architecture and use the image
to build the platform specific binaries. These builder images are created every time this script is run.
The script optionally allows overriding of the repository and branch to build the binaries from (Note that
the local git repo is not used in the building of the binaries, but it is used to build the final uber jar).
```shell
Usage: build-release-comet.sh [options]
This script builds comet native binaries inside a docker image. The image is named
"comet-rm" and will be generated by this script
Options are:
-r [repo] : git repo (default: https://github.com/apache/datafusion-comet.git)
-b [branch] : git branch (default: release)
-t [tag] : tag for the spark-rm docker image to use for building (default: "latest").
```
Example:
```shell
cd dev/release && ./build-release-comet.sh && cd ../..
```
#### Build output
The build output is installed to a temporary local maven repository. The build script will print the name of the
repository location at the end. This location will be required at the time of deploying the artifacts to a staging
repository
### Tag the Release Candidate
Ensure that the Maven version update and changelog cherry-pick have been pushed to the release branch before tagging.
Tag the release branch with `0.13.0-rc1` and push to the `apache` repo
```shell
git fetch apache
git checkout branch-0.13
git reset --hard apache/branch-0.13
git tag 0.13.0-rc1
git push apache 0.13.0-rc1
```
Note that pushing a release candidate tag will trigger a GitHub workflow that will build a Docker image and publish
it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet
### Publishing Documentation
In `docs` directory:
- Update `docs/source/index.rst` and add a new navigation menu link for the new release in the section `_toc.user-guide-links-versioned`
- Add a new line to `build.sh` to delete the locally cloned `comet-*` branch for the new release e.g. `comet-0.13`
- Update the main method in `generate-versions.py`:
```python
latest_released_version = "0.13.0"
previous_versions = ["0.11.0", "0.12.0"]
```
Test the documentation build locally, following the instructions in `docs/README.md`.
Once verified, create a PR against the main branch with these documentation changes. After merging, the docs will be
deployed to https://datafusion.apache.org/comet/ by the documentation publishing workflow.
Note that the download links in the installation guide will not work until the release is finalized, but having the
documentation available could be useful for anyone testing out the release candidate during the voting period.
## Publishing the Release Candidate
This part of the process can mostly only be performed by a PMC member.
### Publish the maven artifacts
#### Setup maven
##### One time project setup
Setting up your project in the ASF Nexus Repository from here: https://infra.apache.org/publishing-maven-artifacts.html
##### Release Manager Setup
Set up your development environment from here: https://infra.apache.org/publishing-maven-artifacts.html
##### Build and publish a release candidate to nexus.
The script `publish-to-maven.sh` will publish the artifacts created by the `build-release-comet.sh` script.
The artifacts will be signed using the gpg key of the release manager and uploaded to the maven staging repository.
Note that installed GPG keys can be listed with `gpg --list-keys`. The gpg key is a 40 character hex string.
Note: This script needs `xmllint` to be installed. On macOS xmllint is available by default.
On Ubuntu `apt-get install -y libxml2-utils`
On RedHat `yum install -y xmlstarlet`
```shell
./dev/release/publish-to-maven.sh -h
usage: publish-to-maven.sh options
Publish signed artifacts to Maven.
Options
-u ASF_USERNAME - Username of ASF committer account
-r LOCAL_REPO - path to temporary local maven repo (created and written to by 'build-release-comet.sh')
The following will be prompted for -
ASF_PASSWORD - Password of ASF committer account
GPG_KEY - GPG key used to sign release artifacts
GPG_PASSPHRASE - Passphrase for GPG key
```
example
```shell
./dev/release/publish-to-maven.sh -u release_manager_asf_id -r /tmp/comet-staging-repo-VsYOX
ASF Password :
GPG Key (Optional):
GPG Passphrase :
Creating Nexus staging repository
...
```
In the Nexus repository UI (https://repository.apache.org/) locate and verify the artifacts in
staging (https://central.sonatype.org/publish/release/#locate-and-examine-your-staging-repository).
The script closes the staging repository but does not release it. Releasing to Maven Central is a manual step
performed only after the vote passes (see [Publishing Maven Artifacts](#publishing-maven-artifacts) below).
Note that the Maven artifacts are always published under the final release version (e.g. `0.13.0`), not the RC
version — the `-rc1` / `-rc2` suffix only appears in the git tag and the source tarball in SVN. Because the script
creates a new staging repository on each run, re-staging the same version for a subsequent RC is supported as long
as no staging repository for that version has been released to Maven Central.
### Create the Release Candidate Tarball
The `create-tarball.sh` script creates a signed source tarball and uploads it to the dev subversion repository.
#### Prerequisites
Before running this script, ensure you have:
1. A GPG key set up for signing, with your public key uploaded to https://pgp.mit.edu/
2. Apache SVN credentials (you must be logged into the Apache SVN server)
3. The `requests` Python package installed (`pip3 install requests`)
#### Run the script
Run the create-tarball script on the release candidate tag (`0.13.0-rc1`):
```shell
./dev/release/create-tarball.sh 0.13.0 1
```
This will generate an email template for starting the vote.
### Start an Email Voting Thread
Send the email that is generated in the previous step to `dev@datafusion.apache.org`.
The verification procedure for voters is documented in
[Verifying Release Candidates](https://github.com/apache/datafusion-comet/blob/main/dev/release/verifying-release-candidates.md).
Voters can also use the `dev/release/verify-release-candidate.sh` script to assist with verification:
```shell
./dev/release/verify-release-candidate.sh 0.13.0 1
```
### If the Vote Fails
If the vote does not pass, address the issues raised, increment the release candidate number, and repeat from
the [Tag the Release Candidate](#tag-the-release-candidate) step. For example, the next attempt would be tagged
`0.13.0-rc2`.
Before staging the next RC, drop the previous RC's staging repository in the
[Nexus UI](https://repository.apache.org/#stagingRepositories) by selecting it and clicking "Drop". This avoids
leaving multiple closed staging repositories for the same version and prevents accidentally releasing the wrong
one when the vote eventually passes. The Maven version (e.g. `0.13.0`) is shared across all RCs, so each run of
`publish-to-maven.sh` creates a new staging repository for the same GAV — only one of them should ever be
released to Maven Central.
## Publishing Binary Releases
Once the vote passes, we can publish the source and binary releases.
### Publishing Source Tarball
Run the release-tarball script to move the tarball to the release subversion repository.
```shell
./dev/release/release-tarball.sh 0.13.0 1
```
### Create a release in the GitHub repository
Go to https://github.com/apache/datafusion-comet/releases and create a release for the release tag, and paste the
changelog in the description.
### Publishing Maven Artifacts
Promote the Maven artifacts from staging to production by visiting https://repository.apache.org/#stagingRepositories
and selecting the staging repository and then clicking the "release" button.
### Push a release tag to the repo
Push a release tag (`0.13.0`) to the `apache` repository.
```shell
git fetch apache
git checkout 0.13.0-rc1
git tag 0.13.0
git push apache 0.13.0
```
Note that pushing a release tag will trigger a GitHub workflow that will build a Docker image and publish
it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet
Reply to the vote thread to close the vote and announce the release. The announcement email should include:
- The release version
- A link to the release notes / changelog
- A link to the download page or Maven coordinates
- Thanks to everyone who contributed and voted
## Post Release
### Register the release
Register the release with the [Apache Reporter Service](https://reporter.apache.org/addrelease.html?datafusion) using
a version such as `COMET-0.13.0`.
### Delete old RCs and Releases
See the ASF documentation on [when to archive](https://www.apache.org/legal/release-policy.html#when-to-archive)
for more information.
#### Deleting old release candidates from `dev` svn
Release candidates should be deleted once the release is published.
Get a list of DataFusion Comet release candidates:
```shell
svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep comet
```
Delete a release candidate:
```shell
svn delete -m "delete old DataFusion Comet RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-comet-0.13.0-rc1/
```
#### Deleting old releases from `release` svn
Only the latest release should be available. Delete old releases after publishing the new release.
Get a list of DataFusion releases:
```shell
svn ls https://dist.apache.org/repos/dist/release/datafusion | grep comet
```
Delete a release:
```shell
svn delete -m "delete old DataFusion Comet release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-0.12.0
```
### Write a blog post
Writing a blog post about the release is a great way to generate more interest in the project. We typically create a
Google document where the community can collaborate on a blog post. Once the content is agreed then a PR can be
created against the [datafusion-site](https://github.com/apache/datafusion-site) repository to add the blog post. Any
contributor can drive this process.