| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Apache DataFusion Comet: Release Process |
| |
| This documentation explains the release process for Apache DataFusion Comet. Some preparation tasks can be |
| performed by any contributor, while certain release tasks can only be performed by a DataFusion Project Management |
| Committee (PMC) member. |
| |
| ## Checklist |
| |
| The following is a quick-reference checklist for the full release process. See the detailed sections below for |
| instructions on each step. |
| |
| - [ ] Release preparation: review expression support status and user guide |
| - [ ] Create release branch |
| - [ ] Generate release documentation |
| - [ ] Update Maven version in release branch |
| - [ ] Update version in main for next development cycle |
| - [ ] Generate the change log and create PR against main |
| - [ ] Cherry-pick the change log commit into the release branch |
| - [ ] Build the jars |
| - [ ] Tag the release candidate |
| - [ ] Update documentation for the new release |
| - [ ] Publish Maven artifacts to staging |
| - [ ] Create the release candidate tarball |
| - [ ] Start the email voting thread |
| - [ ] Once the vote passes: |
| - [ ] Publish source tarball |
| - [ ] Create GitHub release |
| - [ ] Promote Maven artifacts to production |
| - [ ] Push the release tag |
| - [ ] Close the vote and announce the release |
| - [ ] Post release: |
| - [ ] Register the release with Apache Reporter |
| - [ ] Delete old RCs and releases from SVN |
| - [ ] Write a blog post |
| |
| ## Release Preparation |
| |
| Before starting the release process, review the user guide to ensure it accurately reflects the current state of the |
| project: |
| |
| - Review the supported expressions and operators lists in the user guide. Verify that any expressions added since |
| the last release are included and that their support status is accurate. |
| - Spot-check the support status of individual expressions by running tests or queries to confirm they work as |
| documented. |
| - Look for any expressions that may have regressed or changed behavior since the last release and update the |
| documentation accordingly. |
| |
| It is also recommended to run benchmarks (such as TPC-H and TPC-DS) comparing performance against the previous |
| release to check for regressions. See the |
| [Comet Benchmarking Guide](benchmarking.md) for instructions. |
| |
| These are tasks where agentic coding tools can be particularly helpful — for example, scanning the codebase for |
| newly registered expressions and cross-referencing them against the documented list, or generating test queries to |
| verify expression support status. |
| |
| Any issues found should be addressed before creating the release branch. |
| |
| ## Creating the Release Candidate |
| |
| This part of the process can be performed by any committer. |
| |
| Here are the steps, using the 0.13.0 release as an example. |
| |
| ### Create Release Branch |
| |
| This document assumes that GitHub remotes are set up as follows: |
| |
| ```shell |
| $ git remote -v |
| apache git@github.com:apache/datafusion-comet.git (fetch) |
| apache git@github.com:apache/datafusion-comet.git (push) |
| origin git@github.com:yourgithubid/datafusion-comet.git (fetch) |
| origin git@github.com:yourgithubid/datafusion-comet.git (push) |
| ``` |
| |
| Create a release branch from the latest commit in main and push to the `apache` repo: |
| |
| ```shell |
| git fetch apache |
| git checkout main |
| git reset --hard apache/main |
| git checkout -b branch-0.13 |
| git push apache branch-0.13 |
| ``` |
| |
| ### Generate Release Documentation |
| |
| Generate the documentation content for this release. The docs on `main` contain only template markers, |
| so we need to generate the actual content (config tables, compatibility matrices) for the release branch: |
| |
| ```shell |
| ./dev/generate-release-docs.sh |
| git add docs/source/user-guide/latest/ |
| git commit -m "Generate docs for 0.13.0 release" |
| git push apache branch-0.13 |
| ``` |
| |
| This freezes the documentation to reflect the configs and expressions available in this release. |
| |
| ### Update Maven Version |
| |
| Update the `pom.xml` files in the release branch to update the Maven version from `0.13.0-SNAPSHOT` to `0.13.0`. |
| |
| There is no need to update the Rust crate versions because they will already be `0.13.0`. |
| |
| ### Update Version in main |
| |
| Create a PR against the main branch to prepare for developing the next release: |
| |
| - Update the Rust crate version to `0.14.0`. |
| - Update the Maven version to `0.14.0-SNAPSHOT` (both in the `pom.xml` files and also in the diff files |
| under `dev/diffs`). |
| |
| ### Generate the Change Log |
| |
| Generate a change log to cover changes between the previous release and the release branch HEAD by running |
| the provided `dev/release/generate-changelog.py`. |
| |
| It is recommended that you set up a virtual Python environment and then install the dependencies: |
| |
| ```shell |
| cd dev/release |
| python3 -m venv venv |
| source venv/bin/activate |
| pip3 install -r requirements.txt |
| ``` |
| |
| To generate the changelog, set the `GITHUB_TOKEN` environment variable to a valid token and then run the script |
| providing two commit ids or tags followed by the version number of the release being created. The following |
| example generates a change log of all changes between the previous version and the current release branch HEAD revision. |
| |
| ```shell |
| export GITHUB_TOKEN=<your-token-here> |
| python3 generate-changelog.py 0.12.0 HEAD 0.13.0 > ../changelog/0.13.0.md |
| ``` |
| |
| Create a PR against the _main_ branch to add this change log and once this is approved and merged, cherry-pick the |
| commit into the release branch. |
| |
| ### Build the jars |
| |
| #### A note on workspace cleanliness |
| |
| The `common/pom.xml` resource configuration unconditionally bundles |
| `native/target/{x86_64,aarch64}-apple-darwin/release/libcomet.dylib` into the |
| `common` jar when those files exist on disk. Maven's `clean` removes |
| `common/target` but does not touch Cargo's `native/target` directory, so a |
| stale dylib left over from a prior local `make release` or `make release-linux` |
| on the release manager's workstation can silently end up in a release jar |
| (see [#2232](https://github.com/apache/datafusion-comet/issues/2232) for the |
| incident in 0.9.1). |
| |
| The `build-release-comet.sh` script now runs `cargo clean` for you, but as a |
| defensive measure, prefer running the release build from a fresh clone of the |
| repository rather than your day-to-day working tree. |
| |
| #### Setup to do the build |
| |
| The build process requires Docker. Download the latest Docker Desktop from https://www.docker.com/products/docker-desktop/. |
| If you have multiple docker contexts running switch to the context of the Docker Desktop. For example - |
| |
| ```shell |
| $ docker context ls |
| NAME DESCRIPTION DOCKER ENDPOINT ERROR |
| default Current DOCKER_HOST based configuration unix:///var/run/docker.sock |
| desktop-linux Docker Desktop unix:///Users/parth/.docker/run/docker.sock |
| my_custom_context * tcp://192.168.64.2:2376 |
| |
| $ docker context use desktop-linux |
| ``` |
| |
| #### Run the build script |
| |
| The `build-release-comet.sh` script will create a docker image for each architecture and use the image |
| to build the platform specific binaries. These builder images are created every time this script is run. |
| The script optionally allows overriding of the repository and branch to build the binaries from (Note that |
| the local git repo is not used in the building of the binaries, but it is used to build the final uber jar). |
| |
| ```shell |
| Usage: build-release-comet.sh [options] |
| |
| This script builds comet native binaries inside a docker image. The image is named |
| "comet-rm" and will be generated by this script |
| |
| Options are: |
| |
| -r [repo] : git repo (default: https://github.com/apache/datafusion-comet.git) |
| -b [branch] : git branch (default: release) |
| -t [tag] : tag for the spark-rm docker image to use for building (default: "latest"). |
| ``` |
| |
| Example: |
| |
| ```shell |
| cd dev/release && ./build-release-comet.sh && cd ../.. |
| ``` |
| |
| #### Build output |
| |
| The build output is installed to a temporary local maven repository. The build script will print the name of the |
| repository location at the end. This location will be required at the time of deploying the artifacts to a staging |
| repository |
| |
| ### Tag the Release Candidate |
| |
| Ensure that the Maven version update and changelog cherry-pick have been pushed to the release branch before tagging. |
| |
| Tag the release branch with `0.13.0-rc1` and push to the `apache` repo |
| |
| ```shell |
| git fetch apache |
| git checkout branch-0.13 |
| git reset --hard apache/branch-0.13 |
| git tag 0.13.0-rc1 |
| git push apache 0.13.0-rc1 |
| ``` |
| |
| Note that pushing a release candidate tag will trigger a GitHub workflow that will build a Docker image and publish |
| it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet |
| |
| ### Publishing Documentation |
| |
| In `docs` directory: |
| |
| - Update `docs/source/index.rst` and add a new navigation menu link for the new release in the section `_toc.user-guide-links-versioned` |
| - Add a new line to `build.sh` to delete the locally cloned `comet-*` branch for the new release e.g. `comet-0.13` |
| - Update the main method in `generate-versions.py`: |
| |
| ```python |
| latest_released_version = "0.13.0" |
| previous_versions = ["0.11.0", "0.12.0"] |
| ``` |
| |
| Test the documentation build locally, following the instructions in `docs/README.md`. |
| |
| Once verified, create a PR against the main branch with these documentation changes. After merging, the docs will be |
| deployed to https://datafusion.apache.org/comet/ by the documentation publishing workflow. |
| |
| Note that the download links in the installation guide will not work until the release is finalized, but having the |
| documentation available could be useful for anyone testing out the release candidate during the voting period. |
| |
| ## Publishing the Release Candidate |
| |
| This part of the process can mostly only be performed by a PMC member. |
| |
| ### Publish the maven artifacts |
| |
| #### Setup maven |
| |
| ##### One time project setup |
| |
| Setting up your project in the ASF Nexus Repository from here: https://infra.apache.org/publishing-maven-artifacts.html |
| |
| ##### Release Manager Setup |
| |
| Set up your development environment from here: https://infra.apache.org/publishing-maven-artifacts.html |
| |
| ##### Build and publish a release candidate to nexus. |
| |
| The script `publish-to-maven.sh` will publish the artifacts created by the `build-release-comet.sh` script. |
| The artifacts will be signed using the gpg key of the release manager and uploaded to the maven staging repository. |
| |
| Note that installed GPG keys can be listed with `gpg --list-keys`. The gpg key is a 40 character hex string. |
| |
| Note: This script needs `xmllint` to be installed. On macOS xmllint is available by default. |
| |
| On Ubuntu `apt-get install -y libxml2-utils` |
| |
| On RedHat `yum install -y xmlstarlet` |
| |
| ```shell |
| ./dev/release/publish-to-maven.sh -h |
| usage: publish-to-maven.sh options |
| |
| Publish signed artifacts to Maven. |
| |
| Options |
| -u ASF_USERNAME - Username of ASF committer account |
| -r LOCAL_REPO - path to temporary local maven repo (created and written to by 'build-release-comet.sh') |
| |
| The following will be prompted for - |
| ASF_PASSWORD - Password of ASF committer account |
| GPG_KEY - GPG key used to sign release artifacts |
| GPG_PASSPHRASE - Passphrase for GPG key |
| ``` |
| |
| example |
| |
| ```shell |
| ./dev/release/publish-to-maven.sh -u release_manager_asf_id -r /tmp/comet-staging-repo-VsYOX |
| ASF Password : |
| GPG Key (Optional): |
| GPG Passphrase : |
| Creating Nexus staging repository |
| ... |
| ``` |
| |
| In the Nexus repository UI (https://repository.apache.org/) locate and verify the artifacts in |
| staging (https://central.sonatype.org/publish/release/#locate-and-examine-your-staging-repository). |
| |
| The script closes the staging repository but does not release it. Releasing to Maven Central is a manual step |
| performed only after the vote passes (see [Publishing Maven Artifacts](#publishing-maven-artifacts) below). |
| |
| Note that the Maven artifacts are always published under the final release version (e.g. `0.13.0`), not the RC |
| version — the `-rc1` / `-rc2` suffix only appears in the git tag and the source tarball in SVN. Because the script |
| creates a new staging repository on each run, re-staging the same version for a subsequent RC is supported as long |
| as no staging repository for that version has been released to Maven Central. |
| |
| ### Create the Release Candidate Tarball |
| |
| The `create-tarball.sh` script creates a signed source tarball and uploads it to the dev subversion repository. |
| |
| #### Prerequisites |
| |
| Before running this script, ensure you have: |
| |
| 1. A GPG key set up for signing, with your public key uploaded to https://pgp.mit.edu/ |
| 2. Apache SVN credentials (you must be logged into the Apache SVN server) |
| 3. The `requests` Python package installed (`pip3 install requests`) |
| |
| #### Run the script |
| |
| Run the create-tarball script on the release candidate tag (`0.13.0-rc1`): |
| |
| ```shell |
| ./dev/release/create-tarball.sh 0.13.0 1 |
| ``` |
| |
| This will generate an email template for starting the vote. |
| |
| ### Start an Email Voting Thread |
| |
| Send the email that is generated in the previous step to `dev@datafusion.apache.org`. |
| |
| The verification procedure for voters is documented in |
| [Verifying Release Candidates](https://github.com/apache/datafusion-comet/blob/main/dev/release/verifying-release-candidates.md). |
| Voters can also use the `dev/release/verify-release-candidate.sh` script to assist with verification: |
| |
| ```shell |
| ./dev/release/verify-release-candidate.sh 0.13.0 1 |
| ``` |
| |
| ### If the Vote Fails |
| |
| If the vote does not pass, address the issues raised, increment the release candidate number, and repeat from |
| the [Tag the Release Candidate](#tag-the-release-candidate) step. For example, the next attempt would be tagged |
| `0.13.0-rc2`. |
| |
| Before staging the next RC, drop the previous RC's staging repository in the |
| [Nexus UI](https://repository.apache.org/#stagingRepositories) by selecting it and clicking "Drop". This avoids |
| leaving multiple closed staging repositories for the same version and prevents accidentally releasing the wrong |
| one when the vote eventually passes. The Maven version (e.g. `0.13.0`) is shared across all RCs, so each run of |
| `publish-to-maven.sh` creates a new staging repository for the same GAV — only one of them should ever be |
| released to Maven Central. |
| |
| ## Publishing Binary Releases |
| |
| Once the vote passes, we can publish the source and binary releases. |
| |
| ### Publishing Source Tarball |
| |
| Run the release-tarball script to move the tarball to the release subversion repository. |
| |
| ```shell |
| ./dev/release/release-tarball.sh 0.13.0 1 |
| ``` |
| |
| ### Create a release in the GitHub repository |
| |
| Go to https://github.com/apache/datafusion-comet/releases and create a release for the release tag, and paste the |
| changelog in the description. |
| |
| ### Publishing Maven Artifacts |
| |
| Promote the Maven artifacts from staging to production by visiting https://repository.apache.org/#stagingRepositories |
| and selecting the staging repository and then clicking the "release" button. |
| |
| ### Push a release tag to the repo |
| |
| Push a release tag (`0.13.0`) to the `apache` repository. |
| |
| ```shell |
| git fetch apache |
| git checkout 0.13.0-rc1 |
| git tag 0.13.0 |
| git push apache 0.13.0 |
| ``` |
| |
| Note that pushing a release tag will trigger a GitHub workflow that will build a Docker image and publish |
| it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet |
| |
| Reply to the vote thread to close the vote and announce the release. The announcement email should include: |
| |
| - The release version |
| - A link to the release notes / changelog |
| - A link to the download page or Maven coordinates |
| - Thanks to everyone who contributed and voted |
| |
| ## Post Release |
| |
| ### Register the release |
| |
| Register the release with the [Apache Reporter Service](https://reporter.apache.org/addrelease.html?datafusion) using |
| a version such as `COMET-0.13.0`. |
| |
| ### Delete old RCs and Releases |
| |
| See the ASF documentation on [when to archive](https://www.apache.org/legal/release-policy.html#when-to-archive) |
| for more information. |
| |
| #### Deleting old release candidates from `dev` svn |
| |
| Release candidates should be deleted once the release is published. |
| |
| Get a list of DataFusion Comet release candidates: |
| |
| ```shell |
| svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep comet |
| ``` |
| |
| Delete a release candidate: |
| |
| ```shell |
| svn delete -m "delete old DataFusion Comet RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-comet-0.13.0-rc1/ |
| ``` |
| |
| #### Deleting old releases from `release` svn |
| |
| Only the latest release should be available. Delete old releases after publishing the new release. |
| |
| Get a list of DataFusion releases: |
| |
| ```shell |
| svn ls https://dist.apache.org/repos/dist/release/datafusion | grep comet |
| ``` |
| |
| Delete a release: |
| |
| ```shell |
| svn delete -m "delete old DataFusion Comet release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-0.12.0 |
| ``` |
| |
| ### Write a blog post |
| |
| Writing a blog post about the release is a great way to generate more interest in the project. We typically create a |
| Google document where the community can collaborate on a blog post. Once the content is agreed then a PR can be |
| created against the [datafusion-site](https://github.com/apache/datafusion-site) repository to add the blog post. Any |
| contributor can drive this process. |