Release process for Apache Arrow nanoarrow

Verifying a nanoarrow release candidate

Release candidates for nanoarrow are uploaded to https://dist.apache.org/repos/dist/dev/arrow/ prior to a release vote being called on the Apache Arrow developer mailing list. A script (verify-release-candidate.sh) is provided to verify such a release candidate. For example, to verify nanoarrow 0.7.0-rc0, one could run:

git clone https://github.com/apache/arrow-nanoarrow.git arrow-nanoarrow
cd arrow-nanoarrow/dev/release
./verify-release-candidate.sh 0.7.0 0

The verification script itself is written in bash and requires the curl, gpg, and shasum/sha512sum commands. These are typically available from a package manager except on Windows (see below). CMake, Python (>=3.8), and a C/C++ compiler are required to verify the C libraries; Python (>=3.8) is required to verify the Python bindings; and R (>= 4.0) is required to verify the R bindings. See below for platform-specific direction for how to obtain verification dependencies.

To run only C library verification (requires CMake and Arrow C++ but not R or Python): Options are passed to the verification script using environment variables. For example, to run only C library verification (requires CMake and Python but not R):

To run only C library verification (requires CMake but not R or Python):

TEST_DEFAULT=0 TEST_C=1 TEST_C_BUNDLED=1 ./verify-release-candidate.sh 0.7.0 0

To run only R package verification (requires R but not CMake or Arrow C++):

TEST_DEFAULT=0 TEST_R=1 ./verify-release-candidate.sh 0.7.0 0

To run only Python verification (requires Python but not CMake or Arrow C++):

TEST_DEFAULT=0 TEST_PYTHON=1 ./verify-release-candidate.sh 0.7.0 0

MacOS

On MacOS you can install a modern C/C++ toolchain via the XCode Command Line Tools (i.e., xcode-select --install). Other dependencies are available via Homebrew:

brew install cmake gnupg

You can install R using the instructions provided on the R Project Download page; the system python3 provided by MacOS is sufficient to verify the release candidate.

For older MacOS or MacOS without Homebrew, you can install GnuPG and install CMake separately.

Conda (Linux and MacOS)

Using conda, one can install all requirements needed for verification on Linux or MacOS. Users are recommended to install gnupg using a system installer because of interactions with other installations that may cause a crash.

conda create --name nanoarrow-verify-rc
conda activate nanoarrow-verify-rc
conda config --set channel_priority strict

conda install -c conda-forge compilers git cmake
# For R (see below about potential interactions with system R
# before installing via conda on MacOS)
conda install -c conda-forge r-testthat r-hms r-blob r-pkgbuild r-bit64

Note that using conda-provided R when there is also a system install of R on MacOS is unlikely to work.

Windows

On Windows, prerequisites can be installed using officially provided installers: Visual Studio, CMake, and Git should provide the prerequisties to verify the C library; R and Rtools can be installed using the official R-project installer.

# Pass location of R to the verification script
export NANOARROW_CMAKE_OPTIONS="-Dgtest_force_shared_crt=ON"
export R_HOME="/c/Program Files/R/R-4.5.0"

Unfortunately verifying Python via the release verification script on Windows may not work in some shells, thus successful verification may require TEST_PYTHON=0 ./verify-release-candidate.sh.

Debian/Ubuntu

On Debian/Ubuntu (e.g., docker run --rm -it ubuntu:latest) you can install prerequisites using apt.

apt-get update && apt-get install -y git g++ cmake r-base gnupg curl python3-dev python3-venv

If you have never installed an R package before, R verification will fail when it tries to install any missing dependencies. Because of how R is configured by default, you must install your first package in an interactive session and select yes when it asks if you would like to create a user-specific directory.

Fedora

On recent Fedora (e.g., docker run --rm -it fedora:latest), you can install all prerequisites using dnf:

dnf install -y git cmake R gnupg curl python3-devel python3-virtualenv awk

Arch Linux

On Arch Linux (e.g., docker run --rm -it archlinux:latest, you can install all prerequisites using pacman):

pacman -Sy git gcc make cmake r-base gnupg curl python

Alpine Linux

On Alpine Linux (e.g., docker run --rm -it alpine:latest), all prerequisites are available using apk add.


apk add bash linux-headers git cmake R R-dev g++ gnupg curl python3-dev

Big endian

One can verify a nanoarrow release candidate on big endian by setting DOCKER_DEFAULT_PLATFORM=linux/s390x and following the instructions for Alpine Linux, Fedora, or Debian/Ubuntu.

Creating a release candidate

The first step to creating a nanoarrow release is to create a maint-VERSION branch (e.g., usethis::pr_init("maint-0.7.0")) and push the branch to upstream. This is a good opportunity to run though the above instructions to make sure the verification script and instructions are up-to-date. targeting the maint-XX branch that was just pushed.

This is a good time to run other final checks such as:

Run through some R packaging release checks (e.g., urlchecker, winbuilder)
Manually dispatch the Verification workflow.
Manually dispatch the Python wheels workflow.
Create a draft PR into WrapDB to make sure tests pass in their CI
Create a draft PR into vcpkg to make sure tests pass in their CI
Draft a release blog post and make a draft PR into arrow-site.
Review nanoarrow dev documentation for obvious holes/typos.

When these steps are complete, run 01-prepare.R:

# from the repository root
# 01-prepare.sh <nanoarrow-dir> <prev_veresion> <version> <next_version> <rc-num>
dev/release/01-prepare.sh . 0.6.0 0.7.0 0.8.0 0

A currently not automated part of this workflow is updating the R NEWS.md from the changelog: any changelog items for the r/... component can be copied to the R NEWS.md file. This is not essential (i.e., it does not affect the ability to release the R package).

This will update version numbers, the changelong, and create the git tag apache-arrow-nanoarrow-0.7.0-rc0. Check to make sure that the changelog and versions are what you expect them to be before pushing the tag (you may wish to do this by opening a dummy PR to run CI and look at the diff from the main branch).

When you are satisfied that the code at this tag is release-candidate worthy, git push the tag to the upstream repository (or whatever your remote name is for the apache/arrow-nanoarrow repo). This will kick off a packaging workflow that will create a GitHub release and upload assets that are required for later steps. This step can be done by any Arrow committer.

Next, all assets need to be signed by somebody whose GPG key is listed in the Arrow developers KEYS file by calling 02-sign.sh The caller of the script does not need to be on any particular branch to call the script but does need the dev/release/.env file to exist setting the appropriate GPG_KEY_ID environment variable.

# 02-sign.sh <version> <rc-num>
dev/release/02-sign.sh 0.7.0 0

Finally, run 03-source.sh. This step can be done by any Arrow committer. The caller of this script does not need to be on any particular branch but does need the dev/release/.env file to exist setting the appropriate APACHE_USERNAME environment variable.

# 03-source.sh $0 <version> <rc-num>
dev/release/03-source.sh 0.7.0 0

You should check that the release verification runs locally and/or start a Verification workflow and wait for it to complete.

At this point the release candidate is suitable for a vote on the Apache Arrow developer mailing list.

[VOTE] Release nanoarrow 0.7.0

Hello,

I would like to propose the following release candidate (rc0) of Apache Arrow nanoarrow [0] version 0.7.0. This is an initial release consisting of 44 resolved GitHub issues from 5 contributors [1].

This release candidate is based on commit: {rc_commit} [2]

The source release rc0 is hosted at [3].
The changelog is located at [4].

Please download, verify checksums and signatures, run the unit tests, and vote on the release. See [5] for how to validate a release candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow nanoarrow 0.7.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow nanoarrow 0.7.0 because...

[0] https://github.com/apache/arrow-nanoarrow
[1] https://github.com/apache/arrow-nanoarrow/milestone/4?closed=1
[2] https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.7.0-rc0
[3] https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.7.0-rc0/
[4] https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.7.0-rc0/CHANGELOG.md
[5] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md

Post-release

After a passing release vote, the following tasks must be completed:

[ ] Closed GitHub milestone
[ ] Added release to the Apache Reporter System
[ ] Uploaded artifacts to Subversion
[ ] Created GitHub release
[ ] Submit R package to CRAN
[ ] Submit Python package to PyPI
[ ] Update Python package on conda-forge
[ ] Update the WrapDB entry
[ ] Update release documentation
[ ] Release blog post at https://github.com/apache/arrow-site/pull/288
[ ] Sent announcement to announce@apache.org
[ ] Removed old artifacts from SVN
[ ] Bumped versions on main

Close GitHub milestone

Find the appropriate entry in https://github.com/apache/arrow-nanoarrow/milestones/ and mark it as closed.

Add release to the Apache Reporter System

The reporter system for Arrow can be found at https://reporter.apache.org/addrelease.html?arrow. To add a release, a PMC member must log in with their Apache username/password. The release names are in the form NANOARROW-0.7.0.

Upload artifacts to Subversion / Create GitHub Release

These are both handled by post-01-upload.sh. This script must be run by a PMC member whose APACHE_USERNAME environment variable has been set in .env.

dev/release/post-01-upload.sh 0.7.0 0

Submit R package to CRAN

The R package can be updated directly from the release branch if no updates are required to pass CRAN checks. Because most updates are just updates of the underlying C library, most updates should not required special updates just to submit to CRAN. It is usually a good idea to check the package URLs and run a WinBuilder check before submitting, just to be sure. These can be run with:

urlchecker::url_check()
devtools::check_win_devel()

If there are no NOTEs, WARNINGs, or ERRORs on the winbuilder results (emailed to the package maintainer), the package can be submitted to CRAN with:

devtools::submit_cran()

If changes are required, create a branch called r-cran-maint-0.7.-0, make any changes required, and resubmit to CRAN after bumping the “tweak” version (e.g., Version: 0.7.0.1 in the DESCRIPTION). Ensure those changes are also reflected in the main branch after submission is successful.

Submit Python package to PyPI

The Python package source distribution and wheels are built using the Build Python Wheels action on the maint-0.7.0 branch after cutting the release candidate.

To submit these to PyPI, download all assets from the run into a folder (e.g., python/dist) and run twine upload:

# pip install twine

# Clear the dist directory
rm -rf python/dist
mkdir python/dist

# Download assets from the latest `maint-x.x.x` branch run,
# remove the pyodide wheels (which will be rejected by PyPI)
pushd python/dist
gh run download 15963020465
rm -rf release-wheels-pyodide
popd

twine upload python/dist/**/*.tar.gz python/dist/**/*.whl

You will need to enter a token with “upload packages” permissions for the nanoarrow PyPI project.

Update Python package on conda-forge

The conda-forge feedstock is updated automatically by a conda-forge bot after the source distribution has been uploaded to PyPI (typically this takes several hours). This will also start a CI run to ensure that the updated version will build on PyPI.

Update the WrapDB Entry

The nanoarrow C library is available for users of the Meson build system via WrapDB. When a new release is added, PR into the WrapDB repository is required to make the new version available to users. See https://github.com/mesonbuild/wrapdb/pull/1536 for a template PR. It is also a good idea to do this step before the release candidate is cut to catch packaging issues before finalizing the content of the version.

Update the vcpkg Entry

The nanoarrow C library is available on vcpkg. When a new release is added, PR into the vcpkg repository to make the new version available to users. See https://github.com/microsoft/vcpkg/pull/46029 for a template PR. It is a good idea to do this step before merging a release to catch packaging issues before finalizing the content of the version.

Update release documentation

The nanoarrow documentation is populated from the asf-site branch of this repository. To update the documentation, first clone just the asf-site branch:

git clone -b asf-site --single-branch https://github.com/apache/arrow-nanoarrow.git
cd arrow-nanoarrow

Download the 0.7.0 documentation:

curl -L https://github.com/apache/arrow-nanoarrow/releases/download/apache-arrow-nanoarrow-0.7.0/docs.tgz \
  -o docs.tgz

Extract the documentation and rename the directory to 0.7.0:

tar -xvzf docs.tgz
mv nanoarrow-docs 0.7.0

Then remove the existing latest directory and run the extraction again, renaming to latest instead:

rm -rf latest
tar -xvzf docs.tgz
mv nanoarrow-docs latest

Finally, update switcher.json with entries pointing /latest/ and /0.7.0/ to "version": "0.7.0":

[
    {
        "version": "dev",
        "url": "https://arrow.apache.org/nanoarrow/main/"
    },
    {
        "version": "0.7.0",
        "url": "https://arrow.apache.org/nanoarrow/latest/"
    },
    {
        "version": "0.7.0",
        "url": "https://arrow.apache.org/nanoarrow/0.7.0/"
    },
    {
        "version": "0.6.0",
        "url": "https://arrow.apache.org/nanoarrow/0.6.0/"
    },
    ...
]

This can/should be automated for future releases.

Release blog post

Final review + merge of the blog post that was drafted prior to preparation of the release candidate.

Send announcement

This email should be sent to announce@apache.org and dev@arrow.apache.org. It must be sent from your Apache email address and must be sent through the mail-relay.apache.org outgoing server. It also must be in plain text or it will be rejected by the announce mailing list.

Email template:

[ANNOUNCE] Apache Arrow nanoarrow 0.7.0 Released

The Apache Arrow community is pleased to announce the 0.7.0 release of
Apache Arrow nanoarrow. This initial release covers 79 resolved issues
from 9 contributors[1].

The release is available now from [2], release notes are available at
[3], and a blog post highlighting new features and breaking changes is
available at [4].

What is Apache Arrow?
---------------------
Apache Arrow is a columnar in-memory analytics layer designed to
accelerate big data. It houses a set of canonical in-memory
representations of flat and hierarchical data along with multiple
language-bindings for structure manipulation. It also provides
low-overhead streaming and batch messaging, zero-copy interprocess
communication (IPC), and vectorized in-memory analytics libraries.
Languages currently supported include C, C++, C#, Go, Java,
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow nanoarrow?
--------------------------
Apache Arrow nanoarrow is a C library for building and interpreting
Arrow C Data interface structures with bindings for users of R and
Python. The vision of nanoarrow is that it should be trivial for a
library or application to implement an Arrow-based interface. The
library provides helpers to create types, schemas, and metadata, an
API for building arrays element-wise,
and an API to extract elements element-wise from an array. For a more
detailed description of the features nanoarrow provides and motivation
for its development, see [5].

Please report any feedback to the mailing lists ([6], [7]).

Regards,
The Apache Arrow Community

[1] https://github.com/apache/arrow-nanoarrow/issues?q=milestone%3A%22nanoarrow+0.7.0%22+is%3Aclosed
[2] https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-nanoarrow-0.7.0
[3] https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.7.0/CHANGELOG.md
[4] https://arrow.apache.org/blog/2024/05/27/nanoarrow-0.7.0-release/
[5] https://arrow.apache.org/nanoarrow/
[6] https://lists.apache.org/list.html?user@arrow.apache.org
[7] https://lists.apache.org/list.html?dev@arrow.apache.org

Remove old artifacts from SVN

These artifacts include any release candidates that were uploaded to https://dist.apache.org/repos/dist/dev/arrow/ and old releases that were upload to https://dist.apache.org/repos/dist/release/arrow/. You can remove them using:

dev/release/post-02-remove-old-artifacts.sh

Bumped versions on main

This is handled by post-03-bump-versions.sh. Create a branch and then run:

dev/release/post-03-bump-versions.sh . 0.7.0 0.8.0

A currently not automated part of this workflow is also porting the R NEWS.md updates that occurred when updating the changelog. This is not essential but makes the next R NEWS.md update for the next release make a bit more sense.

After this PR is merged, create the dev tag that is used to generate the changelog:

git tag -a apache-arrow-nanoarrow-0.8.0.dev -m "tag dev 0.8.0"
git push upstream apache-arrow-nanoarrow-0.8.0.dev