layout: post title: “Apache Arrow nanoarrow 0.4.0 Release” date: “2023-01-29 00:00:00” author: pmc categories: [release]

The Apache Arrow team is pleased to announce the 0.4.0 release of Apache Arrow nanoarrow. This release covers 46 resolved issues from 5 contributors.

Release Highlights

The primary focus of the nanoarrow 0.4.0 release was testing, stability, and code quality. Notably, an implementation of the C data interface integration test protocol was added to ensure data produced or consumed by nanoarrow can be consumed or produced by other Arrow implementations.

See the Changelog for a detailed list of contributions to this release.

Breaking Changes

Changes included in the nanoarrow 0.4.0 release will not break most downstream code; however, several changes in the C library may result in additional compiler warnings that could cause downstream build failures for projects with strict compiler warning policies.

First, in debug mode (i.e., when NANOARROW_DEBUG is defined), an ignored return value for functions that return ArrowErrorCode now issues a compiler warning for compilers that support an “unused result” attribute. Ignoring the return value of these functions is a common error and return values that are not equal to NANOARROW_OK should be propagated as soon as possible. The C library provides tools to check return values in a readable way. Notably:

  • NANOARROW_RETURN_NOT_OK() can be used in a wrapper function that also returns ArrowErrorCode.
  • NANOARROW_THROW_NOT_OK() can be used from C++ code that inclues nanoarrow.hpp and is prepared to handle exceptions.
  • NANOARROW_ASSERT_OK() can be used to to check for NANOARROW_OK only in debug mode (i.e., silently ignore errors in release mode).

Of these, the first or second is preferred. The Getting Started with nanoarrow in C/C++ tutorial includes examples and advice for handling errors eminating from the nanoarrow C library.

Second, in debug mode (i.e., when NANOARROW_DEBUG is defined), the appropriate attribute was added to check the format string passed to ArrowErrorSet() against the provided arguments. Correct code will be unaffected by this change; however, actual arguments that do not match the format string (e.g., an int64_t that is passed to ArrowErrorSet() with a format string of "%d") should be cast to the appropriate C type (e.g., int) or the format string should be fixed to support the type of the actual argument (e.g., using "%" PRId64).

Third, functions in the C library that do not take ownersip of or modify input are now properly marked as const. For example, ArrowArrayViewGetIntUnsafe() previously accepted a struct ArrowArrayView* and now accepts a const struct ArrowArrayView*. This change makes it more difficult to accidentally modify input intended to be read-only and improves usability from C++. Downstream projects that get a new warning about discarding a const qualifier may need to adjust variable declarations or formal parameter types; however, most projects should be unaffected by this change.

C/C++

The nanoarrow 0.4.0 release includes a number of bugfixes and improvements to the core C library and C++ helpers.

  • An implementation of the C data interface integration test was added, including a reader/writer for the Arrow integration testing JSON format. This was used to improve test coverage of the IPC reader and to add nanoarrow as a participating member of integration testing in the CI job that runs in the main Arrow repository.
  • The C library now supports a wider range of extended compiler warnings to make it easier to vendor in projects with strict compiler warning policies.
  • C++ helpers were improved to support const-correctness. As a result, the UniqueSchema, UniqueArray, UniqueArrayView, and UniqueBuffer now work with a wider variety of C++ wrappers (e.g., std::unordered_map).

R bindings

The nanoarrow R bindings are distributed as the nanoarrow package on CRAN. The 0.4.0 release of the R bindings includes improvements in type support and stability. Notably:

  • Documentation was improved for low-level users of nanoarrow that are producing or consuming ArrowArray, ArrowSchema, and/or ArrowArrayStream structures from C or C++ code in other R packages.
  • Improved conversion of list()s to support more types when the arrow R package is not available.
  • Added more implmentations of as_nanoarrow_array_stream() to support more object types from the arrow R package.
  • Added conversion from Arrow integer arrays to character().

Python bindings

TODO!

Contributors

This release consists of contributions from 4 contributors in addition to the invaluable advice and support of the Apache Arrow developer mailing list.

$ git shortlog -sn 798a1b8f096c84e2b6f887427649f1cb496412b2..apache-arrow-nanoarrow-0.4.0 | grep -v "GitHub Actions"
  35  Dewey Dunnington
  3  William Ayd
  2  Dirk Eddelbuettel
  2  Joris Van den Bossche
  1  eitsupi