feat(r): Provide LinkingTo headers for extension packages (#332)

In developing geoarrow, an R extension that imports/exports a number of
Arrow C data structures wrapped by nanoarrow S3 objects, it has become
apparent that the sanitize and allocate operations are non-trivial and
basically have to be copied in every package that wants to import/export
nanoarrow things. @eddelbuettel has run up against some valid use-cases
as well! https://github.com/eddelbuettel/linesplitter/issues/1 ,
https://github.com/apache/arrow-nanoarrow/issues/187

This PR makes the definition of how Arrow C Data interface objects are
encoded as R external pointers available as a public header (such that
downstream packages can `LinkingTo: nanoarrow` and `#include
<nanoarrow/r.h>`. I think the initial target will just be allocate an
owning external pointer and sanitize an input SEXP.

@eddelbuettel Are there any other operations that are blocking any of
your projects that would be must-haves in this header?

(I know it's missing array_stream...I forgot about the ability to supply
R finalizers and so it's a slightly more complicated change)
19 files changed
tree: d0610ba96b7e53e4ee3ef02bbefc1ca04d2d079b
  1. .github/
  2. ci/
  3. dev/
  4. dist/
  5. docs/
  6. examples/
  7. extensions/
  8. python/
  9. r/
  10. src/
  11. .asf.yaml
  12. .clang-format
  13. .cmake-format
  14. .env
  15. .flake8
  16. .gitattributes
  17. .gitignore
  18. .pre-commit-config.yaml
  19. CHANGELOG.md
  20. CMakeLists.txt
  21. CMakePresets.json
  22. CMakeUserPresets.json.example
  23. docker-compose.yml
  24. LICENSE.txt
  25. NOTICE.txt
  26. README.md
  27. valgrind.supp
README.md

nanoarrow

Codecov test coverage Documentation nanoarrow on GitHub

The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.

Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.

Using the C library

The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanorrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.

A simple producer example:

#include "nanoarrow.h"

int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) {
  struct ArrowError error;
  array_out->release = NULL;
  schema_out->release = NULL;

  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32));

  NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3));
  NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error));

  NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema_out, NANOARROW_TYPE_INT32));

  return NANOARROW_OK;
}

A simple consumer example:

#include <stdio.h>

#include "nanoarrow.h"

int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) {
  struct ArrowError error;
  struct ArrowArrayView array_view;
  NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error));

  if (array_view.storage_type != NANOARROW_TYPE_INT32) {
    printf("Array has storage that is not int32\n");
  }

  int result = ArrowArrayViewSetArray(&array_view, array, &error);
  if (result != NANOARROW_OK) {
    ArrowArrayViewReset(&array_view);
    return result;
  }

  for (int64_t i = 0; i < array->length; i++) {
    printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i));
  }

  ArrowArrayViewReset(&array_view);
  return NANOARROW_OK;
}