feat(r): Improve printing and conversion of buffers (#208)

After #207, buffer data types are readily available. Before, the R
bindings did most of this on their own to get debug buffer printing.
This PR exploits the new feature to improve debug output and adds the
ability to convert buffers into R vectors along the way.

Before this PR:

``` r
library(nanoarrow)
as_nanoarrow_array(c(NA, stringr::words))
#> <nanoarrow_array string[981]>
#>  $ length    : int 981
#>  $ null_count: int 1
#>  $ offset    : int 0
#>  $ buffers   :List of 3
#>   ..$ :<nanoarrow_buffer_validity[123 b] at 0x13b71d8d0>
#>   ..$ :<nanoarrow_buffer_data_offset32[3928 b] at 0x13d87b600>
#>   ..$ :<nanoarrow_buffer_data_utf8[5126 b] at 0x13d87c600>
#>  $ dictionary: NULL
#>  $ children  : list()
```

<sup>Created on 2023-05-30 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>

After this PR:

``` r
library(nanoarrow)
as_nanoarrow_array(c(NA, stringr::words))
#> <nanoarrow_array string[981]>
#>  $ length    : int 981
#>  $ null_count: int 1
#>  $ offset    : int 0
#>  $ buffers   :List of 3
#>   ..$ :<nanoarrow_buffer validity<bool>[984][123 b]> `FALSE TRUE TRUE TRUE T...`
#>   ..$ :<nanoarrow_buffer data_offset<int32>[982][3928 b]> `0 0 1 5 10 18 24 ...`
#>   ..$ :<nanoarrow_buffer data<string>[5126 b]> `aableaboutabsoluteacceptacco...`
#>  $ dictionary: NULL
#>  $ children  : list()
```

<sup>Created on 2023-05-30 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>
12 files changed
tree: bbce2883864dd1bb11e477bb086af6d905d81a47
  1. .github/
  2. ci/
  3. dev/
  4. dist/
  5. docs/
  6. examples/
  7. extensions/
  8. python/
  9. r/
  10. src/
  11. .asf.yaml
  12. .clang-format
  13. .env
  14. .gitattributes
  15. .gitignore
  16. CHANGELOG.md
  17. CMakeLists.txt
  18. CMakePresets.json
  19. CMakeUserPresets.json.example
  20. docker-compose.yml
  21. LICENSE.txt
  22. NOTICE.txt
  23. README.md
  24. valgrind.supp
README.md

nanoarrow

Codecov test coverage Documentation nanoarrow on GitHub

The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.

Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.

Using the C library

The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanorrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.

A simple producer example:

#include "nanoarrow.h"

int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) {
  struct ArrowError error;
  array_out->release = NULL;
  schema_out->release = NULL;

  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32));

  NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3));
  NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error));

  NANOARROW_RETURN_NOT_OK(ArrowSchemaInit(schema_out, NANOARROW_TYPE_INT32));

  return NANOARROW_OK;
}

A simple consumer example:

#include <stdio.h>

#include "nanoarrow.h"

int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) {
  struct ArrowError error;
  struct ArrowArrayView array_view;
  NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error));

  if (array_view.storage_type != NANOARROW_TYPE_INT32) {
    printf("Array has storage that is not int32\n");
  }

  int result = ArrowArrayViewSetArray(&array_view, array, &error);
  if (result != NANOARROW_OK) {
    ArrowArrayViewReset(&array_view);
    return result;
  }

  for (int64_t i = 0; i < array->length; i++) {
    printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i));
  }

  ArrowArrayViewReset(&array_view);
  return NANOARROW_OK;
}