commit | c738f90e874ec38b2fa2acebd154a33da36dc9a4 | [log] [tgz] |
---|---|---|
author | Dewey Dunnington <dewey@dunnington.ca> | Fri Jun 09 11:02:49 2023 -0400 |
committer | GitHub <noreply@github.com> | Fri Jun 09 12:02:49 2023 -0300 |
tree | f70a537a7d0e2b931eaeae6b0360c4959f70dc24 | |
parent | 2d9efd877216d5c7280828bcf81b9338baf31ca9 [diff] |
feat: Include dictionary member in `ArrowArrayView` struct (#221) This doesn't add any features really, but ensures that one can walk `ArrowSchema`, `ArrowArray`, and `ArrowArrayView` recursively using the same pattern. I'd like to include this in the 0.2 release because it is very difficult to work around this limitation: for example, if you have a deeply nested dictionary field, you currently have to walk your whole tree of arrays twice (once to validate the non-dictionary bits, once looking for dictionary bits that may or may not exist to validate). The R package worked around this in some creative but error-prone ways and I'd like to avoid anybody else attempting creative workarounds when this is a feature that will almost certainly get added soon. Another application of this is device array support, since that PR essentially constructs an `ArrowArrayView` and recursively copies the buffer view from the array view to a standalone buffer using some device-specific logic.
The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.
Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.
The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanorrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.
A simple producer example:
#include "nanoarrow.h" int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) { struct ArrowError error; array_out->release = NULL; schema_out->release = NULL; NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32)); NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3)); NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error)); NANOARROW_RETURN_NOT_OK(ArrowSchemaInit(schema_out, NANOARROW_TYPE_INT32)); return NANOARROW_OK; }
A simple consumer example:
#include <stdio.h> #include "nanoarrow.h" int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) { struct ArrowError error; struct ArrowArrayView array_view; NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error)); if (array_view.storage_type != NANOARROW_TYPE_INT32) { printf("Array has storage that is not int32\n"); } int result = ArrowArrayViewSetArray(&array_view, array, &error); if (result != NANOARROW_OK) { ArrowArrayViewReset(&array_view); return result; } for (int64_t i = 0; i < array->length; i++) { printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i)); } ArrowArrayViewReset(&array_view); return NANOARROW_OK; }