commit | ef925885166be36bfdbacb35aabe23e0be9f7c47 | [log] [tgz] |
---|---|---|
author | Dewey Dunnington <dewey@dunnington.ca> | Tue Dec 12 17:10:07 2023 -0400 |
committer | GitHub <noreply@github.com> | Tue Dec 12 17:10:07 2023 -0400 |
tree | b9e9c18c950bebd359e9021470568aa030fc770d | |
parent | 3eee79a62dba215b32ef69c63b9b793c5fd6ba78 [diff] |
chore(extensions/nanoarrow_ipc): Add golden file testing JSON tests to existing arrow-testing tests (#334) This PR adds a "golden file" integration test to the existing suite of tests that read files from the arrow-testing repo. The existing test read the IPC stream file using nanoarrow's IPC reader and Arrow C++'s IPC reader and used Arrow C++ to check equality. The added test is more similar to the golden file test described in the integration testing section of the documentation (read IPC, read testing JSON, use nanoarrow testing utils to check equality). This involved some refactoring to re-use the infrastructure properly for both types of tests. Originally I wrote a dedicated executable and some bash to do these kinds of tests ( https://gist.github.com/paleolimbot/ec2a2067198f0de1901c107c783d3b26 ); however, integrating it into the existing tests is cleaner and makes it easier to run them with valgrind.
The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.
Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.
The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanorrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.
A simple producer example:
#include "nanoarrow.h" int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) { struct ArrowError error; array_out->release = NULL; schema_out->release = NULL; NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32)); NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3)); NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error)); NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema_out, NANOARROW_TYPE_INT32)); return NANOARROW_OK; }
A simple consumer example:
#include <stdio.h> #include "nanoarrow.h" int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) { struct ArrowError error; struct ArrowArrayView array_view; NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error)); if (array_view.storage_type != NANOARROW_TYPE_INT32) { printf("Array has storage that is not int32\n"); } int result = ArrowArrayViewSetArray(&array_view, array, &error); if (result != NANOARROW_OK) { ArrowArrayViewReset(&array_view); return result; } for (int64_t i = 0; i < array->length; i++) { printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i)); } ArrowArrayViewReset(&array_view); return NANOARROW_OK; }