commit | bab66ac3d503519024ae086c44d5152f2f3ea0c9 | [log] [tgz] |
---|---|---|
author | Dewey Dunnington <dewey@dunnington.ca> | Tue Apr 09 16:18:50 2024 -0300 |
committer | GitHub <noreply@github.com> | Tue Apr 09 16:18:50 2024 -0300 |
tree | 4e8f385f22072f0a28ac455b3abba4232f187ca0 | |
parent | 00aa9c381737e97569a1282bfcf96050d3de5d56 [diff] |
feat(python): Clarify interaction between the CDeviceArray, the CArrayView, and the CArray (#409) When device support was first added, the `CArrayView` was device-aware but the `CArray` was not. This worked well until it was clear that `__arrow_c_array__` needed to error if it did not represent a CPU array (and the `CArray` had no way to check). Now, the `CArray` has a `device_type` and `device_id`. A nice side-effect of this is that we get back the `view()` method (whose removal @jorisvandenbossche had lamented!). This also implements the device array protocol to help test https://github.com/apache/arrow/pull/40717 . This protocol isn't finalized yet and I could remove that part until it is (although it doesn't seem likely to change). The non-cpu case is still hard to test without real-world CUDA support...this PR is just trying to get the right information in the right place as early as possible. ```python import nanoarrow as na array = na.c_array([1, 2, 3], na.int32()) array.device_type, array.device_id #> (1, 0) ``` --------- Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com>
The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.
Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.
The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanoarrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.
A simple producer example:
#include "nanoarrow.h" int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) { struct ArrowError error; array_out->release = NULL; schema_out->release = NULL; NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32)); NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3)); NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error)); NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema_out, NANOARROW_TYPE_INT32)); return NANOARROW_OK; }
A simple consumer example:
#include <stdio.h> #include "nanoarrow.h" int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) { struct ArrowError error; struct ArrowArrayView array_view; NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error)); if (array_view.storage_type != NANOARROW_TYPE_INT32) { printf("Array has storage that is not int32\n"); } int result = ArrowArrayViewSetArray(&array_view, array, &error); if (result != NANOARROW_OK) { ArrowArrayViewReset(&array_view); return result; } for (int64_t i = 0; i < array->length; i++) { printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i)); } ArrowArrayViewReset(&array_view); return NANOARROW_OK; }
CMake is the officially supported build system for nanoarrow. However, the Meson backend is an experimental feature you may also wish to try.
To run the test suite with Meson, you will want to first install the testing dependencies via the wrap database (n.b. no wrap database entry exists for Arrow - that must be installed separately).
mkdir subprojects meson wrap install gtest meson wrap install google-benchmark meson wrap install nlohmann_json
The Arrow C++ library must also be discoverable via pkg-config build tests.
You can then set up your build directory:
meson setup builddir
cd builddir
And configure your project (this could have also been done inline with setup
)
meson configure -DNANOARROW_BUILD_TESTS=true -DNANOARROW_BUILD_BENCHMARKS=true
Note that if your Arrow pkg-config profile is installed in a non-standard location on your system, you may pass the --pkg-config-path <path to directory with arrow.pc>
to either the setup or configure steps above.
With the above out of the way, the compile
command should take care of the rest:
meson compile
Upon a successful build you can execute the test suite and benchmarks with the following commands:
meson test nanoarrow: # default test run meson test nanoarrow: --wrap valgrind # run tests under valgrind meson test nanoarrow: --benchmark --verbose # run benchmarks