fix: Relax comparison strictness such that integration tests pass (#399)

These changes are the changes required such that
https://github.com/apache/arrow/pull/39302 results in passing
integration tests for nanoarrow. The changes are mostly related to
comparison:

- We needed an option to allow metadata to be compared on a key/value
basis without considering order (for Java, which seems to reorder
metadata on read)
- We needed the ability to treat NULL metadata and zero-size metadata as
equivalent (for Go, which always exports zero-length metadata)
- We needed an option to ignore flags for top-level batches (for C#,
which exports nullable structs)
- We needed to ensure that the last few bits of the validity buffer were
zeroed (for C#, although this is now fixed in C# on Arrow main)
- We needed to ensure that no buffers were NULL (For C#, which leaks the
top-level array if it encounters one, at least in the integration tests.
This should really be fixed in C#).
5 files changed
tree: f0e0dbe7ca419a2e5e25182c09961ceb3877f42f
  1. .github/
  2. ci/
  3. cmake/
  4. dev/
  5. dist/
  6. docs/
  7. examples/
  8. extensions/
  9. python/
  10. r/
  11. src/
  12. thirdparty/
  13. .asf.yaml
  14. .clang-format
  15. .cmake-format
  16. .env
  17. .flake8
  18. .gitattributes
  19. .gitignore
  20. .isort.cfg
  21. .pre-commit-config.yaml
  22. CHANGELOG.md
  23. CMakeLists.txt
  24. CMakePresets.json
  25. CMakeUserPresets.json.example
  26. docker-compose.yml
  27. LICENSE.txt
  28. meson.build
  29. meson.options
  30. NOTICE.txt
  31. README.md
  32. valgrind.supp
README.md

nanoarrow

Codecov test coverage Documentation nanoarrow on GitHub

The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.

Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.

Using the C library

The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanoarrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.

A simple producer example:

#include "nanoarrow.h"

int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) {
  struct ArrowError error;
  array_out->release = NULL;
  schema_out->release = NULL;

  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32));

  NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2));
  NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3));
  NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error));

  NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema_out, NANOARROW_TYPE_INT32));

  return NANOARROW_OK;
}

A simple consumer example:

#include <stdio.h>

#include "nanoarrow.h"

int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) {
  struct ArrowError error;
  struct ArrowArrayView array_view;
  NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error));

  if (array_view.storage_type != NANOARROW_TYPE_INT32) {
    printf("Array has storage that is not int32\n");
  }

  int result = ArrowArrayViewSetArray(&array_view, array, &error);
  if (result != NANOARROW_OK) {
    ArrowArrayViewReset(&array_view);
    return result;
  }

  for (int64_t i = 0; i < array->length; i++) {
    printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i));
  }

  ArrowArrayViewReset(&array_view);
  return NANOARROW_OK;
}

Building with Meson

CMake is the officially supported build system for nanoarrow. However, the Meson backend is an experimental feature you may also wish to try.

To run the test suite with Meson, you will want to first install the testing dependencies via the wrap database (n.b. no wrap database entry exists for Arrow - that must be installed separately).

mkdir subprojects
meson wrap install gtest
meson wrap install google-benchmark
meson wrap install nlohmann_json

The Arrow C++ library must also be discoverable via pkg-config build tests.

You can then set up your build directory:

meson setup builddir
cd builddir

And configure your project (this could have also been done inline with setup)

meson configure -DNANOARROW_BUILD_TESTS=true -DNANOARROW_BUILD_BENCHMARKS=true

Note that if your Arrow pkg-config profile is installed in a non-standard location on your system, you may pass the --pkg-config-path <path to directory with arrow.pc> to either the setup or configure steps above.

With the above out of the way, the compile command should take care of the rest:

meson compile

Upon a successful build you can execute the test suite and benchmarks with the following commands:

meson test nanoarrow:  # default test run
meson test nanoarrow: --wrap valgrind  # run tests under valgrind
meson test nanoarrow: --benchmark --verbose # run benchmarks