Helpers for Arrow C Data & Arrow C Stream interfaces

Clone this repo:
  1. bdcbd6d chore(dev/release): Update/validate release verification instructions for 0.5.0 (#476) by Dewey Dunnington · 6 hours ago main
  2. fcc540a docs(python): Update Python bindings readme (#474) by Dewey Dunnington · 6 hours ago
  3. d1b9924 feat(python): Add column-wise buffer builder (#464) by Dewey Dunnington · 7 hours ago
  4. aebc812 chore(dev/release): Do nanoarrow vendoring using R_BIN instead of R CMD INSTALL on Windows (#475) by Dewey Dunnington · 8 hours ago
  5. 02782cb docs: Update top-level documentation (#473) by Dewey Dunnington · 2 days ago

nanoarrow

Codecov test coverage Documentation nanoarrow on GitHub

The nanoarrow libraries are a set of helpers to produce and consume Arrow data, including the Arrow C Data, Arrow C Stream, and Arrow C Device, structures and the serialized Arrow IPC format. The vision of nanoarrow is that it should be trivial for libraries to produce and consume Arrow data: it helps fulfill this vision by providing high-quality, easy-to-adopt helpers to produce, consume, and test Arrow data types and arrays.

The nanoarrow libraries were built to be:

  • Small: nanoarrow’s C runtime compiles into a few hundred kilobytes and its R and Python bindings both have an installed size of ~1 MB.
  • Easy to depend on: nanoarrow's C library is distributed as two files (nanoarrow.c and nanoarrow.h) and its R and Python bindings have zero dependencies.
  • Useful: The Arrow Columnar Format includes a wide range of data type and data encoding options. To the greatest extent practicable, nanoarrow strives to support the entire Arrow columnar specification (see the Arrow implementation status page for implementation status).

Getting started

The nanoarrow Python bindings are available from PyPI and conda-forge:

pip install nanoarrow
conda install nanoarrow -c conda-forge

The nanoarrow R package is available from CRAN:

install.packages("nanoarrow")

See the nanoarrow Documentation for extended tutorials and API reference for the C, C++, Python, and R libraries.

The nanoarrow GitHub repository additionally provides a number of examples covering how to use nanoarrow in a variety of build configurations.

Development

Building with CMake

CMake is the primary build system used to develop and test the nanoarrow C library. You can build nanoarrow with:

mkdir build && cd build
cmake ..
cmake --build .

Building nanoarrow with tests currently requires Arrow C++. If installed via a system package manager like apt, dnf, or brew, the tests can be built with:

mkdir build && cd build
cmake .. -DNANOARROW_BUILD_TESTS=ON
cmake --build .

Tests can be run with ctest.

Building with Meson

CMake is the officially supported build system for nanoarrow. However, the Meson backend is an experimental feature you may also wish to try.

To run the test suite with Meson, you will want to first install the testing dependencies via the wrap database (n.b. no wrap database entry exists for Arrow - that must be installed separately).

mkdir subprojects
meson wrap install gtest
meson wrap install google-benchmark
meson wrap install nlohmann_json

The Arrow C++ library must also be discoverable via pkg-config build tests.

You can then set up your build directory:

meson setup builddir
cd builddir

And configure your project (this could have also been done inline with setup)

meson configure -DNANOARROW_BUILD_TESTS=true -DNANOARROW_BUILD_BENCHMARKS=true

Note that if your Arrow pkg-config profile is installed in a non-standard location on your system, you may pass the --pkg-config-path <path to directory with arrow.pc> to either the setup or configure steps above.

With the above out of the way, the compile command should take care of the rest:

meson compile

Upon a successful build you can execute the test suite and benchmarks with the following commands:

meson test nanoarrow:  # default test run
meson test nanoarrow: --wrap valgrind  # run tests under valgrind
meson test nanoarrow: --benchmark --verbose # run benchmarks