The nanoarrow Python package provides bindings to the nanoarrow C library. Like the nanoarrow C library, it provides tools to facilitate the use of the Arrow C Data and Arrow C Stream interfaces.
Python bindings for nanoarrow are not yet available on PyPI. You can install via URL (requires a C compiler):
python -m pip install "https://github.com/apache/arrow-nanoarrow/archive/refs/heads/main.zip#egg=nanoarrow&subdirectory=python"
If you can import the namespace, you're good to go!
import nanoarrow as na
The Arrow C Data and Arrow C Stream interfaces are comprised of three structures: the ArrowSchema
which represents a data type of an array, the ArrowArray
which represents the values of an array, and an ArrowArrayStream
, which represents zero or more ArrowArray
s with a common ArrowSchema
. All three can be wrapped by Python objects using the nanoarrow Python package.
Use nanoarrow.schema()
to convert a data type-like object to an ArrowSchema
. This is currently only implemented for pyarrow objects.
import pyarrow as pa schema = na.schema(pa.decimal128(10, 3))
You can extract the fields of a Schema
object one at a time or parse it into a view to extract deserialized parameters.
print(schema.format) print(schema.view().decimal_precision) print(schema.view().decimal_scale)
d:10,3 10 3
The nanoarrow.schema()
helper is currently only implemented for pyarrow objects. If your data type has an _export_to_c()
-like function, you can get the address of a freshly-allocated ArrowSchema
as well:
schema = na.Schema.allocate() pa.int32()._export_to_c(schema._addr()) schema.view().type
'int32'
The Schema
object cleans up after itself: when the object is deleted, the underlying Schema
is released.
You can use nanoarrow.array()
to convert an array-like object to a nanoarrow.Array
, optionally attaching a Schema
that can be used to interpret its contents. This is currently only implemented for pyarrow objects.
array = na.array(pa.array(["one", "two", "three", None]))
Like the Schema
, you can inspect an Array
by extracting fields individually:
print(array.length) print(array.null_count)
4 1
...and parse the Array
/Schema
combination into a view whose contents is more readily accessible.
import numpy as np view = array.view() [np.array(buffer) for buffer in view.buffers]
[array([7], dtype=uint8), array([ 0, 3, 6, 11, 11], dtype=int32), array([b'o', b'n', b'e', b't', b'w', b'o', b't', b'h', b'r', b'e', b'e'], dtype='|S1')]
Like the Schema
, you can allocate an empty one and access its address with _addr()
to pass to other array-exporting functions.
array = na.Array.allocate(na.Schema.allocate()) pa.array([1, 2, 3])._export_to_c(array._addr(), array.schema._addr()) array.length
3
You can use nanoarrow.array_stream()
to convert an object representing a sequence of Array
s with a common Schema
to a nanoarrow.ArrayStream
. This is currently only implemented for pyarrow objects.
pa_array_child = pa.array([1, 2, 3], pa.int32()) pa_array = pa.record_batch([pa_array_child], names=["some_column"]) reader = pa.RecordBatchReader.from_batches(pa_array.schema, [pa_array]) array_stream = na.array_stream(reader)
You can pull the next array from the stream using .get_next()
or use it like an iterator. The .get_next()
method will return None
when there are no more arrays in the stream.
print(array_stream.get_schema()) for array in array_stream: print(array.length) print(array_stream.get_next() is None)
struct<some_column: int32> 3 True
You can also get the address of a freshly-allocated stream to pass to a suitable exporting function:
array_stream = na.ArrayStream.allocate() reader._export_to_c(array_stream._addr()) array_stream.get_schema()
struct<some_column: int32>
Python bindings for nanoarrow are managed with setuptools. This means you can build the project using:
git clone https://github.com/apache/arrow-nanoarrow.git cd arrow-nanoarrow/python pip install -e .
Tests use pytest:
# Install dependencies pip install -e .[test] # Run tests pytest -vvx