Native Rust implementation of Apache Arrow

Coverage Status

This crate contains a native Rust implementation of the Arrow columnar format. It uses nightly Rust.

Developer's guide

Refer to lib.rs for an introduction to this crate and current functionality.

How to run the tests

The tests of this crate depend on two environment variables to be defined. Assuming that you are in this crates' current directory:

export PARQUET_TEST_DATA=../../cpp/submodules/parquet-testing/data
export ARROW_TEST_DATA=../../testing/data
cargo test

runs all the tests.

Examples

The examples folder shows how to construct some different types of Arrow arrays, including dynamic arrays created at runtime.

Examples can be run using the cargo run --example command. For example:

cargo run --example builders
cargo run --example dynamic_types
cargo run --example read_csv

IPC

The IPC flatbuffer code was generated by running this command from the root of the project, using flatc version 1.10.0:

./regen.sh

The above script will run the flatc compiler and perform some adjustments to the source code:

  • Replace type__ with type_
  • Remove org::apache::arrow::flatbuffers namespace
  • Add includes to each generated file

Features

Arrow uses the following features:

  • simd - Arrow uses the packed_simd crate to optimize many of the implementations in the compute module using SIMD intrinsics. These optimizations are turned off by default.
  • flight which contains useful functions to convert between the Flight wire format and Arrow data
  • prettyprint which is a utility for printing record batches

Other than simd all the other features are enabled by default. Disabling prettyprint might be necessary in order to compile Arrow to the wasm32-unknown-unknown WASM target.

Publishing to crates.io

An Arrow committer can publish this crate after an official project release has been made to crates.io using the following instructions.

Follow these instructions to create an account and login to crates.io before asking to be added as an owner of the arrow crate.

Checkout the tag for the version to be released. For example:

git checkout apache-arrow-0.11.0

If the Cargo.toml in this tag already contains version = "0.11.0" (as it should) then the crate can be published with the following command:

cargo publish

If the Cargo.toml does not have the correct version then it will be necessary to modify it manually. Since there is now a modified file locally that is not committed to GitHub it will be necessary to use the following command.

cargo publish --allow-dirty