We recently released the 6.0.0 Rust version of Apache Arrow, which coincides with the [Arrow 6.0.0 release]({{ site.baseurl }}/6.0.0.html). This post highlights some of the improvements in the Rust implementation. The full changelog can be found here.
The Rust Arrow implementation would not be possible without the wonderful work and support of our community, and the 6.0.0 release is no exception. It includes 99 commits from 35 individual contributors, many of them with their first contribution. Thank you all very much.
Highlighted features and changes between release 5.0.0 and this release are:
sort()
for BinaryArray
ArrayData::new()
with ArrayData::try_new()
and unsafe ArrayData::new_unchecked
Of course, this release also contains bug fixes, performance improvements, and improved documentation examples. For the full list of changes, please consult the changelog.
Arrow releases major versions every three months. The Rust implementation follows this major release cycle, and additionally releases minor version updates approximately every other week to speed the flow of new features and fixes.
You can always find the latest releases on crates.io: arrow
, parquet
, arrow-flight
, and parquet-derive
.
DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on top of Arrow. Ballista is a distributed compute platform. These projects are now in their own repository, and are no longer released in lock-step with Arrow.
The memory required to do sorting has been improve by the pull request resolving issue 553. A demonstration for how to sort follows.
extern crate arrow; use arrow::array::{ Int32Array, ArrayRef, }; use std::sync::Arc; use arrow::compute::sort; fn main() { let array: ArrayRef = Arc::new(Int32Array::from(vec![5, 4, 23, 1, 20, 2])); println!("{:?}", array); let sorted_array = sort(&array, None).unwrap(); println!("{:?}", sorted_array); }
For further examples, see the source code.
Here are some of the initiatives that contributors are currently working on for future releases:
UnionArray
support to follow the latest arrow specAgain, thank you to all the contributors for this release. Here is the raw git listing:
23 Andrew Lamb 8 Ben Chambers 8 Navin 7 Jiayu Liu 5 Wakahisa 4 Ruihang Xia 3 Daniël Heres 3 Matthew Turner 3 Sumit 2 Boaz 2 Chojan Shang 2 Ilya Biryukov 2 Krisztián Szűcs 2 Markus Westerlind 2 Roee Shlomo 2 Sergii Mikhtoniuk 2 Wang Fenjin 2 baishen 1 Carol (Nichols || Goulding) 1 Christian Williams 1 Felix Yan 1 Jorge Leitao 1 Kornelijus Survila 1 Matthew Zeitlin 1 Mike Seddon 1 Mykhailo Osypov 1 Neal Richardson 1 Pete Koomen 1 QP Hou 1 Richard 1 Xavier Lange 1 Yuan Zhou 1 aiglematth 1 mathiaspeters-sig 1 msalib
If you are interested in contributing to the Rust implementation of Apache Arrow, we would love to have you! You can help by trying out Arrow on some of your own data and projects and filing bug reports and helping to improve the documentation, or contribute to the documentation, tests or code. A list of open issues suitable for beginners is here and the full list is here