We recently celebrated releasing version 16.0.0 of the Rust implementation of Apache Arrow. While we still get a few comments on “most rust libraries use versions 0.x.0, why are you at 16.0.0?”, our versioning scheme appears to be working well, and permits quick releases of new features and API evolution in a semver compatible way without breaking downstream projects.
This post contains highlights from the last four months (versions 10.0.0 to 16.0.0) of arrow-rs and parquet-rs development as well as a roadmap of future work. The full list of awesomeness can be found in the CHANGELOG.
As you may remember, the arrow and parquet implementations are in the same crate, on the same release schedule, and in this same blog. This is not for technical reasons, but helps to keep the maintenance burden for delivering great Apache software reasonable, and allows easier development of optimized conversion between Arrow <--> Parquet formats.
The parquet crate has seen a return to substantial improvements after being relatively dormant for several years. The current major areas of focus are
async
hronously from remote object stores.Some Major Highlights:
std:io::Write
rather than a custom ParquetWriter
trait, making it more interoperable with the rest of the Rust ecosystem and the projection API is easier to use with nested types.Looking Forward:
The Rust arrow implementation has also had substantial improvements, in addition to bug fixes and performance improvements.
Some Major Highlights:
safe
apis -- checkout the README and the module level rustdocs for more details. Among other things, we have added additional validation checking to string kernels and DecimalArrays
and sealed some sensitive traits.arrow-rs
, thanks to @HaoYang670.Looking Forward:
unsafe
will likely always be required in arrow (for fast IPC, for example), but we are also working to improve the underlying ArrayData
structure to make it more compatible with the ecosystem (e.g. use Bytes
), support faster to decode from parquet, and to avoid bugs related to offsets (slicing) which are a frequent pain point.Some areas looking for help include:
While some open source software can be created mostly by a single contributor, we believe the greatest software with the largest impact and reach is built around its community. Thus, Arrow is part of the Apache Software Foundation and our releases both past and present are a result of our amazing community's effort.
We would like to thank everyone who has contributed to the arrow-rs repository since the 9.0.2
release. Keep up the great work and we look forward to continued improvements:
git shortlog -sn 9.0.0..16.0.0 47 Liang-Chi Hsieh 45 Raphael Taylor-Davies 43 Andrew Lamb 40 Remzi Yang 8 Sergey Glushchenko 7 Jörn Horstmann 6 Shani Solomon 6 dependabot[bot] 5 Yang Jiang 4 jakevin 4 Chao Sun 4 Yijie Shen 3 kazuhiko kikuchi 2 Sumit 2 Ismail-Maj 2 Kamil Konior 2 tfeda 2 Matthew Turner 1 iyupeng 1 ryan-jacobs1 1 Alex Qyoun-ae 1 tjwilson90 1 Andy Grove 1 Atef Sawaed 1 Daniël Heres 1 DuRipeng 1 Helgi Kristvin Sigurbjarnarson 1 Kun Liu 1 Kyle Barron 1 Marc Garcia 1 Peter C. Jentsch 1 Remco Verhoef 1 Sven Cattell 1 Thomas Peiselt 1 Tiphaine Ruy 1 Trent Feda 1 Wang Fenjin 1 Ze'ev Maor 1 diana
If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues suitable for beginners here and the full list here.
Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to improve the documentation.