We recently released the 5.0.0 Rust version of Apache Arrow which coincides with the Arrow 5.0.0 release. This post highlights some of the improvements in the Rust implementation. The full changelog can be found here.
The Rust Arrow implementation would not be possible without the wonderful work and support of our community, and the 5.0.0 release is no exception. It includes 161 commits from 34 individual contributors, many of them with their first contribution. Thank you all very much.
Feature-wise, this release adds:
RecordBatch
es.We continue to leverage the Rust ecosystem to deliver reliable and performant code. We made significant progress towards running the Rust test suite under the MIRI checker (a sort of valgrind for Rust) for memory access violations, and we expect it to be fully enabled in CI for the 5.1.0 release.
Of course, this release also contains bug fixes, performance improvements, and improved documentation examples. For the full list of changes, please consult the changelog.
The parquet-derive
crate now automatically derives the required parquet schema, and the parquet
crate had several bug fixes and enhancements.
Arrow releases major versions every three months. The Rust implementation has been experimenting with releasing minor version updates to speed the flow of new features and fixes. By implementing a new development process, as described in A New Development Workflow for Arrow's Rust Implementation we have successfully created 4 minor releases on the 4.x.x line every other week without any reports of breakage.
You can always find the latest releases on crates.io: arrow
, parquet
, arrow-flight
, and parquet-derive
.
DataFusion is an in-memory query engine with DataFrame and SQL APIs, built on top of Arrow. Ballista is a distributed compute platform. These projects are now in their own repository, and are no longer released in lock-step with Arrow. Expect further news in this area soon.
Here are some of the initiatives that contributors are currently working on for future releases:
unsafe
to make arrow faster and more secure -- see the mailing list discussion for more details.Again, thank you to all the contributors for this release. Here is the raw git listing:
28 Jorge Leitao 27 Andrew Lamb 15 Jiayu Liu 12 Ritchie Vink 10 Wakahisa 8 Raphael Taylor-Davies 6 Daniël Heres 5 Andy Grove 5 Navin 5 Jörn Horstmann 4 Ádám Lippai 4 Dominik Moritz 4 Marco Neumann 3 Roee Shlomo 3 Michael Edwards 2 Steven 2 Krisztián Szűcs 2 Gary Pennington 1 Ben Chambers 1 Max Meldrum 1 Edd Robinson 1 Gang Liao 1 Chojan Shang 1 Boaz 1 Wes McKinney 1 Yordan Pavlov 1 baishen 1 hulunbier 1 kazuhiko kikuchi 1 Dmitry Patsura 1 Kornelijus Survila 1 Laurent Mazare 1 Manish Gill 1 Marc van Heerden
If you are interested in contributing to the Rust implementation of Apache Arrow, we would love to have you! You can help by trying out Arrow on some of your own data and projects and filing bug reports and helping to improve the documentation, or contribute to the documentation, tests or code. A list of open issues suitable for beginners is here and the full list is here.