With the recent release of 32.0.0 of the Rust implementation of Apache Arrow, it seemed timely to highlight some of the community works since the last update.
The most recent list of detailed changes can always be found in the CHANGELOG, with the full historical list available here.
arrow and arrow-flight are native Rust implementations of Apache Arrow. Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.
The Rust language offers best in class performance, memory safety, and the developer productivity of a modern programming language. These features make Rust an excellent choice for building modern high performance analytical systems. When combined, Rust and the Apache Arrow Ecosystem are a compelling toolkit for building the next generation of systems.
The repository recently passed 1400 stars on github, and the community has been focused on performance and feature completeness.
Major Highlights
arrow
crate has been split into multiple smaller crates, and large kernels have been moved behind optional feature flags. These changes allow downstream projects to choose a smaller dependency footprint and build times, if desired.into_builder
methodsApache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. The Apache Parquet implementation in Rust is one of the fastest and most sophisticated open source implementations available.
Major Highlights
Modern analytic workloads increasingly make use of blob storage facilities, such as S3, to store large volumes of queryable data. A native Rust object storage implementation that works well with the Rust Ecosystem in general, and the Arrow IO abstractions, is an important building block for many applications. The object_store crate was donated to the Apache Arrow project in July 2022 to fill this need, and while it follows a separate release schedule than the arrow
and parquet
crates, it forms an integral part of the overarching Arrow IO story.
Recent improvements include the following:
Major Highlights
While some open source software can be created mostly by a single contributor, we believe the greatest software with the largest impact and reach is built around its community. Thus, Arrow is proud to part of the Apache Software Foundation and our releases both past and present are a result of our amazing community's effort.
We would like to thank everyone who has contributed to the arrow-rs repository since the 16.0.0
release. Keep up the great work, and we look forward to continued improvements:
% git shortlog -sn 16.0.0..32.0.0 347 Raphael Taylor-Davies 166 Liang-Chi Hsieh 94 Andrew Lamb 36 Remzi Yang 30 Kun Liu 21 Yang Jiang 20 askoa 17 dependabot[bot] 15 Vrishabh 12 Dan Harris 12 Wei-Ting Kuo 11 Daniël Heres 11 Jörn Horstmann 9 Brent Gardner 9 Ian Alexander Joiner 9 Jiayu Liu 9 Martin Grigorov 8 Palladium 7 Jeffrey 7 Marco Neumann 6 Robert Pack 6 Will Jones 4 Andy Grove 4 comphead 3 Adrián Gallego Castellanos 3 Markus Westerlind 3 Quentin 2 Alex Qyoun-ae 2 Dmitry Patsura 2 Frank 2 Jiacai Liu 2 Marc Garcia 2 Marko Grujic 2 Max Burke 2 Your friendly neighborhood geek 2 sachin agarwal 1 Aarash Heydari 1 Adam Gutglick 1 Andrey Frolov 1 Anthony Poncet 1 Artjoms Iskovs 1 Ben Kimock 1 Brian Phillips 1 Carol (Nichols || Goulding) 1 Christian Salvati 1 Dalton Modlin 1 Daniel Martinez Maqueda 1 Daniel Poelzleithner 1 Davis Silverman 1 Dhruv Vats 1 Fabio Silva 1 GeauxEric 1 George Andronchik 1 Ismail-Maj 1 Ismaël Mejía 1 JanKaul 1 JasonLi 1 Javier Goday 1 Jayjeet Chakraborty 1 Jean-Charles Campagne 1 Jie Han 1 John Hughes 1 Jon Mease 1 Kevin Lim 1 Kohei Suzuki 1 Konstantin Fastov 1 Marius S 1 Masato Kato 1 Matthijs Brobbel 1 Michael Edwards 1 Pier-Olivier Thibault 1 Remco Verhoef 1 Rutvik Patel 1 Sean Smith 1 Sid 1 Stanislav Lukeš 1 Steve Vaughan 1 Stuart Carnie 1 Sumit 1 Trent Feda 1 Valeriy V. Vorotyntsev 1 Wenjun L 1 X 1 aksharau 1 bmmeijers 1 chunshao.rcs 1 jakevin 1 kastolars 1 nvartolomei 1 xudong.w 1 哇呜哇呜呀咦耶 1 尹吉峰
If you are interested in contributing to the Rust subproject in Apache Arrow, we encourage you to try out Arrow on some of your data, help improve the documentation, or submit a PR. You can find a list of open issues suitable for beginners here and the full list here.