The Apache Arrow team is pleased to announce the 0.12.0 release. This is the largest release yet in the project, covering 3 months of development work and includes 614 resolved issues from 77 distinct contributors.
See the Install Page to learn how to get the libraries for your platform. The complete changelog is also available.
It‘s a huge release, but we’ll give some brief highlights and new from the project to help guide you to the parts of the project that may be of interest.
The Arrow team is growing! Since the 0.11.0 release we have added 3 new committers:
We also pleased to announce that Krisztián Szűcs has been promoted from committer to PMC (Project Management Committee) member.
Thank you for all your contributions!
Since the last release, we have received 3 code donations into the Apache project.
We are excited to continue to grow the Apache Arrow development community.
Since the last release, we have merged the Python and C++ documentation to create a combined project-wide documentation site: https://arrow.apache.org/docs. There is now some prose documentation about many parts of the C++ library. We intend to keep adding documentation for other parts of Apache Arrow to this site.
We start providing the official APT and Yum repositories for C++ and GLib (C). See the install document for details.
Much of the C++ development work the last 3 months concerned internal code refactoring and performance improvements. Some user-visible highlights of note:
Since the LLVM-based Gandiva expression compiler was donated to Apache Arrow during the last release cycle, development there has been moving along. We expect to have Windows support for Gandiva and to ship this in downstream packages (like Python) in the 0.13 release time frame.
The Arrow Go development team has been expanding. The Go library has gained support for many missing features from the columnar format as well as semantic constructs like chunked arrays and tables that are used heavily in the C++ project.
Development of the GLib-based C bindings and corresponding Ruby interfaces have advanced in lock-step with the C++, Python, and R libraries. In this release, there are many new features in C and Ruby:
We fixed a ton of bugs and made many improvements throughout the Python project. Some highlights from the Python side include:
pyarrow.input_stream
and pyarrow.output_stream
functions support read and write buffering. This is analogous to BufferedIOBase
from the Python standard library, but the internals are implemented natively in C++.The R library made huge progress in 0.12, with work led by new committer Romain Francois. The R project's features are not far behind the Python library, and we are hoping to be able to make the R library available to CRAN users for use with Apache Spark or for reading and writing Parquet files over the next quarter.
Users of the feather
R library will see significant speed increases in many cases when reading Feather files with the new Arrow R library.
Rust development had an active last 3 months; see the changelog for details.
A native Rust implementation was just donated to the project, and the community intends to provide a similar level of functionality for reading and writing Parquet files using the Arrow in-memory columnar format as an intermediary.
Apache Arrow has become a large, diverse open source project. It is now being used in dozens of downstream open source and commercial projects. Work will be proceeding in many areas in 2019:
It promises to be an exciting 2019. We look forward to having you involved in the development community.